The core focus of this internship is to enhance, evaluate, and scale an LLM-based Java-to-Kotlin (J2K) translation tool. The intern will be responsible for the following key tasks:
Pipeline Expansion: Expand the current evaluation pipeline by adding a robust set of Java and Kotlin (Gold Kotlin) examples sourced from GitHub.
Framework and LLM Integration: Scale the pipeline to evaluate a wider variety of agent frameworks and LLMs. This includes integrating Claude Code, Codex, Gemini, various open-source models, and self-hosted offline models.
Evaluation & Dataset Creation: Analyze evaluation data (using methodologies similar to OpenAI or Anthropic evals) to identify areas where the models underperform, with the goal of creating a reinforcement learning (RL) dataset to improve open-source models.
AST/PSI Analysis: Implement heuristic-based methods to find and pair code declarations between Java and Kotlin using abstract syntax trees (AST) or Program Structure Interface (PSI) to systematically measure translation accuracy.
(optional) Offline Model Testing: Host and evaluate local models to test the feasibility of shipping a fully offline version of the J2K translation tool that still delivers quality results.
Final Deliverables: By the end of the project, the intern should deliver a comprehensive evaluation setup for various agent frameworks, define specific performance metrics for the J2K tool, and implement tangible improvements to the tool through refined prompting, code adjustments, or a combination of both.
The ideal candidate should have a mix of standard programming skills and specialized experience with AI models.
Core Requirements:
Solid proficiency in Java.
Solid proficiency in Kotlin.
Practical experience working with LLM Agents.
Strong Pluses:
Previous experience writing plugins for IntelliJ IDEA.
The ability to write custom skills or tools for Claude.
Nice-to-Haves:
Experience in building and running formal LLM evaluations (evals).
Knowledge of compilers and code structures (AST/PSI).