Large Language Models (LLMs) have vast applications in the software engineering domain. In the AI Agents and Planning team, we focus on agentic systems, where LLMs can interact with external environments, such as IDEs or web browsers, and autonomously work through complex, open-ended tasks. With LLM-based agents, it becomes possible to consider tasks resembling daily developer work (e.g., SWE-bench). However, building agents also brings technical challenges: with interactive environments instead of static data points, data preparation, training, and evaluation require significant effort.
In this project, we invite you to work on technical tasks that arise in AI agents research and directly contribute to our team's ongoing projects (with potential publications). Your tasks could include:
Data filtering and preparation (e.g., gather agent trajectories for further fine-tuning or extend the existing dataset)
Work on evaluation infrastructure (e.g., support new benchmarks/new agentic scaffolds or scale existing infrastructure)
Fine-tuning models (e.g., SFT or DPO using established infrastructure)
Strong Python programming skills
Ability to work with existing codebases and learn unfamiliar frameworks
Experience with Docker and familiarity with data analysis and experiment tracking tools
Basic understanding of NLP, LLMs, and AI agents
Would be a plus:
Experience with our tech stack:
Agentic scaffolds: LangGraph
Training: VeRL & LLaMA-Factory
Inference: vLLM
Orchestration: Kubernetes, ZenML
Experience with AI agents
Experience with LLMs fine-tuning and evaluation
Relevant publications