The Mellum team aims to train the best possible code models for AI Assistant features, including completion, next-edit suggestion, AI chat, and agents. Our key constraint is inference efficiency: we want to offer our models to users at no additional cost. This project explores knowledge distillation methods to elicit higher-quality responses from small models.
All development will be done in our Mellum code repository, where we use NVidia Nemotron-Bridge for training, interfacing with a k8s GPU cluster.
Apply knowledge distillation in both pre- and post-training settings.
Survey and interpret literature on knowledge distillation.
Design and run experiments on real models with real training infrastructure.
Evaluate results and iterate on findings.
Solid foundational understanding of deep learning.
Experience with transformer-based language models, and frameworks like pytorch.
Good programming skills in Python.
Working proficiency with GPU clusters.