While the rise of Large Language Models (LLMs) has led to powerful new developer tools, the increasing reliance on these cloud-based services presents a significant security threat. For instance, code completion queries sent from a developer's machine can leak unique statistical patterns, potentially enabling the service provider or an adversary to reconstruct the developer's proprietary codebase. One of JetBrains Research's core goals is to build the knowledge and develop the tools necessary to safely and transparently integrate AI within IDEs, primarily by exploring effective privacy-preserving techniques.
The project will research novel privacy-preserving techniques to prevent statistical leakage and query reconstruction in cloud-based LLMs, ensuring developer code remains secure.
Good understanding of machine learning basics and evaluation
Strong background in statistics, probability and linear algebra
Strong coding skills in Python and familiarity with ML prototyping
Familiarity with basic model training and evaluation of LLMs
Understanding of formal privacy models or information theory