JetBrains Internship Application

Pingback: development process mining

Description

Motivation

IDE development isn’t happening purely in code reviews. In fact, the process mostly consists of feature requests and bug reports, and writing and reviewing code itself is just a small part, targeted at achieving goals set by either of the former.

While we keep long track of interactions with our users, we seem to underutilize that data. The issue is that the number of interactions that happen every day make it challenging to extract the insight about what is important and what is not.

If a ticket gets a lot of upvotes, does it mean that users at scale will in fact benefit if we implement the feature? How does development cost compare to the loss due to ignoring the issue? What if we observe a lot of negative comments, mentioning the refusal to renew the license? Do these comments translate into actual loss for the business, or these are just a small fraction of loud and angry users?

Those are questions that everyone has some opinion about and acts upon this opinion, but we could answer them with the data.

Baseline

The baseline approach that we have in mind is to merge our issue tracker data with various business and usage metrics. We want to establish causal links between comments/issues sentiment and future perceived quality, expressed in either usage patterns or purchasing behaviour.

Our intuition is that users who visit our issue tracker are the most loyal ones, so their perspective should scale on a wider audience.

The main focus of this project is on a user sentiment, expressed in a free form text. Consequently, some feature engineering workload is expected here. We specifically don’t impose any limitations on “how” the resulting model should work. We can start small with some simple indicators and a regression, and incrementally go all the way up to some sophisticated LLM-powered survival analysis.

One of the features of issue tracker data is that it contains reports for our EAP builds. These builds are testing polygons, teams try out different changes in EAP builds, collect feedback and either proceed with them or postpone the changes. They are free to use for everyone, and users opt in to sending usage statistics on them.

This makes EAPs both:
• Feedback-rich, we have both explicit feedback from the issue tracker and implicit from usage statistics. Though we don’t have direct correspondence between issue creator/commenter and IDE user, we can still assess the prevalence of some large-scale behaviour.
• Predictive, the quality of EAP builds, feedback on them and reaction to the feedback directly it influences what will be shipped in future release versions and how it might be perceived.

Extensions

We want to start this project with something rather limited and focused, but the scope can be extended with backward-looking indicators, such as support requests or social media postings, along with some harder-to-extrack forward-looking things like internal discussions. That said, this is a pilot project, so we want to validate the initial approach first, before diving in into more sophisticated data sources.

What's in there for an intern?

This project presents the opportunity to work on potentially impactful research for JetBrains day- to-day operations. The work will be done in tight collaboration with the data analytics team, which has both rich research experience and accumulated domain knowledge.

Requirements

Candidate should at least have:

Basic NLP knowledge or curiosity

Experience in time series analysis

Admission

Internship projects 2025-2026

Contact details

internship@jetbrains.com

Preferred internship location

Armenia

Cyprus

Czechia

Germany

Netherlands

Poland

Serbia

Technologies

Natural languages

SQL

Virtualisation

Area

Data Science

Machine Learning

Research

Internship timing preferences

Flexible start

Candidate graduation status

Final-year students preferred