On JetBrains Marketplace, we have thousands of plugins. But we lack a reliable, scalable way to (1) detect near-duplicate plugins, (2) quantify real code changes between releases, and (3) validate some author claims. This creates risk (disguised updates, IP theft), slows reviews, and blocks safe auto-approvals in some cases.
Proposal: compute a privacy-preserving structural fingerprint for every plugin binary and use it for similarity search and change analysis, integrated into submission review.
We have an existing static analyzer Plugin Verifier, that can be used as the extraction layer.
As our team member, you will work on:
Generation of a hashed, non-reversible “Code DNA” from the plugin bytecode structure (classes, signatures, inheritance, API references), leveraging the existing Plugin Verifier (PV) model, and output as the feature source
Storing of per-version fingerprints; compute similarity across Marketplace and churn between releases
Integration of the tool into the Marketplace processes: flag large/suspicious ones, surface top similar plugins and API footprint
Proficient in Kotlin or Java and basic concurrency
Knowledge of set-similarity and hashing techniques and the approximate nearest neighbor algorithms
Familiarity with Plugin Verifier or JVM class/bytecode reading concepts is a plus