
What is the Mellum2 AI Model?
The Mellum2 AI model, released by JetBrains, is a specialized 12 billion parameter model designed for software engineering tasks. It uses a Mixture-of-Experts (MoE) architecture, enabling it to perform efficiently within multi-model AI pipelines. The model excels in tasks like code generation, debugging, and conversational programming assistance.
According to JetBrains, Mellum2 serves as a “focal model” within larger AI systems, offering specialized capabilities without replacing more comprehensive frontier models. It is open-sourced under the Apache 2.0 license, allowing developers broad access to its capabilities.
How Does the Mixture-of-Experts Architecture Work in Mellum2?
The Mixture-of-Experts (MoE) architecture in Mellum2 features 12 billion parameters but activates only 2.5 billion per token, optimizing computational efficiency. This is achieved by selecting 8 active experts out of 64 per token, ensuring high specialization without excessive computational demand.
Mellum2’s architecture includes 28 layers and uses Grouped-Query Attention with 32 query heads. Its context length extends to 131,072 tokens, supported by Sliding Window Attention, which enables efficient processing of large data inputs.
What are the Benchmarks for Mellum2?
JetBrains reports that Mellum2 performs competitively on various benchmarks against other models in the 4B–14B range. In coding tasks, Mellum2 scored 78.4 on EvalPlus and 67.1 on MultiPL-E. For tool use, it achieved 66.3 on BFCL v3. In math tasks, Mellum2 scored 41.7 on AIME 2025+2026.
The model also showed strong performance in knowledge and conversational tasks, with a 78.1 score on MMLU-Redux, demonstrating its capability across diverse AI challenges.
Frequently Asked Questions
What is the Mellum2 model used for?
Mellum2 is designed for software engineering tasks such as code generation, debugging, and conversational programming. It serves as a focal component within larger AI systems.
What is the architecture of Mellum2?
Mellum2 uses a Mixture-of-Experts architecture with 12 billion total parameters, activating 2.5 billion per token. It includes 28 layers, Grouped-Query Attention, and a context length of 131,072 tokens.
How does Mellum2 perform in benchmarks?
Mellum2 shows competitive performance in coding, tool use, math, and knowledge tasks, with notable scores on EvalPlus, MultiPL-E, BFCL v3, and MMLU-Redux benchmarks.
Sources






