Add Mellum (Mellum 2) model support by jedisct1 · Pull Request #1339 · ml-explore/mlx-lm

jedisct1 · 2026-06-02T09:00:05Z

Add support for the Mellum (Mellum 2) architecture

This adds a mellum.py model implementation so mlx-lm can load and run JetBrains'
Mellum 2 models, e.g. JetBrains/Mellum2-12B-A2.5B-Thinking
and JetBrains/Mellum2-12B-A2.5B-Instruct
(model_type: mellum, MellumForCausalLM).

Architecture

Mellum 2 is a Mixture-of-Experts model in the Qwen3 lineage with a couple of distinctive features:

MoE every layer — 64 experts, 8 active per token, with a linear router and softmax/top-k
renormalized gating (same block shape as qwen3_moe).
Per-head QK-norm RMSNorm on the query/key heads (Qwen3 style).
Hybrid attention — each layer is either sliding-window or full attention, driven by the
config's layer_types list (sliding_window = 1024). Implemented with the same
global/sliding mask split and KVCache / RotatingKVCache pairing as gemma3_text.
Per-layer RoPE — full-attention layers use YaRN scaling, sliding-window layers use plain
RoPE. Both rope configs come from the config's rope_parameters mapping keyed by attention type.

The implementation composes the well-tested qwen3_moe MoE/attention block with the gemma3_text
hybrid-mask scheme, plus per-layer rope selection via initialize_rope. The YaRN attention_factor
in the published config (1.2772588722239782) matches the YarnRoPE default mscale exactly.

Testing

Converted JetBrains/Mellum2-12B-A2.5B-Thinking to MLX (bf16) and confirmed weights load with no
missing or unexpected keys and that generation is coherent on reasoning and coding prompts.

nastya236 · 2026-06-06T14:00:56Z

Thanks for adding a new model.

I tried the thinking variant and the output loops on <|im_end|> / </tool_call> after the assistant turn ends:
mlx_lm.chat --model JetBrains/Mellum2-12B-A2.5B-Thinking --max-tokens 4096

>> Are you an instruct model?
<think>
bla bla bla (remove for shortness)
</think>

Yes, I am an instruct model. I'm designed to follow instructions, answer questions, and assist with a wide range of tasks based on user input. My purpose is to provide helpful, accurate, and context-aware responses. Let me know how I can assist you!<|im_end|>
</tool_call><|im_end|>
</tool_call><|im_end|>
</tool_call><|im_end|>
</tool_call><|im_end|>
</tool_call><|im_end|>
</tool_call><|im_end|>
</tool_call><|im_end|>
</tool_call><|im_end|>
</tool_call>
</think>

I think probable reason: Mellum2's generation_config.json has "eos_token_id": 0 (<|endoftext|>), so mlx-lm
never stops on the chat-end token.
I think it is not a blocking factor to merge the model, but if you plan to use it I'd recommend updating generation_config.json upstream so eos_token_id is a list that includes <|im_end|> (id 28) — e.g. "eos_token_id": [0, 28] (like Qwen3.6 adds additional <|im_end|> token to generation_config.json to stop generating when hit <|im_end|>).

nastya236 · 2026-06-06T14:06:43Z

Would you mind rebasing this branch?

Mellum 2 is a Qwen3-lineage Mixture-of-Experts model with MoE on every layer, per-head QK-norm, a hybrid of sliding-window and full-attention layers driven by layer_types, and per-layer RoPE where full-attention layers use YaRN scaling and sliding-window layers use plain RoPE. The rope settings come from the config's rope_parameters mapping keye by attention type.

jedisct1 · 2026-06-06T16:31:19Z

Rebased!

jedisct1 · 2026-06-06T16:34:16Z

And the MLX models at https://huggingface.co/jedisct1/models already have the tool calling fix.

jedisct1 force-pushed the add-mellum branch from ae50fc7 to b9a7962 Compare June 2, 2026 09:24

nastya236 added the enhancement New feature or request label Jun 6, 2026

nastya236 self-requested a review June 6, 2026 09:54

nastya236 approved these changes Jun 6, 2026

View reviewed changes

jedisct1 force-pushed the add-mellum branch from b9a7962 to 6622e42 Compare June 6, 2026 16:31

nastya236 merged commit e476a22 into ml-explore:main Jun 6, 2026
2 checks passed

nastya236 mentioned this pull request Jun 8, 2026

converted repos missing some eos ids for mlx-community/Ministral-3-3B-Instruct-2512-4bit, mlx-community/Qwen3.5-9B-MLX-4bit #1296

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Mellum (Mellum 2) model support#1339

Add Mellum (Mellum 2) model support#1339
nastya236 merged 1 commit into
ml-explore:mainfrom
jedisct1:add-mellum

jedisct1 commented Jun 2, 2026

Uh oh!

nastya236 commented Jun 6, 2026

Uh oh!

nastya236 commented Jun 6, 2026

Uh oh!

jedisct1 commented Jun 6, 2026

Uh oh!

jedisct1 commented Jun 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jedisct1 commented Jun 2, 2026

Add support for the Mellum (Mellum 2) architecture

Architecture

Testing

Uh oh!

nastya236 commented Jun 6, 2026

Uh oh!

nastya236 commented Jun 6, 2026

Uh oh!

jedisct1 commented Jun 6, 2026

Uh oh!

jedisct1 commented Jun 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants