Support Qwen3.5-MoE MoE MTP heads by janfeddersen-wq · Pull Request #84 · youssofal/MTPLX

janfeddersen-wq · 2026-05-25T13:00:51Z

Summary

Adds support for Qwen3.5-MoE MTP heads to the native Qwen MTP path.

The MTP head on Qwen3.5-MoE checkpoints (e.g. Qwen/Qwen3.5-122B-A10B) is itself an MoE block — router gate + per-expert MLPs + a shared_expert and shared_expert_gate — whereas the existing path assumed a dense single-MLP head. As a result these models were recognized as qwen3-next-mtp but rejected at the tensor gate with invalid-mtp-tensor-layout.

No new backend is required: mlx-lm's qwen3_5 DecoderLayer already instantiates SparseMoeBlock when num_experts > 0, so the MTP block builds correctly once the weights are stacked. The change is two small, dense-safe additions:

artifacts.py — for MoE heads, derive the expected MTP key set from num_experts (mtp.layers.*.mlp.{gate, experts.{i}.*, shared_expert.*, shared_expert_gate}) instead of the fixed dense 15, so inspect's tensor gate passes. No-op for dense heads.
mtp_patch.py — _stack_mtp_moe_experts stacks per-expert mtp.layers.*.mlp.experts.{i}.{proj} tensors into the switch_mlp layout SwitchGLU expects, mirroring the stacking mlx-lm performs for the main decoder layers. No-op for dense heads.
Tests for the MoE tensor gate and the expert stacking.

Verification

Tested on Qwopus3.5-122B-A10B (qwen3_5_moe, 256 experts / 8 active), bf16 MTP sidecar grafted onto a 4-bit base:

mtplx inspect → can_run: true, tensor gate 785/785, 0 missing / 0 extra.
120-token greedy run: accepted_by_depth = [40, 19, 3] of [57, 57, 56] drafted → ~70% depth-1 acceptance, 120 tokens in 57 verify passes (~2.1 tokens/verify). A mis-loaded MoE head would accept ~0%.
pytest tests/test_artifacts.py tests/test_mtp_patch.py green.

Known limitation — MoE exactness

At temperature 0, MTP vs non-MTP greedy decode is ~98% identical and re-converges immediately, but occasionally flips a single token. This is the MoE router hitting a near-tie that resolves differently under batched verification vs single-token autoregressive decode (a known MoE/FP effect), not a drafting error. The max_diff = 0.0 exactness guarantee was established on a dense model; strict bit-exactness for MoE heads likely needs separate handling (e.g. fp32 router logits during verify). Flagging for discussion — happy to follow up.

🤖 Generated with Claude Code

The native Qwen MTP path assumed a dense single-MLP MTP head: a fixed 15-tensor gate and no expert stacking. Qwen3.5-MoE checkpoints whose MTP head is itself an MoE block (router gate + per-expert MLPs + a shared expert / shared_expert_gate) were therefore rejected with `invalid-mtp-tensor-layout`, even though the runtime can already build the block -- mlx-lm's qwen3_5 DecoderLayer instantiates SparseMoeBlock whenever num_experts > 0. - artifacts: derive the expected MTP key set from num_experts for MoE heads instead of the hard-coded dense 15, so the tensor gate passes (no-op for dense heads). - mtp_patch: stack per-expert mtp.layers.*.mlp.experts.{i}.* weights into the switch_mlp layout SwitchGLU expects, mirroring the stacking mlx-lm performs for the main decoder layers (no-op for dense heads). - tests for the MoE tensor gate and the expert stacking. Verified on Qwopus3.5-122B-A10B (qwen3_5_moe, 256 experts, bf16 MTP sidecar grafted onto a 4-bit base): `mtplx inspect` passes (785/785 tensors) and MTP speculative decoding runs with ~70% depth-1 acceptance (~2.1 tokens per target verify pass).

janfeddersen-wq requested a review from youssofal as a code owner May 25, 2026 13:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Qwen3.5-MoE MoE MTP heads#84

Support Qwen3.5-MoE MoE MTP heads#84
janfeddersen-wq wants to merge 1 commit into
youssofal:mainfrom
janfeddersen-wq:qwen3-5-moe-mtp

janfeddersen-wq commented May 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

janfeddersen-wq commented May 25, 2026

Summary

Verification

Known limitation — MoE exactness

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant