Scope quantization-config served-id inference to Qwen3.6-27B by wwadge · Pull Request #77 · youssofal/MTPLX

wwadge · 2026-05-21T07:58:54Z

Summary

_public_model_id_from_metadata was treating the config.json quantization layout (e.g. Q4 weights with a Q8 head) as proof the model was a Qwen3.6-27B MTPLX artifact, so unrelated third-party builds got served under one of the mtplx-qwen36-27b-* ids on /v1/models.
Concretely, mtplx serve --model .../samuelfaj/Qwen3.6-35B-A3B-4bit-MTPLX-Optimized-Speed was returning {"id":"mtplx-qwen36-27b-optimized-quality", ...}.
Gate the quantization-config fallback on _public_model_id_from_name already identifying the folder as a Qwen3.6-27B MTPLX variant. Explicit metadata fields (public_model_id, served_model_id, model_id, precision_variant, artifact_role, verified_on.model) still take precedence, so the existing "runtime metadata before folder name" behavior is preserved. Unknown folders fall through to the basename-sanitized id (qwen3.6-35b-a3b-4bit-mtplx-optimized-speed).

Test plan

pytest tests/test_default_models.py (all 42 tests pass, including the new regression covering the third-party 35B-A3B Q4/Q8 case)
pytest tests/test_public_cli.py::test_quality_model_ref_uses_quality_public_model_id tests/test_public_cli.py::test_legacy_optimized_model_ref_uses_neutral_public_model_id tests/test_public_cli.py::test_explicit_model_id_wins_over_loaded_artifact_identity
Manual: start mtplx serve --model .../Qwen3.6-35B-A3B-4bit-MTPLX-Optimized-Speed and confirm /v1/models reports qwen3.6-35b-a3b-4bit-mtplx-optimized-speed.

🤖 Generated with Claude Code

The config.json quantization layout (Q4 weights with a Q8 head, flat Q8, etc.) is shared across many MLX builds, so it cannot identify the model family on its own. _public_model_id_from_metadata was using it as the last-resort signal and unconditionally returning one of the mtplx-qwen36-27b-* ids, so a third-party artifact like Qwen3.6-35B-A3B-4bit-MTPLX-Optimized-Speed was being served as mtplx-qwen36-27b-optimized-quality on /v1/models. Gate that fallback on _public_model_id_from_name already identifying the folder as a Qwen3.6-27B MTPLX variant. Explicit metadata fields (public_model_id, served_model_id, model_id, precision_variant, artifact_role, verified_on.model) still take precedence, so the existing "runtime metadata before folder name" behavior is preserved. Unknown folders now fall through to the basename-sanitized id, e.g. qwen3.6-35b-a3b-4bit-mtplx-optimized-speed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

wwadge · 2026-05-21T07:59:47Z

I had trouble serving 35b via --model, this fixed it.

wwadge requested a review from youssofal as a code owner May 21, 2026 07:58

This was referenced May 21, 2026

Reproducing the 2.24× speedup: documented MLX patch alone doesn't get there — please publish the actual fork commit #64

Open

Feature Request: Native MTPLX conversion for Qwen3.6-35B-A3B (Q2_K_XL / IQ4_XS) #85

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scope quantization-config served-id inference to Qwen3.6-27B#77

Scope quantization-config served-id inference to Qwen3.6-27B#77
wwadge wants to merge 1 commit into
youssofal:mainfrom
wwadge:fix-served-model-id-third-party-quant

wwadge commented May 21, 2026

Uh oh!

wwadge commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

wwadge commented May 21, 2026

Summary

Test plan

Uh oh!

wwadge commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant