Skip to content

Scope quantization-config served-id inference to Qwen3.6-27B#77

Open
wwadge wants to merge 1 commit into
youssofal:mainfrom
wwadge:fix-served-model-id-third-party-quant
Open

Scope quantization-config served-id inference to Qwen3.6-27B#77
wwadge wants to merge 1 commit into
youssofal:mainfrom
wwadge:fix-served-model-id-third-party-quant

Conversation

@wwadge
Copy link
Copy Markdown

@wwadge wwadge commented May 21, 2026

Summary

  • _public_model_id_from_metadata was treating the config.json quantization layout (e.g. Q4 weights with a Q8 head) as proof the model was a Qwen3.6-27B MTPLX artifact, so unrelated third-party builds got served under one of the mtplx-qwen36-27b-* ids on /v1/models.
  • Concretely, mtplx serve --model .../samuelfaj/Qwen3.6-35B-A3B-4bit-MTPLX-Optimized-Speed was returning {"id":"mtplx-qwen36-27b-optimized-quality", ...}.
  • Gate the quantization-config fallback on _public_model_id_from_name already identifying the folder as a Qwen3.6-27B MTPLX variant. Explicit metadata fields (public_model_id, served_model_id, model_id, precision_variant, artifact_role, verified_on.model) still take precedence, so the existing "runtime metadata before folder name" behavior is preserved. Unknown folders fall through to the basename-sanitized id (qwen3.6-35b-a3b-4bit-mtplx-optimized-speed).

Test plan

  • pytest tests/test_default_models.py (all 42 tests pass, including the new regression covering the third-party 35B-A3B Q4/Q8 case)
  • pytest tests/test_public_cli.py::test_quality_model_ref_uses_quality_public_model_id tests/test_public_cli.py::test_legacy_optimized_model_ref_uses_neutral_public_model_id tests/test_public_cli.py::test_explicit_model_id_wins_over_loaded_artifact_identity
  • Manual: start mtplx serve --model .../Qwen3.6-35B-A3B-4bit-MTPLX-Optimized-Speed and confirm /v1/models reports qwen3.6-35b-a3b-4bit-mtplx-optimized-speed.

🤖 Generated with Claude Code

The config.json quantization layout (Q4 weights with a Q8 head, flat Q8,
etc.) is shared across many MLX builds, so it cannot identify the model
family on its own. _public_model_id_from_metadata was using it as the
last-resort signal and unconditionally returning one of the
mtplx-qwen36-27b-* ids, so a third-party artifact like
Qwen3.6-35B-A3B-4bit-MTPLX-Optimized-Speed was being served as
mtplx-qwen36-27b-optimized-quality on /v1/models.

Gate that fallback on _public_model_id_from_name already identifying the
folder as a Qwen3.6-27B MTPLX variant. Explicit metadata fields
(public_model_id, served_model_id, model_id, precision_variant,
artifact_role, verified_on.model) still take precedence, so the existing
"runtime metadata before folder name" behavior is preserved. Unknown
folders now fall through to the basename-sanitized id, e.g.
qwen3.6-35b-a3b-4bit-mtplx-optimized-speed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@wwadge wwadge requested a review from youssofal as a code owner May 21, 2026 07:58
@wwadge
Copy link
Copy Markdown
Author

wwadge commented May 21, 2026

I had trouble serving 35b via --model, this fixed it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant