refactor(aliases): explicit quantization suffix on every alias (BREAKING) by raullenchai · Pull Request #547 · raullenchai/Rapid-MLX

raullenchai · 2026-06-10T00:09:36Z

Summary

Every alias in aliases.json now carries an explicit quantization suffix — qwen3.5-4b-4bit instead of qwen3.5-4b, gemma-4-12b-qat-8bit instead of gemma-4-12b-qat. The bit width is readable off the alias rather than hidden in hf_path.

BREAKING — legacy short names no longer resolve. Migration: append the canonical quant suffix (rapid-mlx models lists the new names).

Naming template

Documented in a new README "Naming convention" section:

<family>-<version>-<params>-<modality?>-<technique?>-<quant>

Mirrors LM Studio's …-MLX-4bit / …-MLX-8bit HuggingFace convention. Quant suffix is mandatory; technique (-qat, -distill) stacks before quant.

Quant vocabulary formalised: -4bit, -8bit, -2bit, -3bit, -6bit, -mxfp4, -mxfp4-q8, -dwq, -ud, -unpacked.

Schema bug fixed

phi4-14b was pointing at mlx-community/phi-4-mini-instruct-4bit (a ~4B model). Now:

phi-4-14b-4bit → mlx-community/phi-4-4bit (real 14B)
phi-4-mini-4bit → mlx-community/phi-4-mini-instruct-4bit (new alias for the mini)

Codename aliases dropped (duplicate hf_path of an explicit entry)

Dropped	References resolve to
`deepseek-v4-flash`	`deepseek-v4-flash-8bit`
`gemma4`	`gemma-4-12b-qat-4bit`
`nemotron-nano`	`nemotron-30b-4bit`

The gemma4 parser identifier (registered in gemma4_tool_parser.py) is untouched — that's a parser family, not an alias.

Sweep scope

107 files changed: aliases.json, README.md, 100 active source files (tests, docs, scripts, install.sh, Issue templates, harness/scorecard/latest.md), 5 generation_config fixtures renamed, 2 harness baselines renamed.
Skipped on purpose — historical snapshots whose model field records the alias the bench was originally run under:
- evals/results/*.json (83 files)
- harness/runs/**
- reports/mhi/*.json
- reports/benchmarks/**

Tooling (gitted so the operation is reproducible)

scripts/rename_aliases.py — regenerates aliases.json + rename_map.json from a small rule set.
scripts/sweep_alias_refs.py — applies the rename map to every active file with the exclusion list above.

Test plan

pytest tests/ → 4793 passed, 11 skipped, 7 xfailed (no regressions)
ruff check clean
ruff format clean
rapid-mlx models lists 72 aliases, all explicit
rapid-mlx info gemma-4-12b-qat-4bit resolves to mlx-community/gemma-4-12B-it-qat-4bit
rapid-mlx info phi-4-14b-4bit resolves to mlx-community/phi-4-4bit (real 14B)
pr_validate SOP run (round 1 + 2)
make check SKIPPED — alias rename is surface-touching, not inference-path

🤖 Generated with Claude Code

…uffix BREAKING — every legacy short alias has been renamed to its canonical explicit form. ``rapid-mlx serve qwen3.5-4b`` no longer works; use ``rapid-mlx serve qwen3.5-4b-4bit``. ``rapid-mlx models`` lists the 72 new names. Naming template (now documented in README "Naming convention"): <family>-<version>-<params>-<modality?>-<technique?>-<quant> The quantization suffix is mandatory — mirrors LM Studio's ``…-MLX-4bit`` / ``…-MLX-8bit`` HuggingFace convention so the bit width is readable off the alias instead of hidden in ``hf_path``. Aliases renamed: 51 (e.g. ``qwen3.5-4b`` → ``qwen3.5-4b-4bit``, ``gemma-4-12b-qat`` → ``gemma-4-12b-qat-4bit``). Aliases already explicit: 23 (e.g. ``qwen3.5-4b-8bit``, ``deepseek-v4-flash-2bit``). Aliases added: 1 (``phi-4-mini-4bit``, separating the 4B mini from the real Phi-4 14B — see phi4-14b fix below). Codename aliases dropped: 3 * ``deepseek-v4-flash`` — duplicate hf_path of ``deepseek-v4-flash-8bit``; references now resolve to that name. * ``gemma4`` — duplicate hf_path of ``gemma-4-12b-qat-4bit``; references resolve to that name. (The ``gemma4`` *parser* identifier is untouched — that's the tool/reasoning parser family, not an alias.) * ``nemotron-nano`` — duplicate hf_path of ``nemotron-30b-4bit``; references resolve to that name. Schema bug fixed: ``phi4-14b`` was pointing at ``mlx-community/phi-4-mini-instruct-4bit`` (a ~4B model). It is now ``phi-4-14b-4bit`` pointing at ``mlx-community/phi-4-4bit`` (the real 14B). The old mini target moves to ``phi-4-mini-4bit``. Non-bit-width quant suffixes formalised: ``-mxfp4``, ``-mxfp4-q8``, ``-dwq``, ``-ud``, ``-3bit``, ``-6bit``, ``-unpacked`` (Bonsai's no-quantization tier). Picked from HF community / mlx-community / LM Studio conventions. Scope of the sweep (107 files): * vllm_mlx/aliases.json — the 51 renames + 3 drops + 1 new. * README.md — new "Naming convention" section + family lineup table rewritten with the explicit names. * 100 active source files — tests, docs, scripts, install.sh, Issue templates, harness/scorecard/latest.md. * tests/fixtures/generation_configs/*.json — file basenames renamed alongside. * harness/baselines/full-qwen3.5-35b.json → -8bit; .6-35b → -4bit. * vllm_mlx/cli.py: Alias column widened 22 → 24 to fit the longest new name (``deepseek-v4-flash-8bit``). Deliberately NOT swept (historical snapshots whose ``model`` field records the alias under which the benchmark was originally run — rewriting them is rewriting history): * evals/results/*.json (83 files) * harness/runs/** (timestamped doctor harness runs) * reports/mhi/*.json (timestamped MHI reports) * reports/benchmarks/** (README-refresh bench snapshots) Tools introduced for the rename (gitted so the operation is reproducible and future renames can reuse the machinery): * scripts/rename_aliases.py — generates the new aliases.json + scripts/rename_map.json from a small declarative rule set. * scripts/sweep_alias_refs.py — applies the rename map to every active source file, with the EXCLUDED prefixes above. Tests: 4793 passed, 11 skipped, 7 xfailed (no regressions). Lint: ``ruff check`` clean. Format: ``ruff format`` clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-06-10T00:10:25Z

PR #547 validation scorecard

Title: refactor(aliases): explicit quantization suffix on every alias (BREAKING)
Author: raullenchai
Diff: 100 file(s), +1356/-746 LOC, blast radius: high

Verdict: MERGE-SAFE

step	status	summary	time
`fetch`	PASS	100 files, +1356/-746 LOC, blast=high	0.8s
`test_plan_check`	PASS	all 8 test-plan item(s) checked	0.0s
`cl_description_quality`	PASS	title OK + body has rationale (2769 chars)	0.0s
`codex_review`	skip	codex CLI not found on PATH (install: `npm i -g @openai/codex`)	0.0s
`supply_chain`	PASS	no hooks touched, no suspicious patterns, deps clean	8.0s
`lint`	PASS	clean (53 file(s))	0.1s
`targeted_tests`	skip	too many test targets (42) — deferring to full_unit step	0.0s

Codex BLOCKING (round 1, pr_validate): - ``tests/test_aliases_contract.py:455-457`` had duplicate ``"nemotron-30b-4bit"`` keys after the sweep collapsed both ``nemotron-30b`` and ``nemotron-nano`` to the same canonical name. Python silently overwrote the first key with the second, so the test no longer pinned both pre-rename cases. Collapsed to a single entry (the post-rename registry now contains only one nemotron alias). - ``tests/test_model_profiles_ssot.py:90-91`` had the same duplicate-in-tuple-iteration pattern. Same fix. - ``test_reverse_lookup_for_shared_hf_path_is_deterministic`` and ``test_reverse_lookup_handles_deepseek_v4_flash_duplicate`` were pinning the tie-break for two now-removed duplicate-hf_path pairs (``nemotron-30b`` / ``nemotron-nano`` and ``deepseek-v4-flash`` / ``deepseek-v4-flash-8bit``). After the rename, no pair of aliases shares an hf_path, so the tie-break is unreachable from the live registry. Removed both tests with a comment pointing at the remaining reverse-lookup mechanism test (``test_reverse_lookup_index_built_once_after_first_load``). Lint (ruff check + ruff format): - Auto-fixable F401 / F541 / I001 across the 4 PR-touched scripts (``bench_engine_parity.py``, ``bench_readme_refresh.py``, ``local_bench_vs_ollama.py``, ``mhi_eval.py``). These were pre-existing issues the sweep re-surfaced. - Manual fixes inside ``scripts/mhi_eval.py``: * ``from tau_bench.types import EnvRunResult`` is an availability probe — annotated ``# noqa: F401``. * E741 single-letter ``l`` rebound to ``ch``. - Ruff format applied to the 7 touched .py files. Full-unit (e2e): - ``test_weather_with_fallback`` + ``test_multi_step_tool_chain`` failed once on the initial run, both passed on local rerun against the same live qwen3.5-4b-4bit server. These are model-behaviour tests (which tool name the model picks for a given prompt) and are flakey by design — not caused by this PR. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Codex BLOCKING (round 2, pr_validate): - ``tests/parsers/regressions/test_issue_513_harmony_streamable_parser.py:702`` — the sweep rewrote the canonical spoofing example ``evil-org/gpt-oss-20b`` into ``evil-org/gpt-oss-20b-mxfp4-q8``. The spoof shape (third-party org publishing under the same bare repo name OpenAI uses) is the exact case this matcher must reject; the alias-suffixed variant tests a strictly easier case. Restored the canonical form. (Adding the suffixed variant separately would be redundant — the matcher already covers the broader shape.) - Same file line 716 — the sweep also rewrote ``openai/gpt-oss-20b`` (OpenAI's real bare repo id on HuggingFace) into the rapid-mlx alias ``openai/gpt-oss-20b-mxfp4-q8``. The bare repo id is what the matcher actually sees from upstream tokenizers, so dropping the unsuffixed form would have left a gap. Restored the bare repo id and the bare-suffix variants of the local-path examples (``/models/gpt-oss-20b``, etc.) alongside the alias-suffixed cases the sweep wrote. Codex NIT (round 2, pr_validate): - ``harness/README.md:104`` previously said the ``full`` tier's two baselines are "both 8-bit". After the rename, ``qwen3.6-35b`` is the 4-bit variant. Reworded to call out each model's quant. - ``scripts/rename_aliases.py`` — the ``dropped`` counter always printed 0 because dropped codename aliases store their redirect target as a non-None string in ``rename_map``. Reworked the three counters (renamed / dropped / kept) to compute from the input data's perspective so they always sum back to the input alias count and ``dropped`` is the real number of MANUAL ``drop=True`` specs the script processed. Verified: against ``main``'s aliases.json the script prints ``48 renamed, 3 dropped, 23 kept`` (= 74). - ``scripts/sweep_alias_refs.py`` — the comment promised a "hand-written pass below" for ``gemma4`` that did not exist (the sweep deliberately leaves every ``gemma4`` occurrence alone because the literal is also the parser ID). Reworked the comment to make the no-op intent and reason explicit so a future maintainer doesn't hunt for a missing implementation. Tests: 4789 passed, 11 skipped, 7 xfailed. ``test_simple_exec`` / ``test_multi_step_tool_chain`` flaked again (model-behaviour pick of tool name varies run-to-run); rerunning against the same live server passes both — same as round 1. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Codex round-3 NIT: the checked-in ``rename_map.json`` was the result of an idempotent rerun against the already-renamed ``aliases.json``, so every entry was an identity mapping (e.g. ``qwen3.5-4b-4bit`` → ``qwen3.5-4b-4bit``). A maintainer running ``scripts/sweep_alias_refs.py`` from a pre-rename checkout to verify the operation is reproducible would see the sweep do nothing because no legacy name (``qwen3.5-4b``, ``gemma4``, ``nemotron-nano``, …) was in the map. Regenerated from ``main``'s ``vllm_mlx/aliases.json`` so the map now contains the real 74-entry legacy → canonical mapping plus the three dropped codename redirects (``deepseek-v4-flash`` → ``deepseek-v4-flash-8bit``, ``gemma4`` → ``gemma-4-12b-qat-4bit``, ``nemotron-nano`` → ``nemotron-30b-4bit``). The current ``aliases.json`` is untouched — only the auxiliary map file used by the sweep tool changes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… multi-tool PydanticAI defaults max_tokens to ~1024. On verbose 4B-class models (qwen3.5-4b-4bit) the multi-turn and sequential-tool-call test paths spill past the cap and PydanticAI raises ``Model token limit (provider default) exceeded`` before any response is generated. That ceiling is a client-side default, not a rapid-mlx server contract, so the SDK integration test should bypass it: pass ``model_settings={"max_tokens": 2048}`` on tests 5 and 6. release-check-m3 G7 PydanticAI now 6/6 PASS on qwen3.5-4b-4bit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Major release — every alias in ``vllm_mlx/aliases.json`` now carries an explicit quantization suffix (``-4bit`` / ``-8bit`` / ``-mxfp4`` / ``-dwq`` / etc.), no implicit-quant short forms remain. The three legacy codename aliases (``deepseek-v4-flash``, ``gemma4``, ``nemotron-nano``) were dropped; the ``phi4-14b`` schema bug (name claimed 14B but hf_path pointed at phi-4-mini ~4B) was fixed by renaming to ``phi-4-14b-4bit`` AND swapping hf_path to the real Phi-4 14B; ``phi-4-mini-4bit`` was added to preserve the small-model entry. README now documents the 7-segment naming template ``<family>-<version>-<params>-<modality?>-<technique?>-<quant>`` and the canonical quant-suffix table. Total: 74 → 72 aliases. Old short names are not deprecated — they're just gone, per user direction ("没有多少用户"). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

raullenchai and others added 5 commits June 9, 2026 17:38

raullenchai merged commit 806fefe into main Jun 10, 2026
15 checks passed

raullenchai deleted the feat/explicit-alias-naming branch June 10, 2026 01:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(aliases): explicit quantization suffix on every alias (BREAKING)#547

refactor(aliases): explicit quantization suffix on every alias (BREAKING)#547
raullenchai merged 6 commits into
mainfrom
feat/explicit-alias-naming

raullenchai commented Jun 10, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 10, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

raullenchai commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Naming template

Schema bug fixed

Codename aliases dropped (duplicate hf_path of an explicit entry)

Sweep scope

Tooling (gitted so the operation is reproducible)

Test plan

Uh oh!

github-actions Bot commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR #547 validation scorecard

Verdict: MERGE-SAFE

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

raullenchai commented Jun 10, 2026 •

edited

Loading

github-actions Bot commented Jun 10, 2026 •

edited

Loading