fix(load): VLM model loading fixes for oQ-quantized checkpoints by a4501150 · Pull Request #1247 · jundot/omlx

a4501150 · 2026-05-13T18:02:44Z

Summary

Expand per-layer quantization config keys for VLM model-tree paths so quantization config matches the MLX model parameter hierarchy (e.g. language_model.model.layers.N vs model.layers.N)
Centralise pre-load patches in oQ _measure_sensitivity so MTP/nested-visual patches are active during sensitivity measurement
Remap nested visual keys (language_model.model.visual.* → vision_tower.*) for MLX-format VLM models where mlx-vlm skips Model.sanitize
Fix nested-visual patch idempotency: use function-attribute marker instead of module-level flag so the wrap can re-apply if another patch (e.g. MTP runtime) overwrites Model.sanitize
Add inline nested-visual post-fixup in all three MTP sanitize functions

Background on nested visual bug

Qwen3.6-35B-A3B nests ViT weights at model.language_model.visual.*. mlx-vlm's sanitize uses if/elif that matches model.language_model first, rewriting to language_model.model.visual.* — the model.visual → vision_tower branch never fires. For non-MLX-format models, the existing qwen3_6_nested_visual sanitize wrap catches this. But mlx-vlm skips sanitize entirely for MLX-format checkpoints (format=mlx in safetensors metadata), so oQ output models fail with "333 parameters not in model". The new _remap_nested_visual_on_load context manager intercepts nn.Module.load_weights during the scoped vlm_load() call to remap keys before they reach the model.

Test plan

pytest tests/test_oq.py -v
Server loads Qwen3.6-35B-A3B-uncensored-heretic (non-MLX format) — nested-visual sanitize wrap fires
Server loads Qwen3.6-35B-A3B-uncensored-heretic-oQ6-mtp (MLX format) — _remap_nested_visual_on_load remaps 333 keys

🤖 Generated with Claude Code

- Expand per-layer quant keys for VLM model-tree paths so quantization config matches the MLX model parameter hierarchy - Centralise pre-load patches in oQ _measure_sensitivity - Remap nested visual keys (language_model.model.visual.* -> vision_tower.*) for MLX-format VLM models where mlx-vlm skips Model.sanitize - Fix nested-visual patch idempotency: use function-attribute marker instead of module-level flag - Add inline nested-visual post-fixup in MTP sanitize functions Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

jundot · 2026-05-14T15:22:22Z

Thanks for this, the load-path fixes look solid and the nested-visual / quant-key remapping all check out.

Merging this with two follow-ups from me, no action needed on your side:

Dropping the new TestBuiltinCalibration class. It expects mixed/chat/bartowski categories and 3000+ samples, but the oq_calibration_data.json in the repo has 7 categories / ~600 samples — looks like the updated JSON wasn't part of the diff. The reasoning addition in _load_builtin_calibration is fine since that key already exists in the shipped JSON.
_measure_sensitivity now goes through maybe_apply_pre_load_patches, which leaves mtp_active=False and doesn't apply the mlx-vlm runtime patch. For an MLX-format VLM checkpoint with MTP heads, mlx-vlm skips sanitize entirely, so the language_model.mtp.* weights stay in the dict but no MTP head is attached — load_weights then rejects them and sensitivity silently returns empty. The text path is fine (the patched qwen35_model.sanitize self-consistently strips mtp.* when no head). I'll restore the head attachment for the VLM path in a follow-up.

…on tests PR #1247 routed _measure_sensitivity through maybe_apply_pre_load_patches, which leaves mtp_active False. For MLX-format VLM checkpoints mlx-vlm skips sanitize, so the language_model.mtp.* weights stay in the dict but no head gets attached. load_weights then rejects them and sensitivity silently returns {}. Re-apply the mlx-vlm runtime MTP patch and set mtp_active True for the VLM load when the source declares MTP heads. The text path is unchanged since the patched qwen35_model.sanitize already strips mtp.* when no head is attached. Also drop TestBuiltinCalibration. It expects calibration categories and sample counts that the shipped oq_calibration_data.json does not have (the updated JSON was not part of #1247). And update the #1204 discovery-failure test for the new sensitivity-before-discovery ordering.

a4501150 · 2026-05-14T15:32:45Z

thanks @jundot

For the first one - that should belong to this pr #1246 - cherry picked the wrong files. If we decide to merge that maybe a follow up will be moving TestBuiltinCalibration in PR-1246

Pulls in 8 upstream commits, most relevantly: - 386e16f fix(tests): repair pre-existing upstream test failures and import guards (jundot#1244) — restores list-shaped GitHub releases payload in test_admin_update_check / test_admin_auth fixtures. Was committed upstream 2026-05-14 10:21, after this branch's previous merge of main and before the next. Branch was unknowingly running with these 4 tests failing the entire time. - 4fe004d feat: add Hermes Agent quick launch (jundot#1250) - ccfba1d fix(load): VLM model loading fixes for oQ-quantized checkpoints (jundot#1247) - 51907f0 fix(oq): restore MTP head attach for VLM sensitivity - and others Without jundot#1244 we keep inheriting the broken admin-auth / update-check tests as branch-only baseline failures. The fix landed 8 hours before today's MRU work and was never picked up because the branch hadn't merged main since. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ot#1247) - Expand per-layer quantization config keys for VLM model-tree paths so `quantization` config matches the MLX model parameter hierarchy (e.g. `language_model.model.layers.N` vs `model.layers.N`) - Centralise pre-load patches in oQ `_measure_sensitivity` so MTP/nested-visual patches are active during sensitivity measurement - Remap nested visual keys (`language_model.model.visual.*` → `vision_tower.*`) for MLX-format VLM models where mlx-vlm skips `Model.sanitize` - Fix nested-visual patch idempotency: use function-attribute marker instead of module-level flag so the wrap can re-apply if another patch (e.g. MTP runtime) overwrites `Model.sanitize` - Add inline nested-visual post-fixup in all three MTP sanitize functions Qwen3.6-35B-A3B nests ViT weights at `model.language_model.visual.*`. mlx-vlm's sanitize uses if/elif that matches `model.language_model` first, rewriting to `language_model.model.visual.*` — the `model.visual → vision_tower` branch never fires. For non-MLX-format models, the existing `qwen3_6_nested_visual` sanitize wrap catches this. But mlx-vlm skips sanitize entirely for MLX-format checkpoints (`format=mlx` in safetensors metadata), so oQ output models fail with "333 parameters not in model". The new `_remap_nested_visual_on_load` context manager intercepts `nn.Module.load_weights` during the scoped `vlm_load()` call to remap keys before they reach the model. - [x] `pytest tests/test_oq.py -v` - [x] Server loads `Qwen3.6-35B-A3B-uncensored-heretic` (non-MLX format) — nested-visual sanitize wrap fires - [x] Server loads `Qwen3.6-35B-A3B-uncensored-heretic-oQ6-mtp` (MLX format) — `_remap_nested_visual_on_load` remaps 333 keys 🤖 Generated with [Claude Code](https://claude.com/claude-code)

…on tests PR jundot#1247 routed _measure_sensitivity through maybe_apply_pre_load_patches, which leaves mtp_active False. For MLX-format VLM checkpoints mlx-vlm skips sanitize, so the language_model.mtp.* weights stay in the dict but no head gets attached. load_weights then rejects them and sensitivity silently returns {}. Re-apply the mlx-vlm runtime MTP patch and set mtp_active True for the VLM load when the source declares MTP heads. The text path is unchanged since the patched qwen35_model.sanitize already strips mtp.* when no head is attached. Also drop TestBuiltinCalibration. It expects calibration categories and sample counts that the shipped oq_calibration_data.json does not have (the updated JSON was not part of jundot#1247). And update the jundot#1204 discovery-failure test for the new sensitivity-before-discovery ordering. (cherry picked from commit 51907f0)

test_is_mtp_eligible_requires_mtp_forward_and_solo_batch was written against a pre-jundot#1247 contract where head presence implied MTP active. Commit 23ca7dc decoupled head attachment from inference-time MTP for the VLM load path and added an is_mtp_active() gate. Set the flag around the True-expected assertion, restore in finally, and add a new "head attached but flag off" case to lock in the post-23ca7dc semantics. test_patch_wraps_target_processors stubbed the wrong module path and class name for dots_ocr (mlx_vlm.models.dots_ocr.processing / DotsOcrProcessor), but the patcher imports mlx_vlm.models.dots_ocr.processing_dots_ocr and looks up DotsVLProcessor (per commit a1987ed, where the test was added). The fake_import branch never matched, the dots branch silently fell through to the except clause, and the wrap-marker assertion failed. Align the test's stubs with the actual (module_path, cls_name) tuples in vlm.py. Both are test-only fixes. Refs jundot#1259. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> (cherry picked from commit 6057304)

test_is_mtp_eligible_requires_mtp_forward_and_solo_batch was written against a pre-#1247 contract where head presence implied MTP active. Commit 23ca7dc decoupled head attachment from inference-time MTP for the VLM load path and added an is_mtp_active() gate. Set the flag around the True-expected assertion, restore in finally, and add a new "head attached but flag off" case to lock in the post-23ca7dc semantics. test_patch_wraps_target_processors stubbed the wrong module path and class name for dots_ocr (mlx_vlm.models.dots_ocr.processing / DotsOcrProcessor), but the patcher imports mlx_vlm.models.dots_ocr.processing_dots_ocr and looks up DotsVLProcessor (per commit a1987ed, where the test was added). The fake_import branch never matched, the dots branch silently fell through to the except clause, and the wrap-marker assertion failed. Align the test's stubs with the actual (module_path, cls_name) tuples in vlm.py. Both are test-only fixes. Refs #1259. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…lm_sync Re-applies pieces from main commits ccfba1d (jundot#1247) + a6781db (jundot#209) lost when vlm.py took --ours in the v0.3.9rc1 merge. (1) _remap_nested_visual_on_load context manager: mlx-vlm's load_model skips Model.sanitize when safetensors metadata declares format=mlx. oQ output is MLX-format, so the nested-visual key fixup that sanitize normally applies never fires. The wrapper intercepts Module.load_weights and remaps 'language_model.model.visual.*' -> 'vision_tower.*'. Without this, Qwen3.6-35B-A3B oQ variants fail to load with '333 parameters not in model'. (2) maybe_load_custom_quantization dispatch: ParoQuant checkpoints require a non-standard loader. Returns (model, processor) on match; falls through to standard vlm_load() otherwise. Without this, ParoQuant VLM checkpoints fail at load with mlx-vlm's standard pipeline.

…rphaned (jundot#1247) Re-applies the per-layer quant expansion missing from feature's vlm.py (--ours took feature, lost main's ccfba1d). Flattens nested per-layer quant configs (e.g. language_model.model.layers.N vs model.layers.N) into a uniform schema so oQ quantization reserves bits per actual model layer rather than per top-level config key. Without the expansion, oQ-quantized VLM checkpoints with nested model hierarchies may load with wrong/missing per-layer quant attributes. No-op for configs without per-layer quant data.

jundot merged commit ccfba1d into jundot:main May 14, 2026

richgoodson mentioned this pull request May 16, 2026

fix(tests): align two stale tests in #1259 with current implementation #1287

Merged

a4501150 deleted the pr/vlm-load-fixes branch May 20, 2026 15:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(load): VLM model loading fixes for oQ-quantized checkpoints#1247

fix(load): VLM model loading fixes for oQ-quantized checkpoints#1247
jundot merged 1 commit into
jundot:mainfrom
a4501150:pr/vlm-load-fixes

a4501150 commented May 13, 2026

Uh oh!

jundot commented May 14, 2026

Uh oh!

a4501150 commented May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

a4501150 commented May 13, 2026

Summary

Background on nested visual bug

Test plan

Uh oh!

jundot commented May 14, 2026

Uh oh!

a4501150 commented May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants