fix(load): VLM model loading fixes for oQ-quantized checkpoints#1247
Merged
Conversation
- Expand per-layer quant keys for VLM model-tree paths so quantization config matches the MLX model parameter hierarchy - Centralise pre-load patches in oQ _measure_sensitivity - Remap nested visual keys (language_model.model.visual.* -> vision_tower.*) for MLX-format VLM models where mlx-vlm skips Model.sanitize - Fix nested-visual patch idempotency: use function-attribute marker instead of module-level flag - Add inline nested-visual post-fixup in MTP sanitize functions Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Owner
|
Thanks for this, the load-path fixes look solid and the nested-visual / quant-key remapping all check out. Merging this with two follow-ups from me, no action needed on your side:
|
jundot
added a commit
that referenced
this pull request
May 14, 2026
…on tests PR #1247 routed _measure_sensitivity through maybe_apply_pre_load_patches, which leaves mtp_active False. For MLX-format VLM checkpoints mlx-vlm skips sanitize, so the language_model.mtp.* weights stay in the dict but no head gets attached. load_weights then rejects them and sensitivity silently returns {}. Re-apply the mlx-vlm runtime MTP patch and set mtp_active True for the VLM load when the source declares MTP heads. The text path is unchanged since the patched qwen35_model.sanitize already strips mtp.* when no head is attached. Also drop TestBuiltinCalibration. It expects calibration categories and sample counts that the shipped oq_calibration_data.json does not have (the updated JSON was not part of #1247). And update the #1204 discovery-failure test for the new sensitivity-before-discovery ordering.
Contributor
Author
blightbow
added a commit
to blightbow/omlx
that referenced
this pull request
May 15, 2026
Pulls in 8 upstream commits, most relevantly: - 386e16f fix(tests): repair pre-existing upstream test failures and import guards (jundot#1244) — restores list-shaped GitHub releases payload in test_admin_update_check / test_admin_auth fixtures. Was committed upstream 2026-05-14 10:21, after this branch's previous merge of main and before the next. Branch was unknowingly running with these 4 tests failing the entire time. - 4fe004d feat: add Hermes Agent quick launch (jundot#1250) - ccfba1d fix(load): VLM model loading fixes for oQ-quantized checkpoints (jundot#1247) - 51907f0 fix(oq): restore MTP head attach for VLM sensitivity - and others Without jundot#1244 we keep inheriting the broken admin-auth / update-check tests as branch-only baseline failures. The fix landed 8 hours before today's MRU work and was never picked up because the branch hadn't merged main since. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
panwudi
pushed a commit
to panwudi/flyto-mlx
that referenced
this pull request
May 16, 2026
…ot#1247) - Expand per-layer quantization config keys for VLM model-tree paths so `quantization` config matches the MLX model parameter hierarchy (e.g. `language_model.model.layers.N` vs `model.layers.N`) - Centralise pre-load patches in oQ `_measure_sensitivity` so MTP/nested-visual patches are active during sensitivity measurement - Remap nested visual keys (`language_model.model.visual.*` → `vision_tower.*`) for MLX-format VLM models where mlx-vlm skips `Model.sanitize` - Fix nested-visual patch idempotency: use function-attribute marker instead of module-level flag so the wrap can re-apply if another patch (e.g. MTP runtime) overwrites `Model.sanitize` - Add inline nested-visual post-fixup in all three MTP sanitize functions Qwen3.6-35B-A3B nests ViT weights at `model.language_model.visual.*`. mlx-vlm's sanitize uses if/elif that matches `model.language_model` first, rewriting to `language_model.model.visual.*` — the `model.visual → vision_tower` branch never fires. For non-MLX-format models, the existing `qwen3_6_nested_visual` sanitize wrap catches this. But mlx-vlm skips sanitize entirely for MLX-format checkpoints (`format=mlx` in safetensors metadata), so oQ output models fail with "333 parameters not in model". The new `_remap_nested_visual_on_load` context manager intercepts `nn.Module.load_weights` during the scoped `vlm_load()` call to remap keys before they reach the model. - [x] `pytest tests/test_oq.py -v` - [x] Server loads `Qwen3.6-35B-A3B-uncensored-heretic` (non-MLX format) — nested-visual sanitize wrap fires - [x] Server loads `Qwen3.6-35B-A3B-uncensored-heretic-oQ6-mtp` (MLX format) — `_remap_nested_visual_on_load` remaps 333 keys 🤖 Generated with [Claude Code](https://claude.com/claude-code)
panwudi
pushed a commit
to panwudi/flyto-mlx
that referenced
this pull request
May 18, 2026
…on tests PR jundot#1247 routed _measure_sensitivity through maybe_apply_pre_load_patches, which leaves mtp_active False. For MLX-format VLM checkpoints mlx-vlm skips sanitize, so the language_model.mtp.* weights stay in the dict but no head gets attached. load_weights then rejects them and sensitivity silently returns {}. Re-apply the mlx-vlm runtime MTP patch and set mtp_active True for the VLM load when the source declares MTP heads. The text path is unchanged since the patched qwen35_model.sanitize already strips mtp.* when no head is attached. Also drop TestBuiltinCalibration. It expects calibration categories and sample counts that the shipped oq_calibration_data.json does not have (the updated JSON was not part of jundot#1247). And update the jundot#1204 discovery-failure test for the new sensitivity-before-discovery ordering. (cherry picked from commit 51907f0)
panwudi
pushed a commit
to panwudi/flyto-mlx
that referenced
this pull request
May 18, 2026
test_is_mtp_eligible_requires_mtp_forward_and_solo_batch was written against a pre-jundot#1247 contract where head presence implied MTP active. Commit 23ca7dc decoupled head attachment from inference-time MTP for the VLM load path and added an is_mtp_active() gate. Set the flag around the True-expected assertion, restore in finally, and add a new "head attached but flag off" case to lock in the post-23ca7dc semantics. test_patch_wraps_target_processors stubbed the wrong module path and class name for dots_ocr (mlx_vlm.models.dots_ocr.processing / DotsOcrProcessor), but the patcher imports mlx_vlm.models.dots_ocr.processing_dots_ocr and looks up DotsVLProcessor (per commit a1987ed, where the test was added). The fake_import branch never matched, the dots branch silently fell through to the except clause, and the wrap-marker assertion failed. Align the test's stubs with the actual (module_path, cls_name) tuples in vlm.py. Both are test-only fixes. Refs jundot#1259. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> (cherry picked from commit 6057304)
jundot
pushed a commit
that referenced
this pull request
May 19, 2026
test_is_mtp_eligible_requires_mtp_forward_and_solo_batch was written against a pre-#1247 contract where head presence implied MTP active. Commit 23ca7dc decoupled head attachment from inference-time MTP for the VLM load path and added an is_mtp_active() gate. Set the flag around the True-expected assertion, restore in finally, and add a new "head attached but flag off" case to lock in the post-23ca7dc semantics. test_patch_wraps_target_processors stubbed the wrong module path and class name for dots_ocr (mlx_vlm.models.dots_ocr.processing / DotsOcrProcessor), but the patcher imports mlx_vlm.models.dots_ocr.processing_dots_ocr and looks up DotsVLProcessor (per commit a1987ed, where the test was added). The fake_import branch never matched, the dots branch silently fell through to the except clause, and the wrap-marker assertion failed. Align the test's stubs with the actual (module_path, cls_name) tuples in vlm.py. Both are test-only fixes. Refs #1259. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ziya32
pushed a commit
to ziya32/omlx
that referenced
this pull request
May 19, 2026
…lm_sync Re-applies pieces from main commits ccfba1d (jundot#1247) + a6781db (jundot#209) lost when vlm.py took --ours in the v0.3.9rc1 merge. (1) _remap_nested_visual_on_load context manager: mlx-vlm's load_model skips Model.sanitize when safetensors metadata declares format=mlx. oQ output is MLX-format, so the nested-visual key fixup that sanitize normally applies never fires. The wrapper intercepts Module.load_weights and remaps 'language_model.model.visual.*' -> 'vision_tower.*'. Without this, Qwen3.6-35B-A3B oQ variants fail to load with '333 parameters not in model'. (2) maybe_load_custom_quantization dispatch: ParoQuant checkpoints require a non-standard loader. Returns (model, processor) on match; falls through to standard vlm_load() otherwise. Without this, ParoQuant VLM checkpoints fail at load with mlx-vlm's standard pipeline.
ziya32
pushed a commit
to ziya32/omlx
that referenced
this pull request
May 19, 2026
…rphaned (jundot#1247) Re-applies the per-layer quant expansion missing from feature's vlm.py (--ours took feature, lost main's ccfba1d). Flattens nested per-layer quant configs (e.g. language_model.model.layers.N vs model.layers.N) into a uniform schema so oQ quantization reserves bits per actual model layer rather than per top-level config key. Without the expansion, oQ-quantized VLM checkpoints with nested model hierarchies may load with wrong/missing per-layer quant attributes. No-op for configs without per-layer quant data.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
quantizationconfig matches the MLX model parameter hierarchy (e.g.language_model.model.layers.Nvsmodel.layers.N)_measure_sensitivityso MTP/nested-visual patches are active during sensitivity measurementlanguage_model.model.visual.*→vision_tower.*) for MLX-format VLM models where mlx-vlm skipsModel.sanitizeModel.sanitizeBackground on nested visual bug
Qwen3.6-35B-A3B nests ViT weights at
model.language_model.visual.*. mlx-vlm's sanitize uses if/elif that matchesmodel.language_modelfirst, rewriting tolanguage_model.model.visual.*— themodel.visual → vision_towerbranch never fires. For non-MLX-format models, the existingqwen3_6_nested_visualsanitize wrap catches this. But mlx-vlm skips sanitize entirely for MLX-format checkpoints (format=mlxin safetensors metadata), so oQ output models fail with "333 parameters not in model". The new_remap_nested_visual_on_loadcontext manager interceptsnn.Module.load_weightsduring the scopedvlm_load()call to remap keys before they reach the model.Test plan
pytest tests/test_oq.py -vQwen3.6-35B-A3B-uncensored-heretic(non-MLX format) — nested-visual sanitize wrap firesQwen3.6-35B-A3B-uncensored-heretic-oQ6-mtp(MLX format) —_remap_nested_visual_on_loadremaps 333 keys🤖 Generated with Claude Code