Merged
Conversation
Update mlx-ruby submodule to 476f721 (main) which includes: - mx.array() dtype: keyword argument support (Issue #1) - mx.mean() keepdims: parameter (Issue #2) - Array#coerce for numeric-left ops (Issues #7, #8) - update_modules_impl Module→Hash recursion fix (Issue #12) - MLX_BUILD_SAFETENSORS=ON by default (Issue #5) Update mlx-onnx submodule to 128d7de which adds GreaterEqual ONNX lowering (Issue #14a). All 11 dense models now export to ONNX with 100% node coverage. Workarounds removed across lib/ and test/: - .array(X).astype(T) → .array(X, dtype: T) in 50+ locations - expand_dims(mean(x, axis), -1) → mean(x, axis, keepdims: true) - Local update_modules_impl patch (now in upstream) PRD updated to mark 8 issues as resolved, with remaining open issues documented. https://claude.ai/code/session_01BTKLwVSDwcZ7bcG4StMxUZ
Port SwitchLinear and SwitchGLU from Python mlx-lm switch_layers.py, replacing per-token tolist routing loops in mixtral.rb and deepseek.rb with batched gather_mm operations. This eliminates the SIGILL crashes during ONNX tracing (data-dependent control flow from tolist) and brings MoE models to 95-98% ONNX node coverage. - Add lib/mlx_lm/models/switch_layers.rb with SwitchLinear and SwitchGLU - Refactor mixtral.rb SparseMoeBlock to use SwitchGLU - Refactor deepseek.rb DeepseekMoE and MoEGate to use SwitchGLU - Add sanitize methods for stacking per-expert weights on load https://claude.ai/code/session_01BTKLwVSDwcZ7bcG4StMxUZ
Exhaustive class-by-class comparison of upstream Python mlx-lm (v0.30.7, ~130 files, ~575 classes) against the Ruby port (33 files, 13 architectures). Covers all 11 categories: models, generation, sampling, KV cache, tokenizer, quantization, tuner, CLI/server, tool parsers, chat templates, and utilities. Current coverage: 13/107 model architectures (12%), core inference pipeline fully functional. Biggest gaps: 94 missing models, training pipeline, batch generation, HF Hub integration, advanced quantization, tool calling. https://claude.ai/code/session_01BTKLwVSDwcZ7bcG4StMxUZ
Adds a detailed 9-phase implementation plan covering: - Phase 1 (A-E): Shared infrastructure (RoPE variants, extended cache, MLA, SSM, gated delta, etc.) - Phases 2-7: All ~96 missing model architectures grouped by dependency/complexity - Phase 8 (A-C): HuggingFace Hub integration (download, save, upload) - Phase 9 (A-E): ONNX export support for all model architectures Phase 9 leverages the existing onnx_export_test.rb subprocess-based testing infrastructure, which auto-generates compat and export tests from TINY_CONFIGS entries. Notes that ArgPartition and GatherMM are now supported upstream in mlx-onnx (commit 33d4b2e), resolving the MoE compat report false positives. https://claude.ai/code/session_01BTKLwVSDwcZ7bcG4StMxUZ
- Delete prd/conversion_plan.md (original 12-phase plan, now superseded) - Create prd/implementation_plan.md with 9-phase execution plan: Phase 1: Shared infra (RoPE, cache, MLA, SSM, gated delta, etc.) Phase 2-7: All ~96 missing model architectures by dependency group Phase 8: HuggingFace Hub integration Phase 9: ONNX export validation for all ~109 architectures - Each phase includes a native library gap report checkpoint to surface missing mlx-ruby/mlx-onnx functionality before proceeding - Move detailed phase content from parity checklist to implementation plan; checklist now references the plan document https://claude.ai/code/session_01BTKLwVSDwcZ7bcG4StMxUZ
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.