Add JANG model loader integration by samuelfaj · Pull Request #212 · raullenchai/Rapid-MLX

samuelfaj · 2026-05-05T01:09:34Z

Summary

Detect local or Hugging Face models with jang_config.json before the vendored architecture fallback.
Route JANGTQ/MXTQ models through jang_tools.load_jangtq.load_jangtq_model and standard JANG models through jang_tools.loader.load_jang_model.
Add optional rapid-mlx[jang] dependency extra and regression tests for JANGTQ, JANG v2, and normal DeepSeek V4 fallback behavior.
Patch DeepSeek V4 JANGTQ tokenizer loading so jang-tools does not fall through Transformers AutoConfig for the vendored deepseek_v4 architecture.

Root cause

DeepSeek V4 JANGTQ bundles declare weight_format: mxtq and store routed experts as tq_packed/tq_norms tensors. The existing loader treated them like normal DeepSeek V4 MLX weights, so mlx_lm.load_model rejected thousands of unexpected JANGTQ parameters. During live validation, jang-tools also hit a DSV4 tokenizer/EOS expansion path that calls Transformers AutoConfig; the wrapper now patches that call for DSV4 JANGTQ to load tokenizer.json directly.

Validation

uv run --extra dev --extra jang python -m pytest tests/test_jangtq_loader.py tests/test_deepseek_v4_vendored.py -q
uv run --extra dev ruff check pyproject.toml vllm_mlx/utils/tokenizer.py tests/test_jangtq_loader.py
uv run --extra jang python - <<'PY' ... import jang_tools ... PY
Local model detection: DeepSeek-V4-Flash-JANGTQ detected as weight_format=mxtq, profile=JANGTQ2.
Live serve validation reached DSV4 streaming hydrate, replaced 129 routed TQ modules, loaded 85 regular shards, patched 43 SwitchGLU instances, then exposed a tokenizer path bug that this branch patches.

… new-main

Add JANG model loader integration

samuelfaj · 2026-05-05T01:24:07Z

Validation update:

Full JANGTQ serve startup completed locally for .
Hydration replaced 129 DSV4 routed TQ modules, loaded 85 regular shards, patched 43 SwitchGLU instances, completed warmup, and served on port 8011.
OpenAI-compatible request returned HTTP 200 with , , , .
Additional compatibility fixes landed in the branch for DSV4 tokenizer metadata and MLX scalar RoPE offsets under rapid-mlx batching.

samuelfaj · 2026-05-05T01:24:16Z

Validation update:

Full JANGTQ serve startup completed locally for /Users/samuelfajreldines/dev/models/DeepSeek-V4-Flash-JANGTQ.
Hydration replaced 129 DSV4 routed TQ modules, loaded 85 regular shards, patched 43 SwitchGLU instances, completed warmup, and served on port 8011.
OpenAI-compatible /v1/chat/completions request returned HTTP 200 with model=local, prompt_tokens=9, completion_tokens=8, total_tokens=17.
Additional compatibility fixes landed in the branch for DSV4 tokenizer metadata and MLX scalar RoPE offsets under rapid-mlx batching.

samuelfaj · 2026-05-05T02:20:05Z

Final validation update:

Fixed quality issue by routing DSV4 JANGTQ through direct mlx_lm.generate on the model-owning MLX worker instead of the continuous batching generator path, which produced corrupted/repetitive tokens for this runtime.
Server validation command completed on port 8013 with /Users/samuelfajreldines/dev/models/DeepSeek-V4-Flash-JANGTQ.
/v1/chat/completions simple math request returned HTTP 200 with content exactly 4, prompt_tokens=17, completion_tokens=1, total_tokens=18.
/v1/chat/completions exact-ok request returned HTTP 200 with content exactly ok, prompt_tokens=9, completion_tokens=1, total_tokens=10.
Regression tests: uv run --extra dev --extra jang python -m pytest tests/test_jangtq_loader.py tests/test_deepseek_v4_vendored.py -q passed, 12 tests.
Ruff passed for changed files.

samuelfaj · 2026-05-05T03:05:28Z

Performance/streaming update:

The DeepSeek V4 JANGTQ direct fallback now uses mlx_lm.stream_generate for streaming requests, so tokens are delivered as they are produced instead of waiting for full completion.
Non-streaming requests keep the safe direct mlx_lm.generate path.
Added an explicit TODO in the direct fallback explaining the future real batching fix: compare BatchGenerator logits/output against mlx_lm.generate, then fix cache offset handling, prompt-cache merge/extract, and RoPE position state until batching is bit-consistent with the direct path.
Live streaming validation returned SSE chunks with content exactly ok and final usage prompt_tokens=9, completion_tokens=2, total_tokens=11.
Focused tests passed: 17 tests.
Ruff passed.

# Conflicts: # vllm_mlx/routes/chat.py

raullenchai · 2026-05-09T15:42:32Z

Hi @samuelfaj — thanks for the work. Applying our new SOP §0 necessity gate (see docs/development/pr_merge_sop.md) I need a demand signal before merging.

Holding for clarification, not closing yet.

Reasoning:

This adds 4007 lines for JANG/JANGTQ model support including a new [jang] extras dependency. That's significant scope.
I searched our issues for "JANG" — zero hits. No one has filed a model-support request for JANG/JANGTQ.
This PR is also stacked on Fix Qwen tool-call OpenAI translation #204, Add serve TUI monitor #205, Improve Hermes tool-call recovery #206 — tests/test_cli_tui_ready.py, tests/test_chat_tool_retry.py, etc. show up here. With Add serve TUI monitor #205 now closed, this will need a rebase to drop the TUI bits.

To unlock merge, I need one or more of:

User demand: a GitHub issue from a user (you or someone else) saying "I want to serve JANG model X with rapid-mlx and it doesn't work". Even one is enough.
JANG popularity signal: pointer to a HuggingFace model page using JANGTQ/MXTQ format with non-trivial download counts, or a community discussion (Reddit/Discord/X) showing people are trying to run JANG locally.
Scope split: separate the JANG-specific changes (vllm_mlx/jang_tools/*, tests/test_jangtq_loader.py, jang detection in loader, [jang] extras) from the unrelated infra changes (anthropic auth, completions, health, request_metrics, etc.). The current diff makes it impossible to review JANG support on its own merits.

For now please rebase on top of latest main (which now has #260, #262, #258 merged) and drop the parts that came from #205/#212-stack-overlap. After that I can give the JANG-specific surface the focused review it deserves.

Apologies for the friction — the necessity gate is new this week and I'm working through the backlog. Your #204 (Qwen tool-call fix) is being reviewed now since it has clear user value.

raullenchai · 2026-06-07T22:47:53Z

Thanks for putting this together. Two requests before review:

(1) Please split this into independent PRs. The diff is +4007 LOC across 27 files but the title scopes it to the JANG loader. The JANG-loader part is a coherent change on its own:

pyproject.toml (the [jang] extra)
vllm_mlx/utils/tokenizer.py (DSV4 JANGTQ tokenizer patch)
tests/test_jangtq_loader.py
whichever loader-routing code path detects jang_config.json before the vendored-arch fallback

The TUI (vllm_mlx/tui.py +736), metrics middleware (vllm_mlx/middleware/metrics.py +247, vllm_mlx/request_metrics.py +201), chat-route refactor (vllm_mlx/routes/chat.py +374), postprocessor changes (vllm_mlx/service/postprocessor.py +176), and batched-engine changes (vllm_mlx/engine/batched.py +224) are each their own scope and should be reviewed separately — they're unrelated to JANG and bundling them makes the diff impossible to review responsibly.

(2) Verify the JANG import path. The PR imports jang_tools.loader.load_jang_model and jang_tools.load_jangtq.load_jangtq_model, but the package published on PyPI is named jang, not jang-tools (https://pypi.org/project/jang/ — jang-tools returns 404). Either the published name has changed since you tested, or the imports here won't resolve on a clean install. Please:

Confirm the actual import path on a fresh venv (uv venv && uv pip install jang && python -c "import jang_tools" vs import jang).
Pin the exact version in the [jang] extra (jang-tools>=X.Y or jang>=X.Y) — this is a single-maintainer dependency with custom Metal kernels, so an unpinned floor is risky.
Add a one-line note in the PR description acknowledging the JANGQ-AI ecosystem is a small Apple-Silicon community (no academic backing, single primary maintainer at jangq.ai) so reviewers understand the supply-chain shape.

Happy to review the loader-only PR once it's split out — that part looks reasonable on first read.

samuelfaj and others added 18 commits May 4, 2026 15:53

Fix Qwen tool call OpenAI translation

7b13ea4

Preserve tool schemas after streamed content

5c9b4e8

Coerce generic tool arguments from schema

5594261

Handle additional OpenCode tool call formats

7fa174d

Preserve code brackets near partial tool markers

ff6f247

Fix PR check failures

0b64dcf

Add serve TUI monitor

8b42dc6

Fix TUI PR CI failures

4d5a3b7

Add TUI request throughput metrics

b2b98b2

Enhance serve TUI request metrics

7f3a1ee

Improve Hermes tool-call recovery

a1a188e

Merge remote-tracking branch 'origin/add-serve-tui' into new-main

3841801

Merge remote-tracking branch 'origin/hermes-pr204-tool-recovery' into…

bbc6136

… new-main

Add JANG model loader integration

bfeb2f2

Merge pull request #1 from samuelfaj/add-jangtq-loader

907d343

Add JANG model loader integration

Patch DeepSeek V4 JANGTQ tokenizer loading

4ce7046

Apply JANG tokenizer metadata

1746f84

Patch JANGTQ RoPE batching offset

7ac0c59

Use direct generation for DeepSeek V4 JANGTQ

197243b

samuelfaj added 2 commits May 4, 2026 23:51

Wait for server readiness before TUI

1ad7852

Stream direct JANGTQ generation

9fd2f5a

Track direct JANGTQ prefill progress

0ee615b

samuelfaj force-pushed the add-jangtq-loader-v2 branch from 2f48ce6 to 0ee615b Compare May 5, 2026 03:31

samuelfaj added 3 commits May 5, 2026 00:42

Cap default direct JANG generation

eebf7dd

Sanitize direct JANG tool prompts

63eabbb

Merge remote-tracking branch 'upstream/main'

05c1f30

# Conflicts: # vllm_mlx/routes/chat.py

samuelfaj marked this pull request as draft May 5, 2026 14:23

Restore direct JANG tool execution

ae6a2af

samuelfaj marked this pull request as ready for review May 5, 2026 15:48

Improve direct JANG tool artifact fallback

9b0bb10

samuelfaj force-pushed the add-jangtq-loader-v2 branch from ea128df to 9b0bb10 Compare May 5, 2026 15:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add JANG model loader integration#212

Add JANG model loader integration#212
samuelfaj wants to merge 27 commits into
raullenchai:mainfrom
samuelfaj:add-jangtq-loader-v2

samuelfaj commented May 5, 2026

Uh oh!

samuelfaj commented May 5, 2026

Uh oh!

samuelfaj commented May 5, 2026

Uh oh!

samuelfaj commented May 5, 2026

Uh oh!

samuelfaj commented May 5, 2026

Uh oh!

raullenchai commented May 9, 2026

Uh oh!

raullenchai commented Jun 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

samuelfaj commented May 5, 2026

Summary

Root cause

Validation

Uh oh!

samuelfaj commented May 5, 2026

Uh oh!

samuelfaj commented May 5, 2026

Uh oh!

samuelfaj commented May 5, 2026

Uh oh!

samuelfaj commented May 5, 2026

Uh oh!

raullenchai commented May 9, 2026

Uh oh!

raullenchai commented Jun 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants