Skip to content

feat: Qwen3.5 V2 VM bytecode runtime support#3

Draft
nuri-yoo wants to merge 1 commit intomlcfrom
feat/qwen35-v2-runtime
Draft

feat: Qwen3.5 V2 VM bytecode runtime support#3
nuri-yoo wants to merge 1 commit intomlcfrom
feat/qwen35-v2-runtime

Conversation

@nuri-yoo
Copy link
Copy Markdown

@nuri-yoo nuri-yoo commented May 1, 2026

Summary

Adds two pieces required to load and execute V2 VM bytecode artifacts that mlc-llm produces for Qwen3.5 (and other Gated DeltaNet hybrid) models on stock relax runtimes.

Changes

src/runtime/vm/attn_backend.cc — Accept both \"tirx\" (current) and \"tir\" (older artifact name) at every match site inside ConvertPagedPrefillFunc / ConvertRaggedPrefillFunc / ConvertPagedDecodeFunc / Convert*-TreeMaskFunc. Strictly widens the accepted set; no rejection path changes.

src/runtime/device_c_api.cc (new, 108 LoC) — Flat C-ABI wrapper around tvm::runtime::DeviceAPI::Get(...). Re-exports TVMDeviceAPIGet / TVMDeviceAPICopyDataFromTo / TVMDeviceAPIStreamSync etc. so Rust crates that link against the C symbols can use this build without re-deriving the wrapper. Strict re-export, no logic.

Verification

End-to-end on macOS arm64 + Metal:

  • Qwen3-0.6B / Qwen3-8B (V1 KvCache)
  • Qwen3.5-0.8B / 2B / 4B / 9B (V2 hybrid Gated DeltaNet)
  • BAAI/bge-m3 (V1 embedding)

All load and run with semantically correct outputs.

Test plan

  • Build relax with the new branch and verify existing V1 artifacts still load
  • Build the matching mlc-llm against this relax and verify V2-bytecode rt.dylib loads
  • Run an external Rust consumer (e.g. ailoy) that links against the C-ABI shim

Adds two pieces required to load and execute V2 VM bytecode artifacts
that mlc-llm produces for Qwen3.5 (and other Gated DeltaNet hybrid)
models on stock relax runtimes:

1. src/runtime/vm/attn_backend.cc
   Accept both "tirx" (current) and "tir" (older artifact name) at
   every match site inside ConvertPagedPrefillFunc /
   ConvertRaggedPrefillFunc / ConvertPagedDecodeFunc / Convert*-
   TreeMaskFunc. Strictly widens the accepted set; no rejection
   path changes.

2. src/runtime/device_c_api.cc (new, 108 LoC)
   Flat C-ABI wrapper around tvm::runtime::DeviceAPI::Get(...).
   Re-exports TVMDeviceAPIGet / ...CopyDataFromTo / ...StreamSync
   etc. so Rust crates that link against the C symbols can use
   this build without re-deriving the wrapper. Strict re-export,
   no logic.

Verified end-to-end on macOS arm64 + Metal: Qwen3-{0.6B,8B} (V1
KvCache), Qwen3.5-{0.8B,2B,4B,9B} (V2 hybrid Gated DeltaNet), and
BAAI/bge-m3 (V1 embedding) all load and run.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant