Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
d203eed
feat: native MTP speculative decoding for Qwen3.5
AirRunner Mar 12, 2026
651b945
fix(mtp): eliminate SSM state contamination on draft rejection
AirRunner Mar 12, 2026
43f4205
fix(mtp): server integration (yield types, cache fallback, batching)
AirRunner Mar 12, 2026
7449a00
fix(mtp): address @janhilgard code review feedback (double-norm, quan…
AirRunner Mar 12, 2026
71011ab
feat(mtp): add --mtp CLI flag for generate and server
AirRunner Mar 12, 2026
44430cc
test(mtp): add unit tests for MTP speculative decoding
AirRunner Mar 12, 2026
78622eb
fix(mtp): warn when --mtp flag is used with a model without MTP head
AirRunner Mar 12, 2026
50faf3e
style: apply black and isort formatting
AirRunner Mar 13, 2026
be3bbf3
fix(mtp): stack per-expert MTP weights for MoE models in sanitize()
AirRunner Mar 17, 2026
ce0bcb7
fix(mtp): raise clear error when config has MTP but weights do not
AirRunner Mar 22, 2026
4ffc627
feat(mtp): add probabilistic draft acceptance for stochastic samplers
AirRunner Apr 3, 2026
9c734c2
refactor(mtp): clean up mtp_generate_step and add dynamic MTP/batch s…
AirRunner Apr 3, 2026
77f616d
fix(mtp): always leave 1 token for _step_backbone in _prefill
AirRunner Apr 4, 2026
67c10e6
fix(mtp): correct prev_tokens management for logits processors
AirRunner Apr 25, 2026
b7f8aa4
refactor(mtp): thread prev_tokens explicitly through mtp_generate_step
AirRunner Apr 25, 2026
48e1fca
refactor(cache): declare rollback_state as class attribute on ArraysC…
AirRunner Apr 26, 2026
8a52379
fix(mtp): support input_embeddings in mtp_generate_step and fix logit…
AirRunner Apr 29, 2026
fae9fa1
fix(mtp): support both fused and per-expert MTP weights in qwen3_5_mo…
AirRunner Apr 29, 2026
32fdaa3
fix(mtp): remove spurious mtp_cache trim on draft rejection
AirRunner May 4, 2026
13f157b
fix(mtp): use residual sampling on rejection at temp>0
AirRunner May 5, 2026
6594348
fix(mtp): reduce residual sampling to 1 sync, correct z=0 fallback
AirRunner May 6, 2026
87f1b09
feat(mtp): native sampling params, XTC draw sharing, correct lp_accept
AirRunner May 7, 2026
a2f1374
quality: replace import functools with from functools import partial
AirRunner May 7, 2026
b1dad14
fix(mtp): prefill MTP cache during prompt prefill
AirRunner May 7, 2026
ffac433
fix(mtp): clear Metal allocator cache every 256 tokens during decode
AirRunner May 8, 2026
a5a82a9
style(mtp): move u after _step_backbone
AirRunner May 9, 2026
c47c1cb
qwen3_5: remove mtp.fc exclusion from quant_predicate
AirRunner May 11, 2026
6222938
test(mtp): remove stale quant_predicate test
AirRunner May 15, 2026
f840f6c
fix(mtp): commit accepted draft token to mtp_cache via batched forward
AirRunner May 15, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Loading