I've tested Qwen3.6-27B-oQ5-fp16-mtp and Qwen3.6-27B-oQ8-fp16-mtp generated from the original weights, acceptance seems to be high:
2026-05-06 20:12:45,019 - omlx.patches.mlx_lm_mtp.batch_generator - INFO - MTP[1] finish=length tokens=128 cycles=68 accept=58/68 (85.3%) emits[init=2,draft=58,bonus=58,verify=10] timing[backbone=1305.3ms mtp=324.7ms sample=6296.6ms cache=17.4ms]
2026-05-06 20:12:45,056 - omlx.scheduler - INFO - Cache phase timings: cleanup_finished_sync=0.1ms/2, store_cache_main_eval=8.0ms/2, store_cache_main_prep=8.1ms/2
2026-05-06 20:13:12,767 - omlx.patches.mlx_lm_mtp.batch_generator - INFO - MTP path activated for uid=2 (model has mtp_forward, batch=1)
2026-05-06 20:13:21,280 - omlx.patches.mlx_lm_mtp.batch_generator - INFO - MTP[2] finish=length tokens=128 cycles=68 accept=58/68 (85.3%) emits[init=2,draft=58,bonus=58,verify=10] timing[backbone=1323.1ms mtp=325.9ms sample=6410.3ms cache=18.2ms]
But oQ5 runs slower with MTP enabled and while oQ8 is improved it's not as much as I think is expected?
MTP ON
oMLX - LLM inference, optimized for your Mac
https://github.com/jundot/omlx
Benchmark Model: Qwen3.6-27B-oQ5-fp16-mtp
================================================================================
Single Request Results
--------------------------------------------------------------------------------
Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem
pp1024/tg128 7308.2 65.63 140.1 tok/s 15.4 tok/s 15.643 73.6 tok/s 19.80 GB
pp4096/tg128 28089.0 65.23 145.8 tok/s 15.5 tok/s 36.373 116.1 tok/s 21.22 GB
pp8192/tg128 56390.1 64.54 145.3 tok/s 15.6 tok/s 64.586 128.8 tok/s 22.25 GB
pp16384/tg128 115483.6 70.19 141.9 tok/s 14.4 tok/s 124.397 132.7 tok/s 23.75 GB
pp32768/tg128 244621.0 75.65 134.0 tok/s 13.3 tok/s 254.228 129.4 tok/s 26.74 GB
MTP OFF
oMLX - LLM inference, optimized for your Mac
https://github.com/jundot/omlx
Benchmark Model: Qwen3.6-27B-oQ5-fp16-mtp
================================================================================
Single Request Results
--------------------------------------------------------------------------------
Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem
pp1024/tg128 7318.8 59.80 139.9 tok/s 16.9 tok/s 14.913 77.2 tok/s 19.46 GB
pp4096/tg128 28026.7 61.19 146.1 tok/s 16.5 tok/s 35.798 118.0 tok/s 20.88 GB
pp8192/tg128 56369.0 62.09 145.3 tok/s 16.2 tok/s 64.254 129.5 tok/s 21.91 GB
pp16384/tg128 115442.2 66.20 141.9 tok/s 15.2 tok/s 123.849 133.3 tok/s 23.41 GB
pp32768/tg128 244644.6 73.36 133.9 tok/s 13.7 tok/s 253.961 129.5 tok/s 26.41 GB
MTP ON
oMLX - LLM inference, optimized for your Mac
https://github.com/jundot/omlx
Benchmark Model: Qwen3.6-27B-oQ8-fp16-mtp
================================================================================
Single Request Results
--------------------------------------------------------------------------------
Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem
pp1024/tg128 7207.5 75.86 142.1 tok/s 13.3 tok/s 16.841 68.4 tok/s 28.81 GB
pp4096/tg128 27623.5 70.30 148.3 tok/s 14.3 tok/s 36.552 115.6 tok/s 30.26 GB
pp8192/tg128 55456.9 71.22 147.7 tok/s 14.2 tok/s 64.502 129.0 tok/s 31.29 GB
pp16384/tg128 113586.4 78.26 144.2 tok/s 12.9 tok/s 123.526 133.7 tok/s 32.79 GB
pp32768/tg128 240711.8 80.76 136.1 tok/s 12.5 tok/s 250.969 131.1 tok/s 35.79 GB
MTP OFF
oMLX - LLM inference, optimized for your Mac
https://github.com/jundot/omlx
Benchmark Model: Qwen3.6-27B-oQ8-fp16-mtp
================================================================================
Single Request Results
--------------------------------------------------------------------------------
Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s) Throughput Peak Mem
pp1024/tg128 7201.1 89.31 142.2 tok/s 11.3 tok/s 18.543 62.1 tok/s 28.34 GB
pp4096/tg128 27628.8 91.76 148.3 tok/s 11.0 tok/s 39.283 107.5 tok/s 29.80 GB
pp8192/tg128 55460.2 92.42 147.7 tok/s 10.9 tok/s 67.198 123.8 tok/s 30.82 GB
pp16384/tg128 113573.6 95.73 144.3 tok/s 10.5 tok/s 125.732 131.3 tok/s 32.32 GB
pp32768/tg128 240794.7 102.56 136.1 tok/s 9.8 tok/s 253.819 129.6 tok/s 35.32 GB
Happy to test anything else if it helps?
I've tested Qwen3.6-27B-oQ5-fp16-mtp and Qwen3.6-27B-oQ8-fp16-mtp generated from the original weights, acceptance seems to be high:
But oQ5 runs slower with MTP enabled and while oQ8 is improved it's not as much as I think is expected?
Happy to test anything else if it helps?