Add n-gram speculative fallback for native MTP by youndukn · Pull Request #1319 · jundot/omlx

youndukn · 2026-05-20T04:41:32Z

Summary

Add optional draftless n-gram speculation in the native MTP BatchGenerator path.
Prefer used n-gram continuations, then repeated prompt n-grams, with native MTP as adaptive fallback on n-gram misses.
Add model settings/profile fields, prompt-token tracking, focused tests, and a concise benchmark note.

Disabled by default.
Current safe target is greedy long-context roleplay or repeated conversation structure with short drafts.

python -m py_compile omlx/model_settings.py omlx/model_profiles.py omlx/engine/batched.py omlx/scheduler.py omlx/patches/mlx_lm_mtp/batch_generator.py tests/test_mlx_lm_mtp_patch.py
PYTHONPATH=/Users/youndukn/projects/oMLX pytest -q tests/test_mlx_lm_mtp_patch.py::TestModelSettingsMtp
PYTHONPATH=/Users/youndukn/projects/oMLX pytest -q tests/test_mlx_lm_mtp_patch.py

40-turn roleplay benchmark, 320 generated tokens, greedy decoding:

feat(mtp): add ngram speculative fallback

1ce631a

youndukn force-pushed the codex/ngram-mtp-speculation branch from 09b7ab7 to 1ce631a Compare May 20, 2026 04:51

youndukn closed this May 20, 2026