fix(admin/benchmark): forward SpecPrefill model settings to engine by SuperMarioYL · Pull Request #1191 · jundot/omlx

SuperMarioYL · 2026-05-11T19:21:54Z

Summary

The dashboard benchmark path read ModelSettings.specprefill_enabled only
to tag run.experimental_features, but never forwarded the matching
specprefill_keep_pct / specprefill_threshold overrides to
engine.stream_generate (in _run_single_test) or to
engine_core.add_request (in _run_batch_test).

The chat-completion path already loads these from ModelSettingsManager
(see server.py around the per-request override block),
so end-users see SpecPrefill engage at their configured threshold during
normal chat — but a benchmark for the same model silently falls back to
DEFAULT_THRESHOLD=8192 and reports numbers that don't reflect the
configured threshold (e.g. the issue's reproducer: threshold set to 512,
benchmark only engages SpecPrefill above 8192).

What changed

omlx/admin/benchmark.py:

Build a specprefill_kwargs dict once at run-start, alongside the
existing experimental-features snapshot.
Add an optional specprefill_kwargs param to _run_single_test and
_run_batch_test and forward via **kwargs into
engine.stream_generate(...) / engine_core.add_request(...).
Pass specprefill_kwargs through to the warmup call too, so the
warmup path doesn't load a different code branch than the measured
runs.

When the model has no SpecPrefill settings the dict stays empty and no
specprefill_* kwarg leaks into stream_generate — preserving the
default engine path for models the user has not opted in for.

tests/test_benchmark.py:

New TestSpecPrefillSettingsLoad class with three regression tests
that mock stream_generate / add_request and assert the threshold
and keep_pct round-trip end-to-end through the benchmark helpers, plus
a no-leakage case for models without SpecPrefill enabled.

Test plan

python -m py_compile omlx/admin/benchmark.py tests/test_benchmark.py
Isolated test run of the four TestSpecPrefillSettingsLoad cases
(stubbed mlx.core to import omlx.admin.benchmark standalone on
a Linux dev machine without the MLX runtime) — all pass on the
branch and the assertions fail on main as expected.
Full pytest tests/test_benchmark.py in the project MLX
environment (not runnable in this dev environment without the
Apple Silicon MLX runtime).

Reviewer kill-question (anticipated)

"Why not push the settings-manager load into engine.stream_generate
itself?"

The engine layer is intentionally settings-manager-agnostic — every
caller (chat-completion, benchmark, tests) forwards the per-request
override. Pushing settings-manager into the engine would duplicate the
existing forward-flow and add a new dependency cycle from
omlx/engine/* into omlx.model_settings. The benchmark route is the
only layer that was missing the forward step, so the fix lives there.

The dashboard benchmark path read the model's `specprefill_enabled` flag only to tag `run.experimental_features` but never forwarded the matching `specprefill_keep_pct` / `specprefill_threshold` overrides to `engine.stream_generate` (in `_run_single_test`) or to `engine_core.add_request` (in `_run_batch_test`). The chat-completion path already loads these from `ModelSettingsManager` and passes them per-request, so end-users see SpecPrefill engage at their configured threshold during normal chat but the benchmark silently falls back to `DEFAULT_THRESHOLD=8192` and reports numbers that don't match the model's configuration. Build a `specprefill_kwargs` dict once at run-start, alongside the existing experimental-features snapshot, and thread it through the warmup call, every `_run_single_test` invocation, and every `_run_batch_test` invocation. When the model has no SpecPrefill settings the dict stays empty and no `specprefill_*` kwarg leaks into `stream_generate` — preserving the default engine path for models the user has not opted in for. Adds three regression tests under `TestSpecPrefillSettingsLoad` that mock `stream_generate` / `add_request` and assert the threshold / keep_pct round-trip end-to-end through the benchmark helpers, plus a no-leakage case. Fixes jundot#1145.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(admin/benchmark): forward SpecPrefill model settings to engine#1191

fix(admin/benchmark): forward SpecPrefill model settings to engine#1191
SuperMarioYL wants to merge 1 commit into
jundot:mainfrom
SuperMarioYL:fix/benchmark-specprefill-settings-1145

SuperMarioYL commented May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

SuperMarioYL commented May 11, 2026

Summary

What changed

Test plan

Reviewer kill-question (anticipated)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant