Skip to content

fix(admin/benchmark): forward SpecPrefill model settings to engine#1191

Open
SuperMarioYL wants to merge 1 commit into
jundot:mainfrom
SuperMarioYL:fix/benchmark-specprefill-settings-1145
Open

fix(admin/benchmark): forward SpecPrefill model settings to engine#1191
SuperMarioYL wants to merge 1 commit into
jundot:mainfrom
SuperMarioYL:fix/benchmark-specprefill-settings-1145

Conversation

@SuperMarioYL
Copy link
Copy Markdown

Summary

Fixes #1145.

The dashboard benchmark path read ModelSettings.specprefill_enabled only
to tag run.experimental_features, but never forwarded the matching
specprefill_keep_pct / specprefill_threshold overrides to
engine.stream_generate (in _run_single_test) or to
engine_core.add_request (in _run_batch_test).

The chat-completion path already loads these from ModelSettingsManager
(see server.py around the per-request override block),
so end-users see SpecPrefill engage at their configured threshold during
normal chat — but a benchmark for the same model silently falls back to
DEFAULT_THRESHOLD=8192 and reports numbers that don't reflect the
configured threshold (e.g. the issue's reproducer: threshold set to 512,
benchmark only engages SpecPrefill above 8192).

What changed

omlx/admin/benchmark.py:

  • Build a specprefill_kwargs dict once at run-start, alongside the
    existing experimental-features snapshot.
  • Add an optional specprefill_kwargs param to _run_single_test and
    _run_batch_test and forward via **kwargs into
    engine.stream_generate(...) / engine_core.add_request(...).
  • Pass specprefill_kwargs through to the warmup call too, so the
    warmup path doesn't load a different code branch than the measured
    runs.

When the model has no SpecPrefill settings the dict stays empty and no
specprefill_* kwarg leaks into stream_generate — preserving the
default engine path for models the user has not opted in for.

tests/test_benchmark.py:

  • New TestSpecPrefillSettingsLoad class with three regression tests
    that mock stream_generate / add_request and assert the threshold
    and keep_pct round-trip end-to-end through the benchmark helpers, plus
    a no-leakage case for models without SpecPrefill enabled.

Test plan

  • python -m py_compile omlx/admin/benchmark.py tests/test_benchmark.py
  • Isolated test run of the four TestSpecPrefillSettingsLoad cases
    (stubbed mlx.core to import omlx.admin.benchmark standalone on
    a Linux dev machine without the MLX runtime) — all pass on the
    branch and the assertions fail on main as expected.
  • Full pytest tests/test_benchmark.py in the project MLX
    environment (not runnable in this dev environment without the
    Apple Silicon MLX runtime).

Reviewer kill-question (anticipated)

"Why not push the settings-manager load into engine.stream_generate
itself?"

The engine layer is intentionally settings-manager-agnostic — every
caller (chat-completion, benchmark, tests) forwards the per-request
override. Pushing settings-manager into the engine would duplicate the
existing forward-flow and add a new dependency cycle from
omlx/engine/* into omlx.model_settings. The benchmark route is the
only layer that was missing the forward step, so the fix lives there.

The dashboard benchmark path read the model's `specprefill_enabled`
flag only to tag `run.experimental_features` but never forwarded the
matching `specprefill_keep_pct` / `specprefill_threshold` overrides to
`engine.stream_generate` (in `_run_single_test`) or to
`engine_core.add_request` (in `_run_batch_test`).

The chat-completion path already loads these from
`ModelSettingsManager` and passes them per-request, so end-users see
SpecPrefill engage at their configured threshold during normal chat
but the benchmark silently falls back to `DEFAULT_THRESHOLD=8192` and
reports numbers that don't match the model's configuration.

Build a `specprefill_kwargs` dict once at run-start, alongside the
existing experimental-features snapshot, and thread it through the
warmup call, every `_run_single_test` invocation, and every
`_run_batch_test` invocation. When the model has no SpecPrefill
settings the dict stays empty and no `specprefill_*` kwarg leaks into
`stream_generate` — preserving the default engine path for models the
user has not opted in for.

Adds three regression tests under
`TestSpecPrefillSettingsLoad` that mock `stream_generate` /
`add_request` and assert the threshold / keep_pct round-trip end-to-end
through the benchmark helpers, plus a no-leakage case.

Fixes jundot#1145.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] v0.3.8 - SpecPrefill model settings are not loaded and always fallback to default values such as DEFAULT_THRESHOLD=8192

1 participant