fix(admin/benchmark): forward SpecPrefill model settings to engine#1191
Open
SuperMarioYL wants to merge 1 commit into
Open
fix(admin/benchmark): forward SpecPrefill model settings to engine#1191SuperMarioYL wants to merge 1 commit into
SuperMarioYL wants to merge 1 commit into
Conversation
The dashboard benchmark path read the model's `specprefill_enabled` flag only to tag `run.experimental_features` but never forwarded the matching `specprefill_keep_pct` / `specprefill_threshold` overrides to `engine.stream_generate` (in `_run_single_test`) or to `engine_core.add_request` (in `_run_batch_test`). The chat-completion path already loads these from `ModelSettingsManager` and passes them per-request, so end-users see SpecPrefill engage at their configured threshold during normal chat but the benchmark silently falls back to `DEFAULT_THRESHOLD=8192` and reports numbers that don't match the model's configuration. Build a `specprefill_kwargs` dict once at run-start, alongside the existing experimental-features snapshot, and thread it through the warmup call, every `_run_single_test` invocation, and every `_run_batch_test` invocation. When the model has no SpecPrefill settings the dict stays empty and no `specprefill_*` kwarg leaks into `stream_generate` — preserving the default engine path for models the user has not opted in for. Adds three regression tests under `TestSpecPrefillSettingsLoad` that mock `stream_generate` / `add_request` and assert the threshold / keep_pct round-trip end-to-end through the benchmark helpers, plus a no-leakage case. Fixes jundot#1145.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #1145.
The dashboard benchmark path read
ModelSettings.specprefill_enabledonlyto tag
run.experimental_features, but never forwarded the matchingspecprefill_keep_pct/specprefill_thresholdoverrides toengine.stream_generate(in_run_single_test) or toengine_core.add_request(in_run_batch_test).The chat-completion path already loads these from
ModelSettingsManager(see
server.pyaround the per-request override block),so end-users see SpecPrefill engage at their configured threshold during
normal chat — but a benchmark for the same model silently falls back to
DEFAULT_THRESHOLD=8192and reports numbers that don't reflect theconfigured threshold (e.g. the issue's reproducer: threshold set to 512,
benchmark only engages SpecPrefill above 8192).
What changed
omlx/admin/benchmark.py:specprefill_kwargsdict once at run-start, alongside theexisting experimental-features snapshot.
specprefill_kwargsparam to_run_single_testand_run_batch_testand forward via**kwargsintoengine.stream_generate(...)/engine_core.add_request(...).specprefill_kwargsthrough to the warmup call too, so thewarmup path doesn't load a different code branch than the measured
runs.
When the model has no SpecPrefill settings the dict stays empty and no
specprefill_*kwarg leaks intostream_generate— preserving thedefault engine path for models the user has not opted in for.
tests/test_benchmark.py:TestSpecPrefillSettingsLoadclass with three regression teststhat mock
stream_generate/add_requestand assert the thresholdand keep_pct round-trip end-to-end through the benchmark helpers, plus
a no-leakage case for models without SpecPrefill enabled.
Test plan
python -m py_compile omlx/admin/benchmark.py tests/test_benchmark.pyTestSpecPrefillSettingsLoadcases(stubbed
mlx.coreto importomlx.admin.benchmarkstandalone ona Linux dev machine without the MLX runtime) — all pass on the
branch and the assertions fail on
mainas expected.pytest tests/test_benchmark.pyin the project MLXenvironment (not runnable in this dev environment without the
Apple Silicon MLX runtime).
Reviewer kill-question (anticipated)
The engine layer is intentionally settings-manager-agnostic — every
caller (chat-completion, benchmark, tests) forwards the per-request
override. Pushing settings-manager into the engine would duplicate the
existing forward-flow and add a new dependency cycle from
omlx/engine/*intoomlx.model_settings. The benchmark route is theonly layer that was missing the forward step, so the fix lives there.