feat(bench): inline run-time settings modal on bench tabs by jasonpaulso · Pull Request #1187 · jundot/omlx

jasonpaulso · 2026-05-11T18:25:21Z

Summary

Add a per-tab run-time settings modal to the Benchmark and Accuracy Benchmark tabs. Each tab can now adjust the full set of model settings (sampling, MTP, dflash, specprefill, turboquant, VLM MTP, etc.) for the next run without permanently mutating the model's saved settings — useful when iterating on bench configurations.

The current dashboard requires opening the model-settings modal (which writes back to disk on save), making it awkward to do quick what-if comparisons on bench runs. This change makes those overrides ephemeral by default with an explicit Save / Save as Profile / Reset workflow.

What changed

New shared macro dashboard/_settings_fields.html — the Basic + Advanced settings form, parameterized on a state object name (modelSettings, benchSettings, or accSettings). Replaces the inline form in _modal_model_settings.html.
New bench-tab modal dashboard/_modal_bench_settings.html — wraps the macro, adds bench-specific actions (Save, Save as Profile, Run, Reset) and a profile-creation form. Bound to a per-tab state object so the bench and accuracy tabs maintain independent overrides.
Generic panel ops in dashboard.js — _modelToSettingsState / _settingsStateToPayload / _emptySettingsState / _ENGINE_INIT_KEYS / _panelHydrate / _panelDirty / _panelRequiresReload / _panelReset / _panelSave / _panelSaveAsProfile. Bench and accuracy tabs expose thin wrappers (benchSettingsSave(), accSettingsReset(), etc.) so templates remain readable.
Bench/accuracy backends (omlx/admin/benchmark.py, omlx/admin/accuracy_benchmark.py, omlx/model_settings.py) — accept an optional settings_override payload on the run request and apply it ephemerally for that run. Persisted model settings are unchanged unless the user explicitly saves.
Tests — added coverage in tests/test_benchmark.py, tests/test_accuracy_benchmark.py, tests/test_model_settings.py for the override path and ephemeral application.

Field coverage

The shared macro covers every field the model-settings modal already exposes, including the recently-added upstream additions:

dflash_in_memory_cache_max_entries (feat(dflash): add configurable prefix cache max_entries setting #1120)
vlm_mtp_enabled / vlm_mtp_draft_model / vlm_mtp_draft_block_size (phase 2A/2B)

_ENGINE_INIT_KEYS is updated so the "requires reload" hint surfaces correctly when any of those new fields change in a bench panel.

Test plan

Bench tab → open settings modal → all fields render and bind to benchSettings.
Accuracy bench tab → same flow with accSettings, independent of bench tab state.
Edit a setting in the bench panel, run benchmark → run uses the override; saved model settings unchanged.
Save writes the override back to the model's persisted settings.
Save as Profile creates a named profile from the current panel state.
Reset restores the last hydrated baseline.
Original model-settings modal in the admin tab still renders and saves correctly (no regression from macro extraction).
Profile pills row renders on bench panels and applies on click.
pytest tests/test_model_settings.py tests/test_benchmark.py tests/test_accuracy_benchmark.py — 113 passed (1 pre-existing failure on main unrelated to this PR).

Notes

The macro lives in a separate file (_settings_fields.html) so future model-settings additions only need to be made in one place to flow to all three contexts (modal + 2 bench panels).
Bench-panel save reuses the existing PUT /admin/api/models/{id}/settings endpoint; there's no new persistence path.

Copilot

Pull request overview

Adds per-tab “Run-time Settings” modals for the Benchmark and Accuracy Benchmark tabs, enabling per-run (ephemeral) ModelSettings overrides without mutating persisted model_settings.json unless the user explicitly saves.

Changes:

Refactors the model settings form into a shared Jinja macro and introduces new bench/accuracy run-time settings modals bound to per-tab Alpine state.
Adds a backend “ephemeral overrides” layer in ModelSettingsManager and plumbs settings_override through benchmark/accuracy run requests for per-run application.
Updates dashboard JS to hydrate/compare/save per-tab settings state, and adds tests for override behavior and request schema.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
`tests/test_model_settings.py`	Adds coverage for `ModelSettingsManager.ephemeral_overrides` behavior (stacking, unknown keys, reversion).
`tests/test_benchmark.py`	Verifies `BenchmarkRequest.settings_override` defaults/accepts dict.
`tests/test_accuracy_benchmark.py`	Verifies `AccuracyBenchmarkRequest.settings_override` defaults/accepts dict.
`omlx/model_settings.py`	Implements ephemeral override stacking and merge-on-read in `get_settings()`.
`omlx/admin/templates/dashboard/_settings_fields.html`	New shared macro for rendering settings fields in modal/panel contexts.
`omlx/admin/templates/dashboard/_modal_model_settings.html`	Replaces large inline form with shared `settings_fields` macro.
`omlx/admin/templates/dashboard/_modal_bench_settings.html`	New run-time settings modal macro for bench/accuracy tabs (Save / Save as Profile / Run / Reset).
`omlx/admin/templates/dashboard/_bench.html`	Adds “Run-time settings” button and modified badge for throughput bench tab.
`omlx/admin/templates/dashboard/_bench_accuracy.html`	Adds “Run-time settings” button and modified badge for accuracy bench tab.
`omlx/admin/templates/dashboard.html`	Includes one run-time settings modal instance per bench tab.
`omlx/admin/templates/base.html`	Replaces lucide polling with a MutationObserver-based icon replacement pass.
`omlx/admin/static/js/dashboard.js`	Adds per-tab settings state, hydrate/dirty/save/apply ops, and sends `settings_override` on run requests.
`omlx/admin/benchmark.py`	Accepts `settings_override` and applies it ephemerally for the duration of the run.
`omlx/admin/accuracy_benchmark.py`	Accepts `settings_override` and applies it ephemerally for the duration of the run.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

jasonpaulso · 2026-05-19T15:25:29Z

+                                    <label class="block text-xs font-bold uppercase tracking-wider text-neutral-500 mb-2">{{ t('modal.model_settings.ttl_seconds') }}</label>
+                                    <input type="number" x-model.number="{{ state }}.ttl_seconds"
+                                           {% if mode == 'modal' %}:placeholder="ttlPlaceholder"{% else %}placeholder="default"{% endif %}
+                                           :disabled="{{ selected_model }}?.pinned"
+                                           min="0" step="60"
+                                           :class="{{ selected_model }}?.pinned ? 'bg-neutral-100 text-neutral-400 cursor-not-allowed' : ''"
+                                           class="w-full px-4 py-2.5 border border-neutral-200 rounded-xl text-sm focus:ring-2 focus:ring-neutral-900 focus:border-transparent transition-all">


@copilot apply changes based on this feedback

Four issues called out in review: 1. **Panel apply helpers persisted to the wrong model and hit the backend.** `_panelApplyProfile` / `_panelApplyTemplate` reused `applyProfileToForm` / `applyTemplateToForm`, which both POST to `/admin/api/models/${selectedModel.id}/...` and update server-side `active_profile_name`. In the bench/accuracy panels `selectedModel` may be unset or differ from the panel's `benchModelId` / `accModelId`, so the helpers (a) could target the wrong model, and (b) broke the ephemeral-by-default contract for bench panels. Fix: extract pure `_mergeProfileSettingsIntoState`, `_mergePresetSettingsIntoState`, and `_resetStatePresetFields` helpers that operate on any target state object without touching `selectedModel`/`activeProfileName` or calling the backend. The modal wrappers (`applyProfileToForm`, `applyPresetToForm`) now use these helpers and keep their own POST/active-profile bookkeeping. The panel wrappers (`_panelApplyProfile`, `_panelApplyTemplate`, `_panelApplyPreset`) use the helpers directly — no fetch, no `selectedModel` dependency, panel state only. `_panelApplyTemplate` previously also created a server-side profile when the template's name was new; that side effect is dropped too. Templates carry the same shape as profiles, so we merge them into panel state directly. 2. **`ttl_seconds || null` collapsed explicit 0.** The TTL field accepts 0 (meaning "no TTL"), but `s.ttl_seconds || null` mapped 0 → null, making it impossible to send an explicit 0 via `settings_override` or the bench panel's Save. Use `Number.isFinite(s.ttl_seconds) ? s.ttl_seconds : null` to mirror the temperature/top_p/top_k pattern already used elsewhere in `_settingsStateToPayload`. 3. **TTL placeholder hard-coded `"default"` in panel mode.** The modal already has a localized `ttlPlaceholder` getter that produces the right hint (`ttl_pinned` / `ttl_global_fallback` / `ttl_no_ttl`). The macro now binds `:placeholder="ttlPlaceholder"` unconditionally so the bench/accuracy panels render the same localized hint. The getter degrades cleanly when `selectedModel` is unset (the `pinned` branch is gated by an optional-chain). 4. **`_panelHydrate` silently swallowed non-OK profile fetches.** Other dashboard fetches redirect to `/admin` on 401 so users get a visible session-expired prompt; the profile fetch did not. It also discarded all other errors. Now: 401 redirects, other non-OK responses surface via `settingsStatus`, and network errors are both logged and surfaced. Verified: `node --check` clean, `dashboard.html` renders (3 `:placeholder="ttlPlaceholder"` bindings, 0 `placeholder="default"` residues), `tests/test_benchmark.py tests/test_accuracy_benchmark.py tests/test_model_settings.py` — 113 passed.

Run-time settings now live in a per-tab modal triggered from the Configuration card header on Throughput and Intelligence. Settings are ephemeral by default — applied only to the next bench run via a new SettingsManager.ephemeral_overrides() context manager that stacks overrides on top of the persisted layer. Both runners accept an optional settings_override field on the request body and wrap their model load + run in the override context, covering engine-init flags and sampling-class settings (bench reloads the model each run). Inside the modal: profile / global-template / preset pills scoped to the selected model, Reset (header), Save as Profile, Save to model settings (with dynamic save / saving / saved icon states), and a Run-benchmark shortcut that fires the run and dismisses the modal. Settings field markup is extracted to a shared Jinja macro (_settings_fields.html) so the bench-settings and model-settings modals stay in lockstep. base.html: Lucide processor is now resilient — per-icon try/catch (one bad icon no longer kills the whole pass, which was silently dropping icons later in DOM order on the accuracy tab) and a MutationObserver replaces the 300ms poll, catching dynamically revealed icons immediately. Tests: 13 ephemeral-override unit tests covering nested / concurrent / exception / mutual-exclusion paths, plus schema tests for both bench endpoints.

Four issues called out in review: 1. **Panel apply helpers persisted to the wrong model and hit the backend.** `_panelApplyProfile` / `_panelApplyTemplate` reused `applyProfileToForm` / `applyTemplateToForm`, which both POST to `/admin/api/models/${selectedModel.id}/...` and update server-side `active_profile_name`. In the bench/accuracy panels `selectedModel` may be unset or differ from the panel's `benchModelId` / `accModelId`, so the helpers (a) could target the wrong model, and (b) broke the ephemeral-by-default contract for bench panels. Fix: extract pure `_mergeProfileSettingsIntoState`, `_mergePresetSettingsIntoState`, and `_resetStatePresetFields` helpers that operate on any target state object without touching `selectedModel`/`activeProfileName` or calling the backend. The modal wrappers (`applyProfileToForm`, `applyPresetToForm`) now use these helpers and keep their own POST/active-profile bookkeeping. The panel wrappers (`_panelApplyProfile`, `_panelApplyTemplate`, `_panelApplyPreset`) use the helpers directly — no fetch, no `selectedModel` dependency, panel state only. `_panelApplyTemplate` previously also created a server-side profile when the template's name was new; that side effect is dropped too. Templates carry the same shape as profiles, so we merge them into panel state directly. 2. **`ttl_seconds || null` collapsed explicit 0.** The TTL field accepts 0 (meaning "no TTL"), but `s.ttl_seconds || null` mapped 0 → null, making it impossible to send an explicit 0 via `settings_override` or the bench panel's Save. Use `Number.isFinite(s.ttl_seconds) ? s.ttl_seconds : null` to mirror the temperature/top_p/top_k pattern already used elsewhere in `_settingsStateToPayload`. 3. **TTL placeholder hard-coded `"default"` in panel mode.** The modal already has a localized `ttlPlaceholder` getter that produces the right hint (`ttl_pinned` / `ttl_global_fallback` / `ttl_no_ttl`). The macro now binds `:placeholder="ttlPlaceholder"` unconditionally so the bench/accuracy panels render the same localized hint. The getter degrades cleanly when `selectedModel` is unset (the `pinned` branch is gated by an optional-chain). 4. **`_panelHydrate` silently swallowed non-OK profile fetches.** Other dashboard fetches redirect to `/admin` on 401 so users get a visible session-expired prompt; the profile fetch did not. It also discarded all other errors. Now: 401 redirects, other non-OK responses surface via `settingsStatus`, and network errors are both logged and surfaced. Verified: `node --check` clean, `dashboard.html` renders (3 `:placeholder="ttlPlaceholder"` bindings, 0 `placeholder="default"` residues), `tests/test_benchmark.py tests/test_accuracy_benchmark.py tests/test_model_settings.py` — 113 passed.

Copilot AI review requested due to automatic review settings May 11, 2026 18:25

Copilot AI reviewed May 11, 2026

View reviewed changes

jasonpaulso force-pushed the pr/bench-inline-settings branch from 3b2e672 to 43b09ad Compare May 11, 2026 18:48

jasonpaulso added 2 commits May 19, 2026 08:38

jasonpaulso force-pushed the pr/bench-inline-settings branch from 43b09ad to ec4cb0f Compare May 19, 2026 15:42

jasonpaulso added a commit to jasonpaulso/omlx-plus that referenced this pull request May 19, 2026

merge: PR jundot#1187 bench inline settings

0d198fb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(bench): inline run-time settings modal on bench tabs#1187

feat(bench): inline run-time settings modal on bench tabs#1187
jasonpaulso wants to merge 2 commits into
jundot:mainfrom
jasonpaulso:pr/bench-inline-settings

jasonpaulso commented May 11, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

jasonpaulso May 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jasonpaulso commented May 11, 2026

Summary

What changed

Field coverage

Test plan

Notes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

jasonpaulso May 19, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants