Skip to content

feat(health): expose WebUI stream runtime diagnostics#2524

Merged
1 commit merged into
nesquena:masterfrom
AJV20:feat/webui-runtime-diagnostics
May 20, 2026
Merged

feat(health): expose WebUI stream runtime diagnostics#2524
1 commit merged into
nesquena:masterfrom
AJV20:feat/webui-runtime-diagnostics

Conversation

@AJV20
Copy link
Copy Markdown
Contributor

@AJV20 AJV20 commented May 18, 2026

Thinking Path

  • WebUI chat slowness/stuckness is hard to diagnose from the browser when SSE streams exist but no tab is attached or events are buffering offline.
  • /health?deep=1 already collects operational checks for supervisors and support workflows.
  • Adding non-sensitive stream counters there gives operators evidence without exposing prompts, tool arguments, event payloads, paths, or secrets.

What Changed

  • Adds StreamChannel.diagnostic_snapshot() with subscriber and offline-buffer counts.
  • Adds _stream_runtime_diagnostics() to summarize active WebUI SSE streams.
  • Includes the runtime summary in deep health checks under checks.stream_runtime.
  • Adds regression coverage for per-channel counters and aggregate stream diagnostics.
  • Updates CHANGELOG.md.

Why It Matters

When WebUI feels slower than a messaging chat, operators can now tell whether the server is actively streaming, whether any browser subscribers are attached, and whether offline events are accumulating.

Verification

  • python -m pytest tests/test_webui_runtime_diagnostics.py -q -o 'addopts='
  • python -m py_compile api/config.py api/routes.py
  • git diff --check

Risks / Follow-ups

  • The endpoint exposes stream ids and counts only; it intentionally avoids payloads and local filesystem/process details.
  • A follow-up UI could surface these counters in the control center or troubleshooting panel.

Model Used

OpenAI gpt-5.5 via Hermes Agent, with repository tools and targeted pytest/py_compile verification.

@nesquena-hermes
Copy link
Copy Markdown
Collaborator

Phase 0 quick note — agent review queued, holding for cron tick

Thanks @AJV20 for the diagnostics/surface-context/dashboard-link burst. These three (#2523 + #2524 + #2533) are well-scoped and CI-eligible, but they're <6 hours old from a first-time contributor with several other in-flight PRs (#2527 + #2526 are getting maintainer-review treatment separately).

For this sweep I'm holding all three out of the batch and letting the agent-review cron pick them up on its next tick — that's a structured per-PR diff/test/integration read with a recap comment, after which the parallel reviewer typically follows. Once both have landed, they'll be eligible for the next batch.

Nothing wrong with the PRs themselves at a glance — the burst pattern just means "scrutiny stays per-PR, not 'batch everything because they're all small'". Will revisit on the next sweep.

@nesquena-hermes nesquena-hermes force-pushed the feat/webui-runtime-diagnostics branch from ffd8567 to 54b6c38 Compare May 19, 2026 22:48
@nesquena-hermes nesquena-hermes closed this pull request by merging all changes into nesquena:master in 9c983e6 May 20, 2026
stevesu2021 pushed a commit to AIBusinessPlatformCenter/hermes-webui-original that referenced this pull request May 20, 2026
stevesu2021 pushed a commit to AIBusinessPlatformCenter/hermes-webui-original that referenced this pull request May 20, 2026
Unreleased section now reflects:
- PR nesquena#2598 live tool event dedup (AJV20)
- PR nesquena#2533 browser dashboard links (AJV20)
- PR nesquena#2607 messaging transcript dedup (AJV20)
- PR nesquena#2521 Geist Contrast skin (intellectronica)
- PR nesquena#2524 SSE runtime diagnostics endpoint (AJV20)

Removed merge markers and consolidated stray entries that leaked into the v0.51.94 release block.
eleboucher pushed a commit to eleboucher/homelab that referenced this pull request May 20, 2026
… 0.51.95) (#569)

This PR contains the following updates:

| Package | Update | Change |
|---|---|---|
| [ghcr.io/nesquena/hermes-webui](https://github.com/nesquena/hermes-webui) | patch | `0.51.92` → `0.51.95` |

---

### Release Notes

<details>
<summary>nesquena/hermes-webui (ghcr.io/nesquena/hermes-webui)</summary>

### [`v0.51.95`](https://github.com/nesquena/hermes-webui/blob/HEAD/CHANGELOG.md#v05195--2026-05-20--Release-BS-stage-388--5-PR-batch--live-tool-callback-event-dedup--browser-only-dashboard-links--messaging-transcript-merge-alignment--Geist-Contrast-skin--SSE-runtime-diagnostics)

[Compare Source](nesquena/hermes-webui@v0.51.94...v0.51.95)

##### Fixed

- **PR [#&#8203;2598](nesquena/hermes-webui#2598 by [@&#8203;AJV20](https://github.com/AJV20) — Surface live tool activity when Hermes Agent reports tools through its dedicated `tool_start_callback` / `tool_complete_callback` path, so browser chat shows the existing running tool cards instead of appearing idle until the final answer. The legacy `on_tool` callback path now early-returns for `tool.started` and `tool.completed` events when the structured callback path is already wired, preventing the same tool event from being emitted twice to the SSE stream.
- **PR [#&#8203;2533](nesquena/hermes-webui#2533 by [@&#8203;AJV20](https://github.com/AJV20) — Allow Settings → System to save public browser-only Official Hermes Dashboard links (for reverse-proxy URLs) without treating them as server-side probe targets. URL sanitization runs against the configured link before save; the dashboard probe is skipped for browser-only links.
- **PR [#&#8203;2607](nesquena/hermes-webui#2607 by [@&#8203;AJV20](https://github.com/AJV20) — Deduplicate messaging/CLI session transcript rows when the sidecar and state store encode the same no-id message with equivalent timestamps in different formats (e.g. `"10.0"` vs `10`), preventing repeated visible chat rows after session reconstruction. The messaging-display merge now reuses `api.models._session_message_merge_key(...)` instead of an ad-hoc dedup key, aligning with the existing append-only merge path.

##### Added

- **PR [#&#8203;2521](nesquena/hermes-webui#2521 by [@&#8203;intellectronica](https://github.com/intellectronica) — Add the Geist Contrast skin to the appearance picker. New light + dark variant pair with a high-contrast yellow-on-black accent and Geist editorial typography. Default unchanged — opt-in via Settings → Appearance → Skin → Geist Contrast. Slash command `/theme geist-contrast` now resolves correctly because the lookup matches against `skin.value` rather than `skin.name`. Documented in `THEMES.md` with a forward-compatible skin count (no hard-coded value).
- **PR [#&#8203;2524](nesquena/hermes-webui#2524 by [@&#8203;AJV20](https://github.com/AJV20) — Add non-sensitive SSE stream runtime diagnostics to deep health checks (`/health?deep=1`), including active stream count, subscriber totals, and offline buffered-event counts for stuck or slow WebUI chat investigations. Read-only telemetry; existing surfaces unchanged.

### [`v0.51.94`](https://github.com/nesquena/hermes-webui/blob/HEAD/CHANGELOG.md#v05194--2026-05-19--Release-BR-stage-387--10-PR-full-sweep-batch--Slice-4b-runner-adapter-facade--folder-zip-download--partial-recovery-marker-dedupe--browser-api-client-side-timeout--auto-compression-card-rotation-finish--composer-draft-rollback-fix--metadata-count-reconciliation--active-session-refresh-on-external-sidecar-updates--indexed-context-metadata--gateway-queues-approval-peek)

[Compare Source](nesquena/hermes-webui@v0.51.93...v0.51.94)

##### Fixed

- **PR [#&#8203;2566](nesquena/hermes-webui#2566 by [@&#8203;bjb2](https://github.com/bjb2) — Add `GET /api/folder/download?session_id=...&path=...` streaming-zip endpoint with pre-flight 413 on size/file-count cap exceeded, `os.walk(followlinks=False)` plus per-symlink workspace-root resolution check, `allowZip64=True` for large files, and a "Download Folder" item in the workspace file context menu (dir items only). Configurable caps via `HERMES_WEBUI_FOLDER_ZIP_MAX_MB` (1024 default) and `HERMES_WEBUI_FOLDER_ZIP_MAX_FILES` (50000 default). `download_folder` i18n key added across all 11 locales with `// TODO: translate` fallback markers for non-en entries.
- **PR [#&#8203;2593](nesquena/hermes-webui#2593 by [@&#8203;Michaelyklam](https://github.com/Michaelyklam) (closes [#&#8203;2592](nesquena/hermes-webui#2592)) — Deduplicate cancelled/recovered partial assistant markers using the full `(content, reasoning, partial tool calls)` payload instead of only non-empty text content. Tool-only failed turns no longer append identical empty-content `_partial` messages repeatedly. Full session loads collapse adjacent duplicate partial markers from already-bloated session files while preserving a `.partial-bak-<timestamp>` backup. New helpers `_partial_message_signature()` (api/streaming.py:2593-2622) + `_partial_marker_already_present()` (api/streaming.py:2625-2641) scope the dedup search to the current user turn only.
- **PR [#&#8203;2597](nesquena/hermes-webui#2597 by [@&#8203;dso2ng](https://github.com/dso2ng) (closes [#&#8203;2539](nesquena/hermes-webui#2539)) — Add a 30s default client-side timeout to the shared browser `api()` helper, with per-call `timeoutMs` overrides, `AbortController`-based cancellation, a timeout toast, and explicit 60s/120s ceilings for legitimately longer update flows. Body-read phase also raced against the timeout so a server that replies headers-OK and then stalls mid-JSON rejects cleanly. New `tests/test_api_timeout.py` covers default, override, abort, and body-read-stall paths.
- **PR [#&#8203;2601](nesquena/hermes-webui#2601 by [@&#8203;starship-s](https://github.com/starship-s) — Prevent the composer-draft rollback regression introduced by [#&#8203;2581](nesquena/hermes-webui#2581 active-session external-refresh polling. Adds `opts.preserveActiveInput` to `_restoreComposerDraft` and gates the overwrite on `current && current !== text`, keeping the guard co-located with the function that owns the contract. Backend `s.save(touch_updated_at=False)` for `/api/session/draft` so draft autosaves no longer falsely advance `updated_at` and trigger the refresh poll. Supersedes parallel-discovery PR [#&#8203;2602](nesquena/hermes-webui#2602).
- **PR [#&#8203;2603](nesquena/hermes-webui#2603 by [@&#8203;starship-s](https://github.com/starship-s) — Finish the running auto-compression card after the backend rotates the session id. The `compressed` SSE listener at `static/messages.js:1829-1862` used to early-return whenever `S.session.session_id !== activeSid`, but the `state` event listener at `:1656-1662` already rotates `window._compressionUi.sessionId` to the continuation id before `compressed` arrives. The strict active-session check is replaced with a cross-session safety check that still rejects mismatched events but no longer rejects the legitimate post-rotation `done` payload, so the elapsed-timer "compressing…" state no longer freezes after rotation completes.
- **PR [#&#8203;2604](nesquena/hermes-webui#2604 by [@&#8203;Michaelyklam](https://github.com/Michaelyklam) (closes [#&#8203;2594](nesquena/hermes-webui#2594)) — Reconcile session metadata counts in the `/api/session?messages=0` fast path. Replaces the prior `max(sidecar_count, state_count)` heuristic with `len(merge_session_messages_append_only(sidecar_messages, state_db_messages))` so the metadata-only count matches the full-load count. Closes the followup issue filed against PR [#&#8203;2581](nesquena/hermes-webui#2581) / v0.51.93 — sidebar refresh polling no longer loops forever when `state.db` retains old rows that the append-only merge correctly filters out.
- **PR [#&#8203;2605](nesquena/hermes-webui#2605 by [@&#8203;LumenYoung](https://github.com/LumenYoung) (refs [#&#8203;2581](nesquena/hermes-webui#2581)) — Make the metadata-only `/api/session?messages=0&resolve_model=0` path return the persisted sidecar `message_count` from `Session._metadata_message_count` when no session-index entry exists, so the active-session external-refresh signal still trips on legacy sessions whose sidecar contains externally-appended content. Composed cleanly with [#&#8203;2604](nesquena/hermes-webui#2604) (the legacy-fallback applies only when the reconciled merged count is zero).
- **PR [#&#8203;2573](nesquena/hermes-webui#2573 by [@&#8203;espokaos-ops](https://github.com/espokaos-ops) (closes [#&#8203;2510](nesquena/hermes-webui#2510)) — Persist session-level approvals when a "Allow for this session" click lands while a stream is active and `_pending` is empty. The approval flow now peeks `_gateway_queues[sid]` to recover the queued `_ApprovalEntry`'s `pattern_keys` so `approve_session()` records the approval; the next dangerous command in the same session no longer asks again. Reduced scope to peek-only per prior review note; the `agent_session_key` round-trip plumbing was dropped (it was dead on the WebUI streaming path).

##### Added

- **PR [#&#8203;2599](nesquena/hermes-webui#2599 by [@&#8203;Michaelyklam](https://github.com/Michaelyklam) (refs [#&#8203;1925](nesquena/hermes-webui#1925)) — Add the Slice 4b `RunnerRuntimeAdapter` facade — a protocol-translator client over a future runner/sidecar backend. The facade delegates `start_run`, `observe_run`, `get_run`, and control calls to an injected runner client, normalizes results into the existing `RunStartResult`/`RunEventStream`/`RunStatus`/`ControlResult` dataclasses, carries explicit `profile`/`workspace`/`model` payload fields, and returns bounded `unsupported` control results without owning `AIAgent`, stream lifecycle, cancel/approval/clarify queues, goal state, or cached-agent table. No route wiring, no default-on runner mode, no public response-shape change.
- **PR [#&#8203;2600](nesquena/hermes-webui#2600 by [@&#8203;LumenYoung](https://github.com/LumenYoung) (refs [#&#8203;2266](nesquena/hermes-webui#2266)) — Slimmer WebUI follow-up from the closed LCM/context-engine PR [#&#8203;2266](nesquena/hermes-webui#2266). Adds rendering and persistence for context-engine compression-anchor metadata (when present on a session or live compression event) including an "Indexed context" detail line on auto-compression cards. No agent-layer clone orchestration; WebUI-only metadata surface.

### [`v0.51.93`](https://github.com/nesquena/hermes-webui/blob/HEAD/CHANGELOG.md#v05193--2026-05-19--Release-BQ-stage-386--10-PR-full-sweep-batch--RFC-Slice-4-runnersidecar-gate--workspace-tree-toggle-width-CSS-variable--settled-file-markdown-link-rendering--prompt-cache-coverage-percentage-fix--terminal-shell-shutdown-reap--configured-model-picker-provider-preservation--profile-aware-assistant-display-names--statedb-reconciliation-slice-1--queued-message-cross-session-drain-fix--stale-stream-writeback-supersede)

[Compare Source](nesquena/hermes-webui@v0.51.92...v0.51.93)

##### Fixed

- **PR [#&#8203;2580](nesquena/hermes-webui#2580 by [@&#8203;Michaelyklam](https://github.com/Michaelyklam) (refs [#&#8203;2571](nesquena/hermes-webui#2571)) — Centralize the workspace-tree toggle slot width into a `--file-tree-toggle-width` CSS variable at `:root`, referenced from both `.file-tree-toggle` and `.file-tree-toggle-placeholder` so a future width adjustment can't silently desync the two rules. Closes the followup issue filed against PR [#&#8203;2563](nesquena/hermes-webui#2563) / v0.51.92.
- **PR [#&#8203;2576](nesquena/hermes-webui#2576 by [@&#8203;dobby-d-elf](https://github.com/dobby-d-elf) (closes [#&#8203;470](nesquena/hermes-webui#470)) — Preserve labeled `file://` links in settled markdown by rewriting them to `/api/media?path=...&inline=1` before the sanitizer drops them. The streamed and settled markdown paths are now symmetric on local-file anchors, while raw `file://` image sources continue to be blocked.
- **PR [#&#8203;2579](nesquena/hermes-webui#2579 by [@&#8203;starship-s](https://github.com/starship-s) (refs [#&#8203;2419](nesquena/hermes-webui#2419), [#&#8203;2421](nesquena/hermes-webui#2421)) — Fix the prompt-cache hit percentage to display the fraction of the prompt served from cache (`cache_read / prompt_total`) instead of the meaningless `cache_read / (cache_read + cache_write)`. New `api/usage.py` `prompt_cache_hit_percent()` helper matches Hermes Agent's log convention; UI labels updated across all locales.
- **PR [#&#8203;2582](nesquena/hermes-webui#2582 by [@&#8203;Michaelyklam](https://github.com/Michaelyklam) (refs [#&#8203;2577](nesquena/hermes-webui#2577)) — Harden embedded workspace-terminal shell cleanup so graceful WebUI shutdowns close/reap every active PTY shell and the spawned shell receives a Linux parent-death signal (`PR_SET_PDEATHSIG`) if the WebUI process dies. The terminal close path now waits again after `SIGKILL` so timed-out shells don't remain unreaped.
- **PR [#&#8203;2583](nesquena/hermes-webui#2583 by [@&#8203;dobby-d-elf](https://github.com/dobby-d-elf) — Make assistant display names properly profile-aware. The saved assistant-name preference applies only to the literal `default` profile; named profiles use their own profile name. Centralizes `assistantDisplayName()` resolution across composer placeholder, `document.title` via `syncTopbar()`, message role labels via `_assistantRoleHtml()`, browser notifications, cancel-copy fallback, and empty-state on session delete.
- **PR [#&#8203;2584](nesquena/hermes-webui#2584 by [@&#8203;wirtsi](https://github.com/wirtsi) (closes [#&#8203;2585](nesquena/hermes-webui#2585)) — Prevent queued follow-up messages from draining into the wrong chat when the user switches sessions during the 120ms `setBusy(false)` drain window. The drain-time guard re-queues against `sid` (not the currently-viewed session) and `_sendInProgressSid` captures the activeSid at the commit point so the re-entrant `send()` path no longer reads a stale `S.session.session_id`.
- **PR [#&#8203;2587](nesquena/hermes-webui#2587 by [@&#8203;AJV20](https://github.com/AJV20) — Allow a still-running stream that was mistakenly marked interrupted by stale-pending recovery to replace its own recovery marker when it later finishes, while continuing to block stale writeback after any newer turn appends transcript content. Three new tests in `tests/test_session_sidecar_repair.py` cover the supersede-allowed and the two refuse cases.
- **PR [#&#8203;2588](nesquena/hermes-webui#2588 by [@&#8203;Michaelyklam](https://github.com/Michaelyklam) (refs [#&#8203;2569](nesquena/hermes-webui#2569)) — Preserve the configured provider when choosing a configured model from the composer picker. `_getOptionProviderId()` now reads `data-provider` from temporary `<option data-custom="1">` rows (created by `selectModelFromDropdown` for configured models outside the native catalog), so the next send routes through the correct provider instead of falling back to whatever provider was already active.

##### Changed

- **PR [#&#8203;2581](nesquena/hermes-webui#2581 by [@&#8203;LumenYoung](https://github.com/LumenYoung) (refs [#&#8203;2194](nesquena/hermes-webui#2194)) — First recovery slice from the closed reconciliation PR [#&#8203;2194](nesquena/hermes-webui#2194). Routes streaming session reconstruction and sidebar metadata through the reconciled state.db/session-summary path with a metadata-only fast path for sidebar polls and a single-snapshot reuse on the streaming hot path. Includes the reviewer-requested `_new_turn_context_from_messages` extraction so both legacy and streaming paths share the `_drop_checkpointed_current_user_from_context` + casual-fresh-chat suppression behavior (refs [#&#8203;1217](nesquena/hermes-webui#1217) / [#&#8203;2308](nesquena/hermes-webui#2308)). 923 LOC across `api/models.py`, `api/routes.py`, `api/streaming.py`, `static/sessions.js` + four new test files; second-pass agent diff review LGTM after the streaming-path regression was caught and fixed.

##### Documentation

- **PR [#&#8203;2575](nesquena/hermes-webui#2575 by [@&#8203;Michaelyklam](https://github.com/Michaelyklam) (refs [#&#8203;1925](nesquena/hermes-webui#1925)) — Advance the runtime-adapter RFC to the Slice 4 runner/sidecar planning gate after [#&#8203;2560](nesquena/hermes-webui#2560) shipped the queue-staging clarification. The RFC now marks queue routing as staged by default, defines Slice 4a as a docs/test contract before any runner code lands, and pins default-off feature-flagging, restart/reattach success criteria, control parity, profile/workspace payload isolation, and explicit non-goals for legacy-backend removal or server-side queue scheduler work.

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about these updates again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box

---

This PR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0My4xMDEuMSIsInVwZGF0ZWRJblZlciI6IjQzLjEwMS4xIiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6WyJyZW5vdmF0ZS9jb250YWluZXIiLCJ0eXBlL3BhdGNoIl19-->

Reviewed-on: https://git.erwanleboucher.dev/eleboucher/homelab/pulls/569
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants