Skip to content

fix: surface auto-compression handoff#2567

Merged
1 commit merged into
nesquena:masterfrom
dso2ng:fix/2477-compression-handoff-event
May 19, 2026
Merged

fix: surface auto-compression handoff#2567
1 commit merged into
nesquena:masterfrom
dso2ng:fix/2477-compression-handoff-event

Conversation

@dso2ng
Copy link
Copy Markdown
Contributor

@dso2ng dso2ng commented May 19, 2026

Thinking Path

#2477 was split into small slices. Slice A (elapsed timer) has shipped via #2512, and fallback/rate-limit warning transparency has shipped via #2505. The remaining narrow WebUI gap is the compressed-session handoff: during automatic compression the backend can rotate from the original browser session id to a new continuation id before the compressed SSE event is emitted.

Before this PR, that completion event used the rotated s.session_id. The frontend listener still compared d.session_id against the original activeSid, so the active browser stream could drop the completion event and keep showing a stale running compression card.

Refs #2477 (Slice B only).

What Changed

  • Keep the compressed SSE event correlated to the origin stream session:
    • session_id / old_session_id now carry the pre-compression origin session id.
    • new_session_id / continuation_session_id carry the compressed continuation id.
  • Update the frontend compressed listener to correlate on old_session_id before falling back to session_id.
  • Keep the continuation id as transient compression-card metadata and show it in the expanded automatic-compression detail line:
    • Continued in compressed session: <id>
  • Add regression coverage for the rotated-session completion path.
  • Add an Unreleased changelog entry.

Why It Matters

A long WebUI run should not look stuck right after auto-compression succeeds. The completion event must still reach the active browser stream even if the backend session id has already rotated to the compressed continuation session.

This keeps the completion card visible without inserting fake transcript messages or changing model context.

Scope Boundaries

This PR is Slice B only.

In scope:

  • auto-compression compressed SSE payload metadata
  • frontend correlation of the completion event
  • transient completion-card detail text

Out of scope:

Visual Evidence

Before/after illustration (not added to the code branch):

auto-compression handoff visual evidence

Verification

Targeted local verification on ea978a198931:

  • node --check static/ui.js
  • node --check static/messages.js
  • $HOME/.hermes/hermes-agent/venv/bin/python -m pytest tests/test_auto_compression_card.py -q
    • 36 passed
  • $HOME/.hermes/hermes-agent/venv/bin/python -m pytest tests/test_auto_compression_card.py tests/test_streaming_max_tokens_quota.py tests/test_inflight_stream_reuse.py tests/test_compression_snapshot_runtime_clear.py tests/test_session_lineage_collapse.py tests/test_streaming_session_sidebar.py -q
    • 73 passed
  • git diff --check
  • public diff hygiene:
    • non_ascii_added_lines=0
    • secret_like_added_lines=0

Full suite on the PR branch:

  • $HOME/.hermes/hermes-agent/venv/bin/python -m pytest tests/ -q
    • 5968 passed, 6 skipped, 3 xpassed, 8 subtests passed, 4 failed

The 4 failures are existing upstream baseline failures in tests/test_issue1527_lmstudio_base_url_classification.py; the same file fails on clean origin/master@718a4c76:

  • $HOME/.hermes/hermes-agent/venv/bin/python -m pytest tests/test_issue1527_lmstudio_base_url_classification.py -q
    • clean origin/master: 4 failed, 1 passed

Risks / Follow-ups

Low risk. The patch keeps the event shape backward-friendly by adding explicit metadata rather than removing fields. The frontend still ignores events for unrelated active sessions; it simply correlates completion using the origin id after compression rotation.

The continuation detail is transient UI only. It does not push into S.messages, does not alter durable transcript history, and does not change model-facing context.

Model Used

AI-assisted: OpenAI GPT-5.5 via Hermes Agent, with repository tools, local pytest verification, and an independent reviewer subagent.

@nesquena-hermes
Copy link
Copy Markdown
Collaborator

Phase 0 review — narrow Slice B of #2477, contract-clean; ready for batch

Pulled the PR head (ea978a19) and read the three-file diff against origin/master. This is the right shape for the remaining handoff gap — backward-compatible event payload, correlation on the origin id, transient UI only.

The bug shape this closes

Before this PR, the compression-side-effects block in api/streaming.py:4348-4435 would rotate session_id to the agent's new id (_agent_sid), persist the new id under SESSION_AGENT_LOCKS / SESSION_AGENT_CACHE, and then emit the completion event with 'session_id': s.session_id — i.e. the rotated id. The browser's listener (static/messages.js:1812) compared d.session_id !== activeSid and bounced the event for the original sid, leaving the running auto-compression card alive until the next /api/sessions poll healed it.

Server-side payload — clean

if _compression_continuation_session_id is None:
    _compression_continuation_session_id = s.session_id
put('compressed', {
    'session_id': _compression_origin_session_id,
    'old_session_id': _compression_origin_session_id,
    'new_session_id': _compression_continuation_session_id,
    'continuation_session_id': _compression_continuation_session_id,
    'message': 'Context auto-compressed to continue the conversation',
    'usage': _live_usage_snapshot(),
})

Two things I like here:

  • session_id is now the origin id (not the rotated one). That keeps the legacy contract intact for any consumer that only knows the original session_id field — they get the correlation they expected, just under different semantics. Subtle but right.
  • Both new_session_id and continuation_session_id are populated, so consumers that picked either name as their canonical handoff key continue to work.

The fallback at _compression_continuation_session_id is None: ... = s.session_id handles the "compression detected via compressor state, not via sid rotation" path (api/streaming.py:4412-4416). That's the case where _compressed = True was set but no actual rotation happened — s.session_id is unchanged, so the continuation id matches the origin id, which is correct: there is no handoff to surface, the card just transitions to done.

Client correlation — symmetric

const eventSid=d.old_session_id||d.session_id||activeSid;
if(eventSid!==activeSid) return;
const continuationSid=d.new_session_id||d.continuation_session_id||'';

Preferring old_session_id first is the correct order: with this PR, session_id is already the origin id, but if a future change re-points session_id to the continuation id (e.g. for symmetry with other event types), the listener still correlates correctly via old_session_id.

Transient UI only — verified

static/ui.js:5018-5021:

if(running)return elapsedLabel?`Elapsed: ${elapsedLabel}`:base;
const continuation=String(state&&state.continuationSessionId||'').trim();
const handoff=continuation?`Continued in compressed session: ${continuation}`:'';

continuationSessionId lives in the state object handed to setCompressionUi(state) — that's transient compression-card metadata, not a transcript message. window._compressionUi=null immediately follows (static/messages.js:1837) once appendLiveCompressionCard anchors the card to the turn, so renderMessages() won't duplicate it. Scope boundaries claimed in the PR body match what the code actually does.

Verification matches

Full suite on the PR branch reports the 4 known tests/test_issue1527_lmstudio_base_url_classification.py upstream baseline failures, which I confirm also fail on clean origin/master. Not introduced by this PR.

Lock migration interaction — confirmed safe

The migration block at api/streaming.py:4400-4410 runs before the put('compressed', ...) call at :4430, so by the time the event is emitted, the new sid is already aliased to the same _agent_lock. No window where a subsequent caller would race against a stale lock entry for the origin id.

LGTM for batch as Slice B of #2477.

@nesquena-hermes nesquena-hermes closed this pull request by merging all changes into nesquena:master in 71c7035 May 19, 2026
Michaelyklam pushed a commit to Michaelyklam/hermes-webui that referenced this pull request May 19, 2026
# Conflicts:
#	CHANGELOG.md
eleboucher pushed a commit to eleboucher/homelab that referenced this pull request May 19, 2026
… 0.51.92) (#560)

This PR contains the following updates:

| Package | Update | Change |
|---|---|---|
| [ghcr.io/nesquena/hermes-webui](https://github.com/nesquena/hermes-webui) | patch | `0.51.90` → `0.51.92` |

---

### Release Notes

<details>
<summary>nesquena/hermes-webui (ghcr.io/nesquena/hermes-webui)</summary>

### [`v0.51.92`](https://github.com/nesquena/hermes-webui/blob/HEAD/CHANGELOG.md#v05192--2026-05-19--Release-BP-stage-385--7-PR-full-sweep-batch--RFC-Slice-3c-clarification--workspace-tree-icon-alignment--project-move-cache-refresh--auto-compression-handoff-metadata--Grok-OAuth-provider-catalog--anonymous-custom-endpoint-picker-fallback--PWA-standalone-reload--pull-to-refresh)

[Compare Source](nesquena/hermes-webui@v0.51.91...v0.51.92)

##### Fixed

- **PR [#&#8203;2563](nesquena/hermes-webui#2563 by [@&#8203;Michaelyklam](https://github.com/Michaelyklam) (closes [#&#8203;2554](nesquena/hermes-webui#2554)) — Align workspace-tree file rows with sibling directory rows by reserving the same expand/collapse toggle slot for files via a new `.file-tree-toggle-placeholder` element. Expanded directories now show child files stepped in at the same icon column as child folders. Directory toggles and file interactions are unchanged; source-level regression coverage and before/after PNGs included.
- **PR [#&#8203;2561](nesquena/hermes-webui#2561 by [@&#8203;nanookclaw](https://github.com/nanookclaw) (closes [#&#8203;2551](nesquena/hermes-webui#2551)) — Refresh the authoritative `_allSessions` cache when the project picker moves a session to/from a project. Previous code mutated only the shallow sidebar row copy, so `renderSessionListFromCache()` re-read the unchanged cache and repainted a stale project dot until the next `/api/sessions` poll healed the UI. Both the "Removed from project" and "Moved to <project>" branches now write the new `project_id` into `_allSessions[idx]` before re-rendering.
- **PR [#&#8203;2567](nesquena/hermes-webui#2567 by [@&#8203;dso2ng](https://github.com/dso2ng) (refs [#&#8203;2477](nesquena/hermes-webui#2477)) — Surface automatic-compression handoff metadata through the `compressed` SSE event so the active browser stream keeps its completion card even after the backend rotates the session id from the origin to a compressed continuation. The event now carries both `old_session_id` and `new_session_id`/`continuation_session_id`; the frontend `compressed` listener accepts either, and the automatic-compression detail line names the compressed continuation session so the done state isn't silently dropped.
- **PR [#&#8203;2568](nesquena/hermes-webui#2568 by [@&#8203;Michaelyklam](https://github.com/Michaelyklam) (closes [#&#8203;2545](nesquena/hermes-webui#2545)) — Add the Hermes Agent `xai-oauth` provider to the WebUI's OAuth provider catalog so Grok OAuth accounts authenticated via the Hermes CLI appear in Settings → Providers and the `/api/models` picker. The provider is treated as CLI-managed OAuth (no WebUI API-key form) and uses the live Hermes CLI model catalog when available with a Grok 4.20 static fallback.
- **PR [#&#8203;2550](nesquena/hermes-webui#2550 by [@&#8203;espokaos-ops](https://github.com/espokaos-ops) (refs [#&#8203;2542](nesquena/hermes-webui#2542)) — Keep anonymous custom OpenAI-compatible endpoints in the model picker even when the configured `/v1/models` probe fails. Lightweight relays and llama-server-style deployments that authenticate `/v1/chat/completions` but not `/v1/models` no longer have their provider group silently dropped from the picker. Users can type a model id manually in the free-form input when no live catalog is available.

##### Added

- **PR [#&#8203;2548](nesquena/hermes-webui#2548 by [@&#8203;espokaos-ops](https://github.com/espokaos-ops) — Add a PWA-standalone reload affordance. A small refresh button appears in the app titlebar (visible only under `@media (display-mode: standalone), (display-mode: fullscreen)`) so users running the WebUI as an installed home-screen PWA can reload without re-launching the app. Adds a complementary pull-to-refresh gesture on the messages container with an 80px threshold and a smooth-scroll-to-top guard so accidental triggers while reading history feel intentional. 4-viewport screenshots (390/1280/1440/1920, light/dark, hover/idle) included under `docs/pr-media/2548/`.

##### Documentation

- **PR [#&#8203;2560](nesquena/hermes-webui#2560 by [@&#8203;Michaelyklam](https://github.com/Michaelyklam) (refs [#&#8203;1925](nesquena/hermes-webui#1925)) — Clarify the RuntimeAdapter Slice 3c state after [#&#8203;2544](nesquena/hermes-webui#2544) shipped. The RFC now distinguishes shipped `/api/goal` routing through `RuntimeAdapter.update_goal(...)` from the still-staged `queue_message(...)` protocol method, and explicitly warns not to add a new server-side queue endpoint or queue scheduler merely for adapter symmetry while `/queue` remains browser-side queue/drain behavior.

### [`v0.51.91`](https://github.com/nesquena/hermes-webui/blob/HEAD/CHANGELOG.md#v05191--2026-05-18--Release-BO-stage-384--5-PR-full-sweep-batch--reasoning-replay-history-fix--archive-extract-per-session-inbox--fallback-streaming-warnings--sanitized-custom-provider-env-hints--Slice-3c-queuegoal-adapter-routing)

[Compare Source](nesquena/hermes-webui@v0.51.90...v0.51.91)

##### Fixed

- **PR [#&#8203;2536](nesquena/hermes-webui#2536 by [@&#8203;Michaelyklam](https://github.com/Michaelyklam) (closes [#&#8203;2514](nesquena/hermes-webui#2514), refs [#&#8203;2535](nesquena/hermes-webui#2535)) — Stop reasoning-only Thinking entries from being replayed into provider-facing history as blank assistant turns. Long WebUI sessions were accumulating duplicated stale Thinking blocks and inflated Activity/tool metadata on later turns when reasoning-only display entries (from interrupted/canceled turns) got reinserted into the restored conversation history. The fix keeps visible Thinking cards in the transcript while filtering them out of provider-facing replay. Settled compact Activity rerenders now also clear previously inserted Thinking rows before rebuilding the visible transcript.
- **PR [#&#8203;2520](nesquena/hermes-webui#2520 by [@&#8203;OneFat3](https://github.com/OneFat3) (refs [#&#8203;2247](nesquena/hermes-webui#2247)) — Route archive extraction (`/api/upload/extract`) through the per-session attachment inbox (`_session_attachment_dir`) instead of hardcoded `Path(s.workspace)`, matching the single-file upload path. Extracted archives now land at `<attachment_root>/<session_id>/<archive_stem>/` so session deletion cleanup covers them and per-session isolation is preserved when `HERMES_WEBUI_ATTACHMENT_DIR` is configured.
- **PR [#&#8203;2505](nesquena/hermes-webui#2505 by [@&#8203;cyberdyne187](https://github.com/cyberdyne187) — Surface provider fallback and rate-limit lifecycle notices as auto-clearing fallback warnings in the streaming composer status. The new bridge in `_agent_status_callback` matches agent lifecycle messages containing `rate limited` / `switching to fallback` / `falling back` / `fallback activated` / `trying fallback` and emits them as `warning` events with `type=fallback`, so the existing `static/messages.js` warning channel surfaces them with the correct auto-clear contract instead of letting them drop silently.
- **PR [#&#8203;2556](nesquena/hermes-webui#2556 by [@&#8203;Michaelyklam](https://github.com/Michaelyklam) (closes [#&#8203;2541](nesquena/hermes-webui#2541)) — Sanitize auto-generated custom-provider API-key environment variable names so endpoint-derived provider ids such as `custom:gpu.local-8000` use POSIX-safe names like `CUSTOM_GPU_LOCAL_8000_API_KEY`. Runtime custom-provider key resolution now checks the sanitized env var first and falls back to the legacy punctuation-preserving name with a one-shot deprecation warning. Configured literal `api_key` values and explicit `key_env` config are unchanged.

##### Documentation

- **PR [#&#8203;2544](nesquena/hermes-webui#2544 by [@&#8203;Michaelyklam](https://github.com/Michaelyklam) (refs [#&#8203;1925](nesquena/hermes-webui#1925)) — Implement the first Slice 3c RuntimeAdapter control routing. `RuntimeAdapter` / `LegacyJournalRuntimeAdapter` now expose `queue_message(...)` and `update_goal(...)` as protocol-translator delegates, and the `/api/goal` route uses `update_goal(...)` only when `HERMES_WEBUI_RUNTIME_ADAPTER=legacy-journal` is enabled while preserving the legacy-direct response shape. The change keeps `/queue`'s existing browser-side drain semantics and goal post-turn evaluation in the current agent loop; no runner/sidecar, WebUI-owned queue, goal scheduler, cached-agent table, or execution-survives-restart claim is introduced.

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about these updates again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box

---

This PR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0My4xMDEuMSIsInVwZGF0ZWRJblZlciI6IjQzLjEwMS4xIiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6WyJyZW5vdmF0ZS9jb250YWluZXIiLCJ0eXBlL3BhdGNoIl19-->

Reviewed-on: https://git.erwanleboucher.dev/eleboucher/homelab/pulls/560
Isla-Liu pushed a commit to Isla-Liu/hermes-webui that referenced this pull request May 20, 2026
Isla-Liu added a commit to Isla-Liu/hermes-webui that referenced this pull request May 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants