Skip to content

fix: clear fallback streaming warnings#2505

Merged
1 commit merged into
nesquena:masterfrom
cyberdyne187:fix/fallback-streaming-warning-lifecycle
May 18, 2026
Merged

fix: clear fallback streaming warnings#2505
1 commit merged into
nesquena:masterfrom
cyberdyne187:fix/fallback-streaming-warning-lifecycle

Conversation

@cyberdyne187
Copy link
Copy Markdown

Summary:

  • Emit fallback/rate-limit lifecycle notices as warning events with type=fallback so the composer status auto-clears correctly.
  • Preserve existing auto-compression lifecycle SSE behavior.
  • Add regression coverage and an Unreleased changelog note.

Test Plan:

  • pytest tests/test_auto_compression_card.py tests/test_streaming_max_tokens_quota.py tests/test_inflight_stream_reuse.py -q
  • git diff --check

@nesquena-hermes
Copy link
Copy Markdown
Collaborator

Summary

Reading api/streaming.py:2978-3020 on this branch and the agent-side emitters in agent/conversation_loop.py, the change tightens a real-but-narrow gap: rate-limit and fallback lifecycle messages were being dropped silently before reaching the SSE stream, so users never learned that their stream switched providers mid-turn. The new warning bridge surfaces them through the existing warning channel that static/messages.js:1903-1913 already handles. The shape is right; the match list has some sharp edges worth tightening before this lands.

Code reference

The new branch on this PR:

_is_fallback_notice = (
    _kind == 'lifecycle'
    and (
        'rate limited' in _lower
        or 'switching to fallback' in _lower
        or 'falling back' in _lower
        or 'fallback activated' in _lower
        or 'pool may recover' in _lower
        or 'waiting for pool' in _lower
    )
)
if _is_fallback_notice:
    put('warning', {'type': 'fallback', 'message': _message})

The frontend already drains this event at static/messages.js:1903-1913:

source.addEventListener('warning',e=>{
  if(!S.session||S.session.session_id!==activeSid) return;
  try{
    const d=JSON.parse(e.data);
    setComposerStatus(`${d.message||'Warning'}`);
    if(d.type==='fallback') setTimeout(()=>setComposerStatus(''),4000);
  }catch(_){}
});

So type: 'fallback' is correctly wired to the auto-clearing 4-second composer status already in place.

Diagnosis / Recommendation

A few things worth checking before merge:

  1. The match list doesn't line up with what the agent actually emits. Grepping ~/.hermes/hermes-agent for the strings in the match list, the actual emit sites at agent/conversation_loop.py:1128, 1198, 1260, 2260, 2621, 2692 use phrases like "Empty/malformed response — switching to fallback...", "Max retries (...) for invalid responses — trying fallback...", "Rate limited — switching to fallback provider...", and "Non-retryable error (HTTP ...) — trying fallback...". The new match list catches 'switching to fallback' and 'rate limited' (good), but 'trying fallback' is the more common emit phrase and isn't covered — so a 5xx-retryable fallback or invalid-response fallback would still be silently dropped. Either add 'trying fallback' to the list or simplify to a single 'fallback' in _lower (with a 'rate limited' in _lower companion for the rate-limit case before any fallback fires).
  2. 'pool may recover' / 'waiting for pool' aren't currently in the agent. grep -rn "pool may recover" ~/.hermes/hermes-agent returns no hits. These look speculative. Either drop them or point at the future emit site so reviewers know why they're listed.
  3. Non-fatal warnings can stack. setComposerStatus(...) in the existing handler overwrites the previous string. A turn that rate-limits, falls back, then rate-limits again on the fallback will flash three statuses in quick succession, and the 4-second timer is shared (each timer races against the others, so the last one wins). Probably fine in practice, but if a user is reading along it can be jarring. Not a blocker.

Test plan

The regression in tests/test_auto_compression_card.py confirms the existing compressing path still works. For the new warning emission, an integration-shaped test that drives _agent_status_callback("lifecycle", "Rate limited — switching to fallback provider...") and asserts a warning event was emitted with type: 'fallback' would close the loop. That goes in tests/test_streaming_* to match neighborhood.

Manual: configure a provider you know to rate-limit, watch the composer status when fallback kicks in. Before this PR, nothing visible; after, a brief "Rate limited — switching to fallback..." status that clears after 4s.

The CHANGELOG entry covers the user-visible behavior change. Worth landing once the match list is tightened to cover 'trying fallback' and the speculative phrases are dropped.

@cyberdyne187 cyberdyne187 force-pushed the fix/fallback-streaming-warning-lifecycle branch from a12ef2e to 58e4462 Compare May 18, 2026 15:44
@cyberdyne187
Copy link
Copy Markdown
Author

Thanks — addressed the review feedback.

Updated the lifecycle fallback matcher to include trying fallback, which covers the invalid-response / non-retryable fallback paths emitted by agent/conversation_loop.py.

Also removed the speculative pool may recover / waiting for pool strings since there are no current emit sites for those phrases, and added predicate coverage for the actual rate-limit and trying fallback lifecycle messages.

Verification run locally:

  • python -m pytest tests/test_auto_compression_card.py tests/test_streaming_max_tokens_quota.py tests/test_streaming_session_sidebar.py -q → 38 passed
  • python -m py_compile api/streaming.py tests/test_auto_compression_card.py
  • git diff --check

Branch is rebased on current origin/master and pushed.

@cyberdyne187 cyberdyne187 force-pushed the fix/fallback-streaming-warning-lifecycle branch from 58e4462 to 42b97d1 Compare May 18, 2026 17:22
@nesquena-hermes nesquena-hermes closed this pull request by merging all changes into nesquena:master in 718a4c7 May 18, 2026
Michaelyklam pushed a commit to Michaelyklam/hermes-webui that referenced this pull request May 18, 2026
# Conflicts:
#	CHANGELOG.md
eleboucher pushed a commit to eleboucher/homelab that referenced this pull request May 19, 2026
… 0.51.92) (#560)

This PR contains the following updates:

| Package | Update | Change |
|---|---|---|
| [ghcr.io/nesquena/hermes-webui](https://github.com/nesquena/hermes-webui) | patch | `0.51.90` → `0.51.92` |

---

### Release Notes

<details>
<summary>nesquena/hermes-webui (ghcr.io/nesquena/hermes-webui)</summary>

### [`v0.51.92`](https://github.com/nesquena/hermes-webui/blob/HEAD/CHANGELOG.md#v05192--2026-05-19--Release-BP-stage-385--7-PR-full-sweep-batch--RFC-Slice-3c-clarification--workspace-tree-icon-alignment--project-move-cache-refresh--auto-compression-handoff-metadata--Grok-OAuth-provider-catalog--anonymous-custom-endpoint-picker-fallback--PWA-standalone-reload--pull-to-refresh)

[Compare Source](nesquena/hermes-webui@v0.51.91...v0.51.92)

##### Fixed

- **PR [#&#8203;2563](nesquena/hermes-webui#2563 by [@&#8203;Michaelyklam](https://github.com/Michaelyklam) (closes [#&#8203;2554](nesquena/hermes-webui#2554)) — Align workspace-tree file rows with sibling directory rows by reserving the same expand/collapse toggle slot for files via a new `.file-tree-toggle-placeholder` element. Expanded directories now show child files stepped in at the same icon column as child folders. Directory toggles and file interactions are unchanged; source-level regression coverage and before/after PNGs included.
- **PR [#&#8203;2561](nesquena/hermes-webui#2561 by [@&#8203;nanookclaw](https://github.com/nanookclaw) (closes [#&#8203;2551](nesquena/hermes-webui#2551)) — Refresh the authoritative `_allSessions` cache when the project picker moves a session to/from a project. Previous code mutated only the shallow sidebar row copy, so `renderSessionListFromCache()` re-read the unchanged cache and repainted a stale project dot until the next `/api/sessions` poll healed the UI. Both the "Removed from project" and "Moved to <project>" branches now write the new `project_id` into `_allSessions[idx]` before re-rendering.
- **PR [#&#8203;2567](nesquena/hermes-webui#2567 by [@&#8203;dso2ng](https://github.com/dso2ng) (refs [#&#8203;2477](nesquena/hermes-webui#2477)) — Surface automatic-compression handoff metadata through the `compressed` SSE event so the active browser stream keeps its completion card even after the backend rotates the session id from the origin to a compressed continuation. The event now carries both `old_session_id` and `new_session_id`/`continuation_session_id`; the frontend `compressed` listener accepts either, and the automatic-compression detail line names the compressed continuation session so the done state isn't silently dropped.
- **PR [#&#8203;2568](nesquena/hermes-webui#2568 by [@&#8203;Michaelyklam](https://github.com/Michaelyklam) (closes [#&#8203;2545](nesquena/hermes-webui#2545)) — Add the Hermes Agent `xai-oauth` provider to the WebUI's OAuth provider catalog so Grok OAuth accounts authenticated via the Hermes CLI appear in Settings → Providers and the `/api/models` picker. The provider is treated as CLI-managed OAuth (no WebUI API-key form) and uses the live Hermes CLI model catalog when available with a Grok 4.20 static fallback.
- **PR [#&#8203;2550](nesquena/hermes-webui#2550 by [@&#8203;espokaos-ops](https://github.com/espokaos-ops) (refs [#&#8203;2542](nesquena/hermes-webui#2542)) — Keep anonymous custom OpenAI-compatible endpoints in the model picker even when the configured `/v1/models` probe fails. Lightweight relays and llama-server-style deployments that authenticate `/v1/chat/completions` but not `/v1/models` no longer have their provider group silently dropped from the picker. Users can type a model id manually in the free-form input when no live catalog is available.

##### Added

- **PR [#&#8203;2548](nesquena/hermes-webui#2548 by [@&#8203;espokaos-ops](https://github.com/espokaos-ops) — Add a PWA-standalone reload affordance. A small refresh button appears in the app titlebar (visible only under `@media (display-mode: standalone), (display-mode: fullscreen)`) so users running the WebUI as an installed home-screen PWA can reload without re-launching the app. Adds a complementary pull-to-refresh gesture on the messages container with an 80px threshold and a smooth-scroll-to-top guard so accidental triggers while reading history feel intentional. 4-viewport screenshots (390/1280/1440/1920, light/dark, hover/idle) included under `docs/pr-media/2548/`.

##### Documentation

- **PR [#&#8203;2560](nesquena/hermes-webui#2560 by [@&#8203;Michaelyklam](https://github.com/Michaelyklam) (refs [#&#8203;1925](nesquena/hermes-webui#1925)) — Clarify the RuntimeAdapter Slice 3c state after [#&#8203;2544](nesquena/hermes-webui#2544) shipped. The RFC now distinguishes shipped `/api/goal` routing through `RuntimeAdapter.update_goal(...)` from the still-staged `queue_message(...)` protocol method, and explicitly warns not to add a new server-side queue endpoint or queue scheduler merely for adapter symmetry while `/queue` remains browser-side queue/drain behavior.

### [`v0.51.91`](https://github.com/nesquena/hermes-webui/blob/HEAD/CHANGELOG.md#v05191--2026-05-18--Release-BO-stage-384--5-PR-full-sweep-batch--reasoning-replay-history-fix--archive-extract-per-session-inbox--fallback-streaming-warnings--sanitized-custom-provider-env-hints--Slice-3c-queuegoal-adapter-routing)

[Compare Source](nesquena/hermes-webui@v0.51.90...v0.51.91)

##### Fixed

- **PR [#&#8203;2536](nesquena/hermes-webui#2536 by [@&#8203;Michaelyklam](https://github.com/Michaelyklam) (closes [#&#8203;2514](nesquena/hermes-webui#2514), refs [#&#8203;2535](nesquena/hermes-webui#2535)) — Stop reasoning-only Thinking entries from being replayed into provider-facing history as blank assistant turns. Long WebUI sessions were accumulating duplicated stale Thinking blocks and inflated Activity/tool metadata on later turns when reasoning-only display entries (from interrupted/canceled turns) got reinserted into the restored conversation history. The fix keeps visible Thinking cards in the transcript while filtering them out of provider-facing replay. Settled compact Activity rerenders now also clear previously inserted Thinking rows before rebuilding the visible transcript.
- **PR [#&#8203;2520](nesquena/hermes-webui#2520 by [@&#8203;OneFat3](https://github.com/OneFat3) (refs [#&#8203;2247](nesquena/hermes-webui#2247)) — Route archive extraction (`/api/upload/extract`) through the per-session attachment inbox (`_session_attachment_dir`) instead of hardcoded `Path(s.workspace)`, matching the single-file upload path. Extracted archives now land at `<attachment_root>/<session_id>/<archive_stem>/` so session deletion cleanup covers them and per-session isolation is preserved when `HERMES_WEBUI_ATTACHMENT_DIR` is configured.
- **PR [#&#8203;2505](nesquena/hermes-webui#2505 by [@&#8203;cyberdyne187](https://github.com/cyberdyne187) — Surface provider fallback and rate-limit lifecycle notices as auto-clearing fallback warnings in the streaming composer status. The new bridge in `_agent_status_callback` matches agent lifecycle messages containing `rate limited` / `switching to fallback` / `falling back` / `fallback activated` / `trying fallback` and emits them as `warning` events with `type=fallback`, so the existing `static/messages.js` warning channel surfaces them with the correct auto-clear contract instead of letting them drop silently.
- **PR [#&#8203;2556](nesquena/hermes-webui#2556 by [@&#8203;Michaelyklam](https://github.com/Michaelyklam) (closes [#&#8203;2541](nesquena/hermes-webui#2541)) — Sanitize auto-generated custom-provider API-key environment variable names so endpoint-derived provider ids such as `custom:gpu.local-8000` use POSIX-safe names like `CUSTOM_GPU_LOCAL_8000_API_KEY`. Runtime custom-provider key resolution now checks the sanitized env var first and falls back to the legacy punctuation-preserving name with a one-shot deprecation warning. Configured literal `api_key` values and explicit `key_env` config are unchanged.

##### Documentation

- **PR [#&#8203;2544](nesquena/hermes-webui#2544 by [@&#8203;Michaelyklam](https://github.com/Michaelyklam) (refs [#&#8203;1925](nesquena/hermes-webui#1925)) — Implement the first Slice 3c RuntimeAdapter control routing. `RuntimeAdapter` / `LegacyJournalRuntimeAdapter` now expose `queue_message(...)` and `update_goal(...)` as protocol-translator delegates, and the `/api/goal` route uses `update_goal(...)` only when `HERMES_WEBUI_RUNTIME_ADAPTER=legacy-journal` is enabled while preserving the legacy-direct response shape. The change keeps `/queue`'s existing browser-side drain semantics and goal post-turn evaluation in the current agent loop; no runner/sidecar, WebUI-owned queue, goal scheduler, cached-agent table, or execution-survives-restart claim is introduced.

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about these updates again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box

---

This PR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0My4xMDEuMSIsInVwZGF0ZWRJblZlciI6IjQzLjEwMS4xIiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6WyJyZW5vdmF0ZS9jb250YWluZXIiLCJ0eXBlL3BhdGNoIl19-->

Reviewed-on: https://git.erwanleboucher.dev/eleboucher/homelab/pulls/560
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants