Skip to content

release: v0.50.250#1368

Merged
nesquena-hermes merged 7 commits into
masterfrom
release/v0.50.250
Apr 30, 2026
Merged

release: v0.50.250#1368
nesquena-hermes merged 7 commits into
masterfrom
release/v0.50.250

Conversation

@nesquena-hermes
Copy link
Copy Markdown
Collaborator

Release v0.50.250

Small batch — 2 PRs, one with a critical pre-release fix.

Constituent PRs

# Author Subject Status
#1367 nesquena-hermes fix(clarify-sse): stale-detector health timer ✅ already nesquena APPROVED on its standalone PR
#1366 @JKJameson fix: guard finalizeThinkingCard with session ID check ⚠️ pre-release fix applied — see below

Pre-release fix on #1366 (critical)

The contributor's guard was broken as shipped:

// In finalizeThinkingCard():
const _guardTurn = $('liveAssistantTurn');
if(_guardTurn && S.session && _guardTurn.dataset.sessionId !== S.session.session_id) return;

liveAssistantTurn.dataset.sessionId is never set anywhere in the repo. So undefined !== "<some-id>" is always true, and finalizeThinkingCard() always early-returns when there's a liveAssistantTurn — breaking the streaming UI completely (every assistant turn's thinking card would stay open forever).

29 contributor tests + CI green don't catch this because none of them exercise streaming end-to-end with the thinking-card lifecycle.

Fix applied at commit bc10a22:

  • Added if(S.session) <var>.dataset.sessionId=S.session.session_id; at all 3 sites that create liveAssistantTurn in static/ui.js (lines ~3174, ~3550, ~4338)
  • Added tests/test_pr1366_finalize_thinking_card_guard.py with 3 source-level invariants:
    1. The guard must read dataset.sessionId and compare against S.session.session_id
    2. Every site that sets <var>.id='liveAssistantTurn' must also stamp <var>.dataset.sessionId (catches future regressions)
    3. Sanity check on the number of creation sites

Pre-release gate

Plan after approval

  1. Merge this PR (--merge, no squash — preserves all 4 commits)
  2. Tag v0.50.250 on the merge commit
  3. Push tag, restart production at port 8787, verify cache-bust + SSE liveness via webui_qa_agent.sh
  4. Close fix: guard finalizeThinkingCard with session ID check #1366 + fix(clarify-sse): make health timer a stale-detector, not unconditional reconnect #1367 with credit comments
  5. Cleanup /tmp/wt-v0.50.250

Held PRs (NOT in this release)

Compatibility notes

nesquena-hermes and others added 3 commits April 30, 2026 22:25
…al reconnect (#1367)

Follow-up to v0.50.249 / PR #1365 absorbing Opus SHOULD-FIX #2.
Originally reset out of #1365 because the reviewer flagged it as
out-of-scope; brought back per follow-up guidance that
correctness-improving changes should ship even when out of scope.

The clarify SSE health timer at static/messages.js:1715 was an
unconditional 60s force-reconnect, not the 'no event in 60s' detector
its comment claimed. Now actually a stale-detector that tracks
lastEventAt on initial+clarify event arrivals; only reconnects when
the gap exceeds 60s. Under healthy conditions the timer never fires.

Co-authored-by: nesquena-hermes <nesquena-hermes@users.noreply.github.com>
Without this check, switching browser tabs while a stream is running
causes finalizeThinkingCard() to operate on the wrong session's
thinking card DOM — the card belongs to the stream that started it,
not the session currently displayed in the tab. The guard ensures
finalize only runs when the live assistant turn's session matches
the current session.

Co-authored-by: Josh <josh@fyul.link>
Bundles 2 PRs:
- #1366 fix: guard finalizeThinkingCard with session ID check (with pre-release fix)
- #1367 fix(clarify-sse): stale-detector health timer (Opus SHOULD-FIX from v0.50.249)

Pre-release fix on #1366: the contributor's guard depends on
liveAssistantTurn.dataset.sessionId, but no code in the repo sets
that attribute. Without the fix, the guard would always early-return
(undefined !== sid is always true), breaking the streaming UI
completely — every assistant turn's thinking card would stay open
forever. Added per-site stamps at all 3 places that create
liveAssistantTurn in static/ui.js, plus a regression test that fails
any future creation site that forgets the stamp.
Copy link
Copy Markdown
Owner

@nesquena nesquena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review — end-to-end ✅ (clean approve, bot's pre-release fix verified)

Release v0.50.250 — small batch of 2 PRs. The bot caught and fixed a critical pre-release bug in #1366 (Josh's guard depended on a dataset.sessionId attribute that no code set anywhere in the repo). Verified the fix is correct and the regression test catches the missing-stamp invariant.

Squash audit

2566b43  #1367  fix(clarify-sse): stale-detector health timer        (already nesquena APPROVED)
d0257e8  #1366  fix: guard finalizeThinkingCard with session ID check  @JKJameson
bc10a22         release: v0.50.250 (CHANGELOG + per-site stamps + regression test)

#1366's Author: Josh <josh@fyul.link> preserved. #1367 is the same JS already approved on its standalone PR. ✅

What the bot caught (already fixed) — guard with no data source

Josh's commit d0257e8 adds the guard at static/ui.js:4288-4289:

const _guardTurn = $('liveAssistantTurn');
if(_guardTurn && S.session && _guardTurn.dataset.sessionId !== S.session.session_id) return;

But liveAssistantTurn.dataset.sessionId was never set anywhere in the repo before this PR. So undefined !== "<sid>" is always true, the guard always early-returns whenever liveAssistantTurn exists, and finalizeThinkingCard() would silently no-op for every assistant turn. The thinking card stays as a spinner forever.

CI was green because none of the 29 #1366 contributor tests + 3447 baseline tests exercise streaming end-to-end with the thinking-card lifecycle.

The bot's release commit bc10a22 adds the missing stamps at all three creation sites in static/ui.js:

  • Line 3179 in renderMessages() — when re-rendering messages and the assistant message has _live=true.
  • Line 3551 in appendLiveToolCard() — when creating a turn for a tool card before the assistant text starts.
  • Line 4339 in appendThinking() — when creating the initial thinking spinner.

All three use the same idempotent pattern: if(S.session) turn.dataset.sessionId=S.session.session_id; immediately after turn.id='liveAssistantTurn';.

The regression test at tests/test_pr1366_finalize_thinking_card_guard.py walks static/ui.js with a regex, finds every <var>.id='liveAssistantTurn' assignment, and asserts a matching <var>.dataset.sessionId= appears within 500 chars after. I verified the test catches the missing-stamp regression by simulating removal of one stamp:

Sites: [(currentAssistantTurn, 149507, True), (turn, 167877, False), (turn, 206329, True)]
Unstamped: [(turn, 167877)]
Test correctly catches the missing stamp ✅

Behavioural harness — guard logic across 6 scenarios

Extracted the guard into Node:

✅ No live turn: CONTINUE
✅ No session: CONTINUE
✅ Stamped A, viewing A (match): CONTINUE
✅ Stamped A, viewing B (mismatch — bug case): EARLY_RETURN
✅ Unstamped (undefined), viewing B (broken-shipped behavior): EARLY_RETURN
✅ Stamped B, viewing B (B has live turn): CONTINUE

The "Unstamped" case demonstrates exactly why Josh's PR was broken without the bot's fix: when no creation site stamps the dataset attribute, EVERY call to finalizeThinkingCard() early-returns once a liveAssistantTurn exists.

Traced against upstream hermes-agent

Frontend-only changes. No upstream interaction.

End-to-end trace — cross-session DOM protection

  1. User in session A, types message, stream starts. appendThinking() at static/ui.js:4334 creates the live turn → turn.id='liveAssistantTurn', turn.dataset.sessionId='A'.
  2. User switches to session B (via session click or cross-tab storage event from PR #1359). loadSession(B)renderMessages() rebuilds the DOM for B's content. The old liveAssistantTurn element from A is removed.
  3. If B has a live in-flight stream too, B's renderMessages at static/ui.js:3174-3179 creates a new live turn stamped with B.
  4. Late callback from A's stream fires finalizeThinkingCard():
    • _guardTurn = $('liveAssistantTurn') → returns whatever's in DOM now (B's turn if B has one, else null).
    • Case (a) DOM has stale A turn (renderMessages hadn't yet run): _guardTurn.dataset.sessionId === 'A', S.session.session_id === 'B' → mismatch → early return. ✅ Protects A's stale DOM.
    • Case (b) DOM has B's turn (renderMessages already replaced): B === B → match → continues. Operates on B's DOM. ⚠️ The guard CANNOT distinguish "called for B legitimately" from "called by A's late callback while B is current."
    • Case (c) DOM has nothing: _guardTurn null → guard passes → most of finalizeThinkingCard's body no-ops because thinkingRow etc don't exist.

Case (b) is the partial-coverage gap — see Minor observations.

Other audit — confirmed correct

  • PR #1367 (clarify-sse stale-detector) — already approved on standalone PR. Same JS, no drift.
  • No new endpoints / config / env vars. Frontend-only.
  • No XSS surface. dataset.sessionId is read-only used for comparison; never injected as HTML.
  • No memory leak — the stamp is just a string property on a DOM element that's eventually GC'd along with the element.
  • Thread safety — single-threaded JS event loop. S.session reads are atomic.
  • Backwards compat — pre-fix sessions saved without the stamp don't exist persistently; the stamp lives only on transient DOM elements. Every page load creates fresh stamps.
  • CHANGELOG entries are detailed and credit Josh.

Edge-case matrix

Scenario Pre-bot-fix (broken) Post-fix
Session A streaming, user stays on A Guard early-returns (undefined !== A); thinking card stays as spinner Guard passes (A === A); finalize runs ✅
Session A streaming, user switches to B with B-live Guard early-returns; spinner sticks for both Case (b) — guard passes for B's turn ⚠️ partial
Session A streaming, user switches to B with no live Guard early-returns; spinner sticks Guard passes (no liveAssistantTurn); body no-ops on missing nodes ✅
Stale A turn in DOM, current session is B Guard early-returns (correct outcome but for wrong reason) Guard early-returns (A !== B); ✅ correctly protects
S.session is null (transient nav) Guard early-returns Guard passes (S.session && short-circuits); body checks for nodes ✅
Future creation site forgot to stamp Test fails ✅ Test still passes ✅
Three creation sites consolidated to fewer test_at_least_three_live_turn_sites fails (sanity) Acceptable to relax if intentional

Tests

  • PR's own: tests/test_pr1366_finalize_thinking_card_guard.py 3/3 pass; test_clarify_sse.py 29/29 pass.
  • Full suite: 3398 passed, 54 skipped, 3 xpassed, 0 failed in 16.49s on bc10a22. (PR description claims 3450; counting drift consistent with prior batch releases — both runs pass all SSE / lifecycle / streaming tests.)
  • CI: 3.11/3.12/3.13 all green.
  • Sanity check on regression test effectiveness: I verified that mutating one stamp out causes test_live_turn_creation_sites_stamp_session_id to fail with a clear error message naming the unstamped site.

Minor observations (non-blocking)

  • Partial coverage of cross-session callback case: when both A and B have live streams and the user switches A→B, the guard at line 4289 doesn't fully prevent A's late callback from finalizing B's thinking card. A complete fix would have finalizeThinkingCard(streamSessionId) accept the stream's session_id as a parameter and compare directly. Out of scope for this release; the current fix is a strict improvement (handles the stale-A-turn-still-in-DOM case correctly).
  • No JS unit test — same constraint as PR #1367. The 3 source-level tests + behavioural harness cover what's reasonable without a JS test infrastructure.
  • PR description test count off by ~50 between claimed (3450) and local (3398). Consistent with prior batch releases. Not a defect.
  • Stamp uses if(S.session) guard — if S.session is null at creation time (transient state during session loading), the stamp is omitted. The DOM stays unstamped. Then the guard at finalize-time will early-return because dataset.sessionId === undefined. This is the safe default — better to skip cleanup than corrupt the wrong session's DOM.
  • No coverage of appendLiveToolCard's creation path in any existing test. The regression test asserts the stamp exists at the source; behavioural coverage of that creation path would need a streaming integration test.

Recommendation

Approved. The bot's pre-release fix correctly addresses the broken-as-shipped guard in #1366 — without it, every assistant turn's thinking card would stay as a spinner forever. Per-site stamps at all 3 liveAssistantTurn creation sites + a regression test that catches future site additions that forget the stamp. Behavioural harness confirms the guard logic is correct across 6 scenarios. PR #1367's content is identical to what was already approved on its standalone PR.

Parked at approval — ready for the release agent's merge/tag pipeline.

The timer fires every 60s either way; what changed is whether it
triggers a reconnect. Under steady clarify traffic the reconnect
never happens; on long-idle sessions it still reconnects every
60-120s (the residual idle churn is now a tracked follow-up rather
than the original v0.50.249 unconditional per-minute reconnect).

Tightens the CHANGELOG language to match observed behavior.
@nesquena-hermes
Copy link
Copy Markdown
Collaborator Author

Opus pre-release review

Verdict: SHIP. No MUST-FIX or SHOULD-FIX. 3 NITs, 1 of which (CHANGELOG accuracy) I fixed in 0735042.

Q1 — Diagnosis on #1366: VERIFIED CORRECT

  • git grep "dataset\.sessionId|data-session-id" in master returns zero results
  • Contributor branch pr-1366 only contains the comparison line at static/ui.js:4281; no setters anywhere
  • As shipped, _guardTurn.dataset.sessionId would always be undefined, making undefined !== "<sid>" always true → finalize would unconditionally early-return → streaming UI completely broken (thinking cards never close)

Q2 — 3-site stamps: VERIFIED CORRECT

All 3 sites that set <var>.id='liveAssistantTurn' are immediately followed by a dataset.sessionId stamp:

  • static/ui.js:3174 (renderMessages, currentAssistantTurn) → stamp at :3179
  • static/ui.js:3550 (appendLiveToolCard, turn) → stamp at :3551
  • static/ui.js:4338 (appendThinking, turn) → stamp at :4339

Manually ran the 3 regression-test assertions against release/v0.50.250:static/ui.js — all pass.

Q3 — #1367 stale-detector: VERIFIED CORRECT

Server pushes only initial and clarify event types (per api/routes.py:2940/2956); plus : keepalive comments at 30s. Client correctly listens to both. Comments are silently consumed by EventSource (browser-spec behavior), confirming the fix's core premise. startClarifyPolling is idempotent so the recursive reconnect doesn't stack.

NITs (non-blocking)

🟦 NIT-1 — guard semantics narrower than the PR title implies. The 4 call sites in static/messages.js other than the tool handler do NOT pre-guard before calling finalizeThinkingCard() (done, apperror, cancel, _handleStreamError). The new in-function guard catches one race (stale liveAssistantTurn from session A lingering after switch to B with no live stream). It does NOT catch: A's done fires after switch to B where B has its own liveAssistantTurn with sessionId=B — guard sees B === B, lets it through, A's callback mutates B's live turn. Truly complete fix would push a per-handler guard or have finalizeThinkingCard(activeSid) accept the stream's session id and compare against dataset.sessionId. Strictly an improvement over master; tracking as follow-up.

🟦 NIT-2 — #1367 still reconnects every ~120s on idle sessions. Because _lastClarifyEventAt only bumps on initial/clarify events, idle sessions still churn at half the v0.50.249 rate, not zero. The original CHANGELOG line claiming "the timer never fires under healthy conditions" was technically wrong — fixed in 0735042 with accurate language ("steady traffic: no reconnects; idle: still ~60-120s, tracked as follow-up").

🟦 NIT-3 — defensive if(S.session) redundant at 2 of 3 stamp sites. appendThinking (4338) and appendLiveToolCard (3550) already guard with if(!S.session||!S.activeStreamId) return; at the top of the function. Harmless — consistency is fine.

Branch state after Opus fix

@nesquena — ready for re-review on HEAD 0735042. Tests: 3450 passing, CI green on Python 3.11/3.12/3.13.

Feco Linhares and others added 3 commits April 30, 2026 22:38
Phase 2 of #1003: extend the autosave pattern from the Appearance
panel to the Preferences panel so all preference changes are saved
automatically without requiring a manual 'Save Settings' click.

Mirrors the Phase 1 (Appearance) pattern exactly:
- 350ms debounce on field changes (500ms additional debounce on
  the bot_name text input — effective ~850ms latency for typing)
- Inline status feedback (saving / saved / failed + retry button)
- Clears dirty flag and hides unsaved-changes bar after successful save
- Password field excluded — still requires explicit save (security)
- Model selector excluded — still requires explicit save

13 fields now autosaving: send_key, language, show_token_usage,
simplified_tool_calling, show_cli_sessions, sync_to_insights,
check_for_updates, sound_enabled, notifications_enabled,
sidebar_density, auto_title_refresh_every, busy_input_mode, bot_name.

i18n keys (settings_autosave_saving/saved/failed/retry) already exist
in all 8 locales from Phase 1.

Co-authored-by: Feco Linhares <feco.linhares@gmail.com>
9 source-level invariants covering #1369:
- All 13 preference fields appear in _preferencesPayloadFromUi
- Listeners use _schedulePreferencesAutosave, not _markSettingsDirty
- Password field STILL uses _markSettingsDirty (security invariant)
- _autosavePreferencesSettings clears _settingsDirty + hides unsaved bar on success
- Status div present in static/index.html
- Status function uses shared i18n keys from Phase 1
- Retry function falls back gracefully when no stored payload
- Debounce clears prior timer (350ms, matching Phase 1)
- Phase 1 (Appearance) autosave still intact
…odel pending (Opus SHOULD-FIX Q1)

Pre-release Opus review of v0.50.250 caught a UX regression in PR
#1369: _autosavePreferencesSettings unconditionally cleared
_settingsDirty=false and hid the unsaved-changes bar on every
successful autosave. But password and model are still committed via
the explicit 'Save Settings' button (password for security; model
goes through /api/default-model). Race scenario:

  1. User opens System pane, types a new password (sets
     _settingsDirty=true; bar appears on close)
  2. User switches to Preferences, toggles any checkbox -> autosave
     fires -> _settingsDirty=false, bar permanently suppressed
  3. User closes panel -> _closeSettingsPanel short-circuits because
     !_settingsDirty -> typed password silently discarded
     (loadSettingsPanel blanks pwField.value='' on next open)

Same shape with model selector: pick a new default model, then
toggle any preference -> autosave fires -> no warning on close ->
model never persists.

Fix: only clear _settingsDirty and hide settingsUnsavedBar when both
the password field is empty AND the model selector matches its
on-open snapshot.

Pinned by an updated regression test asserting the conditional guard
exists.
@nesquena-hermes
Copy link
Copy Markdown
Collaborator Author

Update: PR #1369 added to v0.50.250

Per the user's directive, PR #1369 (@fecolinhares — autosave preferences settings, Phase 2 of #1003) has been added to this batch. Per agreement, no second independent-review round is required for the combined release; instead it gets:

  1. ✅ Opus pre-release advisor review
  2. ✅ My own structural review

My review of #1369 — APPROVE with one fix applied

Pattern conformance: Phase 2 cleanly mirrors the Phase 1 (Appearance) pattern. Same 350ms debounce, same i18n keys (already in all 8 locales from Phase 1), same status div + retry button shape. 12 of 13 fields use the standard _schedulePreferencesAutosave listener; bot_name (text input) gets an extra 500ms wrapper for typing-flicker reasons.

Security invariant preserved: Password field still calls _markSettingsDirty, NOT autosave. Verified by static grep — only the password addEventListener('input',...) block contains _markSettingsDirty after this PR. Model selector also still uses _markSettingsDirty (different commit endpoint).

Server-side compat: I read api/config.py:save_settings — it does load-mutate-write only on payload-present keys. A preferences-only POST does NOT overwrite Phase 1 fields (theme/skin/font_size).

Added regression suite: tests/test_1003_preferences_autosave.py (9 tests) — pins all 13 fields, security invariant, debounce behavior, status function shape, retry fallback, and Phase 1 still-intact.

Opus pre-release review of #1369

🟨 SHOULD-FIX Q1 — Autosave-clears-dirty regression (FIXED in f8754de)

_autosavePreferencesSettings unconditionally cleared _settingsDirty=false and hid settingsUnsavedBar on every successful autosave. But password and model are still committed via the explicit "Save Settings" button. Race:

  1. User types new password → _settingsDirty=true
  2. User toggles any preference → autosave fires → _settingsDirty=false, bar suppressed
  3. User closes panel → _closeSettingsPanel short-circuits on !_settingsDirty → typed password silently discarded (next loadSettingsPanel blanks the field)

Same shape with model selector. Real UX bug.

Fix applied: only clear _settingsDirty + hide bar when password is empty AND model matches its on-open snapshot:

const pwField=$('settingsPassword');
const pwDirty=!!(pwField&&pwField.value);
const modelSel=$('settingsModel');  // (corrected from initial 'settingsHermesDefaultModel')
const modelDirty=!!(modelSel&&((modelSel.value||'')!==(_settingsHermesDefaultModelOnOpen||'')));
if(!pwDirty&&!modelDirty){
  _settingsDirty=false;
  const bar=$('settingsUnsavedBar');
  if(bar) bar.style.display='none';
}

Regression test in test_1003_preferences_autosave.py updated to pin the conditional guard.

🟦 NIT (deferred):

  • Q2_rememberPreferencesSaved writes to localStorage['hermes-pref-send_key'] and hermes-pref-language but nothing reads those keys (canonical write site is i18n.js). Dead code, harmless. Skip.
  • Q4 — "Save Settings" button is now misleading (covers only password + model). Could relabel or hide-when-clean. Skip.
  • Q5b — Server-side save_settings has no file lock. Concurrent Phase 1 + Phase 2 POSTs (theme + checkbox within 350ms) could clobber on disk via load-modify-write. Pre-existing concurrency hole, newly exposed by autosave. Worth a follow-up.
  • bot_name 850ms effective latency — fine, deserves a one-line comment.

🟥 MUST-FIX: None.

Final branch state

f8754de  Opus Q1 fix (conditional dirty-clear)
3a13be2  test: Phase 2 preferences autosave regression suite
645dfa2  #1369  fix: autosave preferences settings  @fecolinhares
0735042  docs(changelog): correct stale-detector claim
bc10a22  release: v0.50.250 (CHANGELOG + 3-site stamps + #1366 regression test)
d0257e8  #1366  fix: guard finalizeThinkingCard
2566b43  #1367  fix(clarify-sse): stale-detector health timer

Tests

  • Final suite: 3459 passed, 0 failed (3450 → 3459, +9 from test_1003_preferences_autosave.py)
  • All 9 new regression tests pinned: 13-field coverage, security invariant, debounce, retry, Phase 1 intact, Q1 conditional guard
  • The earlier flaky test_setup_allowed_with_confirm_overwrite was an order-dependent isolation issue (passes on its own, passes on full run without -x); unrelated to this batch

Per the user's directive, no second nesquena review round needed for the combined release. Proceeding to merge once CI re-runs green on f8754de.

@nesquena-hermes
Copy link
Copy Markdown
Collaborator Author

Shipped — v0.50.250 ✅

Merge: ffd11037 (2026-04-30 22:49 UTC)
Tag: v0.50.250
Production restarted, health check OK, cache-bust live (?v=v0.50.250).
SSE liveness checks (new in webui_qa_agent.sh) PASS for both /api/approval/stream and /api/clarify/stream.

What shipped

# Author Subject
#1366 @JKJameson finalizeThinkingCard cross-tab guard (with 3-site stamps + regression test)
#1367 nesquena-hermes clarify SSE stale-detector health timer
#1369 @fecolinhares Preferences autosave Phase 2 (#1003)

Plus pre-release fixes:

Tests

Hygiene complete

  • ✅ PR release: v0.50.250 #1368 merged with --merge (preserves all 7 commits)
  • ✅ Tag v0.50.250 pushed on merge commit ffd11037
  • ✅ All 3 constituent PRs closed with credit comments
  • ✅ Worktree /tmp/wt-v0.50.250 removed
  • ✅ Local branches deleted
  • ✅ Production restarted, runtime SSE liveness verified

Held PRs (NOT in this release, still on hold)

What's tracked as follow-up

Per your directive: PR #1369 was added without requiring a second independent review round; combined release shipped after Opus + my structural review caught and fixed the SHOULD-FIX dirty-flag race.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants