release: v0.50.250 by nesquena-hermes · Pull Request #1368 · nesquena/hermes-webui

nesquena-hermes · 2026-04-30T22:29:32Z

Release v0.50.250

Small batch — 2 PRs, one with a critical pre-release fix.

Constituent PRs

#	Author	Subject	Status
#1367	nesquena-hermes	fix(clarify-sse): stale-detector health timer	✅ already nesquena APPROVED on its standalone PR
#1366	@JKJameson	fix: guard finalizeThinkingCard with session ID check	⚠️ pre-release fix applied — see below

Pre-release fix on #1366 (critical)

The contributor's guard was broken as shipped:

// In finalizeThinkingCard():
const _guardTurn = $('liveAssistantTurn');
if(_guardTurn && S.session && _guardTurn.dataset.sessionId !== S.session.session_id) return;

liveAssistantTurn.dataset.sessionId is never set anywhere in the repo. So undefined !== "<some-id>" is always true, and finalizeThinkingCard() always early-returns when there's a liveAssistantTurn — breaking the streaming UI completely (every assistant turn's thinking card would stay open forever).

29 contributor tests + CI green don't catch this because none of them exercise streaming end-to-end with the thinking-card lifecycle.

Fix applied at commit bc10a22:

Added if(S.session) <var>.dataset.sessionId=S.session.session_id; at all 3 sites that create liveAssistantTurn in static/ui.js (lines ~3174, ~3550, ~4338)
Added tests/test_pr1366_finalize_thinking_card_guard.py with 3 source-level invariants:
1. The guard must read dataset.sessionId and compare against S.session.session_id
2. Every site that sets <var>.id='liveAssistantTurn' must also stamp <var>.dataset.sessionId (catches future regressions)
3. Sanity check on the number of creation sites

Pre-release gate

pytest: 3450 passed, 0 failed (3447 → 3450, +3 from new regression tests)
CHANGELOG: v0.50.250 entry written
Author preservation: both PRs squash-committed with Author: + Co-authored-by: trailers
fix(clarify-sse): make health timer a stale-detector, not unconditional reconnect #1367 nesquena APPROVED at 22:09 UTC on standalone PR (the same JS you reviewed in release: v0.50.249 #1365)
Opus pre-release review: in progress (PID proc_f3a6df1bb1a6)
nesquena independent review on this release PR: requested

Plan after approval

Merge this PR (--merge, no squash — preserves all 4 commits)
Tag v0.50.250 on the merge commit
Push tag, restart production at port 8787, verify cache-bust + SSE liveness via webui_qa_agent.sh
Close fix: guard finalizeThinkingCard with session ID check #1366 + fix(clarify-sse): make health timer a stale-detector, not unconditional reconnect #1367 with credit comments
Cleanup /tmp/wt-v0.50.250

Held PRs (NOT in this release)

fix: persist sidebar rename, keep model picker fresh, restore full-width chat #1134, Ollama Gemma4 修復と Codex 制御プレーンを追加 #1222, Per-session TPS indicator in sidebar #1272, [codex] fix model cache invalidation on config reload #1311 (draft), fix: validate WebUI launcher can import agent #1315, feat(sessions): add /branch command to fork conversations from any message (#465) #1342, feat: add profile-aware cron panel #1352, Fix/chat history WAL recovery #1353 — 8 PRs on hold per the HARD BLOCK rule. None ready for merge.

Compatibility notes

The cross-tab guard in fix: guard finalizeThinkingCard with session ID check #1366 only matters when the user has multiple tabs open AND switches tabs while a stream is running. Single-tab users see no behavior change (the guard's session-id comparison succeeds and finalizeThinkingCard runs as before).
The clarify SSE stale-detector in fix(clarify-sse): make health timer a stale-detector, not unconditional reconnect #1367 is a pure-refinement of existing behavior. Already runtime-verified live on v0.50.249 production.
No backend / API / config changes.

…al reconnect (#1367) Follow-up to v0.50.249 / PR #1365 absorbing Opus SHOULD-FIX #2. Originally reset out of #1365 because the reviewer flagged it as out-of-scope; brought back per follow-up guidance that correctness-improving changes should ship even when out of scope. The clarify SSE health timer at static/messages.js:1715 was an unconditional 60s force-reconnect, not the 'no event in 60s' detector its comment claimed. Now actually a stale-detector that tracks lastEventAt on initial+clarify event arrivals; only reconnects when the gap exceeds 60s. Under healthy conditions the timer never fires. Co-authored-by: nesquena-hermes <nesquena-hermes@users.noreply.github.com>

Without this check, switching browser tabs while a stream is running causes finalizeThinkingCard() to operate on the wrong session's thinking card DOM — the card belongs to the stream that started it, not the session currently displayed in the tab. The guard ensures finalize only runs when the live assistant turn's session matches the current session. Co-authored-by: Josh <josh@fyul.link>

Bundles 2 PRs: - #1366 fix: guard finalizeThinkingCard with session ID check (with pre-release fix) - #1367 fix(clarify-sse): stale-detector health timer (Opus SHOULD-FIX from v0.50.249) Pre-release fix on #1366: the contributor's guard depends on liveAssistantTurn.dataset.sessionId, but no code in the repo sets that attribute. Without the fix, the guard would always early-return (undefined !== sid is always true), breaking the streaming UI completely — every assistant turn's thinking card would stay open forever. Added per-site stamps at all 3 places that create liveAssistantTurn in static/ui.js, plus a regression test that fails any future creation site that forgets the stamp.

nesquena

Review — end-to-end ✅ (clean approve, bot's pre-release fix verified)

Release v0.50.250 — small batch of 2 PRs. The bot caught and fixed a critical pre-release bug in #1366 (Josh's guard depended on a dataset.sessionId attribute that no code set anywhere in the repo). Verified the fix is correct and the regression test catches the missing-stamp invariant.

Squash audit

2566b43  #1367  fix(clarify-sse): stale-detector health timer        (already nesquena APPROVED)
d0257e8  #1366  fix: guard finalizeThinkingCard with session ID check  @JKJameson
bc10a22         release: v0.50.250 (CHANGELOG + per-site stamps + regression test)

#1366's Author: Josh <josh@fyul.link> preserved. #1367 is the same JS already approved on its standalone PR. ✅

What the bot caught (already fixed) — guard with no data source

Josh's commit d0257e8 adds the guard at static/ui.js:4288-4289:

const _guardTurn = $('liveAssistantTurn');
if(_guardTurn && S.session && _guardTurn.dataset.sessionId !== S.session.session_id) return;

But liveAssistantTurn.dataset.sessionId was never set anywhere in the repo before this PR. So undefined !== "<sid>" is always true, the guard always early-returns whenever liveAssistantTurn exists, and finalizeThinkingCard() would silently no-op for every assistant turn. The thinking card stays as a spinner forever.

CI was green because none of the 29 #1366 contributor tests + 3447 baseline tests exercise streaming end-to-end with the thinking-card lifecycle.

The bot's release commit bc10a22 adds the missing stamps at all three creation sites in static/ui.js:

Line 3179 in renderMessages() — when re-rendering messages and the assistant message has _live=true.
Line 3551 in appendLiveToolCard() — when creating a turn for a tool card before the assistant text starts.
Line 4339 in appendThinking() — when creating the initial thinking spinner.

All three use the same idempotent pattern: if(S.session) turn.dataset.sessionId=S.session.session_id; immediately after turn.id='liveAssistantTurn';.

The regression test at tests/test_pr1366_finalize_thinking_card_guard.py walks static/ui.js with a regex, finds every <var>.id='liveAssistantTurn' assignment, and asserts a matching <var>.dataset.sessionId= appears within 500 chars after. I verified the test catches the missing-stamp regression by simulating removal of one stamp:

Sites: [(currentAssistantTurn, 149507, True), (turn, 167877, False), (turn, 206329, True)]
Unstamped: [(turn, 167877)]
Test correctly catches the missing stamp ✅

Behavioural harness — guard logic across 6 scenarios

Extracted the guard into Node:

✅ No live turn: CONTINUE
✅ No session: CONTINUE
✅ Stamped A, viewing A (match): CONTINUE
✅ Stamped A, viewing B (mismatch — bug case): EARLY_RETURN
✅ Unstamped (undefined), viewing B (broken-shipped behavior): EARLY_RETURN
✅ Stamped B, viewing B (B has live turn): CONTINUE

The "Unstamped" case demonstrates exactly why Josh's PR was broken without the bot's fix: when no creation site stamps the dataset attribute, EVERY call to finalizeThinkingCard() early-returns once a liveAssistantTurn exists.

Traced against upstream hermes-agent

Frontend-only changes. No upstream interaction.

End-to-end trace — cross-session DOM protection

User in session A, types message, stream starts. appendThinking() at static/ui.js:4334 creates the live turn → turn.id='liveAssistantTurn', turn.dataset.sessionId='A'.
User switches to session B (via session click or cross-tab storage event from PR #1359). loadSession(B) → renderMessages() rebuilds the DOM for B's content. The old liveAssistantTurn element from A is removed.
If B has a live in-flight stream too, B's renderMessages at static/ui.js:3174-3179 creates a new live turn stamped with B.
Late callback from A's stream fires finalizeThinkingCard():
- _guardTurn = $('liveAssistantTurn') → returns whatever's in DOM now (B's turn if B has one, else null).
- Case (a) DOM has stale A turn (renderMessages hadn't yet run): _guardTurn.dataset.sessionId === 'A', S.session.session_id === 'B' → mismatch → early return. ✅ Protects A's stale DOM.
- Case (b) DOM has B's turn (renderMessages already replaced): B === B → match → continues. Operates on B's DOM. ⚠️ The guard CANNOT distinguish "called for B legitimately" from "called by A's late callback while B is current."
- Case (c) DOM has nothing: _guardTurn null → guard passes → most of finalizeThinkingCard's body no-ops because thinkingRow etc don't exist.

Case (b) is the partial-coverage gap — see Minor observations.

Other audit — confirmed correct

✅ PR #1367 (clarify-sse stale-detector) — already approved on standalone PR. Same JS, no drift.
✅ No new endpoints / config / env vars. Frontend-only.
✅ No XSS surface. dataset.sessionId is read-only used for comparison; never injected as HTML.
✅ No memory leak — the stamp is just a string property on a DOM element that's eventually GC'd along with the element.
✅ Thread safety — single-threaded JS event loop. S.session reads are atomic.
✅ Backwards compat — pre-fix sessions saved without the stamp don't exist persistently; the stamp lives only on transient DOM elements. Every page load creates fresh stamps.
✅ CHANGELOG entries are detailed and credit Josh.

Edge-case matrix

Scenario	Pre-bot-fix (broken)	Post-fix
Session A streaming, user stays on A	Guard early-returns (undefined !== A); thinking card stays as spinner	Guard passes (A === A); finalize runs ✅
Session A streaming, user switches to B with B-live	Guard early-returns; spinner sticks for both	Case (b) — guard passes for B's turn ⚠️ partial
Session A streaming, user switches to B with no live	Guard early-returns; spinner sticks	Guard passes (no `liveAssistantTurn`); body no-ops on missing nodes ✅
Stale A turn in DOM, current session is B	Guard early-returns (correct outcome but for wrong reason)	Guard early-returns (A !== B); ✅ correctly protects
`S.session` is null (transient nav)	Guard early-returns	Guard passes (`S.session &&` short-circuits); body checks for nodes ✅
Future creation site forgot to stamp	Test fails ✅	Test still passes ✅
Three creation sites consolidated to fewer	`test_at_least_three_live_turn_sites` fails (sanity)	Acceptable to relax if intentional

Tests

PR's own: tests/test_pr1366_finalize_thinking_card_guard.py 3/3 pass; test_clarify_sse.py 29/29 pass.
Full suite: 3398 passed, 54 skipped, 3 xpassed, 0 failed in 16.49s on bc10a22. (PR description claims 3450; counting drift consistent with prior batch releases — both runs pass all SSE / lifecycle / streaming tests.)
CI: 3.11/3.12/3.13 all green.
Sanity check on regression test effectiveness: I verified that mutating one stamp out causes test_live_turn_creation_sites_stamp_session_id to fail with a clear error message naming the unstamped site.

Minor observations (non-blocking)

Partial coverage of cross-session callback case: when both A and B have live streams and the user switches A→B, the guard at line 4289 doesn't fully prevent A's late callback from finalizing B's thinking card. A complete fix would have finalizeThinkingCard(streamSessionId) accept the stream's session_id as a parameter and compare directly. Out of scope for this release; the current fix is a strict improvement (handles the stale-A-turn-still-in-DOM case correctly).
No JS unit test — same constraint as PR #1367. The 3 source-level tests + behavioural harness cover what's reasonable without a JS test infrastructure.
PR description test count off by ~50 between claimed (3450) and local (3398). Consistent with prior batch releases. Not a defect.
Stamp uses if(S.session) guard — if S.session is null at creation time (transient state during session loading), the stamp is omitted. The DOM stays unstamped. Then the guard at finalize-time will early-return because dataset.sessionId === undefined. This is the safe default — better to skip cleanup than corrupt the wrong session's DOM.
No coverage of appendLiveToolCard's creation path in any existing test. The regression test asserts the stamp exists at the source; behavioural coverage of that creation path would need a streaming integration test.

Recommendation

✅ Approved. The bot's pre-release fix correctly addresses the broken-as-shipped guard in #1366 — without it, every assistant turn's thinking card would stay as a spinner forever. Per-site stamps at all 3 liveAssistantTurn creation sites + a regression test that catches future site additions that forget the stamp. Behavioural harness confirms the guard logic is correct across 6 scenarios. PR #1367's content is identical to what was already approved on its standalone PR.

Parked at approval — ready for the release agent's merge/tag pipeline.

The timer fires every 60s either way; what changed is whether it triggers a reconnect. Under steady clarify traffic the reconnect never happens; on long-idle sessions it still reconnects every 60-120s (the residual idle churn is now a tracked follow-up rather than the original v0.50.249 unconditional per-minute reconnect). Tightens the CHANGELOG language to match observed behavior.

nesquena-hermes · 2026-04-30T22:34:56Z

Opus pre-release review

Verdict: SHIP. No MUST-FIX or SHOULD-FIX. 3 NITs, 1 of which (CHANGELOG accuracy) I fixed in 0735042.

Q1 — Diagnosis on #1366: VERIFIED CORRECT

git grep "dataset\.sessionId|data-session-id" in master returns zero results
Contributor branch pr-1366 only contains the comparison line at static/ui.js:4281; no setters anywhere
As shipped, _guardTurn.dataset.sessionId would always be undefined, making undefined !== "<sid>" always true → finalize would unconditionally early-return → streaming UI completely broken (thinking cards never close)

Q2 — 3-site stamps: VERIFIED CORRECT

All 3 sites that set <var>.id='liveAssistantTurn' are immediately followed by a dataset.sessionId stamp:

static/ui.js:3174 (renderMessages, currentAssistantTurn) → stamp at :3179
static/ui.js:3550 (appendLiveToolCard, turn) → stamp at :3551
static/ui.js:4338 (appendThinking, turn) → stamp at :4339

Manually ran the 3 regression-test assertions against release/v0.50.250:static/ui.js — all pass.

Q3 — #1367 stale-detector: VERIFIED CORRECT

Server pushes only initial and clarify event types (per api/routes.py:2940/2956); plus : keepalive comments at 30s. Client correctly listens to both. Comments are silently consumed by EventSource (browser-spec behavior), confirming the fix's core premise. startClarifyPolling is idempotent so the recursive reconnect doesn't stack.

NITs (non-blocking)

🟦 NIT-1 — guard semantics narrower than the PR title implies. The 4 call sites in static/messages.js other than the tool handler do NOT pre-guard before calling finalizeThinkingCard() (done, apperror, cancel, _handleStreamError). The new in-function guard catches one race (stale liveAssistantTurn from session A lingering after switch to B with no live stream). It does NOT catch: A's done fires after switch to B where B has its own liveAssistantTurn with sessionId=B — guard sees B === B, lets it through, A's callback mutates B's live turn. Truly complete fix would push a per-handler guard or have finalizeThinkingCard(activeSid) accept the stream's session id and compare against dataset.sessionId. Strictly an improvement over master; tracking as follow-up.

🟦 NIT-2 — #1367 still reconnects every ~120s on idle sessions. Because _lastClarifyEventAt only bumps on initial/clarify events, idle sessions still churn at half the v0.50.249 rate, not zero. The original CHANGELOG line claiming "the timer never fires under healthy conditions" was technically wrong — fixed in 0735042 with accurate language ("steady traffic: no reconnects; idle: still ~60-120s, tracked as follow-up").

🟦 NIT-3 — defensive if(S.session) redundant at 2 of 3 stamp sites. appendThinking (4338) and appendLiveToolCard (3550) already guard with if(!S.session||!S.activeStreamId) return; at the top of the function. Harmless — consistency is fine.

Branch state after Opus fix

2566b43 fix(clarify-sse): make health timer a stale-detector, not unconditional reconnect #1367 squash
d0257e8 fix: guard finalizeThinkingCard with session ID check #1366 squash
bc10a22 release commit (3-site stamps + regression tests)
0735042 CHANGELOG accuracy fix (Opus NIT-2)

@nesquena — ready for re-review on HEAD 0735042. Tests: 3450 passing, CI green on Python 3.11/3.12/3.13.

Phase 2 of #1003: extend the autosave pattern from the Appearance panel to the Preferences panel so all preference changes are saved automatically without requiring a manual 'Save Settings' click. Mirrors the Phase 1 (Appearance) pattern exactly: - 350ms debounce on field changes (500ms additional debounce on the bot_name text input — effective ~850ms latency for typing) - Inline status feedback (saving / saved / failed + retry button) - Clears dirty flag and hides unsaved-changes bar after successful save - Password field excluded — still requires explicit save (security) - Model selector excluded — still requires explicit save 13 fields now autosaving: send_key, language, show_token_usage, simplified_tool_calling, show_cli_sessions, sync_to_insights, check_for_updates, sound_enabled, notifications_enabled, sidebar_density, auto_title_refresh_every, busy_input_mode, bot_name. i18n keys (settings_autosave_saving/saved/failed/retry) already exist in all 8 locales from Phase 1. Co-authored-by: Feco Linhares <feco.linhares@gmail.com>

9 source-level invariants covering #1369: - All 13 preference fields appear in _preferencesPayloadFromUi - Listeners use _schedulePreferencesAutosave, not _markSettingsDirty - Password field STILL uses _markSettingsDirty (security invariant) - _autosavePreferencesSettings clears _settingsDirty + hides unsaved bar on success - Status div present in static/index.html - Status function uses shared i18n keys from Phase 1 - Retry function falls back gracefully when no stored payload - Debounce clears prior timer (350ms, matching Phase 1) - Phase 1 (Appearance) autosave still intact

…odel pending (Opus SHOULD-FIX Q1) Pre-release Opus review of v0.50.250 caught a UX regression in PR #1369: _autosavePreferencesSettings unconditionally cleared _settingsDirty=false and hid the unsaved-changes bar on every successful autosave. But password and model are still committed via the explicit 'Save Settings' button (password for security; model goes through /api/default-model). Race scenario: 1. User opens System pane, types a new password (sets _settingsDirty=true; bar appears on close) 2. User switches to Preferences, toggles any checkbox -> autosave fires -> _settingsDirty=false, bar permanently suppressed 3. User closes panel -> _closeSettingsPanel short-circuits because !_settingsDirty -> typed password silently discarded (loadSettingsPanel blanks pwField.value='' on next open) Same shape with model selector: pick a new default model, then toggle any preference -> autosave fires -> no warning on close -> model never persists. Fix: only clear _settingsDirty and hide settingsUnsavedBar when both the password field is empty AND the model selector matches its on-open snapshot. Pinned by an updated regression test asserting the conditional guard exists.

nesquena-hermes · 2026-04-30T22:49:02Z

Update: PR #1369 added to v0.50.250

Per the user's directive, PR #1369 (@fecolinhares — autosave preferences settings, Phase 2 of #1003) has been added to this batch. Per agreement, no second independent-review round is required for the combined release; instead it gets:

✅ Opus pre-release advisor review
✅ My own structural review

My review of #1369 — APPROVE with one fix applied

Pattern conformance: Phase 2 cleanly mirrors the Phase 1 (Appearance) pattern. Same 350ms debounce, same i18n keys (already in all 8 locales from Phase 1), same status div + retry button shape. 12 of 13 fields use the standard _schedulePreferencesAutosave listener; bot_name (text input) gets an extra 500ms wrapper for typing-flicker reasons.

Security invariant preserved: Password field still calls _markSettingsDirty, NOT autosave. Verified by static grep — only the password addEventListener('input',...) block contains _markSettingsDirty after this PR. Model selector also still uses _markSettingsDirty (different commit endpoint).

Server-side compat: I read api/config.py:save_settings — it does load-mutate-write only on payload-present keys. A preferences-only POST does NOT overwrite Phase 1 fields (theme/skin/font_size).

Added regression suite: tests/test_1003_preferences_autosave.py (9 tests) — pins all 13 fields, security invariant, debounce behavior, status function shape, retry fallback, and Phase 1 still-intact.

Opus pre-release review of #1369

🟨 SHOULD-FIX Q1 — Autosave-clears-dirty regression (FIXED in f8754de)

_autosavePreferencesSettings unconditionally cleared _settingsDirty=false and hid settingsUnsavedBar on every successful autosave. But password and model are still committed via the explicit "Save Settings" button. Race:

User types new password → _settingsDirty=true
User toggles any preference → autosave fires → _settingsDirty=false, bar suppressed
User closes panel → _closeSettingsPanel short-circuits on !_settingsDirty → typed password silently discarded (next loadSettingsPanel blanks the field)

Same shape with model selector. Real UX bug.

Fix applied: only clear _settingsDirty + hide bar when password is empty AND model matches its on-open snapshot:

const pwField=$('settingsPassword');
const pwDirty=!!(pwField&&pwField.value);
const modelSel=$('settingsModel');  // (corrected from initial 'settingsHermesDefaultModel')
const modelDirty=!!(modelSel&&((modelSel.value||'')!==(_settingsHermesDefaultModelOnOpen||'')));
if(!pwDirty&&!modelDirty){
  _settingsDirty=false;
  const bar=$('settingsUnsavedBar');
  if(bar) bar.style.display='none';
}

Regression test in test_1003_preferences_autosave.py updated to pin the conditional guard.

🟦 NIT (deferred):

Q2 — _rememberPreferencesSaved writes to localStorage['hermes-pref-send_key'] and hermes-pref-language but nothing reads those keys (canonical write site is i18n.js). Dead code, harmless. Skip.
Q4 — "Save Settings" button is now misleading (covers only password + model). Could relabel or hide-when-clean. Skip.
Q5b — Server-side save_settings has no file lock. Concurrent Phase 1 + Phase 2 POSTs (theme + checkbox within 350ms) could clobber on disk via load-modify-write. Pre-existing concurrency hole, newly exposed by autosave. Worth a follow-up.
bot_name 850ms effective latency — fine, deserves a one-line comment.

🟥 MUST-FIX: None.

Final branch state

f8754de  Opus Q1 fix (conditional dirty-clear)
3a13be2  test: Phase 2 preferences autosave regression suite
645dfa2  #1369  fix: autosave preferences settings  @fecolinhares
0735042  docs(changelog): correct stale-detector claim
bc10a22  release: v0.50.250 (CHANGELOG + 3-site stamps + #1366 regression test)
d0257e8  #1366  fix: guard finalizeThinkingCard
2566b43  #1367  fix(clarify-sse): stale-detector health timer

Tests

Final suite: 3459 passed, 0 failed (3450 → 3459, +9 from test_1003_preferences_autosave.py)
All 9 new regression tests pinned: 13-field coverage, security invariant, debounce, retry, Phase 1 intact, Q1 conditional guard
The earlier flaky test_setup_allowed_with_confirm_overwrite was an order-dependent isolation issue (passes on its own, passes on full run without -x); unrelated to this batch

Per the user's directive, no second nesquena review round needed for the combined release. Proceeding to merge once CI re-runs green on f8754de.

nesquena-hermes · 2026-04-30T22:52:18Z

Shipped — v0.50.250 ✅

Merge: ffd11037 (2026-04-30 22:49 UTC)
Tag: v0.50.250
Production restarted, health check OK, cache-bust live (?v=v0.50.250).
SSE liveness checks (new in webui_qa_agent.sh) PASS for both /api/approval/stream and /api/clarify/stream.

What shipped

#	Author	Subject
#1366	@JKJameson	finalizeThinkingCard cross-tab guard (with 3-site stamps + regression test)
#1367	nesquena-hermes	clarify SSE stale-detector health timer
#1369	@fecolinhares	Preferences autosave Phase 2 (#1003)

Plus pre-release fixes:

bc10a22 — 3-site dataset.sessionId stamps (fix: guard finalizeThinkingCard with session ID check #1366 was broken without them)
0735042 — CHANGELOG accuracy (Opus NIT-2)
f8754de — Conditional dirty-flag guard (fix: Autosave preferences settings like appearance (Fixes #1003) #1369 Opus SHOULD-FIX Q1)

Tests

Final suite: 3459 passed, 0 failed
Net new: 12 regression tests (3 for fix: guard finalizeThinkingCard with session ID check #1366 invariants + 9 for fix: Autosave preferences settings like appearance (Fixes #1003) #1369 Phase 2 autosave)
CI: green on Python 3.11/3.12/3.13 for f8754de

Hygiene complete

✅ PR release: v0.50.250 #1368 merged with --merge (preserves all 7 commits)
✅ Tag v0.50.250 pushed on merge commit ffd11037
✅ All 3 constituent PRs closed with credit comments
✅ Worktree /tmp/wt-v0.50.250 removed
✅ Local branches deleted
✅ Production restarted, runtime SSE liveness verified

Held PRs (NOT in this release, still on hold)

fix: persist sidebar rename, keep model picker fresh, restore full-width chat #1134, Ollama Gemma4 修復と Codex 制御プレーンを追加 #1222, Per-session TPS indicator in sidebar #1272, [codex] fix model cache invalidation on config reload #1311 (draft), fix: validate WebUI launcher can import agent #1315, feat(sessions): add /branch command to fork conversations from any message (#465) #1342, feat: add profile-aware cron panel #1352, Fix/chat history WAL recovery #1353 — 8 PRs on hold per HARD BLOCK rule

What's tracked as follow-up

fix: guard finalizeThinkingCard with session ID check #1366 partial cross-session coverage when both A and B have live streams (Opus NIT-1) — needs finalizeThinkingCard(streamSessionId) parameterization
fix(clarify-sse): make health timer a stale-detector, not unconditional reconnect #1367 idle 60-120s reconnect on long-idle clarify sessions — server-side ping event or longer threshold
fix: Autosave preferences settings like appearance (Fixes #1003) #1369 dead-code localStorage writes (hermes-pref-send_key, hermes-pref-language) — clean up
fix: Autosave preferences settings like appearance (Fixes #1003) #1369 "Save Settings" button labeling now misleading after Phase 2
Server-side save_settings has no file lock — concurrent Phase 1/Phase 2 POSTs can race on disk

Per your directive: PR #1369 was added without requiring a second independent review round; combined release shipped after Opus + my structural review caught and fixed the SHOULD-FIX dirty-flag race.

nesquena-hermes and others added 3 commits April 30, 2026 22:25

nesquena-hermes requested a review from nesquena April 30, 2026 22:29

nesquena approved these changes Apr 30, 2026

View reviewed changes

Feco Linhares and others added 3 commits April 30, 2026 22:38

nesquena-hermes merged commit ffd1103 into master Apr 30, 2026
3 checks passed

nesquena-hermes deleted the release/v0.50.250 branch April 30, 2026 22:50

This was referenced Apr 30, 2026

fix(clarify-sse): make health timer a stale-detector, not unconditional reconnect #1367

Closed

fix: Autosave preferences settings like appearance (Fixes #1003) #1369

Closed

fix: guard finalizeThinkingCard with session ID check #1366

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

release: v0.50.250#1368

release: v0.50.250#1368
nesquena-hermes merged 7 commits into
masterfrom
release/v0.50.250

nesquena-hermes commented Apr 30, 2026

Uh oh!

nesquena left a comment

Uh oh!

nesquena-hermes commented Apr 30, 2026

Uh oh!

nesquena-hermes commented Apr 30, 2026

Uh oh!

Uh oh!

nesquena-hermes commented Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

nesquena-hermes commented Apr 30, 2026

Release v0.50.250

Constituent PRs

Pre-release fix on #1366 (critical)

Pre-release gate

Plan after approval

Held PRs (NOT in this release)

Compatibility notes

Uh oh!

nesquena left a comment

Choose a reason for hiding this comment

Review — end-to-end ✅ (clean approve, bot's pre-release fix verified)

Squash audit

What the bot caught (already fixed) — guard with no data source

Behavioural harness — guard logic across 6 scenarios

Traced against upstream hermes-agent

End-to-end trace — cross-session DOM protection

Other audit — confirmed correct

Edge-case matrix

Tests

Minor observations (non-blocking)

Recommendation

Uh oh!

nesquena-hermes commented Apr 30, 2026

Opus pre-release review

Q1 — Diagnosis on #1366: VERIFIED CORRECT

Q2 — 3-site stamps: VERIFIED CORRECT

Q3 — #1367 stale-detector: VERIFIED CORRECT

NITs (non-blocking)

Branch state after Opus fix

Uh oh!

nesquena-hermes commented Apr 30, 2026

Update: PR #1369 added to v0.50.250

My review of #1369 — APPROVE with one fix applied

Opus pre-release review of #1369

Final branch state

Tests

Uh oh!

Uh oh!

nesquena-hermes commented Apr 30, 2026

Shipped — v0.50.250 ✅

What shipped

Tests

Hygiene complete

Held PRs (NOT in this release, still on hold)

What's tracked as follow-up

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants