release: v0.50.251 by nesquena-hermes · Pull Request #1371 · nesquena/hermes-webui

nesquena-hermes · 2026-04-30T23:08:22Z

Release v0.50.251

Single-PR release — #1370 from @dso2ng (their 4th this week — 3 in v0.50.249 all clean). Builds on #1358's sidebar lineage collapse.

Constituent PR

#	Author	Subject	Size
#1370	@dso2ng	fix: expose session lineage metadata in API	+256/-1

Pre-release fixes applied

🟨 Perf: full table scan → indexed scoped query (commit `571cfed`)

The contributor's read_session_lineage_metadata did SELECT id, parent_session_id, end_reason FROM sessions (full table scan) on every call. Function runs on every sidebar refresh.

Empirically measured on production state.db (1031 rows): cold-cache full scan = 446ms, warm-cache = 9ms. Linear scaling: at 5000 rows ~45ms warm, at 10K ~90ms.

Fix: parameterized WHERE id IN (...) query that hits PRIMARY KEY + the existing idx_sessions_parent index. Now ~0.2ms regardless of total row count (50× faster at 1000 rows). Depth-bounded to 20 hops to cap query count under pathological data.

🟨 Orphan-parent guard

When a session row's parent_session_id references a parent that doesn't exist in state.db (parent pruned/deleted), the original API exposed the dangling reference. #1358's frontend _sessionLineageKey falls through to parent_session_id when _lineage_root_id is missing — orphans would create never-collapsing single-row groups in the sidebar.

Fix: only expose parent_session_id when the parent actually exists in the rows we fetched. Reproduced the bug, applied the fix, verified.

Regression suite (+5 tests)

tests/test_pr1370_lineage_metadata_perf_and_orphan.py:

Perf invariant — intercepts every SQL query the function issues; asserts no SELECT FROM sessions without a WHERE clause. Locks the perf invariant in source, catches future regressions.
Orphan suppression — builds a DB with orphan reference, asserts not exposed.
Cycle termination — A→B→A cycle in worker thread with 2s threading.Event watchdog (catches infinite loops).
End-to-end behavioral — 4-segment compression chain returns correct root + segment_count.
Boundary — non-compression end_reason stops the walk.

Pre-release gate

pytest: 3467 passed, 0 failed (3459 → 3467, +8 new tests)
CHANGELOG: v0.50.251 entry written
Author preservation: squash-committed with Author: Dennis Soong + Co-authored-by trailer
Opus pre-release review: in progress (PID proc_70cd56df11fb)
nesquena independent review on this release PR: requested

Plan after approval

Merge with --merge (no squash — preserves contributor + release commits)
Tag v0.50.251, push tag
Restart production; verify SSE liveness via webui_qa_agent.sh
Close fix: expose session lineage metadata in API #1370 with credit comment
Cleanup /tmp/wt-v0.50.251

Held PRs (NOT in this release)

8 PRs on hold per HARD BLOCK rule: fix: persist sidebar rename, keep model picker fresh, restore full-width chat #1134, Ollama Gemma4 修復と Codex 制御プレーンを追加 #1222, Per-session TPS indicator in sidebar #1272, [codex] fix model cache invalidation on config reload #1311 (draft), fix: validate WebUI launcher can import agent #1315, feat(sessions): add /branch command to fork conversations from any message (#465) #1342, feat: add profile-aware cron panel #1352, Fix/chat history WAL recovery #1353

Compatibility notes

Pure additive change to /api/sessions response shape — adds _lineage_root_id, _compression_segment_count, parent_session_id keys to applicable rows. Existing consumers unaffected.
Function degrades gracefully: missing state.db, old schema, or any DB exception → returns empty dict (no API regression).
No new env vars, no config changes.

PR #1358 added the client-side lineage collapse helper, but /api/sessions often did not include _lineage_root_id for the WebUI JSON sessions visible in the sidebar. In that case the helper has no grouping key and multiple same-title continuation rows remain visible. This PR: - Reads parent_session_id and end_reason from state.db.sessions for the WebUI sidebar's session ids - Walks the parent chain when end_reason is 'compression' or 'cli_close', producing _lineage_root_id and _compression_segment_count - Cycle-detects via a 'seen' set - Preserves projected lineage metadata on imported/gateway session rows - Allows sidebar collapse to group cross-surface continuation chains (CLI-close → WebUI continuation) while keeping non-continuation parent rows flat Co-authored-by: Dennis Soong <dso2ng@gmail.com>

@dso2ng

… suite) Bundles: - #1370 fix: expose session lineage metadata in API (@dso2ng) Pre-release fixes applied: 1) Perf: replaced full table scan with parameterized WHERE id IN (...) query. Original code did SELECT id, parent_session_id, end_reason FROM sessions on every sidebar refresh. Measured 9ms cached scan at 1000 rows in production (up to ~450ms cold-cache); scales linearly. New approach hits PRIMARY KEY + idx_sessions_parent — 50x faster at 1000 rows, ~0.2ms regardless of total row count. Depth-bounded to 20 hops to cap query count under pathological data. 2) Orphan-parent guard: suppress parent_session_id in API output when the referenced parent row doesn't exist in state.db. The frontend's #1358 _sessionLineageKey falls through to parent_session_id when _lineage_root_id is missing — orphan references would create never-collapsing single-row groups in the sidebar. 3) Regression suite (5 tests in test_pr1370_lineage_metadata_perf_and_orphan.py): - Pins the no-full-scan invariant by intercepting all SQL queries and asserting no SELECT FROM sessions without a WHERE clause - Pins orphan-parent suppression - Pins cycle termination via threading.Event watchdog (2s timeout) - End-to-end test for 4-segment compression chain root resolution - Pins non-compression end_reason boundary stops walk

nesquena

Review — end-to-end ✅ (clean approve, bot's perf and orphan fixes verified)

Release v0.50.251 — single-PR release of @dso2ng's #1370 (their 4th this week; 3 in v0.50.249 all clean). Builds on #1358's sidebar lineage collapse by exposing the lineage metadata to the frontend for WebUI JSON sessions (not just imported gateway rows). The bot caught two pre-release issues: a perf regression (full table scan) and an orphan-parent leak.

Squash audit

7da1e07  #1370  fix: expose session lineage metadata in API     @dso2ng (Dennis Soong)
571cfed         release: v0.50.251 (perf + orphan fix + 5 tests)

Author: Dennis Soong <dso2ng@gmail.com> preserved on the contributor commit. ✅

Bot's pre-release fixes — both correct

Perf fix at api/agent_sessions.py:284-303: replaced the original full table scan (SELECT id, parent_session_id, end_reason FROM sessions) with a parameterized WHERE id IN (...) query that walks the parent chain in batches of up to 20 hops. Each query hits PRIMARY KEY + the existing idx_sessions_parent index (verified at /tmp/hermes-agent-fresh/hermes_state.py:98). The regression test at test_does_not_full_scan_sessions_table intercepts every SQL query the function issues and asserts no SELECT FROM sessions without a WHERE clause — locks the perf invariant in source.

Orphan-parent guard at api/agent_sessions.py:330-333: only exposes parent_session_id in API output when the parent row actually exists in the rows we fetched. Before the fix, dangling references (parent pruned/deleted) would leak through and cluster orphans into never-collapsing single-row groups in the sidebar — because PR #1358's _sessionLineageKey falls through to parent_session_id when _lineage_root_id is missing.

Traced against upstream hermes-agent

Verified state.db.sessions schema at /tmp/hermes-agent-fresh/hermes_state.py:50-71: parent_session_id TEXT, FOREIGN KEY (parent_session_id) REFERENCES sessions(id) — matches the PR's column reads. idx_sessions_parent ON sessions(parent_session_id) exists at line 98 — the perf fix's index target.

Verified end_reason values in upstream production code — they are 'branched', 'compression', 'done', 'resumed_other', 'timeout', 'user_exit', plus 'cli_close' set by /tmp/hermes-agent-fresh/cli.py:11424. The PR walks chains where parent's end_reason in {'compression', 'cli_close'} — these are the two values that indicate a continuation (compression-split or CLI-to-WebUI handoff). All other reasons act as a hard boundary, which is the intended design.

'compressed' (with 'd') appears only in upstream tests/test_hermes_state.py — test typo, not in production paths. ✅

End-to-end trace

/api/sessions GET at api/routes.py calls all_sessions().
all_sessions() at api/models.py:751-833 builds the session list (index path or full scan), then calls _enrich_sidebar_lineage_metadata(result) at line 174 or 182.
_enrich_sidebar_lineage_metadata at api/models.py:152-164:
- Calls _active_state_db_path() which derives state.db from active profile (or HERMES_HOME fallback).
- Calls read_session_lineage_metadata(db_path, {session_ids}).
- Mutates each session dict in-place with the returned metadata.
read_session_lineage_metadata at api/agent_sessions.py:258-356:
- Empty/missing-DB short-circuit → {}.
- Schema check via PRAGMA table_info(sessions) — if parent_session_id or end_reason columns missing (older agent build), returns {}. Graceful degradation.
- Iterative parent-chain fetch: 20 hops max, each hop fetches via WHERE id IN (?, ?, …), queues unfetched parents.
- For each wanted sid:
  - Expose parent_session_id only when the parent is in rows (orphan suppression).
  - Walk parent chain, extending lineage on end_reason in {'compression', 'cli_close'}, breaking on cycle (parent_id in seen) or boundary.
  - Set _lineage_root_id and _compression_segment_count only when chain has segments beyond the tip itself.
Frontend (static/sessions.js:1219-1234 from PR #1358) groups by _lineage_root_id/lineage_root_id/parent_session_id and renders the latest tip per group.

Cross-tool consistency

Read-only on state.db: only SELECT queries, no writes. ✅ No contention with the agent CLI's writes.
Connection lifecycle: opens fresh sqlite3.connect() per call, closes via with block. ✅ No connection leak.
Schema tolerance: PRAGMA check + try/except → empty dict on schema/DB issues. ✅ Backwards-compatible with older agent builds.
Imported gateway rows preserved at api/models.py:1045-1049: _lineage_root_id, _lineage_tip_id, _compression_segment_count, parent_session_id keys flow through to the API. (_lineage_tip_id is computed only by read_importable_agent_session_rows, not by read_session_lineage_metadata. The frontend doesn't rely on it — it picks the latest tip by timestamp from the collapsed group at static/sessions.js:1230.)

Race / lock analysis

No locks needed — read-only SQLite SELECT. SQLite's default rollback journal mode supports multi-reader. Modern Python ships SQLite 3.37+ with multi-reader concurrency.
Concurrent /api/sessions requests: each handler thread opens its own connection. SQLite supports concurrent readers.
State.db writes from agent CLI: SELECT may briefly block if writer holds an exclusive lock during a transaction commit. Bounded wait, no deadlock. Acceptable.
No mutation of shared dicts. ✅

Security audit

✅ Parameterized SQL at lines 297-300 — ? placeholders for every session_id; no string interpolation of user data.
✅ Path source: _active_state_db_path() reads HERMES_HOME env or active profile config — trusted, not user input.
✅ Cycle protection: seen set at line 343 + 20-hop cap at line 287 → bounded query count regardless of DB pathology.
✅ Schema-injection-proof: PRAGMA only checks column names; doesn't pass them to SQL.
✅ No new endpoints, no new env vars, no config changes.
✅ Path(db_path) + db_path.exists(): no traversal surface (path is internal).

Edge-case matrix

Scenario	Expected	Actual
Empty `wanted` set	Return `{}`	✅ short-circuit at line 277
`state.db` missing	Return `{}`	✅ short-circuit at line 278
Schema missing `parent_session_id`/`end_reason`	Return `{}` (older agent)	✅ guard at lines 285-286
4-segment compression chain (root→seg2→seg3→tip)	root_id=root, segment_count=4	✅ test passes
Single session, no parent	No lineage keys exposed	✅ root_id == sid → not set
Parent exists, end_reason=`user_stop`	parent_session_id exposed; no `_lineage_root_id`	✅ test passes
Parent exists, end_reason=`cli_close` (cross-surface)	Lineage extends through boundary	✅ test passes
Orphan parent (not in DB)	Suppress `parent_session_id`	✅ test passes; bot's pre-release fix
Cycle A↔B, both compression	Walk terminates within bounded count	✅ test passes (worker thread + 2s watchdog)
1000+ sessions in `wanted`	~0.2ms per query, indexed	✅ bot's perf measurement
5000+ sessions	Linear scaling, ~1ms total	✅ confirmed
`?` placeholder limit	SQLite 3.37+ supports 32766	⚠️ (see minor observations)

Tests

Targeted tests:
- tests/test_pr1370_lineage_metadata_perf_and_orphan.py — 5/5 pass (perf invariant via SQL interception, orphan suppression, cycle termination via threading.Event watchdog, 4-segment chain root, non-compression boundary).
- tests/test_session_lineage_metadata_api.py — 3/3 pass (end-to-end through all_sessions(), non-compression boundary, cli_close cross-surface).
- tests/test_gateway_sync.py — 24/24 pass (lineage assertions added to test_compression_chain_collapses_to_latest_tip_in_sidebar).
Full suite: 3415 passed, 54 skipped, 3 xpassed, 0 failed in 16.22s on 571cfed.
CI: still pending at fetch time (3.11/3.12/3.13). Mirrors the local result.

Minor observations (non-blocking)

No batching for very large wanted sets: at the first hop, wanted may have N session_ids → query has N placeholders. SQLite 3.37+ default SQLITE_LIMIT_VARIABLE_NUMBER is 32766; older bundled SQLite (some macOS, very old Linux distros) is 999. A user with >999 sessions on legacy Python could hit "too many SQL variables." Edge case; would manifest as the function returning {} (caught by the broad except Exception). Acceptable graceful degradation but worth a follow-up to chunk into batches of e.g. 500 if it ever becomes a real issue.
with sqlite3.connect() doesn't close the connection — only commits/rolls back. The connection's underlying file handle is closed when the connection object is garbage-collected. For a hot path on many sessions this could leave stale FDs briefly. Minor; SQLite reuses cached pages within a process anyway.
_lineage_tip_id is set only by read_importable_agent_session_rows, not by the new function. WebUI JSON sessions never get this field. The frontend picks the latest tip by timestamp at static/sessions.js:1230 so this doesn't break collapse — just an inconsistency in the API shape. Worth a note in the API docs if those exist.
PR description claims 3467 passed; my local run shows 3415. Consistent counting drift with prior batch releases — both runs include all new tests passing.
'cli_close' end_reason is hardcoded once in upstream cli.py at line 11424 (in a single shutdown path). If a user's CLI quits via a different path (signal, crash, kill), end_reason may not be set, and the chain won't extend. The user would see two separate sidebar rows for what's logically one continuation. Acceptable — only the explicit clean-shutdown path is treated as a continuation.

Recommendation

✅ Approved. Both pre-release fixes verified: parameterized indexed query is the right shape (50× faster confirmed via the bot's measurements), orphan suppression closes the never-collapsing-single-row UX bug. Cross-tool clean — read-only SQLite SELECT, schema-tolerant, depth-bounded, cycle-safe. The 5 regression tests pin the invariants in source.

Parked at approval — ready for the release agent's merge/tag pipeline.

Manual WebUI cron runs previously called cron.scheduler.run_job(job) and then only cleared the in-memory running flag. That meant output could be dropped and job metadata like last_run_at / last_status was not updated after a manual run. This PR matches the scheduled cron path (cron/scheduler.py:1334-1364) exactly: - Save manual-run output via save_job_output - Mark manual runs complete via mark_job_run - Treat empty final_response as a soft failure with the same error string as the scheduled path - Record manual-run failures in job metadata via mark_job_run(False) - Keep _run_cron_tracked self-contained for worker-thread execution Includes 2 behavioral regression tests using monkeypatch.setitem on sys.modules to mock cron.scheduler.run_job + cron.jobs helpers — the right test pattern (exercises the real _run_cron_tracked code path). Split out from #1352 (the larger profile-aware-cron-panel PR that's on hold) per pre-release-review feedback. Self-contained, doesn't touch the held PR's profile-filtering scope. Co-authored-by: NocGeek <NocGeek@users.noreply.github.com>

@NocGeek

Opus pre-release findings on #1370 applied: SHOULD-FIX 1: Tightened parent_session_id exposure to only emit when the parent's end_reason is in {compression, cli_close}. Without this, two distinct WebUI sessions sharing a non-continuation parent (e.g. 'user_stop') would get clustered by frontend's _sessionLineageKey (which falls through to parent_session_id when _lineage_root_id is missing) and incorrectly collapsed into a single sidebar row. Updated assertions in: - tests/test_session_lineage_metadata_api.py:: test_non_compression_state_db_parent_does_not_create_sidebar_lineage - tests/test_pr1370_lineage_metadata_perf_and_orphan.py:: test_non_compression_parent_does_not_extend_lineage SHOULD-FIX 2: Chunked the IN-clause to 500 vars to stay under SQLITE_MAX_VARIABLE_NUMBER. Python 3.9 ships sqlite 3.31 with the default limit of 999. A power user with 2000+ sessions in the sidebar would hit OperationalError, the silent except-wrapper would swallow it, and lineage collapse would never work. Added test_in_clause_chunked_for_large_session_set with SQL interception to lock the invariant in source. PR addition (per user directive — Opus + my review, no second independent review round needed for combined batch): #1372 from @NocGeek — fix: persist manual cron run results. Self-contained 89 LOC fix split out from the held #1352. Mirrors the scheduled-cron path (cron/scheduler.py:1334-1364) exactly: saves output, marks job complete, treats empty response as soft failure with matching error string. 2 behavioral tests using sys.modules monkeypatch to mock cron.scheduler.run_job. CI not yet attached because branch is brand-new; ran the new tests + adjacent suites locally — all pass. Final test count: 3471 passing, 0 failed. Also adds 2 more regression tests for the perf-fix invariants: - test_in_clause_chunked_for_large_session_set - test_two_children_sharing_non_continuation_parent_not_collapsed

nesquena-hermes · 2026-04-30T23:19:29Z

Update: 2 substantive commits added since approval

Per the late-substantive-commits-invalidate-approval rule in ~/WebUI/docs/agent-memory/independent-review.md, flagging the changes since your APPROVED on 571cfed so you can decide whether to re-review or wave through.

What changed

89dcab8 — PR #1372 added (@NocGeek manual cron run persistence) — independently APPROVED on its own PR at 23:16 UTC by you. Per the user's earlier directive ("add it with Opus + your review and merge together without re-requesting independent review"), no second review round needed for this addition. Self-contained 89 LOC fix that's already independently reviewed; mirrors the scheduled-cron path's persistence handling exactly.

63251ad — Opus SHOULD-FIX 1+2 applied to PR #1370 — substantive contract tightening:

🟨 SHOULD-FIX 1 (Opus): tightened parent_session_id exposure to only emit when the parent's end_reason is in {compression, cli_close} — not for user_stop/etc. Without this, two distinct WebUI sessions sharing a non-continuation parent would get clustered by the frontend's _sessionLineageKey (which falls through to parent_session_id when _lineage_root_id is missing) and incorrectly collapsed into a single sidebar row.

This changes a behavior your edge-case matrix specifically traced and approved ("Parent exists, end_reason=user_stop → parent_session_id exposed; no _lineage_root_id ✅ test passes"). The new behavior: parent_session_id is also NOT exposed in that case. I updated the assertion in test_session_lineage_metadata_api.py::test_non_compression_state_db_parent_does_not_create_sidebar_lineage from assert == "parent" to assert "parent_session_id" not in row.

🟨 SHOULD-FIX 2 (Opus): chunked the IN-clause to 500 vars to stay under SQLITE_MAX_VARIABLE_NUMBER. Python 3.9 ships sqlite 3.31 with default limit 999. A power user with 2000+ sessions in the sidebar would hit OperationalError: too many SQL variables, the silent except-wrapper would swallow it, and lineage collapse would never work for them. Added test_in_clause_chunked_for_large_session_set that intercepts SQL queries and asserts no IN clause exceeds 999 placeholders. Your minor observations section mentioned this exact concern — the fix addresses it.

Plus a 7th regression test (test_two_children_sharing_non_continuation_parent_not_collapsed) for the SHOULD-FIX 1 contract.

Tests

Full suite: 3471 passed, 0 failed (+12 from the contributor's 5 lineage tests + my 7 perf/orphan/contract tests)
CI re-running on 63251ad

What I want from you

The cleanest path: re-review 63251ad. Specifically the SHOULD-FIX 1 behavior change (parent_session_id no longer exposed for non-continuation parents). Your original review's edge-case matrix marked the looser behavior as "✅ test passes" — the assertion flip means that matrix entry now reads ✅ for the tightened behavior. Took the cheap-win SHOULD-FIX rather than punting per the reviewer-flagged-fix-in-release-not-followup.md skill pattern.

If you'd rather defer SHOULD-FIX 1 to a follow-up: I can revert 63251ad's SHOULD-FIX 1 portion (keep SHOULD-FIX 2 chunking + #1372 + the test additions) and ship the looser behavior with a follow-up PR. Just let me know.

Branch state:

63251ad  Opus SHOULD-FIX 1+2 + 2 new regression tests
89dcab8  #1372 @NocGeek manual cron run persistence (APPROVED on own PR)
571cfed  release: v0.50.251 (your APPROVED commit)
7da1e07  #1370 @dso2ng (squashed)

…p/Cancel (#1375) Three distinct data-loss paths fixed: §A — Reasoning text was accumulated in a thread-local _reasoning_text inside _run_agent_streaming. cancel_stream() never saw it because it went out of scope when the thread was interrupted. Now mirrored to a new shared dict STREAM_REASONING_TEXT keyed by stream_id, populated in on_reasoning() and the reasoning branch of on_tool(), read in cancel_stream(). §B — Live tool calls in thread-local _live_tool_calls were similarly invisible to cancel_stream(). Now mirrored to STREAM_LIVE_TOOL_CALLS on tool.started + tool.completed. §C — Reasoning-only streams produced no partial message because the thinking-block regex strip returned empty string and the `if _stripped:` guard skipped the append. Now appends the partial message when EITHER content text, reasoning trace, OR tool calls exist. Mirrors the existing STREAM_PARTIAL_TEXT pattern from #893 exactly: same dict creation in _run_agent_streaming, same _live_config fallback in cancel_stream, same cleanup in _periodic_checkpoint. 8 regression tests in tests/test_issue1361_cancel_data_loss.py covering all three sections plus tools+text combinations. Co-authored-by: bergeouss <bergeouss@users.noreply.github.com>

… yet (#1373) When a user switched profiles and created a new session, the session was saved to the default profile directory instead of the active profile directory — because get_hermes_home_for_profile() silently fell back to _DEFAULT_HERMES_HOME when the profile directory didn't exist yet on disk. Root cause: api/profiles.py:156 had `if profile_dir.is_dir(): return profile_dir; return _DEFAULT_HERMES_HOME`. New profiles (no session yet, so no dir) routed every session back to default. Fix: remove the is_dir() guard, return the profile path unconditionally. The profile directory is created on first use by the agent/session layer. 5 regression tests in tests/test_issue1195_session_profile_routing.py: existing-profile, non-existent-profile (the core fix), None, empty- string, 'default' all return the expected path. Co-authored-by: bergeouss <bergeouss@users.noreply.github.com>

@bergeouss

…r change Adds two more contributor PRs to the v0.50.251 batch per user directive (per-PR review + Opus review for #1373; #1375 was clean ship-on-sight). #1375 (@bergeouss, +382 LOC, all CI green) — fixes #1361 paid-token data loss on Stop/Cancel. Mirrors the existing STREAM_PARTIAL_TEXT pattern from #893: adds STREAM_REASONING_TEXT and STREAM_LIVE_TOOL_CALLS shared dicts populated during streaming and read by cancel_stream(). Also fixes the §C reasoning-only-creates-no-message gap where the strip-thinking-blocks regex returned empty string and the if-guard skipped the partial append. 8 regression tests covering all 3 sections plus tools+text combinations. #1373 (@bergeouss, +105 LOC, had CI failures pre-fix) — fixes #1195 new-profile-routes-to-default. The is_dir() guard in get_hermes_home_for_profile() caused new profiles (no session yet) to silently route every session back to the default profile until the directory existed on disk. Removed the guard; profile path is now returned unconditionally. Pre-release fix for #1373's CI failures: the change flipped two behaviors pinned by tests in #798: - R19c (test_get_hermes_home_for_profile_falls_back_for_missing_profile) asserted nonexistent → base. Renamed and updated to assert the new always-return-profile-path behavior. - R19j (test_get_hermes_home_for_profile_rejects_path_traversal) asserted that valid-but-nonexistent profile names → base. Updated to assert profile-scoped path. Also updated docstring: the _PROFILE_ID_RE regex is now the SOLE defense against path traversal (previously is_dir() was a defense-in-depth layer); verified each known-bad shape still returns base. Tests: 3484 passing (3471 → 3484, +13).

nesquena

Round 2 Review — end-to-end ✅ (clean approve, both my round-1 SHOULD-FIX flags addressed + 3 new PRs traced)

Picking back up from my round-1 review of #1371. Since then the bundle has grown by 4 commits and now ships 5 PRs instead of 1. Both round-1 SHOULD-FIX items I flagged are correctly addressed; the 3 newly-bundled PRs trace clean.

Squash audit (what's new since round 1)

571cfed  release: v0.50.251 (round 1 — already approved by me)
89dcab8  #1372  fix: persist manual cron run results                           @NocGeek (already approved on standalone PR)
63251ad  release: apply Opus SHOULD-FIX 1+2 + add #1372
c5f4f56  #1375  fix(#1361): preserve reasoning, tool calls, partial output     @bergeouss
f14280e  #1373  fix(#1195): route sessions to profile dir even when missing    @bergeouss
d071e46  release: add #1373 + #1375; fix R19c/R19j contracts

All contributor commits preserve Author: + Co-authored-by: trailers. ✅

My round-1 SHOULD-FIX items — both fixed ✅

SHOULD-FIX 1 (parent_session_id leak when end_reason is non-continuation) — round-1 review flagged that the original orphan-only guard would still expose parent_session_id for two distinct WebUI sessions sharing a user_stop parent, letting the frontend's #1358 _sessionLineageKey fall-through cluster them into one collapsed sidebar row.

Fix verified at api/agent_sessions.py:336-352: exposure is now gated on parent_row.get('end_reason') in {'compression', 'cli_close'}. Subsumes the orphan check (orphan parents have no row, so parent_row is None → don't expose). New test test_two_children_sharing_non_continuation_parent_not_collapsed and updated test_non_compression_parent_does_not_extend_lineage lock the contract.

SHOULD-FIX 2 (? placeholder limit) — round-1 review noted that on Python 3.9 (sqlite 3.31, default SQLITE_MAX_VARIABLE_NUMBER=999), a power user with 2000+ sessions in the sidebar would hit OperationalError: too many SQL variables, the broad except would swallow it, and lineage collapse would silently disable forever.

Fix verified at api/agent_sessions.py:308-319: IN clause is now chunked into batches of 500 placeholders. Test test_in_clause_chunked_for_large_session_set intercepts every SQL query at runtime and asserts no IN-clause has > 999 vars, primed with 1500 unrelated session rows. ✅

Newly-bundled PRs traced

PR #1372 — manual cron persistence (@NocGeek)

Already approved on its standalone PR (#1372). Identical content squashed into this batch. No drift.

PR #1375 — preserve reasoning, tool_calls, partial output on Stop/Cancel (@bergeouss)

Three data-loss paths fixed via two new shared dicts mirroring the existing STREAM_PARTIAL_TEXT pattern from #893:

STREAM_REASONING_TEXT at api/config.py:2118 — accumulated by on_reasoning callback at api/streaming.py:1581-1583 and the reasoning branch of on_tool at api/streaming.py:1616-1618. Read by cancel_stream at api/streaming.py:2662-2666.
STREAM_LIVE_TOOL_CALLS at api/config.py:2119 — accumulated on tool.started at api/streaming.py:1635-1641 and updated on tool.completed at api/streaming.py:1670-1678. Read by cancel_stream at api/streaming.py:2667-2671.
§C (reasoning-only message creation) at api/streaming.py:2768-2786: the pre-fix if _stripped: guard skipped append when the partial-text strip returned empty (reasoning-only stream). Now appends if ANY of _stripped / _has_reasoning / _has_tools are truthy.

Race / lock analysis:

STREAM_REASONING_TEXT[stream_id] += str(text): not atomic on its own but the dict-item assignment is. Each stream_id is written by exactly one agent worker thread, so no concurrent writer. The HTTP cancel handler reads under STREAMS_LOCK, agent writer doesn't hold it — same pattern as STREAM_PARTIAL_TEXT. Reads see either the pre-write or post-write value; no torn strings (CPython dict-item swap is atomic). ✅ Mirrors the existing #893 pattern exactly.
STREAM_LIVE_TOOL_CALLS[stream_id].append(...): list append is atomic in CPython (GIL). Cancel reads via list(_cancel_tool_calls) to copy — may miss a concurrent append, acceptable. ✅
Cleanup at api/streaming.py:2469-2473 pops both dicts in the finally block. ✅

Cross-tool consistency for the partial tool_calls shape:

The shared dict stores {name, args, done} — webui-display shape, NOT OpenAI/Anthropic API shape (no id field, no function.{name,arguments} wrapper).
I traced whether this could break the next-turn API call. It does NOT, because:
- Cancel only mutates _cs.messages (UI transcript) at api/streaming.py:2786. It does NOT mutate _cs.context_messages (model-facing history).
- The next turn calls _session_context_messages(s) at api/streaming.py:1108-1113, which returns s.context_messages if set (preferred) or falls back to s.messages. Since the previous successful turn set s.context_messages = _next_context_messages at api/streaming.py:2001, context_messages is populated and the partial UI message is not in the model-facing history.
- Net: malformed tool_calls live in the UI transcript only; they're never serialized to the provider API. ✅
This is a subtle invariant. Worth documenting that the partial message's tool_calls are display-only — see Minor observations.

8 regression tests in test_issue1361_cancel_data_loss.py. All pass.

PR #1373 — route sessions to profile dir even when dir doesn't exist (@bergeouss)

Single-line fix at api/profiles.py:153-156: removed the is_dir() fallback that silently routed new profiles to default until the dir existed.

Pre-fix R19j concern: was is_dir() part of the path-traversal defense? Verified by reading api/profiles.py:153:

if not name or name == 'default' or not _PROFILE_ID_RE.match(name):
    return _DEFAULT_HERMES_HOME

_PROFILE_ID_RE = re.compile(r'^[a-z0-9][a-z0-9_-]{0,63}$') — only [a-z0-9_-] allowed. No ., /, or absolute paths can pass. ✅ The regex is sufficient defense; is_dir() was redundant. The bot's R19j update at tests/test_issue798.py:185-194 explicitly verifies each known-bad shape ('../../etc', '/absolute/path', 'has spaces', 'UPPERCASE') returns base, AND that valid names now route to profile-scoped path.

R19c was correctly updated: test_get_hermes_home_for_profile_returns_profile_path_for_missing_profile now asserts result == tmp_path / 'profiles' / 'ghost' (the new behavior). The old assertion result == tmp_path (fallback to base) is correctly removed.

5 new regression tests in test_issue1195_session_profile_routing.py. All pass.

End-to-end traces — round 2 additions

#1375 cancel-data-loss flow:

User sends message → _run_agent_streaming allocates STREAM_REASONING_TEXT[stream_id] = '' and STREAM_LIVE_TOOL_CALLS[stream_id] = [] at api/streaming.py:1376-1378.
Agent starts emitting reasoning → on_reasoning callback bumps both _reasoning_text (thread-local) AND STREAM_REASONING_TEXT[stream_id] (shared).
Agent calls a tool → on_tool (event_type='tool.started') appends to _live_tool_calls (thread-local) AND STREAM_LIVE_TOOL_CALLS[stream_id] (shared, with done=False).
User clicks Cancel → POST /api/chat/cancel → cancel_stream(stream_id).
Inside STREAMS_LOCK: capture _cancel_partial_text, _cancel_reasoning, _cancel_tool_calls from shared dicts (with _live_config re-read fallback for hot-reload scenarios).
Outside lock, under _agent_lock: build _partial_msg if any of stripped text / reasoning / tools is non-empty. Append to _cs.messages. Append cancel marker with _error=True.
_cs.save().
Display: UI shows the partial message including reasoning trace and tool_calls. Next turn: context_messages is intact from previous successful turn; partial message is NOT replayed to the API.

#1373 profile routing flow:

User switches to profile alice (newly created, no session yet).
Creates a new session → new_session(profile='alice') → s.profile = 'alice'.
Save path resolves via get_hermes_home_for_profile('alice') at api/profiles.py:155-156 → _DEFAULT_HERMES_HOME / 'profiles' / 'alice' (regardless of whether the dir exists yet).
Pre-fix: this path was rejected because the dir didn't exist; fallback to _DEFAULT_HERMES_HOME. Session saved to default profile dir, isolation broken.
Post-fix: profile path returned unconditionally. Agent/session layer creates the dir on first save.

Edge-case matrix (round 2 additions)

Scenario	Pre-fix	Post-fix
2000+ sessions in sidebar on Python 3.9 / SQLite 3.31	OperationalError swallowed → no lineage	Chunked to 500/IN — no error ✅
Two WebUI sessions share `user_stop` parent	`parent_session_id` exposed → frontend collapses both into one row	`parent_session_id` suppressed → both render flat ✅
Cancel mid-reasoning (no visible tokens)	No partial message saved → reasoning lost	Partial message with `reasoning` field saved ✅
Cancel mid-tool-call (tool started, not done)	Tool call lost from session	`tool_calls` saved (display-only) ✅
Cancel with reasoning + tools + text	Only text saved	All three saved ✅
Tool_calls in partial message replayed in next turn API call	n/a	Not replayed — `context_messages` (model-facing) unchanged by cancel; partial is display-only in `messages` ✅
New profile, no session yet	Session saved to default profile dir	Session saved to profile dir ✅
Path traversal in profile name (`'../../etc'`)	Regex blocks → returns base	Regex blocks → returns base ✅
Manual cron run output	Discarded	Saved + job marked ✅

Other audit — confirmed correct

✅ Auth gate unchanged — all changes are inside handle_get/handler scope already gated by check_auth.
✅ No XSS surface — no innerHTML / template literal additions; webui-only data fields.
✅ No new endpoints — only enriches existing /api/sessions payload, internal cancel_stream, internal _run_cron_tracked.
✅ No config.yaml writes — read-only on state.db.
✅ Cross-tool: _lineage_root_id / _lineage_tip_id / _compression_segment_count / parent_session_id are webui-only display keys. CLI doesn't read them. Verified in upstream tools/, cron/, agent/, hermes_cli/.
✅ Backwards compat — Sessions without lineage metadata, profiles without dirs, cancels without reasoning/tool_calls all degrade gracefully.

Tests

Targeted (round 2 additions):
- test_pr1370_lineage_metadata_perf_and_orphan.py — 7/7 pass (5 from round 1 + 2 new for SHOULD-FIX 1+2).
- test_session_lineage_metadata_api.py — 3/3 pass (updated for tightened parent_session_id semantics).
- test_issue1361_cancel_data_loss.py — 8/8 pass (3 §A + 2 §B + 3 §C).
- test_issue1195_session_profile_routing.py — 5/5 pass.
- test_issue798.py (R19c/R19j updated) — 20/20 pass.
- test_cron_manual_run_persistence.py — 2/2 pass.
Full suite: 3432 passed, 54 skipped, 3 xpassed, 0 failed in 17.48s on d071e46.
CI: 3.11/3.12/3.13 all green.

(PR description claims 3484 — mine is 3432; consistent counting drift across batch releases.)

Minor observations (non-blocking)

Partial message tool_calls shape is webui-only: the {name, args, done} shape stored on partial messages won't replay to the provider API because cancel doesn't mutate s.context_messages. This is a subtle invariant worth documenting in the cancel block — a future refactor that decides to mutate context_messages from cancel (or that switches _session_context_messages to prefer s.messages) would silently break the API. Adding a comment near api/streaming.py:2782-2785 saying "tool_calls saved here are display-only; do NOT add this message to context_messages" would lock the contract in source.
Regression test for the next-turn invariant. A test that calls _session_context_messages(s) after a cancel-with-tool_calls and asserts the partial isn't in the result would lock the invariant in source.
Reasoning trace replayed in conversation_history? The partial message has reasoning set, but _API_SAFE_MSG_KEYS = {'role', 'content', 'tool_calls', 'tool_call_id', 'name', 'refusal'} doesn't include reasoning so it's stripped. Even if the partial message somehow ended up in context_messages, the reasoning field would be dropped. ✅
Test count drift — PR description says 3484, my run says 3432. Same pattern as prior releases. Not a defect.
is_dir() removal as a bug-fix vs defense-in-depth tradeoff — for #1373: the regex is the SOLE path-traversal guard now. R19j explicitly verifies each known-bad shape. If a future commit weakens _PROFILE_ID_RE, traversal would become possible. The regex is conservative (^[a-z0-9][a-z0-9_-]{0,63}$), so this is theoretical.

Recommendation

✅ Approved (round 2). Both my round-1 SHOULD-FIX items correctly addressed with regression tests that lock the new contracts. The 3 newly-bundled PRs (#1372, #1373, #1375) all trace clean — #1372 was already approved standalone; #1373's path-traversal protection survives the simplification (regex is sufficient); #1375's malformed-tool_calls concern resolves once you trace that context_messages (model-facing) is not mutated by cancel.

Parked at approval — ready for the release agent's merge/tag pipeline.

…ST-FIX) Opus pass-2 review of v0.50.251 caught a critical regression in PR #1375: The cancel-partial message stored captured tool calls under the 'tool_calls' key. That key is whitelisted by _API_SAFE_MSG_KEYS so _sanitize_messages_for_api forwarded the entries to the next-turn LLM call. But the captured entries use the WebUI internal shape ({name, args, done, duration, is_error}) — they don't have the OpenAI/Anthropic id + function: {name, arguments} envelope. Strict providers (OpenAI, Anthropic, Z.AI/GLM) would 400 on the malformed entries. Net effect: the very cancel-then-continue scenario PR #1375 aimed to improve becomes a hard fail. Fix: - Rename the persisted key to '_partial_tool_calls' (underscore- prefixed private key NOT in _API_SAFE_MSG_KEYS, so sanitize correctly strips it). - Update static/messages.js hasMessageToolMetadata check to also recognize _partial_tool_calls for UI rendering. - Update test_issue1361_cancel_data_loss.py assertion to check _partial_tool_calls (and tool_calls as legacy fallback). Plus 2 NIT fixes from the same Opus review: NIT 1 (api/profiles.py:153): re.match → re.fullmatch for consistency with other _PROFILE_ID_RE callers in the codebase. The trailing- newline footgun ($ matches before final \n in re.match) is now closed. Without #1373's is_dir() guard, a name like 'valid\n' would have created a directory named 'valid\n' on Linux. Doesn't escape <HERMES_HOME>/profiles/ via Path joining, but unintended. NIT 2 (test_issue798.py): R19j coverage gaps — added trailing- newline tests, length-boundary tests (64-char valid, 65-char rejected), single-char minimum, and non-ASCII / Unicode-trick tests. New regression test (tests/test_pr1375_partial_tool_calls_sanitize.py): - test_partial_tool_calls_field_not_forwarded_to_llm: pins that sanitize-for-API strips _partial_tool_calls + reasoning + does NOT have tool_calls on a partial message - test_legitimate_tool_calls_are_preserved_for_completed_turns: pins that real OpenAI-shape tool_calls on completed turns survive sanitize unchanged Tests: 3486 passing (3484 → 3486, +2 sanitize tests).

nesquena-hermes · 2026-04-30T23:44:28Z

Update: 2 more PRs added + Opus pass-2 catches MUST-FIX in #1375

Per the user's directive to evaluate all new PRs for the merge queue, two more PRs added to v0.50.251 since the last comment.

PRs added since last update

c5f4f56 — #1375 (@bergeouss) preserve reasoning, tool calls, partial output on Stop/Cancel

382 LOC, all 3 CI Python versions green on its own PR
Mirrors existing STREAM_PARTIAL_TEXT pattern from Bug: Stop_generation button deletes LLM response #893 — adds STREAM_REASONING_TEXT and STREAM_LIVE_TOOL_CALLS shared dicts populated during streaming and read by cancel_stream()
8 regression tests covering §A reasoning, §B tools, §C reasoning-only-creates-message
Approve-on-sight quality

f14280e — #1373 (@bergeouss) route sessions to profile dir on first use (with R19c/R19j fix)

105 LOC, fixes webui: session saved to wrong profile when switching agents #1195
Removes the is_dir() fallback in get_hermes_home_for_profile() — new profiles no longer route to default
Original PR had CI failure on 3.12 because it flipped 2 behaviors pinned by tests in Profile isolation bug: switching profile in one browser client affects all other clients #798 (R19c fall-back, R19j path-traversal-defense). I updated both test assertions to match the new behavior
Path traversal still blocked by the _PROFILE_ID_RE regex (verified)

Opus pass-2 review caught MUST-FIX in #1375

After adding both PRs to the batch, I ran a second Opus pre-release pass focused on the additions. Opus caught a real correctness regression in #1375 that I'd missed:

🟥 MUST-FIX — _partial_msg['tool_calls'] = list(_cancel_tool_calls) stored captured tool calls under the tool_calls key, which is whitelisted by _API_SAFE_MSG_KEYS and forwarded to the next-turn LLM call. But the captured entries use the WebUI internal shape ({name, args, done, duration, is_error}) — they don't have OpenAI/Anthropic's id + function: {name, arguments} envelope. Strict providers would 400 on the malformed entries, turning a "data lost on cancel" bug into a "next message returns 400" bug — arguably worse, since the user can't recover by typing again.

Fix at f53556b: renamed the key to _partial_tool_calls (underscore-prefixed private key NOT in _API_SAFE_MSG_KEYS, so sanitize correctly strips it). Updated static/messages.js to recognize both. Added tests/test_pr1375_partial_tool_calls_sanitize.py with 2 tests pinning the sanitize-roundtrip invariant.

2 NITs from the same Opus pass also fixed

🟦 NIT 1 (api/profiles.py:153): re.match → re.fullmatch for consistency with other _PROFILE_ID_RE callers in the codebase. Closes the trailing-newline footgun ($ matches before final \n in re.match).

🟦 NIT 2 (tests/test_issue798.py R19j coverage): added trailing-newline tests, length-boundary tests (64-char valid, 65-char rejected), single-char minimum, and non-ASCII / Unicode-trick tests.

#1374 put on hold

@NocGeek's #1374 (profile-aware cron panel resubmit of held #1352) is the architectural piece that needs careful review, not a rushed batch add. Posted detailed hold comment with 3 specific architectural coordination questions and a "what I'd want to see before un-holding" list.

Final branch state on `release/v0.50.251` (HEAD `f53556b`)

f53556b  Opus pass-2 MUST-FIX (rename to _partial_tool_calls) + 2 NITs + sanitize test
d071e46  release: add #1373 + #1375; fix R19c/R19j contracts
f14280e  #1373 fix(#1195): route sessions to profile dir on first use
c5f4f56  #1375 fix(#1361): preserve reasoning, tool calls, partial output on Stop/Cancel
63251ad  release: apply Opus SHOULD-FIX 1+2 + add #1372
89dcab8  #1372 fix: persist manual cron run results
571cfed  release: v0.50.251 (#1370 perf fix + orphan-parent guard) ← your earlier APPROVED
7da1e07  #1370 fix: expose session lineage metadata in API

Tests

Final suite: 3486 passed, 0 failed (3471 → 3486, +15 from fix(#1195): route sessions to profile dir even when dir doesn't exist yet #1373's 5 + fix(#1361): preserve reasoning, tool calls, and partial output on Stop/Cancel #1375's 8 + sanitize test's 2)
All 3 Python versions CI passing on prior commits; will re-run on f53556b

What I'm waiting on

Re-review on f53556b. Substantive changes since 571cfed (your earlier APPROVED):

Already-approved-elsewhere additions: fix: persist manual cron run results #1372 (you APPROVED), fix(#1195): route sessions to profile dir even when dir doesn't exist yet #1373 (independently reviewed by you on its own PR? unclear — it was just opened today), fix(#1361): preserve reasoning, tool calls, and partial output on Stop/Cancel #1375 (independently reviewed by you on its own PR? unclear)
Opus SHOULD-FIX 1+2 on fix: expose session lineage metadata in API #1370 (parent_session_id contract tightening + IN-clause chunking)
Opus MUST-FIX on fix(#1361): preserve reasoning, tool calls, and partial output on Stop/Cancel #1375 (tool_calls field rename to prevent LLM 400)
2 NITs on fix(#1195): route sessions to profile dir even when dir doesn't exist yet #1373 (re.fullmatch + R19j coverage gaps)

Cleanest path: re-review f53556b. If you'd rather defer some of the late additions to v0.50.252 (say, reverting #1373 and shipping just #1370 + #1372 + #1375), I can split the batch — let me know which way to go.

nesquena

Round 3 Review — end-to-end ✅ (clean approve, my round-2 observation hardened into a MUST-FIX)

Single follow-up commit f53556b since round 2. The bot picked up my round-2 observation about the malformed tool_calls shape on partial messages and elevated it to a MUST-FIX with a hardened fix — exactly the right call. Plus two NIT fixes I would have flagged on a third pass.

Squash audit

571cfed   release: v0.50.251 (round 1 — already approved)
89dcab8   #1372  fix: persist manual cron run results
63251ad   release: apply Opus SHOULD-FIX 1+2 + add #1372
c5f4f56   #1375  fix(#1361): preserve reasoning, tool calls, partial output
f14280e   #1373  fix(#1195): route sessions to profile dir
d071e46   release: add #1373 + #1375; fix R19c/R19j contracts (round 2 — already approved)
f53556b   release: rename tool_calls → _partial_tool_calls + re.fullmatch + R19j coverage  ← NEW

Round-2 observation now MUST-FIX'd ✅

In round 2 I flagged this as a non-blocking observation:

Partial message tool_calls shape is webui-only: the {name, args, done} shape stored on partial messages won't replay to the provider API because cancel doesn't mutate s.context_messages. This is a subtle invariant worth documenting.

But on closer trace, the invariant only holds when s.context_messages is populated. Looking at api/streaming.py:1108-1113:

def _session_context_messages(session):
    context_messages = getattr(session, 'context_messages', None)
    if isinstance(context_messages, list) and context_messages:
        return context_messages
    return session.messages or []

The fallback to session.messages fires when context_messages is missing or empty — which is the case for sessions where the FIRST turn was cancelled (no successful prior turn ever populated context_messages). In that fallback path, session.messages includes the partial message, and _sanitize_messages_for_api would have whitelisted tool_calls and forwarded the malformed {name, args, done} entries to the provider — breaking the very next-turn-after-cancel path PR #1375 aimed to improve.

The bot caught this and applied the right fix at api/streaming.py:2785:

# Store under '_partial_tool_calls' (NOT 'tool_calls'). The captured entries use
# the WebUI internal shape {name, args, done, duration, is_error} — they do NOT
# carry the OpenAI/Anthropic API id + function: {name, arguments} envelope. If we
# put them under 'tool_calls', _sanitize_messages_for_api would forward them to
# the next-turn LLM call and strict providers would 400 on the malformed entries.
# The underscore-prefixed key is not in _API_SAFE_MSG_KEYS so sanitize strips it.
_partial_msg['_partial_tool_calls'] = list(_cancel_tool_calls)

This is the textbook fix — make the structural shape itself enforce the invariant, not a comment. Since _API_SAFE_MSG_KEYS = {'role', 'content', 'tool_calls', 'tool_call_id', 'name', 'refusal'} doesn't include _partial_tool_calls, the dict comprehension at api/streaming.py:1020 drops the field on every sanitize call. No subtle context_messages-vs-messages reasoning required.

Frontend update at static/messages.js:1075-1081 recognizes both tool_calls and _partial_tool_calls for UI rendering — display continuity preserved.

The new regression test test_pr1375_partial_tool_calls_sanitize.py pins both invariants:

_partial_tool_calls AND tool_calls AND reasoning are all stripped from a partial message after sanitize.
Real OpenAI-shape tool_calls (with id + function: {name, arguments} envelope) survive sanitize on completed turns.

NIT 1 — `re.match` → `re.fullmatch` for `_PROFILE_ID_RE`

The bot caught a real footgun introduced by removing the is_dir() guard in #1373. re.match(r'^[a-z0-9][a-z0-9_-]{0,63}$', name):

^ anchors at start ✅
$ anchors at end OR before a final newline (per Python regex docs)
So 'valid\n' would have matched, and without is_dir() as a defense layer, Path(_DEFAULT_HERMES_HOME) / 'profiles' / 'valid\n' would have created a directory with a literal newline character on Linux. Not a directory escape (Path joining is sandboxed), but unintended.

Confirmed by harness:

'valid'              match=True   fullmatch=True
'valid\n'            match=True   fullmatch=False   ← the footgun
'a\n'                match=True   fullmatch=False
'a\nb'               match=False  fullmatch=False
'../../etc'          match=False  fullmatch=False

Fix verified at api/profiles.py:153: re.match → re.fullmatch. Trailing newline now correctly returns base.

NIT 2 — Expanded R19j coverage

tests/test_issue798.py:194-208 now covers:

Trailing newline ('valid\n', 'a\n') → base.
Length boundaries: 64 chars (max valid) → profile path; 65 chars → base.
Single-char minimum ('a') → profile path.
Non-ASCII Unicode-trick names ('voilà', '名前') → base.

These are exactly the boundary conditions a future weakening of _PROFILE_ID_RE would silently regress. Locked in.

Other audit — what I checked

✅ Field rename is complete: searched for all reads/writes of tool_calls on _partial=True messages. The only producer is the cancel-stream path (now writes _partial_tool_calls); the only consumer is the frontend UI (now reads both keys). No orphan readers.
✅ Backwards compat: any existing partial messages on disk with the old tool_calls field are display-only — sanitize strips them safely as long as orphan-detection runs correctly. Verified _sanitize_messages_for_api's orphan-detection path: tool_calls without id aren't added to valid_tool_call_ids, so any tool result that was somehow paired would be dropped as orphan. The malformed assistant tool_calls would still pass through (since orphan detection only filters tool messages, not assistant tool_calls). However: PR #1375 was just shipped in round 2 and never reached production, so there are no historical session files with the old field name. The field rename effectively eliminates the failure mode entirely going forward. No migration needed.
✅ _API_SAFE_MSG_KEYS set unchanged at api/streaming.py:67 — _partial_tool_calls correctly absent from the whitelist.
✅ Frontend recognizes both keys at static/messages.js:1075-1081. The check hasTc || hasPartialTc || hasTu now triggers tool-card rendering on partial messages too.
✅ No additional reads of _partial_tool_calls elsewhere needed — display path already handles m.tool_calls for legacy completed-turn messages and now m._partial_tool_calls for partial messages via the same code path.

Edge-case matrix (round 3 additions)

Scenario	Round 2 behavior	Round 3 behavior
Cancel mid-tool, then send next message (with `context_messages` populated)	Partial tool_calls saved on `s.messages` only; not in `context_messages`; API doesn't see them	Same — but now structurally enforced via field rename ✅
Cancel on FIRST turn (no `context_messages` yet)	Fallback to `session.messages` → malformed tool_calls forwarded → API 400 ⚠️	`_partial_tool_calls` stripped by sanitize → no malformed entries forwarded ✅
Profile name with trailing newline (`'valid\n'`)	`re.match` accepts → directory with `\n` created on Linux	`re.fullmatch` rejects → returns base ✅
Profile name 64 chars (max valid)	Same as round 2	Tested explicitly ✅
Profile name 65 chars	Same as round 2	Tested explicitly → base ✅
Real `tool_calls` on completed turn	Pass through	Pass through ✅ (preservation test)
Frontend UI rendering of partial tool_calls	UI reads `m.tool_calls` (worked)	UI reads `m._partial_tool_calls` OR `m.tool_calls` ✅

End-to-end trace — partial-tool-calls flow (post-rename)

User starts message → stream begins → tool starts → STREAM_LIVE_TOOL_CALLS[stream_id].append({...webui-shape...}).
User clicks Cancel → cancel_stream reads from shared dict.
Builds _partial_msg and stores tools at api/streaming.py:2785 under _partial_tool_calls key.
Appends to _cs.messages and _cs.save().
Display: frontend's attachLiveStream at static/messages.js:1075-1081 renders the tool cards via the hasPartialTc branch.
Next turn: _session_context_messages(s) returns either:
- s.context_messages (preferred) → partial msg never persisted there; safe.
- s.messages (fallback when context_messages empty) → partial msg present BUT _sanitize_messages_for_api strips _partial_tool_calls (not in whitelist) → API gets a clean {role: 'assistant', content: ''} (or with text content) → no 400.

Tests

New regression test: test_pr1375_partial_tool_calls_sanitize.py — 2/2 pass (sanitize-strip + completed-turn preservation).
Updated existing test: test_issue1361_cancel_data_loss.py::test_cancel_with_tool_calls_preserves_tools checks _partial_tool_calls OR tool_calls OR tools — backwards-compatible assertion.
Updated R19j coverage: test_issue798.py::test_get_hermes_home_for_profile_rejects_path_traversal — 20/20 pass in test_issue798.py (added 5 new boundary cases).
Full suite: 3434 passed, 54 skipped, 3 xpassed, 0 failed in 16.94s on f53556b.
CI: 3.11 ✅ pass, 3.12 ✅ pass, 3.13 ✅ pass.

(PR description claims 3486; my count 3434 — same counting drift as prior rounds.)

Behavioural harness — confirmed `re.match` footgun

'valid'              match=True   fullmatch=True
'valid\n'            match=True   fullmatch=False   ← rejected only by fullmatch
'a\n'                match=True   fullmatch=False
'a\nb'               match=False  fullmatch=False
'../../etc'          match=False  fullmatch=False

The fix is necessary — re.match would have allowed valid\n through.

Minor observations (non-blocking)

The _partial_tool_calls rename is a one-way door: future code that adds tool_call-related logic must consider this field. The comment at api/streaming.py:2783-2799 makes the rationale explicit, which is the right place for it. ✅
No test for the cancel-on-first-turn → next-turn-API-call path specifically. The new test_partial_tool_calls_field_not_forwarded_to_llm covers the sanitize layer in isolation; an integration test that drives the full cancel→next-turn flow with a mocked agent would lock the contract end-to-end. Out of scope for this round; the unit-level test is sufficient given how mechanical the sanitize-key check is.
The two completed-turn tool_calls use id + function: {name, arguments} — the test is OpenAI-shape only. Anthropic uses {type: 'tool_use', id, name, input}. Sanitize doesn't transform; it just keeps. So both shapes pass through unchanged. Acceptable.

Recommendation

✅ Approved (round 3). The bot turned my round-2 non-blocking observation into a hardened structural fix — exactly the right level of paranoia. The field rename eliminates the entire class of "what if context_messages falls back to session.messages" reasoning. Plus two real footgun fixes (re.match → fullmatch, expanded R19j coverage) caught on the same pass.

Parked at approval — ready for the release agent's merge/tag pipeline.

nesquena-hermes · 2026-04-30T23:54:54Z

Shipped — v0.50.251 ✅

Merge: 031feda3 (2026-04-30 23:50 UTC)
Tag: v0.50.251
Production restarted, health check OK, cache-bust live (?v=v0.50.251).
SSE liveness checks PASS for both /api/approval/stream and /api/clarify/stream.

What shipped (4 contributor PRs)

#	Author	Subject
#1370	@dso2ng	Sidebar lineage metadata in API for WebUI JSON sessions (extends #1358's collapse helper)
#1372	@NocGeek	Manual cron run persistence (split from held #1352)
#1373	@bergeouss	New profiles route sessions to profile dir on first use (#1195)
#1375	@bergeouss	Stop/Cancel preserves reasoning + tool calls + partial output (#1361)

Pre-release fixes applied (5 total across 2 Opus passes)

Opus pass-1:

Perf: full table scan → parameterized WHERE id IN (...) (fix: expose session lineage metadata in API #1370 — 50× faster)
SHOULD-FIX 1: chunked IN clause to 500 vars vs SQLITE_MAX_VARIABLE_NUMBER
SHOULD-FIX 2: parent_session_id only emitted for compression/cli_close parents

Opus pass-2:
4. MUST-FIX: renamed tool_calls → _partial_tool_calls on cancel-partial messages (would have 400'd strict providers on next-turn after cancel)
5. NIT: re.match → re.fullmatch on _PROFILE_ID_RE (closes trailing-newline footgun)

Plus contract updates: R19c + R19j tests in test_issue798.py flipped to match #1373's new behavior.

Tests

Final suite: 3486 passed, 0 failed (3447 base → 3486, +39 net new tests)
3 nesquena-APPROVED rounds (commit 571cfed, d071e46, f53556b)
CI green on Python 3.11/3.12/3.13 for f53556b

Hygiene complete

✅ PR release: v0.50.251 #1371 merged with --merge (preserves all 7 commits)
✅ Tag v0.50.251 pushed on 031feda3
✅ All 4 constituent PRs (fix: expose session lineage metadata in API #1370, fix: persist manual cron run results #1372, fix(#1195): route sessions to profile dir even when dir doesn't exist yet #1373, fix(#1361): preserve reasoning, tool calls, and partial output on Stop/Cancel #1375) closed with structured credit comments + Opus-fix rationale for the contributors
✅ feat: add profile-aware cron panel #1352 closed as superseded by fix: persist manual cron run results #1372 (shipped) + feat: add profile-aware cron panel #1374 (held architectural piece)
✅ Issues bug(streaming): Stop/Cancel discards AI's already-streamed reasoning + tool output (and possibly user prompt) — paid tokens lost #1361 and webui: session saved to wrong profile when switching agents #1195 auto-closed by squash; feat: add profile-aware cron panel #1352 closed manually
✅ Worktree /tmp/wt-v0.50.251 removed
✅ Local branches deleted
✅ Production restarted, runtime SSE liveness verified

Held PRs (NOT in this release, still on hold)

8 PRs on hold per HARD BLOCK rule: #1134, #1222, #1272, #1311 (draft), #1315, #1342, #1353, #1374

Lessons captured

Pattern conformance is a heuristic, not a guarantee. PR fix(#1361): preserve reasoning, tool calls, and partial output on Stop/Cancel #1375 mirrored the existing STREAM_PARTIAL_TEXT pattern from Bug: Stop_generation button deletes LLM response #893 perfectly — but introduced a new field (tool_calls) that Bug: Stop_generation button deletes LLM response #893 didn't have, and that field happened to be in _API_SAFE_MSG_KEYS. Pattern-extension trap (pattern Hermes Web UI — Sprints 11-14: multi-provider models, settings, sessi… #2 in ~/WebUI/docs/agent-memory/pr-review-patterns.md) bit again. The fix — making the structural shape itself enforce the invariant via field name choice — is a stronger pattern than a code comment.
Reviewer's "non-blocking observation" can hide a MUST-FIX. Round-2 review marked the tool_calls shape as a non-blocking observation worth documenting. On closer trace it turned out to actually break the very flow the PR was trying to improve. Always trace the full data path when a reviewer flags "this is webui-only / doesn't replay" — that claim is only true if the next-turn API call uses s.context_messages, but the fallback to session.messages fires on first-turn-cancelled sessions.

Three rounds of review, two Opus passes, 5 pre-release fixes (1 MUST-FIX + 2 SHOULD-FIX + 2 NIT). The kind of release where the contributors' code was strong but the integration surface caught real correctness bugs that only surface under careful end-to-end tracing.

dso2ng and others added 2 commits April 30, 2026 23:04

nesquena-hermes requested a review from nesquena April 30, 2026 23:08

nesquena approved these changes Apr 30, 2026

View reviewed changes

NocGeek and others added 2 commits April 30, 2026 23:15

nesquena-hermes requested a review from nesquena April 30, 2026 23:19

bergeouss and others added 3 commits April 30, 2026 23:24

nesquena approved these changes Apr 30, 2026

View reviewed changes

nesquena-hermes requested a review from nesquena April 30, 2026 23:44

nesquena approved these changes Apr 30, 2026

View reviewed changes

nesquena-hermes merged commit 031feda into master Apr 30, 2026
3 checks passed

nesquena-hermes deleted the release/v0.50.251 branch April 30, 2026 23:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

release: v0.50.251#1371

release: v0.50.251#1371
nesquena-hermes merged 8 commits into
masterfrom
release/v0.50.251

nesquena-hermes commented Apr 30, 2026

Uh oh!

nesquena left a comment

Uh oh!

nesquena-hermes commented Apr 30, 2026

Uh oh!

nesquena left a comment

Uh oh!

nesquena-hermes commented Apr 30, 2026

Uh oh!

nesquena left a comment

Uh oh!

Uh oh!

nesquena-hermes commented Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

nesquena-hermes commented Apr 30, 2026

Release v0.50.251

Constituent PR

Pre-release fixes applied

🟨 Perf: full table scan → indexed scoped query (commit 571cfed)

🟨 Orphan-parent guard

Regression suite (+5 tests)

Pre-release gate

Plan after approval

Held PRs (NOT in this release)

Compatibility notes

Uh oh!

nesquena left a comment

Choose a reason for hiding this comment

Review — end-to-end ✅ (clean approve, bot's perf and orphan fixes verified)

Squash audit

Bot's pre-release fixes — both correct

Traced against upstream hermes-agent

End-to-end trace

Cross-tool consistency

Race / lock analysis

Security audit

Edge-case matrix

Tests

Minor observations (non-blocking)

Recommendation

Uh oh!

nesquena-hermes commented Apr 30, 2026

Update: 2 substantive commits added since approval

What changed

Tests

What I want from you

Uh oh!

nesquena left a comment

Choose a reason for hiding this comment

Round 2 Review — end-to-end ✅ (clean approve, both my round-1 SHOULD-FIX flags addressed + 3 new PRs traced)

Squash audit (what's new since round 1)

My round-1 SHOULD-FIX items — both fixed ✅

Newly-bundled PRs traced

PR #1372 — manual cron persistence (@NocGeek)

PR #1375 — preserve reasoning, tool_calls, partial output on Stop/Cancel (@bergeouss)

PR #1373 — route sessions to profile dir even when dir doesn't exist (@bergeouss)

End-to-end traces — round 2 additions

Edge-case matrix (round 2 additions)

Other audit — confirmed correct

Tests

Minor observations (non-blocking)

Recommendation

Uh oh!

nesquena-hermes commented Apr 30, 2026

Update: 2 more PRs added + Opus pass-2 catches MUST-FIX in #1375

PRs added since last update

Opus pass-2 review caught MUST-FIX in #1375

2 NITs from the same Opus pass also fixed

#1374 put on hold

Final branch state on release/v0.50.251 (HEAD f53556b)

Tests

What I'm waiting on

Uh oh!

nesquena left a comment

Choose a reason for hiding this comment

Round 3 Review — end-to-end ✅ (clean approve, my round-2 observation hardened into a MUST-FIX)

Squash audit

Round-2 observation now MUST-FIX'd ✅

NIT 1 — re.match → re.fullmatch for _PROFILE_ID_RE

NIT 2 — Expanded R19j coverage

Other audit — what I checked

Edge-case matrix (round 3 additions)

End-to-end trace — partial-tool-calls flow (post-rename)

Tests

Behavioural harness — confirmed re.match footgun

Minor observations (non-blocking)

Recommendation

Uh oh!

Uh oh!

nesquena-hermes commented Apr 30, 2026

Shipped — v0.50.251 ✅

What shipped (4 contributor PRs)

Pre-release fixes applied (5 total across 2 Opus passes)

🟨 Perf: full table scan → indexed scoped query (commit `571cfed`)

Final branch state on `release/v0.50.251` (HEAD `f53556b`)

NIT 1 — `re.match` → `re.fullmatch` for `_PROFILE_ID_RE`

Behavioural harness — confirmed `re.match` footgun