release: v0.50.251#1371
Conversation
PR #1358 added the client-side lineage collapse helper, but /api/sessions often did not include _lineage_root_id for the WebUI JSON sessions visible in the sidebar. In that case the helper has no grouping key and multiple same-title continuation rows remain visible. This PR: - Reads parent_session_id and end_reason from state.db.sessions for the WebUI sidebar's session ids - Walks the parent chain when end_reason is 'compression' or 'cli_close', producing _lineage_root_id and _compression_segment_count - Cycle-detects via a 'seen' set - Preserves projected lineage metadata on imported/gateway session rows - Allows sidebar collapse to group cross-surface continuation chains (CLI-close → WebUI continuation) while keeping non-continuation parent rows flat Co-authored-by: Dennis Soong <dso2ng@gmail.com>
… suite) Bundles: - #1370 fix: expose session lineage metadata in API (@dso2ng) Pre-release fixes applied: 1) Perf: replaced full table scan with parameterized WHERE id IN (...) query. Original code did SELECT id, parent_session_id, end_reason FROM sessions on every sidebar refresh. Measured 9ms cached scan at 1000 rows in production (up to ~450ms cold-cache); scales linearly. New approach hits PRIMARY KEY + idx_sessions_parent — 50x faster at 1000 rows, ~0.2ms regardless of total row count. Depth-bounded to 20 hops to cap query count under pathological data. 2) Orphan-parent guard: suppress parent_session_id in API output when the referenced parent row doesn't exist in state.db. The frontend's #1358 _sessionLineageKey falls through to parent_session_id when _lineage_root_id is missing — orphan references would create never-collapsing single-row groups in the sidebar. 3) Regression suite (5 tests in test_pr1370_lineage_metadata_perf_and_orphan.py): - Pins the no-full-scan invariant by intercepting all SQL queries and asserting no SELECT FROM sessions without a WHERE clause - Pins orphan-parent suppression - Pins cycle termination via threading.Event watchdog (2s timeout) - End-to-end test for 4-segment compression chain root resolution - Pins non-compression end_reason boundary stops walk
nesquena
left a comment
There was a problem hiding this comment.
Review — end-to-end ✅ (clean approve, bot's perf and orphan fixes verified)
Release v0.50.251 — single-PR release of @dso2ng's #1370 (their 4th this week; 3 in v0.50.249 all clean). Builds on #1358's sidebar lineage collapse by exposing the lineage metadata to the frontend for WebUI JSON sessions (not just imported gateway rows). The bot caught two pre-release issues: a perf regression (full table scan) and an orphan-parent leak.
Squash audit
7da1e07 #1370 fix: expose session lineage metadata in API @dso2ng (Dennis Soong)
571cfed release: v0.50.251 (perf + orphan fix + 5 tests)
Author: Dennis Soong <dso2ng@gmail.com> preserved on the contributor commit. ✅
Bot's pre-release fixes — both correct
Perf fix at api/agent_sessions.py:284-303: replaced the original full table scan (SELECT id, parent_session_id, end_reason FROM sessions) with a parameterized WHERE id IN (...) query that walks the parent chain in batches of up to 20 hops. Each query hits PRIMARY KEY + the existing idx_sessions_parent index (verified at /tmp/hermes-agent-fresh/hermes_state.py:98). The regression test at test_does_not_full_scan_sessions_table intercepts every SQL query the function issues and asserts no SELECT FROM sessions without a WHERE clause — locks the perf invariant in source.
Orphan-parent guard at api/agent_sessions.py:330-333: only exposes parent_session_id in API output when the parent row actually exists in the rows we fetched. Before the fix, dangling references (parent pruned/deleted) would leak through and cluster orphans into never-collapsing single-row groups in the sidebar — because PR #1358's _sessionLineageKey falls through to parent_session_id when _lineage_root_id is missing.
Traced against upstream hermes-agent
Verified state.db.sessions schema at /tmp/hermes-agent-fresh/hermes_state.py:50-71: parent_session_id TEXT, FOREIGN KEY (parent_session_id) REFERENCES sessions(id) — matches the PR's column reads. idx_sessions_parent ON sessions(parent_session_id) exists at line 98 — the perf fix's index target.
Verified end_reason values in upstream production code — they are 'branched', 'compression', 'done', 'resumed_other', 'timeout', 'user_exit', plus 'cli_close' set by /tmp/hermes-agent-fresh/cli.py:11424. The PR walks chains where parent's end_reason in {'compression', 'cli_close'} — these are the two values that indicate a continuation (compression-split or CLI-to-WebUI handoff). All other reasons act as a hard boundary, which is the intended design.
'compressed' (with 'd') appears only in upstream tests/test_hermes_state.py — test typo, not in production paths. ✅
End-to-end trace
/api/sessionsGET at api/routes.py callsall_sessions().all_sessions()at api/models.py:751-833 builds the session list (index path or full scan), then calls_enrich_sidebar_lineage_metadata(result)at line 174 or 182._enrich_sidebar_lineage_metadataat api/models.py:152-164:- Calls
_active_state_db_path()which derivesstate.dbfrom active profile (orHERMES_HOMEfallback). - Calls
read_session_lineage_metadata(db_path, {session_ids}). - Mutates each session dict in-place with the returned metadata.
- Calls
read_session_lineage_metadataat api/agent_sessions.py:258-356:- Empty/missing-DB short-circuit →
{}. - Schema check via
PRAGMA table_info(sessions)— ifparent_session_idorend_reasoncolumns missing (older agent build), returns{}. Graceful degradation. - Iterative parent-chain fetch: 20 hops max, each hop fetches via
WHERE id IN (?, ?, …), queues unfetched parents. - For each
wantedsid:- Expose
parent_session_idonly when the parent is inrows(orphan suppression). - Walk parent chain, extending lineage on
end_reason in {'compression', 'cli_close'}, breaking on cycle (parent_id in seen) or boundary. - Set
_lineage_root_idand_compression_segment_countonly when chain has segments beyond the tip itself.
- Expose
- Empty/missing-DB short-circuit →
- Frontend (static/sessions.js:1219-1234 from PR #1358) groups by
_lineage_root_id/lineage_root_id/parent_session_idand renders the latest tip per group.
Cross-tool consistency
- Read-only on
state.db: only SELECT queries, no writes. ✅ No contention with the agent CLI's writes. - Connection lifecycle: opens fresh
sqlite3.connect()per call, closes viawithblock. ✅ No connection leak. - Schema tolerance: PRAGMA check + try/except → empty dict on schema/DB issues. ✅ Backwards-compatible with older agent builds.
- Imported gateway rows preserved at api/models.py:1045-1049:
_lineage_root_id,_lineage_tip_id,_compression_segment_count,parent_session_idkeys flow through to the API. (_lineage_tip_idis computed only byread_importable_agent_session_rows, not byread_session_lineage_metadata. The frontend doesn't rely on it — it picks the latest tip by timestamp from the collapsed group at static/sessions.js:1230.)
Race / lock analysis
- No locks needed — read-only SQLite SELECT. SQLite's default rollback journal mode supports multi-reader. Modern Python ships SQLite 3.37+ with multi-reader concurrency.
- Concurrent
/api/sessionsrequests: each handler thread opens its own connection. SQLite supports concurrent readers. - State.db writes from agent CLI: SELECT may briefly block if writer holds an exclusive lock during a transaction commit. Bounded wait, no deadlock. Acceptable.
- No mutation of shared dicts. ✅
Security audit
- ✅ Parameterized SQL at lines 297-300 —
?placeholders for every session_id; no string interpolation of user data. - ✅ Path source:
_active_state_db_path()readsHERMES_HOMEenv or active profile config — trusted, not user input. - ✅ Cycle protection:
seenset at line 343 + 20-hop cap at line 287 → bounded query count regardless of DB pathology. - ✅ Schema-injection-proof: PRAGMA only checks column names; doesn't pass them to SQL.
- ✅ No new endpoints, no new env vars, no config changes.
- ✅
Path(db_path)+db_path.exists(): no traversal surface (path is internal).
Edge-case matrix
| Scenario | Expected | Actual |
|---|---|---|
Empty wanted set |
Return {} |
✅ short-circuit at line 277 |
state.db missing |
Return {} |
✅ short-circuit at line 278 |
Schema missing parent_session_id/end_reason |
Return {} (older agent) |
✅ guard at lines 285-286 |
| 4-segment compression chain (root→seg2→seg3→tip) | root_id=root, segment_count=4 | ✅ test passes |
| Single session, no parent | No lineage keys exposed | ✅ root_id == sid → not set |
Parent exists, end_reason=user_stop |
parent_session_id exposed; no _lineage_root_id |
✅ test passes |
Parent exists, end_reason=cli_close (cross-surface) |
Lineage extends through boundary | ✅ test passes |
| Orphan parent (not in DB) | Suppress parent_session_id |
✅ test passes; bot's pre-release fix |
| Cycle A↔B, both compression | Walk terminates within bounded count | ✅ test passes (worker thread + 2s watchdog) |
1000+ sessions in wanted |
~0.2ms per query, indexed | ✅ bot's perf measurement |
| 5000+ sessions | Linear scaling, ~1ms total | ✅ confirmed |
? placeholder limit |
SQLite 3.37+ supports 32766 |
Tests
- Targeted tests:
tests/test_pr1370_lineage_metadata_perf_and_orphan.py— 5/5 pass (perf invariant via SQL interception, orphan suppression, cycle termination viathreading.Eventwatchdog, 4-segment chain root, non-compression boundary).tests/test_session_lineage_metadata_api.py— 3/3 pass (end-to-end throughall_sessions(), non-compression boundary, cli_close cross-surface).tests/test_gateway_sync.py— 24/24 pass (lineage assertions added totest_compression_chain_collapses_to_latest_tip_in_sidebar).
- Full suite: 3415 passed, 54 skipped, 3 xpassed, 0 failed in 16.22s on
571cfed. - CI: still pending at fetch time (3.11/3.12/3.13). Mirrors the local result.
Minor observations (non-blocking)
- No batching for very large
wantedsets: at the first hop,wantedmay have N session_ids → query has N placeholders. SQLite 3.37+ defaultSQLITE_LIMIT_VARIABLE_NUMBERis 32766; older bundled SQLite (some macOS, very old Linux distros) is 999. A user with >999 sessions on legacy Python could hit "too many SQL variables." Edge case; would manifest as the function returning{}(caught by the broadexcept Exception). Acceptable graceful degradation but worth a follow-up to chunk into batches of e.g. 500 if it ever becomes a real issue. with sqlite3.connect()doesn't close the connection — only commits/rolls back. The connection's underlying file handle is closed when the connection object is garbage-collected. For a hot path on many sessions this could leave stale FDs briefly. Minor; SQLite reuses cached pages within a process anyway._lineage_tip_idis set only byread_importable_agent_session_rows, not by the new function. WebUI JSON sessions never get this field. The frontend picks the latest tip by timestamp at static/sessions.js:1230 so this doesn't break collapse — just an inconsistency in the API shape. Worth a note in the API docs if those exist.- PR description claims 3467 passed; my local run shows 3415. Consistent counting drift with prior batch releases — both runs include all new tests passing.
'cli_close'end_reason is hardcoded once in upstream cli.py at line 11424 (in a single shutdown path). If a user's CLI quits via a different path (signal, crash, kill),end_reasonmay not be set, and the chain won't extend. The user would see two separate sidebar rows for what's logically one continuation. Acceptable — only the explicit clean-shutdown path is treated as a continuation.
Recommendation
✅ Approved. Both pre-release fixes verified: parameterized indexed query is the right shape (50× faster confirmed via the bot's measurements), orphan suppression closes the never-collapsing-single-row UX bug. Cross-tool clean — read-only SQLite SELECT, schema-tolerant, depth-bounded, cycle-safe. The 5 regression tests pin the invariants in source.
Parked at approval — ready for the release agent's merge/tag pipeline.
Manual WebUI cron runs previously called cron.scheduler.run_job(job) and then only cleared the in-memory running flag. That meant output could be dropped and job metadata like last_run_at / last_status was not updated after a manual run. This PR matches the scheduled cron path (cron/scheduler.py:1334-1364) exactly: - Save manual-run output via save_job_output - Mark manual runs complete via mark_job_run - Treat empty final_response as a soft failure with the same error string as the scheduled path - Record manual-run failures in job metadata via mark_job_run(False) - Keep _run_cron_tracked self-contained for worker-thread execution Includes 2 behavioral regression tests using monkeypatch.setitem on sys.modules to mock cron.scheduler.run_job + cron.jobs helpers — the right test pattern (exercises the real _run_cron_tracked code path). Split out from #1352 (the larger profile-aware-cron-panel PR that's on hold) per pre-release-review feedback. Self-contained, doesn't touch the held PR's profile-filtering scope. Co-authored-by: NocGeek <NocGeek@users.noreply.github.com>
Opus pre-release findings on #1370 applied: SHOULD-FIX 1: Tightened parent_session_id exposure to only emit when the parent's end_reason is in {compression, cli_close}. Without this, two distinct WebUI sessions sharing a non-continuation parent (e.g. 'user_stop') would get clustered by frontend's _sessionLineageKey (which falls through to parent_session_id when _lineage_root_id is missing) and incorrectly collapsed into a single sidebar row. Updated assertions in: - tests/test_session_lineage_metadata_api.py:: test_non_compression_state_db_parent_does_not_create_sidebar_lineage - tests/test_pr1370_lineage_metadata_perf_and_orphan.py:: test_non_compression_parent_does_not_extend_lineage SHOULD-FIX 2: Chunked the IN-clause to 500 vars to stay under SQLITE_MAX_VARIABLE_NUMBER. Python 3.9 ships sqlite 3.31 with the default limit of 999. A power user with 2000+ sessions in the sidebar would hit OperationalError, the silent except-wrapper would swallow it, and lineage collapse would never work. Added test_in_clause_chunked_for_large_session_set with SQL interception to lock the invariant in source. PR addition (per user directive — Opus + my review, no second independent review round needed for combined batch): #1372 from @NocGeek — fix: persist manual cron run results. Self-contained 89 LOC fix split out from the held #1352. Mirrors the scheduled-cron path (cron/scheduler.py:1334-1364) exactly: saves output, marks job complete, treats empty response as soft failure with matching error string. 2 behavioral tests using sys.modules monkeypatch to mock cron.scheduler.run_job. CI not yet attached because branch is brand-new; ran the new tests + adjacent suites locally — all pass. Final test count: 3471 passing, 0 failed. Also adds 2 more regression tests for the perf-fix invariants: - test_in_clause_chunked_for_large_session_set - test_two_children_sharing_non_continuation_parent_not_collapsed
Update: 2 substantive commits added since approvalPer the late-substantive-commits-invalidate-approval rule in What changed
🟨 SHOULD-FIX 1 (Opus): tightened This changes a behavior your edge-case matrix specifically traced and approved ("Parent exists, end_reason= 🟨 SHOULD-FIX 2 (Opus): chunked the IN-clause to 500 vars to stay under Plus a 7th regression test ( Tests
What I want from youThe cleanest path: re-review If you'd rather defer SHOULD-FIX 1 to a follow-up: I can revert Branch state: |
…p/Cancel (#1375) Three distinct data-loss paths fixed: §A — Reasoning text was accumulated in a thread-local _reasoning_text inside _run_agent_streaming. cancel_stream() never saw it because it went out of scope when the thread was interrupted. Now mirrored to a new shared dict STREAM_REASONING_TEXT keyed by stream_id, populated in on_reasoning() and the reasoning branch of on_tool(), read in cancel_stream(). §B — Live tool calls in thread-local _live_tool_calls were similarly invisible to cancel_stream(). Now mirrored to STREAM_LIVE_TOOL_CALLS on tool.started + tool.completed. §C — Reasoning-only streams produced no partial message because the thinking-block regex strip returned empty string and the `if _stripped:` guard skipped the append. Now appends the partial message when EITHER content text, reasoning trace, OR tool calls exist. Mirrors the existing STREAM_PARTIAL_TEXT pattern from #893 exactly: same dict creation in _run_agent_streaming, same _live_config fallback in cancel_stream, same cleanup in _periodic_checkpoint. 8 regression tests in tests/test_issue1361_cancel_data_loss.py covering all three sections plus tools+text combinations. Co-authored-by: bergeouss <bergeouss@users.noreply.github.com>
… yet (#1373) When a user switched profiles and created a new session, the session was saved to the default profile directory instead of the active profile directory — because get_hermes_home_for_profile() silently fell back to _DEFAULT_HERMES_HOME when the profile directory didn't exist yet on disk. Root cause: api/profiles.py:156 had `if profile_dir.is_dir(): return profile_dir; return _DEFAULT_HERMES_HOME`. New profiles (no session yet, so no dir) routed every session back to default. Fix: remove the is_dir() guard, return the profile path unconditionally. The profile directory is created on first use by the agent/session layer. 5 regression tests in tests/test_issue1195_session_profile_routing.py: existing-profile, non-existent-profile (the core fix), None, empty- string, 'default' all return the expected path. Co-authored-by: bergeouss <bergeouss@users.noreply.github.com>
…r change Adds two more contributor PRs to the v0.50.251 batch per user directive (per-PR review + Opus review for #1373; #1375 was clean ship-on-sight). #1375 (@bergeouss, +382 LOC, all CI green) — fixes #1361 paid-token data loss on Stop/Cancel. Mirrors the existing STREAM_PARTIAL_TEXT pattern from #893: adds STREAM_REASONING_TEXT and STREAM_LIVE_TOOL_CALLS shared dicts populated during streaming and read by cancel_stream(). Also fixes the §C reasoning-only-creates-no-message gap where the strip-thinking-blocks regex returned empty string and the if-guard skipped the partial append. 8 regression tests covering all 3 sections plus tools+text combinations. #1373 (@bergeouss, +105 LOC, had CI failures pre-fix) — fixes #1195 new-profile-routes-to-default. The is_dir() guard in get_hermes_home_for_profile() caused new profiles (no session yet) to silently route every session back to the default profile until the directory existed on disk. Removed the guard; profile path is now returned unconditionally. Pre-release fix for #1373's CI failures: the change flipped two behaviors pinned by tests in #798: - R19c (test_get_hermes_home_for_profile_falls_back_for_missing_profile) asserted nonexistent → base. Renamed and updated to assert the new always-return-profile-path behavior. - R19j (test_get_hermes_home_for_profile_rejects_path_traversal) asserted that valid-but-nonexistent profile names → base. Updated to assert profile-scoped path. Also updated docstring: the _PROFILE_ID_RE regex is now the SOLE defense against path traversal (previously is_dir() was a defense-in-depth layer); verified each known-bad shape still returns base. Tests: 3484 passing (3471 → 3484, +13).
nesquena
left a comment
There was a problem hiding this comment.
Round 2 Review — end-to-end ✅ (clean approve, both my round-1 SHOULD-FIX flags addressed + 3 new PRs traced)
Picking back up from my round-1 review of #1371. Since then the bundle has grown by 4 commits and now ships 5 PRs instead of 1. Both round-1 SHOULD-FIX items I flagged are correctly addressed; the 3 newly-bundled PRs trace clean.
Squash audit (what's new since round 1)
571cfed release: v0.50.251 (round 1 — already approved by me)
89dcab8 #1372 fix: persist manual cron run results @NocGeek (already approved on standalone PR)
63251ad release: apply Opus SHOULD-FIX 1+2 + add #1372
c5f4f56 #1375 fix(#1361): preserve reasoning, tool calls, partial output @bergeouss
f14280e #1373 fix(#1195): route sessions to profile dir even when missing @bergeouss
d071e46 release: add #1373 + #1375; fix R19c/R19j contracts
All contributor commits preserve Author: + Co-authored-by: trailers. ✅
My round-1 SHOULD-FIX items — both fixed ✅
SHOULD-FIX 1 (parent_session_id leak when end_reason is non-continuation) — round-1 review flagged that the original orphan-only guard would still expose parent_session_id for two distinct WebUI sessions sharing a user_stop parent, letting the frontend's #1358 _sessionLineageKey fall-through cluster them into one collapsed sidebar row.
Fix verified at api/agent_sessions.py:336-352: exposure is now gated on parent_row.get('end_reason') in {'compression', 'cli_close'}. Subsumes the orphan check (orphan parents have no row, so parent_row is None → don't expose). New test test_two_children_sharing_non_continuation_parent_not_collapsed and updated test_non_compression_parent_does_not_extend_lineage lock the contract.
SHOULD-FIX 2 (? placeholder limit) — round-1 review noted that on Python 3.9 (sqlite 3.31, default SQLITE_MAX_VARIABLE_NUMBER=999), a power user with 2000+ sessions in the sidebar would hit OperationalError: too many SQL variables, the broad except would swallow it, and lineage collapse would silently disable forever.
Fix verified at api/agent_sessions.py:308-319: IN clause is now chunked into batches of 500 placeholders. Test test_in_clause_chunked_for_large_session_set intercepts every SQL query at runtime and asserts no IN-clause has > 999 vars, primed with 1500 unrelated session rows. ✅
Newly-bundled PRs traced
PR #1372 — manual cron persistence (@NocGeek)
Already approved on its standalone PR (#1372). Identical content squashed into this batch. No drift.
PR #1375 — preserve reasoning, tool_calls, partial output on Stop/Cancel (@bergeouss)
Three data-loss paths fixed via two new shared dicts mirroring the existing STREAM_PARTIAL_TEXT pattern from #893:
STREAM_REASONING_TEXTat api/config.py:2118 — accumulated byon_reasoningcallback at api/streaming.py:1581-1583 and the reasoning branch ofon_toolat api/streaming.py:1616-1618. Read bycancel_streamat api/streaming.py:2662-2666.STREAM_LIVE_TOOL_CALLSat api/config.py:2119 — accumulated ontool.startedat api/streaming.py:1635-1641 and updated ontool.completedat api/streaming.py:1670-1678. Read bycancel_streamat api/streaming.py:2667-2671.- §C (reasoning-only message creation) at api/streaming.py:2768-2786: the pre-fix
if _stripped:guard skipped append when the partial-text strip returned empty (reasoning-only stream). Now appends if ANY of_stripped/_has_reasoning/_has_toolsare truthy.
Race / lock analysis:
STREAM_REASONING_TEXT[stream_id] += str(text): not atomic on its own but the dict-item assignment is. Each stream_id is written by exactly one agent worker thread, so no concurrent writer. The HTTP cancel handler reads underSTREAMS_LOCK, agent writer doesn't hold it — same pattern asSTREAM_PARTIAL_TEXT. Reads see either the pre-write or post-write value; no torn strings (CPython dict-item swap is atomic). ✅ Mirrors the existing #893 pattern exactly.STREAM_LIVE_TOOL_CALLS[stream_id].append(...): list append is atomic in CPython (GIL). Cancel reads vialist(_cancel_tool_calls)to copy — may miss a concurrent append, acceptable. ✅- Cleanup at api/streaming.py:2469-2473 pops both dicts in the finally block. ✅
Cross-tool consistency for the partial tool_calls shape:
- The shared dict stores
{name, args, done}— webui-display shape, NOT OpenAI/Anthropic API shape (noidfield, nofunction.{name,arguments}wrapper). - I traced whether this could break the next-turn API call. It does NOT, because:
- Cancel only mutates
_cs.messages(UI transcript) at api/streaming.py:2786. It does NOT mutate_cs.context_messages(model-facing history). - The next turn calls
_session_context_messages(s)at api/streaming.py:1108-1113, which returnss.context_messagesif set (preferred) or falls back tos.messages. Since the previous successful turn sets.context_messages = _next_context_messagesat api/streaming.py:2001,context_messagesis populated and the partial UI message is not in the model-facing history. - Net: malformed tool_calls live in the UI transcript only; they're never serialized to the provider API. ✅
- Cancel only mutates
- This is a subtle invariant. Worth documenting that the partial message's tool_calls are display-only — see Minor observations.
8 regression tests in test_issue1361_cancel_data_loss.py. All pass.
PR #1373 — route sessions to profile dir even when dir doesn't exist (@bergeouss)
Single-line fix at api/profiles.py:153-156: removed the is_dir() fallback that silently routed new profiles to default until the dir existed.
Pre-fix R19j concern: was is_dir() part of the path-traversal defense? Verified by reading api/profiles.py:153:
if not name or name == 'default' or not _PROFILE_ID_RE.match(name):
return _DEFAULT_HERMES_HOME_PROFILE_ID_RE = re.compile(r'^[a-z0-9][a-z0-9_-]{0,63}$') — only [a-z0-9_-] allowed. No ., /, or absolute paths can pass. ✅ The regex is sufficient defense; is_dir() was redundant. The bot's R19j update at tests/test_issue798.py:185-194 explicitly verifies each known-bad shape ('../../etc', '/absolute/path', 'has spaces', 'UPPERCASE') returns base, AND that valid names now route to profile-scoped path.
R19c was correctly updated: test_get_hermes_home_for_profile_returns_profile_path_for_missing_profile now asserts result == tmp_path / 'profiles' / 'ghost' (the new behavior). The old assertion result == tmp_path (fallback to base) is correctly removed.
5 new regression tests in test_issue1195_session_profile_routing.py. All pass.
End-to-end traces — round 2 additions
#1375 cancel-data-loss flow:
- User sends message →
_run_agent_streamingallocatesSTREAM_REASONING_TEXT[stream_id] = ''andSTREAM_LIVE_TOOL_CALLS[stream_id] = []at api/streaming.py:1376-1378. - Agent starts emitting reasoning →
on_reasoningcallback bumps both_reasoning_text(thread-local) ANDSTREAM_REASONING_TEXT[stream_id](shared). - Agent calls a tool →
on_tool(event_type='tool.started') appends to_live_tool_calls(thread-local) ANDSTREAM_LIVE_TOOL_CALLS[stream_id](shared, withdone=False). - User clicks Cancel → POST
/api/chat/cancel→cancel_stream(stream_id). - Inside
STREAMS_LOCK: capture_cancel_partial_text,_cancel_reasoning,_cancel_tool_callsfrom shared dicts (with_live_configre-read fallback for hot-reload scenarios). - Outside lock, under
_agent_lock: build_partial_msgif any of stripped text / reasoning / tools is non-empty. Append to_cs.messages. Append cancel marker with_error=True. _cs.save().- Display: UI shows the partial message including reasoning trace and tool_calls. Next turn:
context_messagesis intact from previous successful turn; partial message is NOT replayed to the API.
#1373 profile routing flow:
- User switches to profile
alice(newly created, no session yet). - Creates a new session →
new_session(profile='alice')→s.profile = 'alice'. - Save path resolves via
get_hermes_home_for_profile('alice')at api/profiles.py:155-156 →_DEFAULT_HERMES_HOME / 'profiles' / 'alice'(regardless of whether the dir exists yet). - Pre-fix: this path was rejected because the dir didn't exist; fallback to
_DEFAULT_HERMES_HOME. Session saved to default profile dir, isolation broken. - Post-fix: profile path returned unconditionally. Agent/session layer creates the dir on first save.
Edge-case matrix (round 2 additions)
| Scenario | Pre-fix | Post-fix |
|---|---|---|
| 2000+ sessions in sidebar on Python 3.9 / SQLite 3.31 | OperationalError swallowed → no lineage | Chunked to 500/IN — no error ✅ |
Two WebUI sessions share user_stop parent |
parent_session_id exposed → frontend collapses both into one row |
parent_session_id suppressed → both render flat ✅ |
| Cancel mid-reasoning (no visible tokens) | No partial message saved → reasoning lost | Partial message with reasoning field saved ✅ |
| Cancel mid-tool-call (tool started, not done) | Tool call lost from session | tool_calls saved (display-only) ✅ |
| Cancel with reasoning + tools + text | Only text saved | All three saved ✅ |
| Tool_calls in partial message replayed in next turn API call | n/a | Not replayed — context_messages (model-facing) unchanged by cancel; partial is display-only in messages ✅ |
| New profile, no session yet | Session saved to default profile dir | Session saved to profile dir ✅ |
Path traversal in profile name ('../../etc') |
Regex blocks → returns base | Regex blocks → returns base ✅ |
| Manual cron run output | Discarded | Saved + job marked ✅ |
Other audit — confirmed correct
- ✅ Auth gate unchanged — all changes are inside
handle_get/handler scope already gated bycheck_auth. - ✅ No XSS surface — no innerHTML / template literal additions; webui-only data fields.
- ✅ No new endpoints — only enriches existing
/api/sessionspayload, internalcancel_stream, internal_run_cron_tracked. - ✅ No
config.yamlwrites — read-only onstate.db. - ✅ Cross-tool:
_lineage_root_id/_lineage_tip_id/_compression_segment_count/parent_session_idare webui-only display keys. CLI doesn't read them. Verified in upstreamtools/,cron/,agent/,hermes_cli/. - ✅ Backwards compat — Sessions without lineage metadata, profiles without dirs, cancels without reasoning/tool_calls all degrade gracefully.
Tests
- Targeted (round 2 additions):
test_pr1370_lineage_metadata_perf_and_orphan.py— 7/7 pass (5 from round 1 + 2 new for SHOULD-FIX 1+2).test_session_lineage_metadata_api.py— 3/3 pass (updated for tightened parent_session_id semantics).test_issue1361_cancel_data_loss.py— 8/8 pass (3 §A + 2 §B + 3 §C).test_issue1195_session_profile_routing.py— 5/5 pass.test_issue798.py(R19c/R19j updated) — 20/20 pass.test_cron_manual_run_persistence.py— 2/2 pass.
- Full suite: 3432 passed, 54 skipped, 3 xpassed, 0 failed in 17.48s on
d071e46. - CI: 3.11/3.12/3.13 all green.
(PR description claims 3484 — mine is 3432; consistent counting drift across batch releases.)
Minor observations (non-blocking)
- Partial message tool_calls shape is webui-only: the
{name, args, done}shape stored on partial messages won't replay to the provider API because cancel doesn't mutates.context_messages. This is a subtle invariant worth documenting in the cancel block — a future refactor that decides to mutatecontext_messagesfrom cancel (or that switches_session_context_messagesto prefers.messages) would silently break the API. Adding a comment near api/streaming.py:2782-2785 saying "tool_calls saved here are display-only; do NOT add this message to context_messages" would lock the contract in source. - Regression test for the next-turn invariant. A test that calls
_session_context_messages(s)after a cancel-with-tool_calls and asserts the partial isn't in the result would lock the invariant in source. - Reasoning trace replayed in conversation_history? The partial message has
reasoningset, but_API_SAFE_MSG_KEYS = {'role', 'content', 'tool_calls', 'tool_call_id', 'name', 'refusal'}doesn't includereasoningso it's stripped. Even if the partial message somehow ended up incontext_messages, the reasoning field would be dropped. ✅ - Test count drift — PR description says 3484, my run says 3432. Same pattern as prior releases. Not a defect.
is_dir()removal as a bug-fix vs defense-in-depth tradeoff — for #1373: the regex is the SOLE path-traversal guard now. R19j explicitly verifies each known-bad shape. If a future commit weakens_PROFILE_ID_RE, traversal would become possible. The regex is conservative (^[a-z0-9][a-z0-9_-]{0,63}$), so this is theoretical.
Recommendation
✅ Approved (round 2). Both my round-1 SHOULD-FIX items correctly addressed with regression tests that lock the new contracts. The 3 newly-bundled PRs (#1372, #1373, #1375) all trace clean — #1372 was already approved standalone; #1373's path-traversal protection survives the simplification (regex is sufficient); #1375's malformed-tool_calls concern resolves once you trace that context_messages (model-facing) is not mutated by cancel.
Parked at approval — ready for the release agent's merge/tag pipeline.
…ST-FIX) Opus pass-2 review of v0.50.251 caught a critical regression in PR #1375: The cancel-partial message stored captured tool calls under the 'tool_calls' key. That key is whitelisted by _API_SAFE_MSG_KEYS so _sanitize_messages_for_api forwarded the entries to the next-turn LLM call. But the captured entries use the WebUI internal shape ({name, args, done, duration, is_error}) — they don't have the OpenAI/Anthropic id + function: {name, arguments} envelope. Strict providers (OpenAI, Anthropic, Z.AI/GLM) would 400 on the malformed entries. Net effect: the very cancel-then-continue scenario PR #1375 aimed to improve becomes a hard fail. Fix: - Rename the persisted key to '_partial_tool_calls' (underscore- prefixed private key NOT in _API_SAFE_MSG_KEYS, so sanitize correctly strips it). - Update static/messages.js hasMessageToolMetadata check to also recognize _partial_tool_calls for UI rendering. - Update test_issue1361_cancel_data_loss.py assertion to check _partial_tool_calls (and tool_calls as legacy fallback). Plus 2 NIT fixes from the same Opus review: NIT 1 (api/profiles.py:153): re.match → re.fullmatch for consistency with other _PROFILE_ID_RE callers in the codebase. The trailing- newline footgun ($ matches before final \n in re.match) is now closed. Without #1373's is_dir() guard, a name like 'valid\n' would have created a directory named 'valid\n' on Linux. Doesn't escape <HERMES_HOME>/profiles/ via Path joining, but unintended. NIT 2 (test_issue798.py): R19j coverage gaps — added trailing- newline tests, length-boundary tests (64-char valid, 65-char rejected), single-char minimum, and non-ASCII / Unicode-trick tests. New regression test (tests/test_pr1375_partial_tool_calls_sanitize.py): - test_partial_tool_calls_field_not_forwarded_to_llm: pins that sanitize-for-API strips _partial_tool_calls + reasoning + does NOT have tool_calls on a partial message - test_legitimate_tool_calls_are_preserved_for_completed_turns: pins that real OpenAI-shape tool_calls on completed turns survive sanitize unchanged Tests: 3486 passing (3484 → 3486, +2 sanitize tests).
Update: 2 more PRs added + Opus pass-2 catches MUST-FIX in #1375Per the user's directive to evaluate all new PRs for the merge queue, two more PRs added to v0.50.251 since the last comment. PRs added since last update
Opus pass-2 review caught MUST-FIX in #1375After adding both PRs to the batch, I ran a second Opus pre-release pass focused on the additions. Opus caught a real correctness regression in #1375 that I'd missed: 🟥 MUST-FIX — Fix at 2 NITs from the same Opus pass also fixed🟦 NIT 1 ( 🟦 NIT 2 ( #1374 put on hold@NocGeek's #1374 (profile-aware cron panel resubmit of held #1352) is the architectural piece that needs careful review, not a rushed batch add. Posted detailed hold comment with 3 specific architectural coordination questions and a "what I'd want to see before un-holding" list. Final branch state on
|
nesquena
left a comment
There was a problem hiding this comment.
Round 3 Review — end-to-end ✅ (clean approve, my round-2 observation hardened into a MUST-FIX)
Single follow-up commit f53556b since round 2. The bot picked up my round-2 observation about the malformed tool_calls shape on partial messages and elevated it to a MUST-FIX with a hardened fix — exactly the right call. Plus two NIT fixes I would have flagged on a third pass.
Squash audit
571cfed release: v0.50.251 (round 1 — already approved)
89dcab8 #1372 fix: persist manual cron run results
63251ad release: apply Opus SHOULD-FIX 1+2 + add #1372
c5f4f56 #1375 fix(#1361): preserve reasoning, tool calls, partial output
f14280e #1373 fix(#1195): route sessions to profile dir
d071e46 release: add #1373 + #1375; fix R19c/R19j contracts (round 2 — already approved)
f53556b release: rename tool_calls → _partial_tool_calls + re.fullmatch + R19j coverage ← NEW
Round-2 observation now MUST-FIX'd ✅
In round 2 I flagged this as a non-blocking observation:
Partial message tool_calls shape is webui-only: the
{name, args, done}shape stored on partial messages won't replay to the provider API because cancel doesn't mutates.context_messages. This is a subtle invariant worth documenting.
But on closer trace, the invariant only holds when s.context_messages is populated. Looking at api/streaming.py:1108-1113:
def _session_context_messages(session):
context_messages = getattr(session, 'context_messages', None)
if isinstance(context_messages, list) and context_messages:
return context_messages
return session.messages or []The fallback to session.messages fires when context_messages is missing or empty — which is the case for sessions where the FIRST turn was cancelled (no successful prior turn ever populated context_messages). In that fallback path, session.messages includes the partial message, and _sanitize_messages_for_api would have whitelisted tool_calls and forwarded the malformed {name, args, done} entries to the provider — breaking the very next-turn-after-cancel path PR #1375 aimed to improve.
The bot caught this and applied the right fix at api/streaming.py:2785:
# Store under '_partial_tool_calls' (NOT 'tool_calls'). The captured entries use
# the WebUI internal shape {name, args, done, duration, is_error} — they do NOT
# carry the OpenAI/Anthropic API id + function: {name, arguments} envelope. If we
# put them under 'tool_calls', _sanitize_messages_for_api would forward them to
# the next-turn LLM call and strict providers would 400 on the malformed entries.
# The underscore-prefixed key is not in _API_SAFE_MSG_KEYS so sanitize strips it.
_partial_msg['_partial_tool_calls'] = list(_cancel_tool_calls)This is the textbook fix — make the structural shape itself enforce the invariant, not a comment. Since _API_SAFE_MSG_KEYS = {'role', 'content', 'tool_calls', 'tool_call_id', 'name', 'refusal'} doesn't include _partial_tool_calls, the dict comprehension at api/streaming.py:1020 drops the field on every sanitize call. No subtle context_messages-vs-messages reasoning required.
Frontend update at static/messages.js:1075-1081 recognizes both tool_calls and _partial_tool_calls for UI rendering — display continuity preserved.
The new regression test test_pr1375_partial_tool_calls_sanitize.py pins both invariants:
_partial_tool_callsANDtool_callsANDreasoningare all stripped from a partial message after sanitize.- Real OpenAI-shape
tool_calls(withid+function: {name, arguments}envelope) survive sanitize on completed turns.
NIT 1 — re.match → re.fullmatch for _PROFILE_ID_RE
The bot caught a real footgun introduced by removing the is_dir() guard in #1373. re.match(r'^[a-z0-9][a-z0-9_-]{0,63}$', name):
^anchors at start ✅$anchors at end OR before a final newline (per Python regex docs)- So
'valid\n'would have matched, and withoutis_dir()as a defense layer,Path(_DEFAULT_HERMES_HOME) / 'profiles' / 'valid\n'would have created a directory with a literal newline character on Linux. Not a directory escape (Path joining is sandboxed), but unintended.
Confirmed by harness:
'valid' match=True fullmatch=True
'valid\n' match=True fullmatch=False ← the footgun
'a\n' match=True fullmatch=False
'a\nb' match=False fullmatch=False
'../../etc' match=False fullmatch=False
Fix verified at api/profiles.py:153: re.match → re.fullmatch. Trailing newline now correctly returns base.
NIT 2 — Expanded R19j coverage
tests/test_issue798.py:194-208 now covers:
- Trailing newline (
'valid\n','a\n') → base. - Length boundaries: 64 chars (max valid) → profile path; 65 chars → base.
- Single-char minimum (
'a') → profile path. - Non-ASCII Unicode-trick names (
'voilà','名前') → base.
These are exactly the boundary conditions a future weakening of _PROFILE_ID_RE would silently regress. Locked in.
Other audit — what I checked
- ✅ Field rename is complete: searched for all reads/writes of
tool_callson_partial=Truemessages. The only producer is the cancel-stream path (now writes_partial_tool_calls); the only consumer is the frontend UI (now reads both keys). No orphan readers. - ✅ Backwards compat: any existing partial messages on disk with the old
tool_callsfield are display-only — sanitize strips them safely as long as orphan-detection runs correctly. Verified_sanitize_messages_for_api's orphan-detection path: tool_calls withoutidaren't added tovalid_tool_call_ids, so any tool result that was somehow paired would be dropped as orphan. The malformed assistanttool_callswould still pass through (since orphan detection only filters tool messages, not assistant tool_calls). However: PR #1375 was just shipped in round 2 and never reached production, so there are no historical session files with the old field name. The field rename effectively eliminates the failure mode entirely going forward. No migration needed. - ✅
_API_SAFE_MSG_KEYSset unchanged at api/streaming.py:67 —_partial_tool_callscorrectly absent from the whitelist. - ✅ Frontend recognizes both keys at static/messages.js:1075-1081. The check
hasTc || hasPartialTc || hasTunow triggers tool-card rendering on partial messages too. - ✅ No additional reads of
_partial_tool_callselsewhere needed — display path already handlesm.tool_callsfor legacy completed-turn messages and nowm._partial_tool_callsfor partial messages via the same code path.
Edge-case matrix (round 3 additions)
| Scenario | Round 2 behavior | Round 3 behavior |
|---|---|---|
Cancel mid-tool, then send next message (with context_messages populated) |
Partial tool_calls saved on s.messages only; not in context_messages; API doesn't see them |
Same — but now structurally enforced via field rename ✅ |
Cancel on FIRST turn (no context_messages yet) |
Fallback to session.messages → malformed tool_calls forwarded → API 400 |
_partial_tool_calls stripped by sanitize → no malformed entries forwarded ✅ |
Profile name with trailing newline ('valid\n') |
re.match accepts → directory with \n created on Linux |
re.fullmatch rejects → returns base ✅ |
| Profile name 64 chars (max valid) | Same as round 2 | Tested explicitly ✅ |
| Profile name 65 chars | Same as round 2 | Tested explicitly → base ✅ |
Real tool_calls on completed turn |
Pass through | Pass through ✅ (preservation test) |
| Frontend UI rendering of partial tool_calls | UI reads m.tool_calls (worked) |
UI reads m._partial_tool_calls OR m.tool_calls ✅ |
End-to-end trace — partial-tool-calls flow (post-rename)
- User starts message → stream begins → tool starts →
STREAM_LIVE_TOOL_CALLS[stream_id].append({...webui-shape...}). - User clicks Cancel →
cancel_streamreads from shared dict. - Builds
_partial_msgand stores tools at api/streaming.py:2785 under_partial_tool_callskey. - Appends to
_cs.messagesand_cs.save(). - Display: frontend's
attachLiveStreamat static/messages.js:1075-1081 renders the tool cards via thehasPartialTcbranch. - Next turn:
_session_context_messages(s)returns either:s.context_messages(preferred) → partial msg never persisted there; safe.s.messages(fallback when context_messages empty) → partial msg present BUT_sanitize_messages_for_apistrips_partial_tool_calls(not in whitelist) → API gets a clean{role: 'assistant', content: ''}(or with text content) → no 400.
Tests
- New regression test:
test_pr1375_partial_tool_calls_sanitize.py— 2/2 pass (sanitize-strip + completed-turn preservation). - Updated existing test:
test_issue1361_cancel_data_loss.py::test_cancel_with_tool_calls_preserves_toolschecks_partial_tool_calls OR tool_calls OR tools— backwards-compatible assertion. - Updated R19j coverage:
test_issue798.py::test_get_hermes_home_for_profile_rejects_path_traversal— 20/20 pass intest_issue798.py(added 5 new boundary cases). - Full suite: 3434 passed, 54 skipped, 3 xpassed, 0 failed in 16.94s on
f53556b. - CI: 3.11 ✅ pass, 3.12 ✅ pass, 3.13 ✅ pass.
(PR description claims 3486; my count 3434 — same counting drift as prior rounds.)
Behavioural harness — confirmed re.match footgun
'valid' match=True fullmatch=True
'valid\n' match=True fullmatch=False ← rejected only by fullmatch
'a\n' match=True fullmatch=False
'a\nb' match=False fullmatch=False
'../../etc' match=False fullmatch=False
The fix is necessary — re.match would have allowed valid\n through.
Minor observations (non-blocking)
- The
_partial_tool_callsrename is a one-way door: future code that adds tool_call-related logic must consider this field. The comment at api/streaming.py:2783-2799 makes the rationale explicit, which is the right place for it. ✅ - No test for the cancel-on-first-turn → next-turn-API-call path specifically. The new
test_partial_tool_calls_field_not_forwarded_to_llmcovers the sanitize layer in isolation; an integration test that drives the full cancel→next-turn flow with a mocked agent would lock the contract end-to-end. Out of scope for this round; the unit-level test is sufficient given how mechanical the sanitize-key check is. - The two completed-turn tool_calls use
id+function: {name, arguments}— the test is OpenAI-shape only. Anthropic uses{type: 'tool_use', id, name, input}. Sanitize doesn't transform; it just keeps. So both shapes pass through unchanged. Acceptable.
Recommendation
✅ Approved (round 3). The bot turned my round-2 non-blocking observation into a hardened structural fix — exactly the right level of paranoia. The field rename eliminates the entire class of "what if context_messages falls back to session.messages" reasoning. Plus two real footgun fixes (re.match → fullmatch, expanded R19j coverage) caught on the same pass.
Parked at approval — ready for the release agent's merge/tag pipeline.
Shipped — v0.50.251 ✅Merge: What shipped (4 contributor PRs)
Pre-release fixes applied (5 total across 2 Opus passes)Opus pass-1:
Opus pass-2: Plus contract updates: R19c + R19j tests in Tests
Hygiene complete
Held PRs (NOT in this release, still on hold)8 PRs on hold per HARD BLOCK rule: #1134, #1222, #1272, #1311 (draft), #1315, #1342, #1353, #1374 Lessons captured
Three rounds of review, two Opus passes, 5 pre-release fixes (1 MUST-FIX + 2 SHOULD-FIX + 2 NIT). The kind of release where the contributors' code was strong but the integration surface caught real correctness bugs that only surface under careful end-to-end tracing. |
Release v0.50.251
Single-PR release — #1370 from @dso2ng (their 4th this week — 3 in v0.50.249 all clean). Builds on #1358's sidebar lineage collapse.
Constituent PR
Pre-release fixes applied
🟨 Perf: full table scan → indexed scoped query (commit
571cfed)The contributor's
read_session_lineage_metadatadidSELECT id, parent_session_id, end_reason FROM sessions(full table scan) on every call. Function runs on every sidebar refresh.Empirically measured on production state.db (1031 rows): cold-cache full scan = 446ms, warm-cache = 9ms. Linear scaling: at 5000 rows ~45ms warm, at 10K ~90ms.
Fix: parameterized
WHERE id IN (...)query that hits PRIMARY KEY + the existingidx_sessions_parentindex. Now ~0.2ms regardless of total row count (50× faster at 1000 rows). Depth-bounded to 20 hops to cap query count under pathological data.🟨 Orphan-parent guard
When a session row's
parent_session_idreferences a parent that doesn't exist instate.db(parent pruned/deleted), the original API exposed the dangling reference. #1358's frontend_sessionLineageKeyfalls through toparent_session_idwhen_lineage_root_idis missing — orphans would create never-collapsing single-row groups in the sidebar.Fix: only expose
parent_session_idwhen the parent actually exists in the rows we fetched. Reproduced the bug, applied the fix, verified.Regression suite (+5 tests)
tests/test_pr1370_lineage_metadata_perf_and_orphan.py:SELECT FROM sessionswithout aWHEREclause. Locks the perf invariant in source, catches future regressions.threading.Eventwatchdog (catches infinite loops).end_reasonstops the walk.Pre-release gate
Author: Dennis Soong+ Co-authored-by trailerproc_70cd56df11fb)Plan after approval
--merge(no squash — preserves contributor + release commits)v0.50.251, push tagwebui_qa_agent.sh/tmp/wt-v0.50.251Held PRs (NOT in this release)
Compatibility notes
/api/sessionsresponse shape — adds_lineage_root_id,_compression_segment_count,parent_session_idkeys to applicable rows. Existing consumers unaffected.