Skip to content

v0.50.259 — SessionDB FD-leak hotfix (#1421) + LRU-eviction Opus follow-up#1427

Merged
nesquena-hermes merged 4 commits into
masterfrom
stage-259
May 1, 2026
Merged

v0.50.259 — SessionDB FD-leak hotfix (#1421) + LRU-eviction Opus follow-up#1427
nesquena-hermes merged 4 commits into
masterfrom
stage-259

Conversation

@nesquena-hermes
Copy link
Copy Markdown
Collaborator

v0.50.259 Batch Release — SessionDB FD leak hotfix

Summary

Tiny focused FD-leak fix on top of v0.50.258. 1 PR plus 1 Opus pre-release follow-up that extends the fix to a sibling leak path.

Constituent PR

PR Author Summary Size
#1421 @wali-reheman Close previous SessionDB before replacing on cached agent — fixes WAL FD leak that crashes the server with EMFILE after ~73 messages +9/-0 (1f)

Pre-applied Opus follow-up

Same FD-leak shape on the LRU eviction pathSESSION_AGENT_CACHE.popitem(last=False) was discarding the evicted entry with evicted_sid, _ = .... The evicted agent's _session_db waited on GC finalization which on a long-running server may be never. Now captures the evicted entry, calls _evicted_agent._session_db.close() explicitly. Same shape as #1421's fix.

5 regression tests in test_v050259_sessiondb_fd_leak.py:

  • Source-level: cached-agent reuse path closes before replace (negative-pattern guard)
  • Source-level: LRU eviction path captures + closes evicted agent (negative-pattern + positive)
  • Behavioral: SessionDB.close() is idempotent (3 calls safe)
  • Behavioral: cached-agent reuse with mock — close called exactly once
  • Behavioral: LRU eviction with mock — only the evicted agent's DB closes

Tests

  • 3615 passed, 0 failed (master 3610, +5 new)
  • Browser tests + Phase 2 API sanity → ALL CHECKS PASSED
  • CI all green

Why ship this fast

  • 9-LOC contributor fix targeting a real production crash (empirical 73-handle leak at EMFILE matches exactly the message count)
  • SessionDB.close() is idempotent + thread-safe; double-close is a benign no-op
  • Opus pre-release pass extended the fix to the sibling leak path
  • 5 regression tests pin both source patterns and behavioral round-trips
  • Nathan explicitly authorized merge-as-small-release without independent review gate

What's NOT in this batch

Wali Reheman and others added 4 commits May 1, 2026 13:51
SessionDB WAL handles leak when streaming.py creates a new SessionDB
instance per request and replaces the cached agent's _session_db without
closing the old one. Each orphaned connection holds 2 FDs (.db +
.db-wal), causing FD exhaustion and EMFILE crashes after ~73 messages.

Fix: close the previous _session_db before replacing it on cached
agents, mirroring the close-before-replace pattern used elsewhere in the
codebase.
…tion + CHANGELOG + 5 regression tests

PR #1421 (SessionDB WAL handle leak fix on cached-agent reuse path) had a
sibling leak at the LRU eviction site that I caught during pre-review:

api/streaming.py SESSION_AGENT_CACHE.popitem(last=False) was discarding
the evicted entry with `evicted_sid, _ = ...`. The agent's _session_db
was dropped on the floor and only released when GC eventually finalized
the agent — which on a long-running server may be never (cyclic refs,
extension types holding C handles, etc.).

Same fix shape as #1421: capture the evicted entry, call
_evicted_agent._session_db.close() explicitly. SessionDB.close() is
idempotent + thread-safe (with self._lock: if self._conn:), so the
double-close-is-benign property still holds.

5 regression tests in test_v050259_sessiondb_fd_leak.py:
- Source-level: cached-agent reuse path closes before replace
- Source-level: LRU eviction path captures + closes evicted agent
- Behavioral: SessionDB.close() is idempotent (3 calls safe)
- Behavioral: cached-agent reuse with mock — close called exactly once
- Behavioral: LRU eviction with mock — only evicted agent's DB closes

Full suite: 3615 passed, 0 failed.

Nathan explicitly authorized 'just go ahead and merge it as a small release'
since the PR is 9 LOC, focused, has Opus pre-release follow-up + tests, and
matches the empirically-confirmed leak shape (73-handle leak at EMFILE).
…not on import path

CI-only failure: test_session_db_close_is_idempotent imported hermes_state
from /home/hermes/.hermes/hermes-agent which exists locally but NOT on the
GH Actions runner that only has the WebUI repo.

Use importlib.util.find_spec to detect availability and pytest.skip when
the agent repo isn't present. The source-level pin in
test_cached_agent_reuse_closes_old_session_db catches revert of the close()
call; the runtime idempotency test is added confirmation when both repos
are co-located.

Local: 5 passed. CI: 4 passed + 1 skipped (idempotency).
@nesquena-hermes nesquena-hermes merged commit c0d50b3 into master May 1, 2026
3 checks passed
@nesquena-hermes nesquena-hermes deleted the stage-259 branch May 1, 2026 22:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants