v0.50.259 — SessionDB FD-leak hotfix (#1421) + LRU-eviction Opus follow-up#1427
Merged
Conversation
SessionDB WAL handles leak when streaming.py creates a new SessionDB instance per request and replaces the cached agent's _session_db without closing the old one. Each orphaned connection holds 2 FDs (.db + .db-wal), causing FD exhaustion and EMFILE crashes after ~73 messages. Fix: close the previous _session_db before replacing it on cached agents, mirroring the close-before-replace pattern used elsewhere in the codebase.
…lacing on cached agent
…tion + CHANGELOG + 5 regression tests PR #1421 (SessionDB WAL handle leak fix on cached-agent reuse path) had a sibling leak at the LRU eviction site that I caught during pre-review: api/streaming.py SESSION_AGENT_CACHE.popitem(last=False) was discarding the evicted entry with `evicted_sid, _ = ...`. The agent's _session_db was dropped on the floor and only released when GC eventually finalized the agent — which on a long-running server may be never (cyclic refs, extension types holding C handles, etc.). Same fix shape as #1421: capture the evicted entry, call _evicted_agent._session_db.close() explicitly. SessionDB.close() is idempotent + thread-safe (with self._lock: if self._conn:), so the double-close-is-benign property still holds. 5 regression tests in test_v050259_sessiondb_fd_leak.py: - Source-level: cached-agent reuse path closes before replace - Source-level: LRU eviction path captures + closes evicted agent - Behavioral: SessionDB.close() is idempotent (3 calls safe) - Behavioral: cached-agent reuse with mock — close called exactly once - Behavioral: LRU eviction with mock — only evicted agent's DB closes Full suite: 3615 passed, 0 failed. Nathan explicitly authorized 'just go ahead and merge it as a small release' since the PR is 9 LOC, focused, has Opus pre-release follow-up + tests, and matches the empirically-confirmed leak shape (73-handle leak at EMFILE).
…not on import path CI-only failure: test_session_db_close_is_idempotent imported hermes_state from /home/hermes/.hermes/hermes-agent which exists locally but NOT on the GH Actions runner that only has the WebUI repo. Use importlib.util.find_spec to detect availability and pytest.skip when the agent repo isn't present. The source-level pin in test_cached_agent_reuse_closes_old_session_db catches revert of the close() call; the runtime idempotency test is added confirmation when both repos are co-located. Local: 5 passed. CI: 4 passed + 1 skipped (idempotency).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
v0.50.259 Batch Release — SessionDB FD leak hotfix
Summary
Tiny focused FD-leak fix on top of v0.50.258. 1 PR plus 1 Opus pre-release follow-up that extends the fix to a sibling leak path.
Constituent PR
SessionDBbefore replacing on cached agent — fixes WAL FD leak that crashes the server withEMFILEafter ~73 messagesPre-applied Opus follow-up
Same FD-leak shape on the LRU eviction path —
SESSION_AGENT_CACHE.popitem(last=False)was discarding the evicted entry withevicted_sid, _ = .... The evicted agent's_session_dbwaited on GC finalization which on a long-running server may be never. Now captures the evicted entry, calls_evicted_agent._session_db.close()explicitly. Same shape as #1421's fix.5 regression tests in
test_v050259_sessiondb_fd_leak.py:SessionDB.close()is idempotent (3 calls safe)Tests
Why ship this fast
SessionDB.close()is idempotent + thread-safe; double-close is a benign no-opWhat's NOT in this batch