fix: correct false non-interactive classification for large sessions#277
Conversation
… sessions Snippet-based enrichment reads 256KB (head+tail) from session files. When text user messages fall in the middle of a large file outside this window, the classifier sees ≤1 text user message and marks the session as non-interactive — hiding it from the default sidebar view. Add a fast byte-level scan fallback: when a truncated snippet yields isNonInteractive, scan the full file for the `"role":"user","content":"` pattern. If >1 match is found, override the classification. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c3b5499f89
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if (found > 1) return true | ||
| offset = idx + USER_TEXT_PATTERN.length | ||
| } | ||
| position += chunkSize |
There was a problem hiding this comment.
Avoid double-counting matches at chunk boundaries
The scanner reads chunkSize + pattern.length bytes but advances position by only chunkSize, so bytes at each 64 KiB boundary are scanned twice. If a single "role":"user","content":" sequence starts exactly at one of those boundaries, it is counted once in the previous window and again in the next, causing found > 1 and incorrectly reclassifying a truly non-interactive session as interactive.
Useful? React with 👍 / 👎.
When scanning in 64KB chunks with pattern-length overlap, a match at exactly the chunk boundary was counted in both the previous and current chunk. Skip byte 0 on non-first chunks since it was already the last scannable position in the previous chunk. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Summary
"role":"user","content":"patternsRoot cause
Session enrichment reads a 256KB snippet (128KB head + 128KB tail) for performance. The Claude provider counts user messages with text content (not tool_result arrays) and classifies sessions with ≤1 text user message as non-interactive. In sessions with heavy tool use, all text user messages can fall in the middle of the file outside the snippet window, causing false classification.
Test plan
scanFileForUserTextMessagescovering:🤖 Generated with Claude Code