fix(parsing): auto-select thread pool on Windows MCP stdio (closes #46, #136)#400
Open
niveku wants to merge 2 commits intotirth8205:mainfrom
Open
fix(parsing): auto-select thread pool on Windows MCP stdio (closes #46, #136)#400niveku wants to merge 2 commits intotirth8205:mainfrom
niveku wants to merge 2 commits intotirth8205:mainfrom
Conversation
ProcessPoolExecutor workers spawned from a stdio MCP server inherit the parent's pipe handles and leak as zombies after the pool closes (tirth8205#46, tirth8205#136). The previous asyncio.to_thread wrapper in main.py only avoided event-loop deadlock; the worker leak was unaddressed. Auto-detect Windows + non-TTY stdin (the MCP stdio profile) and switch to ThreadPoolExecutor. Tree-sitter parsers release the GIL during native parsing, so the speedup loss is small (<30% on full builds) and the trade is worth a stable, zombie-free MCP. Linux/macOS unchanged. Windows TTY (CLI) unchanged. Override with CRG_PARSE_EXECUTOR=process|thread for any environment. Closes tirth8205#46, tirth8205#136 (effectively — root cause now papered over).
On Windows MCP stdio mode, subprocess.run without explicit stdin inherits the parent's pipe handle. Git blocks waiting to read from that inherited pipe until the timeout fires (10s for context.py, _GIT_TIMEOUT for changes.py/incremental.py), causing every MCP tool call that touches git (get_minimal_context_tool, detect_changes_tool, etc.) to hang for 10+ seconds. Fix: add stdin=subprocess.DEVNULL to: - tools/context.py _has_git_changes (git diff, git status) - changes.py parse_git_diff_ranges (git diff --unified=0) - changes.py parse_svn_diff_ranges (svn diff) - incremental.py _git_branch_info (git rev-parse x2) - incremental.py get_changed_files (git diff --name-only x2) - incremental.py get_staged_and_unstaged (git status --porcelain) - incremental.py get_all_tracked_files (git ls-files) Result: get_minimal_context_tool 10s -> 0.06s. Companion to fix(parsing) auto-select thread pool (fe014a8).
dpesch
added a commit
to 11com7/code-review-graph
that referenced
this pull request
May 3, 2026
dpesch
added a commit
to 11com7/code-review-graph
that referenced
this pull request
May 3, 2026
- PR tirth8205#400 (niveku): auto-select ThreadPoolExecutor on Windows MCP stdio to avoid ProcessPool pipe-handle inheritance deadlock; stdin=DEVNULL on all git subprocess calls (closes upstream tirth8205#46, tirth8205#136, tirth8205#401) - PR tirth8205#409 (suainam): encoding='utf-8' on subprocess.run() calls for Windows non-UTF-8 locales (GBK, Shift-JIS, etc.) These are pending upstream review. Merged here for internal use.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
ProcessPoolExecutorinincremental.pydeadlocks / leaks zombie workers when the package runs under a Windows MCP stdio host (Claude Code, Cursor, etc.). The existingasyncio.to_threadwrapper inmain.pyonly avoids event-loop deadlock — the worker leak in tear-down is unaddressed.After repeated MCP graph queries, accumulated zombies (~20 across an hour-long session in my repro) eventually wedge stdio.
Root cause
spawnis the only multiprocessing start method on Windows. Workers spawned from a process whose stdin/stdout are pipes connected to a parent (the MCP host) inherit those handles. When the pool closes, the workers can't detach the inherited pipes cleanly and remain in a hung state. The CLI path (uvx code-review-graph update) is unaffected because stdin is a TTY.Fix
Auto-select the executor at runtime:
process(default, original behavior) — Linux, macOS, Windows-TTY.thread— whensys.platform == "win32"ANDnot sys.stdin.isatty()(the MCP profile).CRG_PARSE_EXECUTOR={process,thread}for any environment.Tree-sitter parsers release the GIL during native parsing, so
ThreadPoolExecutorkeeps a meaningful chunk of the parallelism. Measured on a 202-file Flutter+Dart+SQL repo: full build 1.9s under thread pool with full postprocess (FTS + flows + communities). Comparable to the prior Linux-fork numbers I saw on similar repos.Repro environment
uvx code-review-graph serveBefore: every other MCP build call would hang for minutes, leaving `uvx`+python zombies. Status / read-only tools eventually became unresponsive too.
After: builds complete in seconds, no zombies, MCP stays healthy across full sessions.
Trade-offs
Backwards compatibility
No public API changes. New env var
CRG_PARSE_EXECUTORis optional. Default behavior preserved on every platform that already worked.Happy to iterate on the auto-detect heuristic (e.g., detecting "running as MCP" more directly) if you have a preferred signal.