Skip to content

Windows: semantic_search_nodes_tool hangs on stdio MCP transport (missing asyncio.to_thread + tokenizer deadlock) #385

@gfleg77

Description

@gfleg77

Summary

On Windows, semantic_search_nodes_tool silently hangs indefinitely when sentence-transformers embeddings are available. Three layered issues prevent semantic search from working on Windows.

Environment

  • OS: Windows 10 Enterprise
  • Python: 3.11
  • code-review-graph: 2.3.2
  • sentence-transformers: 3.4.1
  • FastMCP: 2.14.7
  • Transport: stdio (VS Code extension)

Root Causes

1. semantic_search_nodes_tool missing asyncio.to_thread (regression vs embed_graph_tool)

embed_graph_tool already uses asyncio.to_thread (per the fix in #46 / #136) to avoid blocking the WindowsSelectorEventLoop with sentence-transformers inference. semantic_search_nodes_tool does not — it is a plain def that calls semantic_search_nodes() synchronously on the event loop, producing the same deadlock.

Fix: convert to async def + asyncio.to_thread, identical to embed_graph_tool:

@mcp.tool()
async def semantic_search_nodes_tool(
    query: str,
    ...
) -> dict:
    """..."""
    return await asyncio.to_thread(
        semantic_search_nodes,
        query=query, kind=kind, limit=limit,
        repo_root=_resolve_repo_root(repo_root),
        model=model, detail_level=detail_level,
    )

2. Rust tokenizer thread deadlock in subprocess context

Even with asyncio.to_thread, the Rust-based fast tokenizers inside sentence-transformers spawn parallel threads on first load. In a Windows MCP server subprocess this causes a deadlock. Required env vars (not currently documented):

"TOKENIZERS_PARALLELISM": "false",
"OMP_NUM_THREADS": "1"

3. 12-second cold start on first tool call

The LocalEmbeddingProvider lazy-loads SentenceTransformer on the first inference call (~12s on first call, <1s subsequently). This means the first semantic_search_nodes_tool call in a session always blocks the event loop for 12 seconds even after fix #1.

Fix: pre-warm synchronously in main() before mcp.run() starts the event loop:

def _prewarm_embeddings() -> None:
    try:
        from .embeddings import LocalEmbeddingProvider
        import os
        model_name = os.environ.get("CRG_EMBEDDING_MODEL", "all-MiniLM-L6-v2")
        provider = LocalEmbeddingProvider(model_name=model_name)
        provider._get_model()
    except Exception:
        pass  # embeddings are optional, never crash the server

def main(repo_root=None):
    ...
    if sys.platform == "win32":
        asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())
    _prewarm_embeddings()  # load model before event loop starts
    mcp.run(transport="stdio")

Result after all three fixes

  • Semantic search responds in <1s (warm) on Windows via stdio MCP
  • No deadlocks, no hangs

Notes

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions