Summary
On Windows, semantic_search_nodes_tool silently hangs indefinitely when sentence-transformers embeddings are available. Three layered issues prevent semantic search from working on Windows.
Environment
- OS: Windows 10 Enterprise
- Python: 3.11
- code-review-graph: 2.3.2
- sentence-transformers: 3.4.1
- FastMCP: 2.14.7
- Transport: stdio (VS Code extension)
Root Causes
1. semantic_search_nodes_tool missing asyncio.to_thread (regression vs embed_graph_tool)
embed_graph_tool already uses asyncio.to_thread (per the fix in #46 / #136) to avoid blocking the WindowsSelectorEventLoop with sentence-transformers inference. semantic_search_nodes_tool does not — it is a plain def that calls semantic_search_nodes() synchronously on the event loop, producing the same deadlock.
Fix: convert to async def + asyncio.to_thread, identical to embed_graph_tool:
@mcp.tool()
async def semantic_search_nodes_tool(
query: str,
...
) -> dict:
"""..."""
return await asyncio.to_thread(
semantic_search_nodes,
query=query, kind=kind, limit=limit,
repo_root=_resolve_repo_root(repo_root),
model=model, detail_level=detail_level,
)
2. Rust tokenizer thread deadlock in subprocess context
Even with asyncio.to_thread, the Rust-based fast tokenizers inside sentence-transformers spawn parallel threads on first load. In a Windows MCP server subprocess this causes a deadlock. Required env vars (not currently documented):
"TOKENIZERS_PARALLELISM": "false",
"OMP_NUM_THREADS": "1"
3. 12-second cold start on first tool call
The LocalEmbeddingProvider lazy-loads SentenceTransformer on the first inference call (~12s on first call, <1s subsequently). This means the first semantic_search_nodes_tool call in a session always blocks the event loop for 12 seconds even after fix #1.
Fix: pre-warm synchronously in main() before mcp.run() starts the event loop:
def _prewarm_embeddings() -> None:
try:
from .embeddings import LocalEmbeddingProvider
import os
model_name = os.environ.get("CRG_EMBEDDING_MODEL", "all-MiniLM-L6-v2")
provider = LocalEmbeddingProvider(model_name=model_name)
provider._get_model()
except Exception:
pass # embeddings are optional, never crash the server
def main(repo_root=None):
...
if sys.platform == "win32":
asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())
_prewarm_embeddings() # load model before event loop starts
mcp.run(transport="stdio")
Result after all three fixes
- Semantic search responds in <1s (warm) on Windows via stdio MCP
- No deadlocks, no hangs
Notes
Summary
On Windows,
semantic_search_nodes_toolsilently hangs indefinitely when sentence-transformers embeddings are available. Three layered issues prevent semantic search from working on Windows.Environment
Root Causes
1.
semantic_search_nodes_toolmissingasyncio.to_thread(regression vsembed_graph_tool)embed_graph_toolalready usesasyncio.to_thread(per the fix in #46 / #136) to avoid blocking theWindowsSelectorEventLoopwith sentence-transformers inference.semantic_search_nodes_tooldoes not — it is a plaindefthat callssemantic_search_nodes()synchronously on the event loop, producing the same deadlock.Fix: convert to
async def+asyncio.to_thread, identical toembed_graph_tool:2. Rust tokenizer thread deadlock in subprocess context
Even with
asyncio.to_thread, the Rust-based fast tokenizers inside sentence-transformers spawn parallel threads on first load. In a Windows MCP server subprocess this causes a deadlock. Required env vars (not currently documented):3. 12-second cold start on first tool call
The
LocalEmbeddingProviderlazy-loadsSentenceTransformeron the first inference call (~12s on first call, <1s subsequently). This means the firstsemantic_search_nodes_toolcall in a session always blocks the event loop for 12 seconds even after fix #1.Fix: pre-warm synchronously in
main()beforemcp.run()starts the event loop:Result after all three fixes
Notes
embed_graph_toolwas also hanging for the same tokenizer reason (fix Fix/plugin install config #2) even though it already hadasyncio.to_thread. AddingTOKENIZERS_PARALLELISM=false+ pre-warming fixes both tools.embed_graph_tool's first call faster since sentence-transformers is already initialized.