Windows: semantic_search_nodes_tool hangs on stdio MCP transport (missing asyncio.to_thread + tokenizer deadlock)

## Summary

On Windows, `semantic_search_nodes_tool` silently hangs indefinitely when sentence-transformers embeddings are available. Three layered issues prevent semantic search from working on Windows.

## Environment

- OS: Windows 10 Enterprise
- Python: 3.11
- code-review-graph: 2.3.2
- sentence-transformers: 3.4.1
- FastMCP: 2.14.7
- Transport: stdio (VS Code extension)

## Root Causes

### 1. `semantic_search_nodes_tool` missing `asyncio.to_thread` (regression vs `embed_graph_tool`)

`embed_graph_tool` already uses `asyncio.to_thread` (per the fix in #46 / #136) to avoid blocking the `WindowsSelectorEventLoop` with sentence-transformers inference. `semantic_search_nodes_tool` does not — it is a plain `def` that calls `semantic_search_nodes()` synchronously on the event loop, producing the same deadlock.

**Fix:** convert to `async def` + `asyncio.to_thread`, identical to `embed_graph_tool`:

```python
@mcp.tool()
async def semantic_search_nodes_tool(
    query: str,
    ...
) -> dict:
    """..."""
    return await asyncio.to_thread(
        semantic_search_nodes,
        query=query, kind=kind, limit=limit,
        repo_root=_resolve_repo_root(repo_root),
        model=model, detail_level=detail_level,
    )
```

### 2. Rust tokenizer thread deadlock in subprocess context

Even with `asyncio.to_thread`, the Rust-based fast tokenizers inside sentence-transformers spawn parallel threads on first load. In a Windows MCP server subprocess this causes a deadlock. Required env vars (not currently documented):

```json
"TOKENIZERS_PARALLELISM": "false",
"OMP_NUM_THREADS": "1"
```

### 3. 12-second cold start on first tool call

The `LocalEmbeddingProvider` lazy-loads `SentenceTransformer` on the first inference call (~12s on first call, <1s subsequently). This means the first `semantic_search_nodes_tool` call in a session always blocks the event loop for 12 seconds even after fix #1.

**Fix:** pre-warm synchronously in `main()` before `mcp.run()` starts the event loop:

```python
def _prewarm_embeddings() -> None:
    try:
        from .embeddings import LocalEmbeddingProvider
        import os
        model_name = os.environ.get("CRG_EMBEDDING_MODEL", "all-MiniLM-L6-v2")
        provider = LocalEmbeddingProvider(model_name=model_name)
        provider._get_model()
    except Exception:
        pass  # embeddings are optional, never crash the server

def main(repo_root=None):
    ...
    if sys.platform == "win32":
        asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())
    _prewarm_embeddings()  # load model before event loop starts
    mcp.run(transport="stdio")
```

## Result after all three fixes

- Semantic search responds in <1s (warm) on Windows via stdio MCP
- No deadlocks, no hangs

## Notes

- `embed_graph_tool` was also hanging for the same tokenizer reason (fix #2) even though it already had `asyncio.to_thread`. Adding `TOKENIZERS_PARALLELISM=false` + pre-warming fixes both tools.
- The pre-warm approach also makes `embed_graph_tool`'s first call faster since sentence-transformers is already initialized.
- Related: #46, #136

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Windows: semantic_search_nodes_tool hangs on stdio MCP transport (missing asyncio.to_thread + tokenizer deadlock) #385

Summary

Environment

Root Causes

1. `semantic_search_nodes_tool` missing `asyncio.to_thread` (regression vs `embed_graph_tool`)

2. Rust tokenizer thread deadlock in subprocess context

3. 12-second cold start on first tool call

Result after all three fixes

Notes

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Windows: semantic_search_nodes_tool hangs on stdio MCP transport (missing asyncio.to_thread + tokenizer deadlock) #385

Description

Summary

Environment

Root Causes

1. semantic_search_nodes_tool missing asyncio.to_thread (regression vs embed_graph_tool)

2. Rust tokenizer thread deadlock in subprocess context

3. 12-second cold start on first tool call

Result after all three fixes

Notes

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

1. `semantic_search_nodes_tool` missing `asyncio.to_thread` (regression vs `embed_graph_tool`)