Stabilize runtime and hybrid search by chidev · Pull Request #1 · chidev/qmd

chidev · 2026-04-02T06:34:27Z

Summary

carry upstream PR fix: allow hyphenated words in vec/hyde queries (#383) tobi/qmd#384 for hyphen-safe semantic query validation
carry upstream PR feat(mcp): expose skipRerank and candidateLimit in query tool tobi/qmd#435 for MCP candidateLimit and skipRerank support
carry upstream PR Avoid SQLite startup races during parallel Bun qmd initialization tobi/qmd#442 for parallel startup SQLite/WAL lock resilience
carry upstream PR feat: support remote Ollama embeddings via OLLAMA_EMBED_URL tobi/qmd#490 for remote Ollama embeddings via OLLAMA_EMBED_URL

Context

This branch starts from upstream main because the old local install was still on the c85889d lineage and missed already-merged fixes like the collection filter and launcher/runtime patches.

Verification

bunx vitest run test/store.helpers.unit.test.ts test/structured-search.test.ts
bunx vitest run test/cli.test.ts -t "parallel startup regression"
bunx tsc -p tsconfig.build.json --noEmit

Notes

test/mcp.test.ts was not used as a verification gate here because it triggers a large model download path unrelated to the carried changes.

The validateSemanticQuery regex rejected any hyphen followed by a word character, blocking common compound words (real-time, multi-client, kebab-case identifiers like better-sqlite3). Tighten the check to only match negation syntax at token boundaries (start of string or after whitespace). See tobi#383 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

On CPU-only servers, LLM reranking (0.6B model) takes ~2s per document, making the query tool unusable with timeouts under 30s. This commit: - Adds `skipRerank` boolean parameter to the MCP `query` tool schema. When true, returns results scored by RRF fusion only (no LLM rerank). - Passes `candidateLimit` through to structuredSearch (was declared in schema but never forwarded to the store). Use case: automated RAG hooks with 1-2s timeouts on VPS without GPU. With skipRerank=true, queries complete in 30-50ms instead of 30-40s.

When OLLAMA_EMBED_URL is set, all embedding and tokenization operations use the remote Ollama HTTP API instead of node-llama-cpp. This enables QMD on platforms without local GPU/Vulkan support (ARM64 VPS, Docker containers, CI runners) and with remote Ollama instances (Tailscale, LAN, Docker networks). Changes: - Add ollamaEmbed() and ollamaEmbedBatch() helper functions using Ollama /api/embed endpoint - Patch getEmbedding() to bypass node-llama-cpp when OLLAMA_EMBED_URL is set - Patch generateEmbeddings() with dedicated Ollama fast-path that skips withLLMSessionForLlm entirely - Patch expandQuery() to skip LLM-based HYDE query expansion (passes raw query as vector search) - Patch chunkDocumentByTokens() to use char-based estimation instead of local tokenizer - Patch vsearch and query CLI commands to skip withLLMSession wrapper Environment variables: OLLAMA_EMBED_URL - Ollama server URL (e.g. http://your-ollama:11434) OLLAMA_EMBED_MODEL - Model name (default: nomic-embed-text) Tested on ARM64 Oracle Cloud VPS with qwen3-embedding:0.6b on remote Ollama via Tailscale. 7,100+ documents indexed successfully.

rymalia and others added 7 commits April 2, 2026 01:28

Avoid SQLite startup races during parallel Bun qmd initialization

aee44af

Narrow the Bun startup regression harness

62b1707

Add managed installer for cross-machine rollout

2202802

Keep managed clone clean after build

815cc18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stabilize runtime and hybrid search#1

Stabilize runtime and hybrid search#1
chidev wants to merge 7 commits intomainfrom
feature/stabilize_qmd

chidev commented Apr 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

chidev commented Apr 2, 2026

Summary

Context

Verification

Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants