Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 16 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,22 @@

## [Unreleased]

### Changes

- `bin/qmd` gains an opt-in fast-path for `qmd search`. When
`QMD_DAEMON_URL` is set and points at a running
`qmd mcp --http --daemon`, `search` POSTs to the daemon's existing
`/search` REST endpoint instead of paying the per-call Node +
better-sqlite3 + sqlite-vec bootstrap. On large indexes this turns
a ~500–800 ms cold start into ~50–100 ms. Opt-in preserves the
default formatted-text output for interactive users; daemon mode
prints the JSON response for scripts/agents. Unrecognised flags
(`--json`, `--min-score`, …), `--index <name>`, non-plain-positive
`-n` values, and any curl error fall through to the cold-start CLI
silently. `qmd vsearch` is intentionally NOT fast-pathed because
its vector-only semantics aren't expressible on the daemon's
current REST surface.

### Fixes

- GPU: respect explicit `QMD_LLAMA_GPU=metal|vulkan|cuda` backend overrides instead of always using auto GPU selection. #529
Expand Down
41 changes: 41 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -112,6 +112,47 @@ Or configure MCP manually in `~/.claude/settings.json`:
}
```

#### Fast CLI via daemon (scripts / agents)

For repeated scripted queries the per-call Node + SQLite + sqlite-vec
bootstrap (~500 ms on large indexes) is wasted overhead. Start a daemon
once and point `bin/qmd` at it:

```sh
qmd mcp --http --daemon --port 8181
export QMD_DAEMON_URL="http://127.0.0.1:8181"

qmd search "database migrations" -c engineering # ~50 ms, returns JSON
```

`qmd search` routes through the daemon's existing `POST /search` endpoint
when `QMD_DAEMON_URL` is set. Responses are the structured JSON shape
(`{results:[{docid,file,title,score,snippet,context}]}`) — convenient for
agents and scripts parsing stdout. Interactive users who want the
formatted-text output should leave `QMD_DAEMON_URL` unset.

The fast-path handles only `qmd search` (BM25 + hybrid via the daemon's
structured-query shape). `qmd vsearch` intentionally always runs the
cold-start CLI: its vector-only pipeline, minScore default, and rerank
behavior are not currently expressible on the daemon's `/search`
endpoint, and routing it would silently change result ordering.

Fall-through cases (silently use cold-start CLI):

- `QMD_DAEMON_URL` unset
- Daemon health check fails within 1 s
- Request returns a non-2xx
- Invocation uses `--index <name>` (daemons serve one index at a time)
- Unrecognised flag (e.g. `--min-score`, `--json`) — cold-start owns the full flag set
- Non-plain-positive-integer `-n` value

Use `curl` to test payload directly:

```sh
curl -s http://127.0.0.1:8181/search -H 'Content-Type: application/json' \
-d '{"searches":[{"type":"lex","query":"auth"}],"collections":["engineering"],"limit":5}'
```

#### HTTP Transport

By default, QMD's MCP server uses stdio (launched as a subprocess by each client). For a shared, long-lived server that avoids repeated model loading, use the HTTP transport:
Expand Down
170 changes: 170 additions & 0 deletions bin/qmd
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,176 @@ done
# to avoid native module ABI mismatches (e.g., better-sqlite3 compiled for bun vs node)
DIR="$(cd -P "$(dirname "$SOURCE")/.." && pwd)"

# ---------------------------------------------------------------------------
# Daemon fast-path (opt-in, search only)
#
# When `QMD_DAEMON_URL` is set and points at a running `qmd mcp --http` daemon,
# route `qmd search` through the daemon's existing REST endpoint instead of
# bootstrapping a fresh Node + better-sqlite3 + sqlite-vec process for every
# invocation. On large indexes this can be the difference between ~80 ms and
# ~800 ms per query.
#
# Only `qmd search` is eligible. `qmd vsearch` is deliberately NOT handled
# here: vsearch's cold-start path sets a different `minScore` default (0.3
# vs 0) and uses a vector-only pipeline without reranking, and the daemon's
# REST surface cannot currently express either choice. Routing vsearch
# through the fast-path would silently change result ordering and filtering.
# Cold-start remains the source of truth for vsearch semantics.
#
# Opt-in so the default interactive UX (formatted text output) is unchanged.
# Callers that set `QMD_DAEMON_URL` get the daemon's JSON response shape on
# stdout — useful for scripts and agents. Unset the variable (or pass
# `--index <name>`) to fall back to the normal cold-start CLI.
#
# The fast-path logic lives in a shell function so its argument parsing
# operates on a local copy of `$@`. If the POST ultimately fails, the outer
# shell's original argv is untouched and the cold-start `exec` receives the
# exact command the user typed.
#
# Bypass rules:
# - Only `search` is eligible (vsearch deliberately excluded, see above).
# - Any `--index` / `--index=...` forces fall-through — the daemon serves
# whichever index it was started with, so secondary-index queries must go
# through the cold-start CLI.
# - Any curl/daemon error (health check fails, non-2xx response, empty query)
# falls back to cold-start silently.
# ---------------------------------------------------------------------------
_qmd_try_daemon() {
# $1 = subcommand ("search"); $2.. = subcommand args.
# Returns 0 on a successful daemon response (stdout already emitted),
# non-zero on any reason to fall through to cold-start.
[ -n "${QMD_DAEMON_URL:-}" ] || return 1
command -v curl >/dev/null 2>&1 || return 1

case "$1" in
search) _qmd_qtype="lex" ;;
*) return 1 ;;
esac
shift

# Detect --index before any destructive parsing.
for _arg in "$@"; do
case "$_arg" in
--index|--index=*) return 1 ;;
esac
done

curl -sf --max-time 1 "${QMD_DAEMON_URL}/health" >/dev/null 2>&1 || return 1

# Match the interactive CLI's default result count so scripts that rely
# on `qmd search <q>` without an explicit -n don't see cardinality change
# based on whether QMD_DAEMON_URL is set. The `--files`/`--json` special
# case (default 20) can't apply here — those flags fall through to
# cold-start via the unknown-flag branch below.
_qmd_limit=5
_qmd_collections_json="" # "" = unset; later serialised as null
_qmd_query=""

# Upstream CLI flag shapes: -n <N> for limit, -c <name> for collection
# (collection supports `multiple: true`, so several -c flags must all be
# forwarded). Unknown dashed flags are dropped so we never leak them into
# the query text.
while [ $# -gt 0 ]; do
case "$1" in
-n|--n)
# Value-taking flag; bail out if the user left off the value
# (otherwise `shift 2` would abort the whole shell with
# "can't shift that many" instead of falling through).
[ $# -ge 2 ] || return 1
_qmd_limit="$2"
shift 2
;;
--n=*) _qmd_limit="${1#*=}"; shift ;;
-c|--collection)
[ $# -ge 2 ] || return 1
_qmd_c_escaped=$(printf '%s' "$2" | sed 's/\\/\\\\/g; s/"/\\"/g')
if [ -z "$_qmd_collections_json" ]; then
_qmd_collections_json="[\"${_qmd_c_escaped}\"]"
else
# Strip the closing `]`, append `,"<val>"]`. `${var%]}` is POSIX.
_qmd_collections_json="${_qmd_collections_json%]},\"${_qmd_c_escaped}\"]"
fi
shift 2
;;
--collection=*)
_qmd_c_val="${1#*=}"
_qmd_c_escaped=$(printf '%s' "$_qmd_c_val" | sed 's/\\/\\\\/g; s/"/\\"/g')
if [ -z "$_qmd_collections_json" ]; then
_qmd_collections_json="[\"${_qmd_c_escaped}\"]"
else
_qmd_collections_json="${_qmd_collections_json%]},\"${_qmd_c_escaped}\"]"
fi
shift
;;
--)
shift
while [ $# -gt 0 ]; do
if [ -z "$_qmd_query" ]; then _qmd_query="$1"
else _qmd_query="$_qmd_query $1"; fi
shift
done
;;
-*)
# Unknown flag. The upstream CLI accepts several options this
# parser can't safely decode (--min-score, --candidate-limit,
# --intent, --full, --json, --files, --all, ...). Some take a
# value, some are booleans. Rather than guess and risk eating
# the next token as part of the query, fall through to the
# cold-start CLI which knows the full flag set. The opt-in
# fast-path stays a strict subset of the interactive CLI.
return 1 ;;
*)
if [ -z "$_qmd_query" ]; then _qmd_query="$1"
else _qmd_query="$_qmd_query $1"; fi
shift ;;
esac
done

[ -n "$_qmd_query" ] || return 1

# -n policy in the fast-path: accept ONLY a plain positive integer with
# no sign, no leading zeros, no trailing non-digits, no exponent, etc.
# Any other shape (`-1`, `+7`, `0`, `1e2`, `007`, `5abc`, empty) falls
# through to the cold-start CLI, which owns the edge-case semantics.
# This keeps the fast-path a strict subset of interactive behaviour
# without trying to emulate `parseInt(...) || default` in POSIX sh.
#
# An unset -n is the common case — we already defaulted `_qmd_limit=5`.
# Skip the strict check on that value.
if [ "$_qmd_limit" != "5" ]; then
case "$_qmd_limit" in
''|0|0[0-9]*|*[!0-9]*) return 1 ;;
esac
fi

_qmd_q_escaped=$(printf '%s' "$_qmd_query" | sed 's/\\/\\\\/g; s/"/\\"/g')
if [ -z "$_qmd_collections_json" ]; then
_qmd_collections_json="null"
fi

_qmd_payload="{\"searches\":[{\"type\":\"${_qmd_qtype}\",\"query\":\"${_qmd_q_escaped}\"}],\"collections\":${_qmd_collections_json},\"limit\":${_qmd_limit}}"

# Buffer the daemon response so a mid-stream failure (connection reset,
# --max-time elapsing after bytes have been written) can't leak a partial
# payload before we fall through to the cold-start CLI. Script consumers
# should see either the daemon's complete JSON response OR the cold-start
# output — never both concatenated.
_qmd_response=$(curl -sf --max-time 30 -X POST \
-H "Content-Type: application/json" \
-d "$_qmd_payload" \
"${QMD_DAEMON_URL}/search") || return 1
printf '%s\n' "$_qmd_response"
return 0
}

case "${1:-}" in
search)
if _qmd_try_daemon "$@"; then
exit 0
fi
;;
esac

# Detect the package manager that installed dependencies by checking lockfiles.
# $BUN_INSTALL is intentionally NOT checked — it only indicates that bun exists
# on the system, not that it was used to install this package (see #361).
Expand Down
1 change: 1 addition & 0 deletions src/db.ts
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,7 @@ export interface Database {
exec(sql: string): void;
prepare(sql: string): Statement;
loadExtension(path: string): void;
transaction<T extends (...args: any[]) => any>(fn: T): T;
close(): void;
}

Expand Down
Loading