feat(llm): add optional OpenAI embedding backend by lukeboyett · Pull Request #3 · lukeboyett/qmd

lukeboyett · 2026-04-14T14:16:53Z

Summary

Adds an optional OpenAI embedding backend to QMD. When OPENAI_API_KEY is set in the environment, getDefaultLlamaCpp() returns a new OpenAILLM instead of the local node-llama-cpp implementation. When no key is present, behavior is unchanged.

Motivation

Not every user wants to download ~2 GB of local GGUF models on first run.
Some users already have an OpenAI budget and would prefer hosted embeddings + chat completion.
Makes qmd viable on memory/CPU-constrained machines, containers without GPU, and CI.

What's in it

Source:

New OpenAILLM class in src/llm.ts implementing the full LLM interface plus the duck-typed surface used by store.ts/cli/qmd.ts (embedBatch, tokenize/countTokens/detokenize via tiktoken cl100k_base, getDeviceInfo).
Selection rule in getDefaultLlamaCpp(): new OpenAILLM() when OPENAI_API_KEY is set, else new LlamaCpp().
New exported type QmdLLM = LlamaCpp | OpenAILLM; store.ts signatures accept QmdLLM so call sites work with either backend.
Model selection via env vars with sensible defaults:
- QMD_OPENAI_EMBED_MODEL → text-embedding-3-small
- QMD_OPENAI_GENERATE_MODEL → gpt-4o-mini
Transient failure handling inside OpenAILLM: exponential backoff + jitter on 408/409/425/429/5xx and transport errors (ECONNRESET, UND_ERR_*, etc.). A batch that still fails after retries falls back to per-document embed() so one flaky batch doesn't zero out an entire index run.
Rerank falls back to cosine similarity on OpenAI embeddings since OpenAI has no native rerank endpoint.

User-facing:

qmd --help now has an "Embedding backend" section listing the relevant env vars.
qmd status displays a Backend: line (llama.cpp (...) vs openai (...)) without instantiating an LLM.
README adds an "Embedding backend: OpenAI (hosted)" section; the Environment Variables table is extended.

Tests: test/openai-backend.test.ts mocks the openai module (no real API calls) and covers:

Backend selection by env var presence / absence
Constructor rejects missing key
options.model is ignored — configured OpenAI model name wins
Retry on transient 429 eventually succeeds
Gives up after maxAttempts on persistent ECONNRESET, returns null
No retry on 401
embedBatch falls back to per-item embed when a batch fails repeatedly

Dependencies:

openai@^6
tiktoken@^1

What's NOT changed

Local llama.cpp path is untouched when OPENAI_API_KEY is unset. Existing tests still cover it.
No migration behavior is required on the user's side beyond qmd embed -f if they toggle the env var (vector dimensions differ: 768 for embeddinggemma, 1536 for text-embedding-3-small). This is called out in the README note.

Test plan

npm run build passes
New test file passes in isolation (vitest run test/openai-backend.test.ts — 8 passed)
qmd status shows Backend: llama.cpp (...) without a key, Backend: openai (text-embedding-3-small) with a key (verified locally against a scratch index)
qmd --help renders the new Embedding backend block
Full CI suite (other tests unaffected: changes to store.ts are type-only — LlamaCpp → QmdLLM)

Notes

Based on fix/db-transaction-type because a clean main currently fails to build; that baseline build fix is the parent PR.
Independent of fix(embed): honor -c <collection> by filtering pending docs #2 (embed -c collection filter); either can merge first.

Adds `OpenAILLM`, a second concrete `LLM` implementation, and routes `getDefaultLlamaCpp()` to it when `OPENAI_API_KEY` is set in the environment. Local node-llama-cpp remains the default when no key is present, so existing installs are unchanged. Why: - Not every user wants to download ~2 GB of local GGUF models. - Some environments already have an OpenAI budget and would prefer hosted embeddings + chat completion. - A hosted backend makes qmd viable in memory/CPU-constrained setups and containers without GPU. Implementation: - `OpenAILLM` implements the `LLM` interface plus duck-typed `embedBatch`, `tokenize`/`countTokens`/`detokenize` (tiktoken cl100k_base), and `getDeviceInfo` (CPU-only) for parity with `LlamaCpp` at all call sites in `src/store.ts` and `src/cli/qmd.ts`. - Model selection is env-driven with defaults `text-embedding-3-small` (embed) and `gpt-4o-mini` (generate). Overrides: `QMD_OPENAI_EMBED_MODEL`, `QMD_OPENAI_GENERATE_MODEL`. - OpenAILLM ignores local GGUF model identifiers passed via `options.model` — it always uses the configured OpenAI model name. - Retry + backoff on transient HTTP (408/409/425/429/5xx) and socket errors (ECONNRESET, UND_ERR_*, etc.). A failed batch falls back to per-document embedding so one flaky batch doesn't zero out every row. - Rerank falls back to cosine similarity on OpenAI embeddings since OpenAI has no native rerank endpoint. - Export `QmdLLM = LlamaCpp | OpenAILLM` from `llm.ts`. Store.ts signatures accept `QmdLLM` so either backend can flow through. User-facing: - `qmd --help` gains an "Embedding backend" section listing the relevant env vars. - `qmd status` shows a `Backend:` line (`llama.cpp (...)` vs `openai (...)`) so the active path is obvious without instantiating an LLM. - README gets an "Embedding backend: OpenAI (hosted)" subsection and the Environment Variables table is extended. Tests: - `test/openai-backend.test.ts` mocks the `openai` module (no real API calls) and covers: - backend selection by env var, - constructor error when key is absent, - model-name override behavior, - retry on transient 429, - give-up after maxAttempts on persistent ECONNRESET, - no retry on 401, - batch-to-single-item fallback after repeated 503s. Dependencies: - openai@^6 - tiktoken@^1

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4be5422098

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-14T14:22:40Z

  if (!defaultLlamaCpp) {
-    defaultLlamaCpp = new LlamaCpp();
+    if (process.env.OPENAI_API_KEY) {
+      defaultLlamaCpp = new OpenAILLM();
+    } else {


Honor OPENAI_API_KEY when default LLM was pre-set

Update backend selection so getDefaultLlamaCpp() does not permanently stick to an earlier LlamaCpp singleton. Right now it only checks OPENAI_API_KEY when defaultLlamaCpp is null, but getStore() can pre-set the singleton to new LlamaCpp(...) from YAML model config (src/cli/qmd.ts), which means OpenAI never activates for those users even when the key is present. This breaks the new backend-switch behavior for a common configured setup.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-14T14:22:40Z

+  if (process.env.OPENAI_API_KEY) {
+    const embedModel = process.env.QMD_OPENAI_EMBED_MODEL || "text-embedding-3-small";
+    console.log(`Backend: openai (${embedModel})`);


Report backend from active LLM instance, not env var

qmd status currently prints Backend: openai (...) whenever OPENAI_API_KEY is set, but the active singleton may already be a local LlamaCpp (for example when model config initializes it first). In that case status output is incorrect and can send users down the wrong debugging path; derive the backend from the instantiated default LLM/store instance instead of environment presence alone.

Useful? React with 👍 / 👎.

Codex round 1 on #3 flagged two linked concerns: - P1 "Honor OPENAI_API_KEY when default LLM was pre-set": when `config.models` is present, `getStore()` in `src/cli/qmd.ts` calls `setDefaultLlamaCpp(new LlamaCpp(...))` before `getDefaultLlamaCpp()` has a chance to apply env-based selection. OpenAI never activates for that user even with a valid key. - P2 "Report backend from active LLM instance, not env var": `qmd status` inferred `Backend: openai (...)` from `process.env.OPENAI_API_KEY`, which is a lie whenever the CLI already pre-set a local `LlamaCpp` singleton. Fix (at the caller, not in the singleton): - The YAML-models branch in `getStore()` now checks `OPENAI_API_KEY` first. If the key is set, it pre-sets an `OpenAILLM` and the configured GGUF URIs are intentionally ignored (the hosted backend doesn't understand them; it uses `QMD_OPENAI_EMBED_MODEL` / `QMD_OPENAI_GENERATE_MODEL` or the built-in defaults). - `showStatus()` derives the backend label from the actual instance via `getDefaultLlamaCpp() instanceof OpenAILLM`, and reads `llm.embedModelName` (both concrete classes expose it without forcing a model load) so the status output always matches the real active backend. Test: - `a pre-set LlamaCpp singleton wins over env selection` asserts the singleton's stickiness so status-reporting code can rely on `getDefaultLlamaCpp()` reflecting the active instance. The behavior being flagged is a caller bug; the singleton itself is now documented as intentionally sticky once set.

lukeboyett · 2026-04-14T14:25:11Z

Both concerns addressed in e88f653:

P1 "Honor OPENAI_API_KEY when default LLM was pre-set" — the fix is at the caller. The YAML-models branch in getStore() now checks OPENAI_API_KEY first and pre-sets an OpenAILLM instead of a LlamaCpp when the key is present. Local GGUF URIs in the YAML are intentionally ignored by the hosted backend; QMD_OPENAI_EMBED_MODEL / QMD_OPENAI_GENERATE_MODEL take over.

P2 "Report backend from active LLM instance, not env var" — showStatus() now derives the backend label from getDefaultLlamaCpp() instanceof OpenAILLM and reads llm.embedModelName. Both concrete classes expose that without forcing a model load, so status stays cheap.

New test a pre-set LlamaCpp singleton wins over env selection documents the singleton's stickiness so status/reporting code can rely on getDefaultLlamaCpp() reflecting the real instance. Kept the fix in the caller rather than making getDefaultLlamaCpp() self-correcting because silently mutating a pre-set singleton from inside the getter felt surprising — happy to invert if you disagree.

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e88f653001

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-14T14:31:47Z

+    "openai": "^6.34.0",
    "picomatch": "4.0.4",
    "sqlite-vec": "0.1.9",
+    "tiktoken": "^1.0.22",


Regenerate bun.lock after adding OpenAI dependencies

These new runtime dependencies were added to package.json without a corresponding lockfile update, which breaks the repo’s frozen-lockfile checks (bun install --frozen-lockfile in .github/workflows/ci.yml and release scripts). In this state, CI/publish will fail before tests run because the lockfile no longer matches declared dependencies.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-14T14:31:47Z

+  async expandQuery(query: string, options: { context?: string; includeLexical?: boolean } = {}): Promise<Queryable[]> {
+    const includeLexical = options.includeLexical ?? true;
+
+    try {
+      const prompt = `Expand this search query into 3 variations:


Include intent hint in OpenAI query-expansion prompt

The OpenAI expandQuery path ignores caller-provided expansion hints and builds its prompt from query alone, so intent-guided search degrades when the OpenAI backend is active. store.expandQuery passes an intent hint into llm.expandQuery(...), and the local LlamaCpp implementation uses it (Query intent: ...), but this implementation drops it entirely, changing ranking/recall behavior for users of --intent.

Useful? React with 👍 / 👎.

Codex round 2 on #3 flagged three issues: - P1 "Honor OPENAI_API_KEY when default LLM was pre-set" (re-flagged): my previous caller-side fix only patched the one known call site (CLI YAML model-config). Any future caller or SDK user that pre-sets a `LlamaCpp` before `OPENAI_API_KEY` is observed would still strand on the wrong backend. Conceded — moved the fix into `getDefaultLlamaCpp()` itself. The getter now checks the env var on every call, discards a cached singleton whose type does not match the currently requested backend, and returns an exact-match singleton as-is so shared state is preserved across callers. `setDefaultLlamaCpp` remains useful for tests and for pinning a specific pre-configured instance when it matches the requested backend. The CLI pre-set is simplified accordingly. - P1 "Regenerate bun.lock after adding OpenAI dependencies": the earlier commit added `openai` and `tiktoken` to package.json without touching the tracked lockfiles. Regenerated `bun.lock` (bun CI uses `--frozen-lockfile`) and `pnpm-lock.yaml` via `bun install` and `pnpm install`. `package-lock.json` is gitignored upstream so is not tracked. - P2 "Include intent hint in OpenAI query-expansion prompt": `store.expandQuery` passes `{ intent }` to `llm.expandQuery(...)`, and the local `LlamaCpp` implementation inlines it as `Query intent: <intent>`. `OpenAILLM.expandQuery` previously ignored it, degrading intent-guided search when the hosted backend was active. Now it accepts `options.intent` and prepends a `Query intent: ...` line to the expansion prompt. Test updates: - Replace the "singleton is sticky" test with two new tests: - `env selection wins over a pre-set singleton of the wrong backend` - `matching pre-set singleton is reused` - Existing retry / fallback / backend-selection tests still pass.

lukeboyett · 2026-04-14T14:36:32Z

All three addressed in 7e49d29:

P1 "Honor OPENAI_API_KEY when default LLM was pre-set" — conceded the design point. Moved the fix into getDefaultLlamaCpp() itself: the getter now re-checks OPENAI_API_KEY on every call, discards a cached singleton whose type doesn't match the current env, and only reuses the singleton when it already matches. Covers not just the CLI YAML path but any SDK/downstream caller that pre-sets.

P1 "Regenerate bun.lock after adding OpenAI dependencies" — regenerated bun.lock and pnpm-lock.yaml via bun install and pnpm install. package-lock.json is gitignored upstream so not tracked.

P2 "Include intent hint in OpenAI query-expansion prompt" — OpenAILLM.expandQuery now accepts options.intent and prepends Query intent: <intent> to the expansion prompt, matching LlamaCpp.expandQuery's behavior. store.expandQuery already passes { intent } to both.

Test updates:

Replaced the old "singleton is sticky" test with env selection wins over a pre-set singleton of the wrong backend and matching pre-set singleton is reused — documents the new behavior on both the mismatch and match paths.

@codex review

chatgpt-codex-connector · 2026-04-14T14:42:37Z

Codex Review: Didn't find any major issues. You're on a roll.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector bot reviewed Apr 14, 2026

View reviewed changes

This was referenced Apr 14, 2026

perf(store): add opt-in TTL cache to getStatus for polling callers #4

Open

feat(bin): opt-in daemon fast-path for search and vsearch #5

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(llm): add optional OpenAI embedding backend#3

feat(llm): add optional OpenAI embedding backend#3
lukeboyett wants to merge 3 commits intofix/db-transaction-typefrom
feat/openai-embed-backend

lukeboyett commented Apr 14, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Apr 14, 2026

Uh oh!

chatgpt-codex-connector bot Apr 14, 2026

Uh oh!

lukeboyett commented Apr 14, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Apr 14, 2026

Uh oh!

chatgpt-codex-connector bot Apr 14, 2026

Uh oh!

lukeboyett commented Apr 14, 2026

Uh oh!

chatgpt-codex-connector bot commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

lukeboyett commented Apr 14, 2026

Summary

Motivation

What's in it

What's NOT changed

Test plan

Notes

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

lukeboyett commented Apr 14, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

lukeboyett commented Apr 14, 2026

Uh oh!

chatgpt-codex-connector bot commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant