feat(llm): add optional OpenAI embedding backend#3
feat(llm): add optional OpenAI embedding backend#3lukeboyett wants to merge 3 commits intofix/db-transaction-typefrom
Conversation
Adds `OpenAILLM`, a second concrete `LLM` implementation, and routes `getDefaultLlamaCpp()` to it when `OPENAI_API_KEY` is set in the environment. Local node-llama-cpp remains the default when no key is present, so existing installs are unchanged. Why: - Not every user wants to download ~2 GB of local GGUF models. - Some environments already have an OpenAI budget and would prefer hosted embeddings + chat completion. - A hosted backend makes qmd viable in memory/CPU-constrained setups and containers without GPU. Implementation: - `OpenAILLM` implements the `LLM` interface plus duck-typed `embedBatch`, `tokenize`/`countTokens`/`detokenize` (tiktoken cl100k_base), and `getDeviceInfo` (CPU-only) for parity with `LlamaCpp` at all call sites in `src/store.ts` and `src/cli/qmd.ts`. - Model selection is env-driven with defaults `text-embedding-3-small` (embed) and `gpt-4o-mini` (generate). Overrides: `QMD_OPENAI_EMBED_MODEL`, `QMD_OPENAI_GENERATE_MODEL`. - OpenAILLM ignores local GGUF model identifiers passed via `options.model` — it always uses the configured OpenAI model name. - Retry + backoff on transient HTTP (408/409/425/429/5xx) and socket errors (ECONNRESET, UND_ERR_*, etc.). A failed batch falls back to per-document embedding so one flaky batch doesn't zero out every row. - Rerank falls back to cosine similarity on OpenAI embeddings since OpenAI has no native rerank endpoint. - Export `QmdLLM = LlamaCpp | OpenAILLM` from `llm.ts`. Store.ts signatures accept `QmdLLM` so either backend can flow through. User-facing: - `qmd --help` gains an "Embedding backend" section listing the relevant env vars. - `qmd status` shows a `Backend:` line (`llama.cpp (...)` vs `openai (...)`) so the active path is obvious without instantiating an LLM. - README gets an "Embedding backend: OpenAI (hosted)" subsection and the Environment Variables table is extended. Tests: - `test/openai-backend.test.ts` mocks the `openai` module (no real API calls) and covers: - backend selection by env var, - constructor error when key is absent, - model-name override behavior, - retry on transient 429, - give-up after maxAttempts on persistent ECONNRESET, - no retry on 401, - batch-to-single-item fallback after repeated 503s. Dependencies: - openai@^6 - tiktoken@^1
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4be5422098
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (!defaultLlamaCpp) { | ||
| defaultLlamaCpp = new LlamaCpp(); | ||
| if (process.env.OPENAI_API_KEY) { | ||
| defaultLlamaCpp = new OpenAILLM(); | ||
| } else { |
There was a problem hiding this comment.
Honor OPENAI_API_KEY when default LLM was pre-set
Update backend selection so getDefaultLlamaCpp() does not permanently stick to an earlier LlamaCpp singleton. Right now it only checks OPENAI_API_KEY when defaultLlamaCpp is null, but getStore() can pre-set the singleton to new LlamaCpp(...) from YAML model config (src/cli/qmd.ts), which means OpenAI never activates for those users even when the key is present. This breaks the new backend-switch behavior for a common configured setup.
Useful? React with 👍 / 👎.
| if (process.env.OPENAI_API_KEY) { | ||
| const embedModel = process.env.QMD_OPENAI_EMBED_MODEL || "text-embedding-3-small"; | ||
| console.log(`Backend: openai (${embedModel})`); |
There was a problem hiding this comment.
Report backend from active LLM instance, not env var
qmd status currently prints Backend: openai (...) whenever OPENAI_API_KEY is set, but the active singleton may already be a local LlamaCpp (for example when model config initializes it first). In that case status output is incorrect and can send users down the wrong debugging path; derive the backend from the instantiated default LLM/store instance instead of environment presence alone.
Useful? React with 👍 / 👎.
Codex round 1 on #3 flagged two linked concerns: - P1 "Honor OPENAI_API_KEY when default LLM was pre-set": when `config.models` is present, `getStore()` in `src/cli/qmd.ts` calls `setDefaultLlamaCpp(new LlamaCpp(...))` before `getDefaultLlamaCpp()` has a chance to apply env-based selection. OpenAI never activates for that user even with a valid key. - P2 "Report backend from active LLM instance, not env var": `qmd status` inferred `Backend: openai (...)` from `process.env.OPENAI_API_KEY`, which is a lie whenever the CLI already pre-set a local `LlamaCpp` singleton. Fix (at the caller, not in the singleton): - The YAML-models branch in `getStore()` now checks `OPENAI_API_KEY` first. If the key is set, it pre-sets an `OpenAILLM` and the configured GGUF URIs are intentionally ignored (the hosted backend doesn't understand them; it uses `QMD_OPENAI_EMBED_MODEL` / `QMD_OPENAI_GENERATE_MODEL` or the built-in defaults). - `showStatus()` derives the backend label from the actual instance via `getDefaultLlamaCpp() instanceof OpenAILLM`, and reads `llm.embedModelName` (both concrete classes expose it without forcing a model load) so the status output always matches the real active backend. Test: - `a pre-set LlamaCpp singleton wins over env selection` asserts the singleton's stickiness so status-reporting code can rely on `getDefaultLlamaCpp()` reflecting the active instance. The behavior being flagged is a caller bug; the singleton itself is now documented as intentionally sticky once set.
|
Both concerns addressed in e88f653: P1 "Honor OPENAI_API_KEY when default LLM was pre-set" — the fix is at the caller. The YAML-models branch in P2 "Report backend from active LLM instance, not env var" — New test @codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e88f653001
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| "openai": "^6.34.0", | ||
| "picomatch": "4.0.4", | ||
| "sqlite-vec": "0.1.9", | ||
| "tiktoken": "^1.0.22", |
There was a problem hiding this comment.
Regenerate bun.lock after adding OpenAI dependencies
These new runtime dependencies were added to package.json without a corresponding lockfile update, which breaks the repo’s frozen-lockfile checks (bun install --frozen-lockfile in .github/workflows/ci.yml and release scripts). In this state, CI/publish will fail before tests run because the lockfile no longer matches declared dependencies.
Useful? React with 👍 / 👎.
| async expandQuery(query: string, options: { context?: string; includeLexical?: boolean } = {}): Promise<Queryable[]> { | ||
| const includeLexical = options.includeLexical ?? true; | ||
|
|
||
| try { | ||
| const prompt = `Expand this search query into 3 variations: |
There was a problem hiding this comment.
Include intent hint in OpenAI query-expansion prompt
The OpenAI expandQuery path ignores caller-provided expansion hints and builds its prompt from query alone, so intent-guided search degrades when the OpenAI backend is active. store.expandQuery passes an intent hint into llm.expandQuery(...), and the local LlamaCpp implementation uses it (Query intent: ...), but this implementation drops it entirely, changing ranking/recall behavior for users of --intent.
Useful? React with 👍 / 👎.
Codex round 2 on #3 flagged three issues: - P1 "Honor OPENAI_API_KEY when default LLM was pre-set" (re-flagged): my previous caller-side fix only patched the one known call site (CLI YAML model-config). Any future caller or SDK user that pre-sets a `LlamaCpp` before `OPENAI_API_KEY` is observed would still strand on the wrong backend. Conceded — moved the fix into `getDefaultLlamaCpp()` itself. The getter now checks the env var on every call, discards a cached singleton whose type does not match the currently requested backend, and returns an exact-match singleton as-is so shared state is preserved across callers. `setDefaultLlamaCpp` remains useful for tests and for pinning a specific pre-configured instance when it matches the requested backend. The CLI pre-set is simplified accordingly. - P1 "Regenerate bun.lock after adding OpenAI dependencies": the earlier commit added `openai` and `tiktoken` to package.json without touching the tracked lockfiles. Regenerated `bun.lock` (bun CI uses `--frozen-lockfile`) and `pnpm-lock.yaml` via `bun install` and `pnpm install`. `package-lock.json` is gitignored upstream so is not tracked. - P2 "Include intent hint in OpenAI query-expansion prompt": `store.expandQuery` passes `{ intent }` to `llm.expandQuery(...)`, and the local `LlamaCpp` implementation inlines it as `Query intent: <intent>`. `OpenAILLM.expandQuery` previously ignored it, degrading intent-guided search when the hosted backend was active. Now it accepts `options.intent` and prepends a `Query intent: ...` line to the expansion prompt. Test updates: - Replace the "singleton is sticky" test with two new tests: - `env selection wins over a pre-set singleton of the wrong backend` - `matching pre-set singleton is reused` - Existing retry / fallback / backend-selection tests still pass.
|
All three addressed in 7e49d29: P1 "Honor OPENAI_API_KEY when default LLM was pre-set" — conceded the design point. Moved the fix into P1 "Regenerate bun.lock after adding OpenAI dependencies" — regenerated P2 "Include intent hint in OpenAI query-expansion prompt" — Test updates:
@codex review |
|
Codex Review: Didn't find any major issues. You're on a roll. ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
Summary
Adds an optional OpenAI embedding backend to QMD. When
OPENAI_API_KEYis set in the environment,getDefaultLlamaCpp()returns a newOpenAILLMinstead of the localnode-llama-cppimplementation. When no key is present, behavior is unchanged.Motivation
What's in it
Source:
OpenAILLMclass insrc/llm.tsimplementing the fullLLMinterface plus the duck-typed surface used bystore.ts/cli/qmd.ts(embedBatch,tokenize/countTokens/detokenizevia tiktokencl100k_base,getDeviceInfo).getDefaultLlamaCpp():new OpenAILLM()whenOPENAI_API_KEYis set, elsenew LlamaCpp().QmdLLM = LlamaCpp | OpenAILLM; store.ts signatures acceptQmdLLMso call sites work with either backend.QMD_OPENAI_EMBED_MODEL→text-embedding-3-smallQMD_OPENAI_GENERATE_MODEL→gpt-4o-miniOpenAILLM: exponential backoff + jitter on408/409/425/429/5xxand transport errors (ECONNRESET,UND_ERR_*, etc.). A batch that still fails after retries falls back to per-documentembed()so one flaky batch doesn't zero out an entire index run.User-facing:
qmd --helpnow has an "Embedding backend" section listing the relevant env vars.qmd statusdisplays aBackend:line (llama.cpp (...)vsopenai (...)) without instantiating an LLM.Tests:
test/openai-backend.test.tsmocks theopenaimodule (no real API calls) and covers:options.modelis ignored — configured OpenAI model name winsnullembedBatchfalls back to per-item embed when a batch fails repeatedlyDependencies:
openai@^6tiktoken@^1What's NOT changed
OPENAI_API_KEYis unset. Existing tests still cover it.qmd embed -fif they toggle the env var (vector dimensions differ: 768 for embeddinggemma, 1536 for text-embedding-3-small). This is called out in the README note.Test plan
npm run buildpassesvitest run test/openai-backend.test.ts— 8 passed)qmd statusshowsBackend: llama.cpp (...)without a key,Backend: openai (text-embedding-3-small)with a key (verified locally against a scratch index)qmd --helprenders the new Embedding backend blockstore.tsare type-only —LlamaCpp→QmdLLM)Notes
fix/db-transaction-typebecause a cleanmaincurrently fails to build; that baseline build fix is the parent PR.-ccollection filter); either can merge first.