Skip to content

feat(llm): add optional OpenAI embedding backend#3

Open
lukeboyett wants to merge 3 commits intofix/db-transaction-typefrom
feat/openai-embed-backend
Open

feat(llm): add optional OpenAI embedding backend#3
lukeboyett wants to merge 3 commits intofix/db-transaction-typefrom
feat/openai-embed-backend

Conversation

@lukeboyett
Copy link
Copy Markdown
Owner

Summary

Adds an optional OpenAI embedding backend to QMD. When OPENAI_API_KEY is set in the environment, getDefaultLlamaCpp() returns a new OpenAILLM instead of the local node-llama-cpp implementation. When no key is present, behavior is unchanged.

Motivation

  • Not every user wants to download ~2 GB of local GGUF models on first run.
  • Some users already have an OpenAI budget and would prefer hosted embeddings + chat completion.
  • Makes qmd viable on memory/CPU-constrained machines, containers without GPU, and CI.

What's in it

Source:

  • New OpenAILLM class in src/llm.ts implementing the full LLM interface plus the duck-typed surface used by store.ts/cli/qmd.ts (embedBatch, tokenize/countTokens/detokenize via tiktoken cl100k_base, getDeviceInfo).
  • Selection rule in getDefaultLlamaCpp(): new OpenAILLM() when OPENAI_API_KEY is set, else new LlamaCpp().
  • New exported type QmdLLM = LlamaCpp | OpenAILLM; store.ts signatures accept QmdLLM so call sites work with either backend.
  • Model selection via env vars with sensible defaults:
    • QMD_OPENAI_EMBED_MODELtext-embedding-3-small
    • QMD_OPENAI_GENERATE_MODELgpt-4o-mini
  • Transient failure handling inside OpenAILLM: exponential backoff + jitter on 408/409/425/429/5xx and transport errors (ECONNRESET, UND_ERR_*, etc.). A batch that still fails after retries falls back to per-document embed() so one flaky batch doesn't zero out an entire index run.
  • Rerank falls back to cosine similarity on OpenAI embeddings since OpenAI has no native rerank endpoint.

User-facing:

  • qmd --help now has an "Embedding backend" section listing the relevant env vars.
  • qmd status displays a Backend: line (llama.cpp (...) vs openai (...)) without instantiating an LLM.
  • README adds an "Embedding backend: OpenAI (hosted)" section; the Environment Variables table is extended.

Tests: test/openai-backend.test.ts mocks the openai module (no real API calls) and covers:

  • Backend selection by env var presence / absence
  • Constructor rejects missing key
  • options.model is ignored — configured OpenAI model name wins
  • Retry on transient 429 eventually succeeds
  • Gives up after maxAttempts on persistent ECONNRESET, returns null
  • No retry on 401
  • embedBatch falls back to per-item embed when a batch fails repeatedly

Dependencies:

  • openai@^6
  • tiktoken@^1

What's NOT changed

  • Local llama.cpp path is untouched when OPENAI_API_KEY is unset. Existing tests still cover it.
  • No migration behavior is required on the user's side beyond qmd embed -f if they toggle the env var (vector dimensions differ: 768 for embeddinggemma, 1536 for text-embedding-3-small). This is called out in the README note.

Test plan

  • npm run build passes
  • New test file passes in isolation (vitest run test/openai-backend.test.ts — 8 passed)
  • qmd status shows Backend: llama.cpp (...) without a key, Backend: openai (text-embedding-3-small) with a key (verified locally against a scratch index)
  • qmd --help renders the new Embedding backend block
  • Full CI suite (other tests unaffected: changes to store.ts are type-only — LlamaCppQmdLLM)

Notes

Adds `OpenAILLM`, a second concrete `LLM` implementation, and routes
`getDefaultLlamaCpp()` to it when `OPENAI_API_KEY` is set in the
environment. Local node-llama-cpp remains the default when no key is
present, so existing installs are unchanged.

Why:
- Not every user wants to download ~2 GB of local GGUF models.
- Some environments already have an OpenAI budget and would prefer
  hosted embeddings + chat completion.
- A hosted backend makes qmd viable in memory/CPU-constrained setups
  and containers without GPU.

Implementation:
- `OpenAILLM` implements the `LLM` interface plus duck-typed
  `embedBatch`, `tokenize`/`countTokens`/`detokenize` (tiktoken
  cl100k_base), and `getDeviceInfo` (CPU-only) for parity with
  `LlamaCpp` at all call sites in `src/store.ts` and `src/cli/qmd.ts`.
- Model selection is env-driven with defaults `text-embedding-3-small`
  (embed) and `gpt-4o-mini` (generate). Overrides:
  `QMD_OPENAI_EMBED_MODEL`, `QMD_OPENAI_GENERATE_MODEL`.
- OpenAILLM ignores local GGUF model identifiers passed via
  `options.model` — it always uses the configured OpenAI model name.
- Retry + backoff on transient HTTP (408/409/425/429/5xx) and socket
  errors (ECONNRESET, UND_ERR_*, etc.). A failed batch falls back to
  per-document embedding so one flaky batch doesn't zero out every row.
- Rerank falls back to cosine similarity on OpenAI embeddings since
  OpenAI has no native rerank endpoint.
- Export `QmdLLM = LlamaCpp | OpenAILLM` from `llm.ts`. Store.ts
  signatures accept `QmdLLM` so either backend can flow through.

User-facing:
- `qmd --help` gains an "Embedding backend" section listing the
  relevant env vars.
- `qmd status` shows a `Backend:` line (`llama.cpp (...)` vs
  `openai (...)`) so the active path is obvious without
  instantiating an LLM.
- README gets an "Embedding backend: OpenAI (hosted)" subsection and
  the Environment Variables table is extended.

Tests:
- `test/openai-backend.test.ts` mocks the `openai` module (no real
  API calls) and covers:
  - backend selection by env var,
  - constructor error when key is absent,
  - model-name override behavior,
  - retry on transient 429,
  - give-up after maxAttempts on persistent ECONNRESET,
  - no retry on 401,
  - batch-to-single-item fallback after repeated 503s.

Dependencies:
- openai@^6
- tiktoken@^1
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4be5422098

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/llm.ts Outdated
Comment on lines +2001 to +2004
if (!defaultLlamaCpp) {
defaultLlamaCpp = new LlamaCpp();
if (process.env.OPENAI_API_KEY) {
defaultLlamaCpp = new OpenAILLM();
} else {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Honor OPENAI_API_KEY when default LLM was pre-set

Update backend selection so getDefaultLlamaCpp() does not permanently stick to an earlier LlamaCpp singleton. Right now it only checks OPENAI_API_KEY when defaultLlamaCpp is null, but getStore() can pre-set the singleton to new LlamaCpp(...) from YAML model config (src/cli/qmd.ts), which means OpenAI never activates for those users even when the key is present. This breaks the new backend-switch behavior for a common configured setup.

Useful? React with 👍 / 👎.

Comment thread src/cli/qmd.ts Outdated
Comment on lines +349 to +351
if (process.env.OPENAI_API_KEY) {
const embedModel = process.env.QMD_OPENAI_EMBED_MODEL || "text-embedding-3-small";
console.log(`Backend: openai (${embedModel})`);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Report backend from active LLM instance, not env var

qmd status currently prints Backend: openai (...) whenever OPENAI_API_KEY is set, but the active singleton may already be a local LlamaCpp (for example when model config initializes it first). In that case status output is incorrect and can send users down the wrong debugging path; derive the backend from the instantiated default LLM/store instance instead of environment presence alone.

Useful? React with 👍 / 👎.

Codex round 1 on #3 flagged two linked concerns:

- P1 "Honor OPENAI_API_KEY when default LLM was pre-set": when
  `config.models` is present, `getStore()` in `src/cli/qmd.ts` calls
  `setDefaultLlamaCpp(new LlamaCpp(...))` before `getDefaultLlamaCpp()`
  has a chance to apply env-based selection. OpenAI never activates
  for that user even with a valid key.

- P2 "Report backend from active LLM instance, not env var":
  `qmd status` inferred `Backend: openai (...)` from
  `process.env.OPENAI_API_KEY`, which is a lie whenever the CLI
  already pre-set a local `LlamaCpp` singleton.

Fix (at the caller, not in the singleton):

- The YAML-models branch in `getStore()` now checks
  `OPENAI_API_KEY` first. If the key is set, it pre-sets an
  `OpenAILLM` and the configured GGUF URIs are intentionally
  ignored (the hosted backend doesn't understand them; it uses
  `QMD_OPENAI_EMBED_MODEL` / `QMD_OPENAI_GENERATE_MODEL` or the
  built-in defaults).
- `showStatus()` derives the backend label from the actual instance
  via `getDefaultLlamaCpp() instanceof OpenAILLM`, and reads
  `llm.embedModelName` (both concrete classes expose it without
  forcing a model load) so the status output always matches the
  real active backend.

Test:

- `a pre-set LlamaCpp singleton wins over env selection` asserts
  the singleton's stickiness so status-reporting code can rely on
  `getDefaultLlamaCpp()` reflecting the active instance. The
  behavior being flagged is a caller bug; the singleton itself is
  now documented as intentionally sticky once set.
@lukeboyett
Copy link
Copy Markdown
Owner Author

Both concerns addressed in e88f653:

P1 "Honor OPENAI_API_KEY when default LLM was pre-set" — the fix is at the caller. The YAML-models branch in getStore() now checks OPENAI_API_KEY first and pre-sets an OpenAILLM instead of a LlamaCpp when the key is present. Local GGUF URIs in the YAML are intentionally ignored by the hosted backend; QMD_OPENAI_EMBED_MODEL / QMD_OPENAI_GENERATE_MODEL take over.

P2 "Report backend from active LLM instance, not env var"showStatus() now derives the backend label from getDefaultLlamaCpp() instanceof OpenAILLM and reads llm.embedModelName. Both concrete classes expose that without forcing a model load, so status stays cheap.

New test a pre-set LlamaCpp singleton wins over env selection documents the singleton's stickiness so status/reporting code can rely on getDefaultLlamaCpp() reflecting the real instance. Kept the fix in the caller rather than making getDefaultLlamaCpp() self-correcting because silently mutating a pre-set singleton from inside the getter felt surprising — happy to invert if you disagree.

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e88f653001

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread package.json
Comment on lines +52 to +55
"openai": "^6.34.0",
"picomatch": "4.0.4",
"sqlite-vec": "0.1.9",
"tiktoken": "^1.0.22",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Regenerate bun.lock after adding OpenAI dependencies

These new runtime dependencies were added to package.json without a corresponding lockfile update, which breaks the repo’s frozen-lockfile checks (bun install --frozen-lockfile in .github/workflows/ci.yml and release scripts). In this state, CI/publish will fail before tests run because the lockfile no longer matches declared dependencies.

Useful? React with 👍 / 👎.

Comment thread src/llm.ts Outdated
Comment on lines +676 to +680
async expandQuery(query: string, options: { context?: string; includeLexical?: boolean } = {}): Promise<Queryable[]> {
const includeLexical = options.includeLexical ?? true;

try {
const prompt = `Expand this search query into 3 variations:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Include intent hint in OpenAI query-expansion prompt

The OpenAI expandQuery path ignores caller-provided expansion hints and builds its prompt from query alone, so intent-guided search degrades when the OpenAI backend is active. store.expandQuery passes an intent hint into llm.expandQuery(...), and the local LlamaCpp implementation uses it (Query intent: ...), but this implementation drops it entirely, changing ranking/recall behavior for users of --intent.

Useful? React with 👍 / 👎.

Codex round 2 on #3 flagged three issues:

- P1 "Honor OPENAI_API_KEY when default LLM was pre-set" (re-flagged):
  my previous caller-side fix only patched the one known call site
  (CLI YAML model-config). Any future caller or SDK user that pre-sets
  a `LlamaCpp` before `OPENAI_API_KEY` is observed would still strand
  on the wrong backend. Conceded — moved the fix into
  `getDefaultLlamaCpp()` itself. The getter now checks the env var on
  every call, discards a cached singleton whose type does not match
  the currently requested backend, and returns an exact-match
  singleton as-is so shared state is preserved across callers.
  `setDefaultLlamaCpp` remains useful for tests and for pinning a
  specific pre-configured instance when it matches the requested
  backend. The CLI pre-set is simplified accordingly.

- P1 "Regenerate bun.lock after adding OpenAI dependencies": the
  earlier commit added `openai` and `tiktoken` to package.json without
  touching the tracked lockfiles. Regenerated `bun.lock` (bun CI uses
  `--frozen-lockfile`) and `pnpm-lock.yaml` via `bun install` and
  `pnpm install`. `package-lock.json` is gitignored upstream so is
  not tracked.

- P2 "Include intent hint in OpenAI query-expansion prompt":
  `store.expandQuery` passes `{ intent }` to `llm.expandQuery(...)`,
  and the local `LlamaCpp` implementation inlines it as
  `Query intent: <intent>`. `OpenAILLM.expandQuery` previously
  ignored it, degrading intent-guided search when the hosted backend
  was active. Now it accepts `options.intent` and prepends a
  `Query intent: ...` line to the expansion prompt.

Test updates:

- Replace the "singleton is sticky" test with two new tests:
  - `env selection wins over a pre-set singleton of the wrong backend`
  - `matching pre-set singleton is reused`
- Existing retry / fallback / backend-selection tests still pass.
@lukeboyett
Copy link
Copy Markdown
Owner Author

All three addressed in 7e49d29:

P1 "Honor OPENAI_API_KEY when default LLM was pre-set" — conceded the design point. Moved the fix into getDefaultLlamaCpp() itself: the getter now re-checks OPENAI_API_KEY on every call, discards a cached singleton whose type doesn't match the current env, and only reuses the singleton when it already matches. Covers not just the CLI YAML path but any SDK/downstream caller that pre-sets.

P1 "Regenerate bun.lock after adding OpenAI dependencies" — regenerated bun.lock and pnpm-lock.yaml via bun install and pnpm install. package-lock.json is gitignored upstream so not tracked.

P2 "Include intent hint in OpenAI query-expansion prompt"OpenAILLM.expandQuery now accepts options.intent and prepends Query intent: <intent> to the expansion prompt, matching LlamaCpp.expandQuery's behavior. store.expandQuery already passes { intent } to both.

Test updates:

  • Replaced the old "singleton is sticky" test with env selection wins over a pre-set singleton of the wrong backend and matching pre-set singleton is reused — documents the new behavior on both the mismatch and match paths.

@codex review

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. You're on a roll.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant