diff --git a/CHANGELOG.md b/CHANGELOG.md index e18179c3f..c0475f560 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -2,6 +2,122 @@ All notable changes to GBrain will be documented in this file. +## [0.32.0] - 2026-05-10 + +**5 new embedding providers + the discoverability fix that closes the 17-PR dupe cluster.** +**`gbrain providers list` now shows 14 recipes; `gbrain doctor` tells you which alternatives are already wired.** + +A triage of 197 open issues + 289 open PRs surfaced a 17-PR cluster of community embedding-provider PRs filed within ~3 weeks (Ollama, Gemini, Voyage, Azure, MiniMax, Copilot, llama-server, Vertex, DashScope, Zhipu, etc.). Most were dupes of work already in master — gbrain has shipped a comprehensive AI SDK gateway + recipe pattern since v0.14, with 9 providers built in. Users just didn't know. + +v0.32.0 ships the missing recipes that aren't covered by the existing pattern, plus a documentation pass + doctor advisory + improved error hints that close the discoverability gap. Codex outside-voice review during plan-eng-review caught the discoverability framing — without it, the wave would have shipped 8 recipes plus an OAuth subsystem instead of the focused 5-recipe + docs delivery. + +### The numbers that matter + +``` +gbrain providers list → v0.31.1: 9 providers → v0.32.0: 14 providers +gbrain doctor → v0.31.1: 1 advisory → v0.32.0: 2 advisories (+ alternative_providers) +``` + +5 new recipes: + +| Recipe | Auth | Default dims | Notes | +|---|---|---|---| +| `azure-openai` | `AZURE_OPENAI_API_KEY` + `AZURE_OPENAI_ENDPOINT` + `AZURE_OPENAI_DEPLOYMENT` | 1536 | First recipe with `api-key:` custom header (not Bearer); first with templated URL + `?api-version=` query injection | +| `minimax` | `MINIMAX_API_KEY` | 1536 | China-region; embo-01 model; type='db' asymmetric retrieval field plumbed via dims.ts | +| `dashscope` | `DASHSCOPE_API_KEY` | 1024 | Alibaba; international endpoint default; CJK-aware batching (chars_per_token=2) | +| `zhipu` | `ZHIPUAI_API_KEY` | 1024 | BigModel; embedding-3 with Matryoshka up to 2048 (HNSW falls back to exact-scan past 2000 dims) | +| `llama-server` | (none) | user-set | llama.cpp's `llama-server --embeddings`; user_provided_models recipe | + +### What this means for new users + +`gbrain init` keeps OpenAI as the zero-config default. Users with API keys for any of the other 13 providers see them surfaced via `gbrain doctor` ("Detected 2 alternative embedding providers ready to use: voyage, dashscope. Run `gbrain providers list` to switch."). Users on Azure tenancies, China-region, or local-only setups have first-class recipes instead of "find a workaround." Users with provider needs gbrain doesn't ship can route through LiteLLM proxy (the universal escape hatch) without writing custom code. + +For agents: every recipe is registered in the same `listRecipes()` registry, so `gbrain providers list/test/env/explain` automatically picks up new recipes without code changes. The recipe contract test (`test/ai/recipes-contract.test.ts`) keeps the registry honest. + +### To take advantage of v0.32.0 + +`gbrain upgrade` should do this automatically. If it didn't: + +1. **Confirm the new recipes show:** + ```bash + gbrain providers list + ``` + Should show 14 entries including `azure-openai`, `minimax`, `dashscope`, `zhipu`, `llama-server`. + +2. **Try the doctor advisory:** + ```bash + gbrain doctor + ``` + Look for the `alternative_providers` row. If env vars for unconfigured providers are present, it'll name them. + +3. **Read the new docs** at [`docs/integrations/embedding-providers.md`](docs/integrations/embedding-providers.md) — capability matrix, decision tree, per-recipe setup, "my provider isn't listed" path. + +4. **No breaking changes**: the existing 9 recipes (openai, anthropic, google, deepseek, groq, ollama, litellm-proxy, together, voyage) keep working unchanged. The internal auth refactor (D12=A unified resolveAuth seam) is pinned by `test/ai/recipes-existing-regression.test.ts` so the next refactor can't silently break them. + +5. **If anything breaks**, file an issue at https://github.com/garrytan/gbrain/issues with `gbrain doctor` output. The only behavior change for existing recipes: Ollama expansion + chat now read `OLLAMA_API_KEY` when set (embedding already did; the unification aligns all three touchpoints). + +### Itemized changes + +#### Architectural foundations + +- **Recipe.resolveAuth(env) seam (D12=A)**: unified the openai-compatible auth path, which was duplicated 3 times across `instantiateEmbedding`, `instantiateExpansion`, `instantiateChat` with subtle drift. Default impl (used by all existing recipes unchanged) returns `{headerName: 'Authorization', token: 'Bearer '}`. Recipes deviating override; Azure is the first. +- **Recipe.resolveOpenAICompatConfig(env) seam**: env-templated baseURL + optional fetch wrapper for recipes whose URL shape doesn't fit a static `base_url_default`. Azure uses both seams. +- **Recipe.probe() seam (D13=A)**: recipe-owned readiness check for local-server providers. Replaces the hardcoded `recipe.id === 'ollama'` special case in `runExplain()`. llama-server declares its own probe; future local providers self-register. +- **EmbeddingTouchpoint.user_provided_models?: true (D8=A)**: explicit signal for recipes that ship without a fixed model list (litellm, llama-server). Replaces the legacy `recipe.id === 'litellm'` hardcode in gateway.ts:223; refusal in `init.ts:resolveAIOptions` for shorthand `--model` with a setup hint pointing at the explicit form. +- **EmbeddingTouchpoint.no_batch_cap?: true**: silences the missing-max_batch_tokens startup warning for recipes with genuinely dynamic batch capacity (Ollama, LiteLLM proxy, llama-server). Pre-fix: 3 stderr warnings on every `configureGateway()` call. Post-fix: only `google` warns. + +#### Discoverability + +- New `docs/integrations/embedding-providers.md` (one-pager: capability table, decision tree, per-recipe setup, "my provider isn't listed" path to LiteLLM). +- README embedding-providers callout near the top of the install section. +- `gbrain doctor` adds an `alternative_providers` check that surfaces recipes whose env vars are already set but aren't the configured provider. +- `gbrain init --model litellm` (or any user_provided_models recipe) now refuses with a structured setup hint instead of throwing "no embedding models listed." + +#### Codex review fixes (pre-merge) + +- **dimsProviderOptions on openai-compatible**: text-embedding-3-* (Azure), text-embedding-v3 (DashScope), and embedding-3 (Zhipu) now thread `dimensions` to the wire. Without this, Azure-default 3072d would mismatch a 1536d brain on the first embed; DashScope/Zhipu Matryoshka requests would be silently ignored. +- **`gbrain init --embedding-model llama-server:foo` (verbose path)**: now refuses without `--embedding-dimensions`. Pre-fix, the verbose path fell through to the gateway's 1536d default and silently created the wrong-width schema (only the shorthand `--model` was guarded). +- **MiniMax host correction**: `api.minimax.chat` → `api.minimaxi.com` (matches MiniMax's current OpenAI-compatible docs). +- **`LLAMA_SERVER_BASE_URL` reaches the gateway**: `buildGatewayConfig` now threads `LLAMA_SERVER_BASE_URL`, `OLLAMA_BASE_URL`, `LMSTUDIO_BASE_URL`, `LITELLM_BASE_URL` env into `cfg.base_urls` so embed traffic actually hits the configured port. Pre-fix, the env-only setup let probe pass on a custom port while traffic still hit `localhost:8080`. +- **`Recipe.probe(baseURL?)` accepts the resolved URL**: probe and gateway can no longer disagree when only `provider_base_urls` is set in config (no env). Callers with cfg pass the URL; legacy callers fall back to env. + +#### Adjacent fixes + +- **#779 (alexandreroumieu-codeapprentice) reworked**: `EmbeddingTouchpoint.no_batch_cap?: true` opt-out for dynamic-cap recipes. +- **#121 (vinsew) reworked**: `~/.gbrain/config.json` API keys now propagate to the gateway env. Pre-fix, `openai_api_key` / `anthropic_api_key` config-file values were ignored (the gateway only saw `process.env`). Common bite: launchd-spawned daemons or agent subprocess tools without `~/.zshrc` propagation. Process env still wins on conflict. +- `loadConfig()` now merges `ANTHROPIC_API_KEY` env var into the file-config result (was silently dropped). +- IRON RULE regression test (`test/ai/recipes-existing-regression.test.ts`): pins that the v0.32 resolveAuth refactor preserves auth behavior for the existing 9 recipes. + +### Closed as superseded + +The following community PRs are closed because their work is now covered by the recipe system + LiteLLM proxy escape hatch + the recipes shipped in this wave: + +- #49, #58, #73, #100, #112, #134, #137, #150, #172, #178, #255, #327, #420, #482, #516, #780, #89 — pluggable embedding adapter / Ollama / Gemini / E5 / Azure-via-LiteLLM / etc. + +Each contributor identified a real gap; the patterns they prototyped converged on the recipe system that was shipped in v0.14. Thank you for the early signal. + +### Deferred to v0.32.x (with TODOS.md entries) + +- **#729 Vertex AI ADC** (lucha0404): proper ADC chain (metadata server, gcloud creds, service-account JSON) is a real product surface, not the single-source-JSON path the original PR proposed. +- **#691 GitHub Copilot** (tonyxu-io): outbound OAuth is a new product surface (login flow, browser/device flow, refresh, UX), not a sidecar recipe. Needs its own design pass. +- **#698 OpenAI Codex OAuth** (perlantir): same OAuth-product-surface argument; chat-only. +- **#765 Hunyuan PGLite + CJK keyword fallback** (313094319-sudo): the CJK PGLite branch is ~150 lines of new SQL + scoring logic that deserves its own focused PR rather than being folded into a 9-commit wave. +- **Interactive provider chooser in `gbrain init`**: the wizard piece of the discoverability lane. v0.32.0 ships the doctor advisory + cleaner refusal that close the 80% case; the full wizard is a v0.32.x follow-up. +- **Real-credentials per-recipe smoke fixtures**: opt-in CI matrix gated on API-key budget approval. + +### Contributors + +Reworked from / inspired by: +- @cacity (#148 MiniMax) +- @JamesJZhang (#459 Azure OpenAI) +- @Magicray1217 (#59 DashScope + Zhipu) +- @SiyaoZheng (#702 llama-server) +- @alexandreroumieu-codeapprentice (#779) +- @vinsew (#121) +- @100yenadmin / Eva (Voyage 4 Large 2048d HNSW policy, shipped earlier via 3004a87) + +Codex outside-voice review during plan-eng-review drove the scope reduction (D11=C) from 8 recipes + OAuth subsystem to 5 recipes + docs. + ## [0.31.12] - 2026-05-10 **The chat default no longer 404s, and every Claude call gbrain makes is now one config key away from your preferred model.** diff --git a/README.md b/README.md index 3de35dc52..9b33b3a0c 100644 --- a/README.md +++ b/README.md @@ -16,6 +16,8 @@ GBrain is those patterns, generalized. 34 skills. Install in 30 minutes. Your ag > **LLMs:** fetch [`llms.txt`](llms.txt) for the documentation map, or [`llms-full.txt`](llms-full.txt) for the same map with core docs inlined in one fetch. **Agents:** start with [`AGENTS.md`](AGENTS.md) (or [`CLAUDE.md`](CLAUDE.md) if you're Claude Code). +> **Embedding providers:** OpenAI is the default, but gbrain ships with **14 recipes** covering Voyage, Google Gemini, Azure OpenAI, MiniMax, Alibaba DashScope, Zhipu, Ollama (local), llama.cpp llama-server (local), LiteLLM proxy (universal), and 5 more. Run `gbrain providers list` to see them, or read [`docs/integrations/embedding-providers.md`](docs/integrations/embedding-providers.md) for setup, pricing, and a decision tree. `gbrain doctor` will surface alternative providers whose env vars you already have set. + ## Install ### On an agent platform (recommended) diff --git a/TODOS.md b/TODOS.md index 140b53211..2c2002b39 100644 --- a/TODOS.md +++ b/TODOS.md @@ -1,5 +1,71 @@ # TODOS +## Embedding-provider follow-ups (v0.32.0) + +- [ ] **v0.32.x: Vertex AI ADC embedding provider (#729 originally).** lucha0404 + prototyped this with single-source-JSON via `GOOGLE_APPLICATION_CREDENTIALS`. + Real ADC is the full chain (metadata server, gcloud creds, service-account + JSON). The recipe needs to either use `@ai-sdk/google-vertex` (one new + dep, native fit) or implement the chain via Bun.crypto.subtle for RS256 + JWT signing (zero dep, ~150 lines + RS256 spike). Original Q3 chose + zero-dep; revisit the dep budget when scoping. + +- [ ] **v0.32.x: GitHub Copilot embeddings (#691 originally).** tonyxu-io + proposed adding Copilot's Metis embedding endpoint as a sidecar recipe. + Codex review caught that this is not a recipe-add — it's an outbound OAuth + product surface (login flow, browser/device flow, refresh, UX). Needs its + own design pass: where does the token live? `~/.gbrain/oauth/copilot.json` + mode 0600 was the v0.32 plan; revisit + write `gbrain auth login copilot`. + +- [ ] **v0.32.x: OpenAI Codex OAuth chat provider (#698 originally).** perlantir + proposed a chat-only provider that reuses ChatGPT subscription auth instead + of API keys. Same OAuth-product-surface argument as #691. Same shared + infra: `~/.gbrain/oauth/.json` + `gbrain auth login `. + Build alongside #691 in one OAuth-subsystem wave. + +- [ ] **v0.32.x: CJK PGLite keyword fallback (#765 extracted).** 313094319-sudo + hit a real gap: PGLite's FTS doesn't tokenize CJK well, so Chinese queries + return empty results even with proper embeddings. Their PR added a + hasCJK detection branch in `searchKeyword` that switches to LIKE-based + fuzzy matching with a custom scoring function. ~150 lines of new SQL + + scoring + tests. Worth its own focused PR rather than folded into the + v0.32 wave's adjacent-fix lane. Extract `extractSearchTokens`, + `normalizeSearchText`, `hasCJK` helpers + the CJK branch in + `pglite-engine.ts:searchKeyword`. Includes tests for romaji + Korean + Hangul + traditional/simplified Chinese. + +- [ ] **v0.32.x: interactive provider chooser in `gbrain init`.** The full + wizard piece of the v0.32 discoverability lane was deferred. Today + `gbrain init` (no flags, TTY) silently uses OpenAI default. Plan: hook + into `init.ts:resolveAIOptions`, when no `--model` AND TTY AND not + `--non-interactive`, call `runExplain([])` (non-JSON path) from + `providers.ts:233-350` to print the provider matrix, then prompt with + readline (mirror `supabaseWizard()` at `init.ts:108`). Suggest + recommended based on env detection. Refuse `user_provided_models` + shorthand (already done in v0.32.0). Tests: + `test/init-provider-wizard.test.ts` (TTY → prompt fires; non-TTY → + falls through; invalid choice → re-prompts). + +- [ ] **v0.32.x: real-credentials per-recipe smoke-test CI matrix.** Codex + finding #6 noted that unit tests via `__setEmbedTransportForTests` prove + routing but not contract correctness with the actual provider HTTP + shape. Provider APIs change quietly (Voyage encoding-format, MiniMax + type field, Azure header). One real-call per recipe per month catches + drift before users do; <$1/run estimated. Requires API-key budget + approval + repo secrets. + +- [ ] **v0.32.x: MiniMax asymmetric retrieval support.** v0.32 ships + `embo-01` with `type: 'db'` for both indexing and queries (symmetric + retrieval). True asymmetric needs a query/document signal threaded + through the embed seam. Worth it for MiniMax users who care about + retrieval quality on Chinese content; defer until users complain. + +- [ ] **v0.32.x: un-hardcode the multimodal dispatch at gateway.ts:583.** + Currently `recipe.id !== 'voyage'` is hardcoded — harmless until a + second multimodal recipe lands. Make it table-driven via + `Recipe.touchpoints.embedding.supports_multimodal` + + `multimodal_models`. ~10 lines + a contract test. + ## v0.31.2 follow-ups ### Investigate: `gbrain query ` infinite loop diff --git a/VERSION b/VERSION index 3112f8a4c..8a0d6d408 100644 --- a/VERSION +++ b/VERSION @@ -1 +1 @@ -0.31.12 +0.32.0 \ No newline at end of file diff --git a/docs/integrations/embedding-providers.md b/docs/integrations/embedding-providers.md new file mode 100644 index 000000000..943728e96 --- /dev/null +++ b/docs/integrations/embedding-providers.md @@ -0,0 +1,130 @@ +# Embedding providers + +GBrain ships with 14 embedding-provider recipes covering OpenAI, the major hosted alternatives, three local options, and a universal escape hatch (LiteLLM proxy). Run `gbrain providers list` to see the live registry; `gbrain providers explain --json` emits a machine-readable matrix for agents. + +This page is the human-readable counterpart: capability per provider, env-var setup, dimensions, cost, and known constraints. + +## Quick start + +``` +gbrain providers list # see all providers +gbrain providers env # see required env vars +gbrain providers test --model openai:text-embedding-3-large # smoke-test +gbrain init --pglite --model voyage # use a non-default provider +``` + +## TL;DR table + +| Provider | env vars | default dims | cost ($/1M tokens) | local? | multimodal? | +|---|---|---|---|---|---| +| `openai` | `OPENAI_API_KEY` | 1536 | 0.13 | no | no | +| `voyage` | `VOYAGE_API_KEY` | 1024 | 0.18 | no | yes (`voyage-multimodal-3`) | +| `google` | `GOOGLE_GENERATIVE_AI_API_KEY` | 768 | 0.025 | no | no | +| `azure-openai` | `AZURE_OPENAI_API_KEY`, `AZURE_OPENAI_ENDPOINT`, `AZURE_OPENAI_DEPLOYMENT` | 1536 | 0.13 | no | no | +| `minimax` | `MINIMAX_API_KEY` | 1536 | 0.07 | no | no | +| `dashscope` | `DASHSCOPE_API_KEY` | 1024 | varies | no | no | +| `zhipu` | `ZHIPUAI_API_KEY` | 1024 | varies | no | no | +| `ollama` | (none — runs locally) | 768 | 0 | yes | no | +| `llama-server` | (none — runs locally) | user-set | 0 | yes | no | +| `litellm` | `LITELLM_API_KEY` (optional) | user-set | varies | yes (proxy) | no | +| `together` | `TOGETHER_API_KEY` | 768 | varies | no | no | +| `anthropic` | (no embedding model — chat only) | — | — | — | — | +| `deepseek` | (no embedding model — chat only) | — | — | — | — | +| `groq` | (no embedding model — chat only) | — | — | — | — | + +## Decision tree + +- **Cost-sensitive, English-only**: Ollama (free, local) or Voyage (paid, best quality per dollar). +- **Quality-first**: Voyage `voyage-4-large` (1024-2048 dims, ~3-4× more dense tokens than OpenAI tiktoken). +- **Reranking pair**: Voyage (their reranker `rerank-2.5` pairs cleanly with Voyage embeddings). +- **Enterprise compliance**: Azure OpenAI (data residency + private endpoints) or self-hosted via llama-server / Ollama. +- **China region**: DashScope (Alibaba) or Zhipu (BigModel). DashScope's international endpoint at `dashscope-intl.aliyuncs.com`; override `provider_base_urls.dashscope` for the China endpoint. +- **OSS local, full control**: llama-server (`llama.cpp`) for any GGUF model; Ollama for the curated catalog. +- **Anything else**: LiteLLM proxy. Run LiteLLM in front of any provider (Bedrock, Vertex, Cohere, Jina, Fireworks, etc.) and point gbrain at it via `LITELLM_BASE_URL`. + +## Per-provider details + +### OpenAI + +Default. Set `OPENAI_API_KEY`. Models: `text-embedding-3-large` (3072 max, 1536 default), `text-embedding-3-small` (1536). Matryoshka via the `dimensions` field — gbrain pins it from `embedding_dimensions` config so existing 1536-dim brains stay aligned across SDK upgrades. + +### Voyage AI + +Best-in-class quality on the Voyage 4 family (Jan 2026 release). Set `VOYAGE_API_KEY`. Models: `voyage-4-large`, `voyage-4`, `voyage-4-lite`, `voyage-4-nano`, `voyage-3.5`, `voyage-code-3` (code-tuned), `voyage-finance-2`, `voyage-law-2`, `voyage-multimodal-3` (text + image). + +Voyage 4 family shares an embedding space across all variants, so you can index with `voyage-4-large` and query with `voyage-4-lite` without reindexing. Dims: 256, 512, 1024, 2048. **2048 exceeds pgvector's HNSW cap of 2000** — those brains fall back to exact vector scans (still correct, just slower). + +### Google Gemini + +Set `GOOGLE_GENERATIVE_AI_API_KEY` (the AI Studio public API key). Model: `gemini-embedding-001`. Default 768 dims; Matryoshka up to 3072. Cheap. + +For GCP service-account / Vertex AI auth (production deployments), see the v0.32.x follow-up — Vertex ADC is on the roadmap. + +### Azure OpenAI + +Enterprise OpenAI behind Azure tenancy. Required env: `AZURE_OPENAI_API_KEY`, `AZURE_OPENAI_ENDPOINT` (e.g. `https://my-resource.openai.azure.com`), `AZURE_OPENAI_DEPLOYMENT` (the deployment name from your Azure portal). Optional: `AZURE_OPENAI_API_VERSION` (defaults to `2024-10-21`). + +Unlike vanilla OpenAI, Azure uses `api-key:` header (not `Authorization: Bearer`) and a templated URL with `?api-version=` query param — gbrain handles both via the recipe's resolveAuth + resolveOpenAICompatConfig overrides. + +Models: `text-embedding-3-large`, `text-embedding-3-small`, `text-embedding-ada-002` (your Azure deployment must serve the requested model). + +### MiniMax (海螺AI) + +Set `MINIMAX_API_KEY`. Optional `MINIMAX_GROUP_ID` for org-scoped accounts. Model: `embo-01` (1536 dims). + +MiniMax's API takes a `type: 'db' | 'query'` field for asymmetric retrieval. v0.32 routes everything as `type='db'` (symmetric retrieval — same vector space for indexing and queries). Asymmetric query support is a v0.32.x follow-up. + +### DashScope (Alibaba) + +Set `DASHSCOPE_API_KEY`. International endpoint at `dashscope-intl.aliyuncs.com` by default; override `provider_base_urls.dashscope` for the China endpoint. Models: `text-embedding-v3` (current; Matryoshka 64-1024 dims), `text-embedding-v2`. + +CJK-dominant content tokenizes denser than OpenAI tiktoken; gbrain declares `chars_per_token: 2` so the batch pre-split leaves headroom. + +### Zhipu AI (BigModel) + +Set `ZHIPUAI_API_KEY`. Models: `embedding-3` (current; Matryoshka 256-2048 dims), `embedding-2`. v0.32 default is 1024 (HNSW-compatible). The 2048-dim option works but falls into the exact-scan branch (see Voyage 4 Large note above). + +### Ollama (local) + +No env required — Ollama runs unauthenticated locally. Optional `OLLAMA_BASE_URL` (default `http://localhost:11434/v1`) and `OLLAMA_API_KEY` (for auth-enabled deployments). + +Recipe ships with `nomic-embed-text` (768d, recommended), `mxbai-embed-large` (1024d), `all-minilm` (384d). `gbrain providers test --model ollama:nomic-embed-text` smoke-tests the local install. + +### llama-server (local, llama.cpp) + +`llama.cpp`'s `llama-server --embeddings` endpoint. No env required. Optional `LLAMA_SERVER_BASE_URL` (default `http://localhost:8080/v1`) and `LLAMA_SERVER_API_KEY`. + +User-driven models: launch llama-server with `--model --embeddings`, then run `gbrain init --embedding-model llama-server: --embedding-dimensions `. The recipe refuses the implicit shorthand `--model llama-server` because there's no canonical first model. + +### LiteLLM proxy (universal escape hatch) + +Run [LiteLLM](https://docs.litellm.ai/docs/proxy/quick_start) in front of any provider — Bedrock, Vertex, Cohere, Jina, Fireworks, OctoAI, etc. The proxy normalizes everything to the OpenAI-compatible API; gbrain points at the proxy via `LITELLM_BASE_URL` and proxies the call. + +This is the catch-all for "my provider isn't in the list above." Set up LiteLLM, then `gbrain init --embedding-model litellm: --embedding-dimensions `. + +## Choosing dimensions + +Three numbers matter: +1. **Provider's native dims**: each model has a "true" output dim (e.g. OpenAI `text-embedding-3-large` is 3072 native). +2. **Matryoshka reductions**: most modern providers let you request a smaller vector via the `dimensions` field. +3. **HNSW cap**: pgvector's HNSW index supports up to 2000 dims. Brains above that fall back to exact vector scans (slower but correct; gbrain handles the SQL automatically via `chunkEmbeddingIndexSql` in `src/core/vector-index.ts`). + +For most users: **stay at 1024 or 1536**. Bigger isn't better below the noise floor; smaller saves disk + RAM with marginal recall loss on Matryoshka providers. + +## My provider isn't listed + +Three options: + +1. **Use LiteLLM proxy** (above) — the universal escape hatch. Works for 100+ providers. +2. **Open a feature request** at [github.com/garrytan/gbrain/issues](https://github.com/garrytan/gbrain/issues) with the provider's API docs URL and a setup snippet. Recipes are ~30-40 lines of TypeScript. +3. **Submit a recipe**: clone, copy `src/core/ai/recipes/voyage.ts` as the gold-standard openai-compat template, register in `src/core/ai/recipes/index.ts`, add a per-recipe smoke test under `test/ai/recipe-.test.ts`. The recipe contract test (`test/ai/recipes-contract.test.ts`) and IRON RULE regression test pin the structural invariants. + +## Switching providers on an existing brain + +Embedding dimensions are baked into the schema at `gbrain init` time. To change providers post-init, you usually need to re-embed: + +1. Update config: `gbrain config set embedding_model :` and `embedding_dimensions `. +2. Reindex schema if dims changed: `gbrain doctor` will detect the mismatch and print the exact `ALTER TABLE` recipe. +3. Re-embed: `gbrain embed --all` (or `--stale` for incremental). + +`gbrain doctor` 8c "alternative_providers" surfaces unconfigured providers whose env is already set — useful when you've configured OpenAI but also have e.g. `VOYAGE_API_KEY` exported and want to know you can switch without extra setup. diff --git a/llms-full.txt b/llms-full.txt index e823a5029..bbc1d39d1 100644 --- a/llms-full.txt +++ b/llms-full.txt @@ -1836,6 +1836,8 @@ GBrain is those patterns, generalized. 34 skills. Install in 30 minutes. Your ag > **LLMs:** fetch [`llms.txt`](llms.txt) for the documentation map, or [`llms-full.txt`](llms-full.txt) for the same map with core docs inlined in one fetch. **Agents:** start with [`AGENTS.md`](AGENTS.md) (or [`CLAUDE.md`](CLAUDE.md) if you're Claude Code). +> **Embedding providers:** OpenAI is the default, but gbrain ships with **14 recipes** covering Voyage, Google Gemini, Azure OpenAI, MiniMax, Alibaba DashScope, Zhipu, Ollama (local), llama.cpp llama-server (local), LiteLLM proxy (universal), and 5 more. Run `gbrain providers list` to see them, or read [`docs/integrations/embedding-providers.md`](docs/integrations/embedding-providers.md) for setup, pricing, and a decision tree. `gbrain doctor` will surface alternative providers whose env vars you already have set. + ## Install ### On an agent platform (recommended) diff --git a/package.json b/package.json index 8e2c3fa0b..b4beb23f2 100644 --- a/package.json +++ b/package.json @@ -1,6 +1,6 @@ { "name": "gbrain", - "version": "0.31.12", + "version": "0.32.0", "description": "Postgres-native personal knowledge brain with hybrid RAG search", "type": "module", "main": "src/core/index.ts", diff --git a/scripts/run-serial-tests.sh b/scripts/run-serial-tests.sh index f54321827..a9ee62a49 100755 --- a/scripts/run-serial-tests.sh +++ b/scripts/run-serial-tests.sh @@ -30,5 +30,29 @@ if [ "${1:-}" = "--dry-run-list" ]; then exit 0 fi -echo "[serial-tests] running ${#files[@]} file(s) with --max-concurrency=1" -exec bun test --max-concurrency=1 --timeout=60000 "${files[@]}" +echo "[serial-tests] running ${#files[@]} file(s), one bun process per file" + +# Each serial file gets its OWN bun process. `--max-concurrency=1` was not +# enough: files in the same process share the module registry, so a top-level +# `mock.module(...)` in one file leaks into the next file's imports +# (eval-takes-quality-runner mocks gateway.ts and the next file fails on +# `import { resetGateway }` because the mock factory didn't export it). +# Per-file processes give true isolation; cost is ~100ms startup × N files. +fail_count=0 +failed_files=() +for f in "${files[@]}"; do + if ! bun test --max-concurrency=1 --timeout=60000 "$f"; then + fail_count=$((fail_count + 1)) + failed_files+=("$f") + fi +done + +if [ "$fail_count" -gt 0 ]; then + echo "" >&2 + echo "[serial-tests] $fail_count file(s) failed:" >&2 + for f in "${failed_files[@]}"; do + echo " - $f" >&2 + done + exit 1 +fi +echo "[serial-tests] all ${#files[@]} file(s) passed" diff --git a/src/cli.ts b/src/cli.ts index ec96a46e9..5b82a5c2f 100755 --- a/src/cli.ts +++ b/src/cli.ts @@ -1219,6 +1219,27 @@ async function handleCliOnly(command: string, args: string[]) { // but not the other previously required remembering to mirror the change; // the helper makes that structural. function buildGatewayConfig(c: GBrainConfig): AIGatewayConfig { + // v0.32 (#121 reworked): when ~/.gbrain/config.json declares + // openai_api_key / anthropic_api_key, fold them into the gateway env so + // recipes that read OPENAI_API_KEY / ANTHROPIC_API_KEY find them. Process + // env still wins (it's loaded last) — this is a fallback for daemons / + // launchd-spawned subprocesses that don't propagate ~/.zshrc-sourced keys. + const envFromConfig: Record = {}; + if (c.openai_api_key) envFromConfig.OPENAI_API_KEY = c.openai_api_key; + if (c.anthropic_api_key) envFromConfig.ANTHROPIC_API_KEY = c.anthropic_api_key; + + // v0.32 codex finding #4+#5 fix: thread local-server _BASE_URL env vars + // into base_urls so the gateway hits the user's configured port. Without + // this, `LLAMA_SERVER_BASE_URL=http://localhost:9000` would let the probe + // succeed against :9000 but the actual embed call would still go to the + // recipe's base_url_default (localhost:8080). Same fix applies to + // OLLAMA_BASE_URL. Caller-provided cfg.provider_base_urls wins. + const envBaseUrls: Record = {}; + if (process.env.LLAMA_SERVER_BASE_URL) envBaseUrls['llama-server'] = process.env.LLAMA_SERVER_BASE_URL; + if (process.env.OLLAMA_BASE_URL) envBaseUrls['ollama'] = process.env.OLLAMA_BASE_URL; + if (process.env.LMSTUDIO_BASE_URL) envBaseUrls['lmstudio'] = process.env.LMSTUDIO_BASE_URL; + if (process.env.LITELLM_BASE_URL) envBaseUrls['litellm'] = process.env.LITELLM_BASE_URL; + return { embedding_model: c.embedding_model, embedding_dimensions: c.embedding_dimensions, @@ -1226,8 +1247,8 @@ function buildGatewayConfig(c: GBrainConfig): AIGatewayConfig { expansion_model: c.expansion_model, chat_model: c.chat_model, chat_fallback_chain: c.chat_fallback_chain, - base_urls: c.provider_base_urls, - env: { ...process.env }, + base_urls: { ...envBaseUrls, ...(c.provider_base_urls ?? {}) }, // config wins over env + env: { ...envFromConfig, ...process.env }, // process.env wins }; } diff --git a/src/commands/doctor.ts b/src/commands/doctor.ts index e94c303ca..d021b35b7 100644 --- a/src/commands/doctor.ts +++ b/src/commands/doctor.ts @@ -1170,6 +1170,39 @@ export async function runDoctor(engine: BrainEngine | null, args: string[], dbSo }); } + // 8c. Alternative provider advisory (v0.32 D11=C / Codex finding #2 wire-through). + // Walks listRecipes() and surfaces any recipe whose required env vars are ALL + // set in the process env but is not the currently configured provider. Helps + // users discover that, e.g., OPENAI_API_KEY=x DASHSCOPE_API_KEY=y means they + // have a Chinese-region alternative ready to go without setup. + progress.heartbeat('alternative_providers'); + try { + const { listRecipes } = await import('../core/ai/recipes/index.ts'); + const { getEmbeddingModel } = await import('../core/ai/gateway.ts'); + const configuredId = (getEmbeddingModel() || '').split(':')[0]; + const alternatives: string[] = []; + for (const r of listRecipes()) { + if (r.id === configuredId) continue; + const required = r.auth_env?.required ?? []; + // Skip recipes with no required env (they're "always available" — not a + // useful signal) and recipes that require env we don't have. + if (required.length === 0) continue; + const allPresent = required.every(k => !!process.env[k]); + if (!allPresent) continue; + // Skip recipes without an embedding touchpoint (chat-only — not an + // embedding alternative). + if (!r.touchpoints.embedding) continue; + alternatives.push(r.id); + } + if (alternatives.length > 0) { + checks.push({ + name: 'alternative_providers', + status: 'ok', + message: `Detected ${alternatives.length} alternative embedding provider${alternatives.length > 1 ? 's' : ''} ready to use: ${alternatives.join(', ')}. Run \`gbrain providers list\` to switch.`, + }); + } + } catch { /* listRecipes / gateway not available — silent */ } + // 9. Graph health (link + timeline coverage on entity pages). // dead_links removed in v0.10.1: ON DELETE CASCADE on link FKs makes it always 0. // diff --git a/src/commands/init.ts b/src/commands/init.ts index 5fa08bbfa..737b5b0e9 100644 --- a/src/commands/init.ts +++ b/src/commands/init.ts @@ -134,6 +134,18 @@ async function resolveAIOptions( console.error(`Unknown provider: ${shorthand}. Run \`gbrain providers list\` to see known providers.`); process.exit(1); } + // v0.32 D8=A: recipes flagged user_provided_models (litellm, llama-server) + // refuse implicit "first model" pick with a setup hint pointing the user + // at the explicit form. The shorthand --model is meaningless for these + // recipes because there's no canonical first model. + if (recipe.touchpoints.embedding?.user_provided_models === true) { + console.error( + `Provider ${shorthand} requires you to specify the model + dimensions explicitly:\n` + + ` gbrain init --embedding-model ${shorthand}: --embedding-dimensions \n` + + (recipe.setup_hint ? `\nSetup: ${recipe.setup_hint}` : '') + ); + process.exit(1); + } const firstModel = recipe.touchpoints.embedding?.models[0]; if (!firstModel) { console.error(`Provider ${shorthand} has no embedding models listed. Use --embedding-model provider:model.`); @@ -150,6 +162,20 @@ async function resolveAIOptions( const { getRecipe } = await import('../core/ai/recipes/index.ts'); const providerId = out.embedding_model.split(':')[0]; const recipe = getRecipe(providerId); + // v0.32: user_provided_models recipes (litellm, llama-server) have + // default_dims=0 and ship with `models: []` — there's no sensible + // fallback. Refuse explicitly here too. Without this, the verbose path + // `--embedding-model llama-server:foo` (no --embedding-dimensions) would + // fall through to configureGateway's default (1536), creating a + // wrong-width schema that explodes only at first embed. + if (recipe?.touchpoints.embedding?.user_provided_models === true) { + console.error( + `Provider ${providerId} requires --embedding-dimensions when using --embedding-model ${out.embedding_model}.\n` + + `User-driven-model recipes (litellm, llama-server) have no default dimension.\n` + + (recipe.setup_hint ? `\nSetup: ${recipe.setup_hint}` : '') + ); + process.exit(1); + } if (recipe?.touchpoints.embedding?.default_dims) { out.embedding_dimensions = recipe.touchpoints.embedding.default_dims; } diff --git a/src/core/ai/dims.ts b/src/core/ai/dims.ts index d08c46053..2b3bb030a 100644 --- a/src/core/ai/dims.ts +++ b/src/core/ai/dims.ts @@ -58,6 +58,29 @@ export function dimsProviderOptions( if (VOYAGE_OUTPUT_DIMENSION_MODELS.has(modelId)) { return { openaiCompatible: { output_dimension: dims } }; } + // OpenAI text-embedding-3 family on the openai-compatible adapter + // (Azure OpenAI hosts these via its OpenAI-compatible /embeddings + // endpoint). The provider defaults to the model's native size (3072 + // for `-large`, 1536 for `-small`); without `dimensions`, brains + // configured for a smaller width (e.g. 1536) hard-fail at first embed. + if (modelId.startsWith('text-embedding-3')) { + return { openaiCompatible: { dimensions: dims } }; + } + // DashScope text-embedding-v3 (Matryoshka 64-1024) and Zhipu + // embedding-3 (Matryoshka 256-2048) both accept `dimensions` on the + // OpenAI-compat path. Without this, user-selected non-default dims are + // silently ignored and the provider returns its default size. + if (modelId === 'text-embedding-v3' || modelId === 'embedding-3') { + return { openaiCompatible: { dimensions: dims } }; + } + // MiniMax embo-01 takes a `type: 'db' | 'query'` field for asymmetric + // retrieval. Default to 'db' (the indexing path) so embed() works for + // import. Queries also embed with type:'db', making retrieval + // symmetric. Asymmetric query support is a follow-up TODO that needs + // a query/document signal threaded through the embed seam. + if (modelId === 'embo-01') { + return { openaiCompatible: { type: 'db' } }; + } return undefined; } } diff --git a/src/core/ai/gateway.ts b/src/core/ai/gateway.ts index c7b9df419..619932c7e 100644 --- a/src/core/ai/gateway.ts +++ b/src/core/ai/gateway.ts @@ -139,6 +139,116 @@ const DEFAULT_SAFETY_FACTOR = 0.8; */ const MAX_VOYAGE_RESPONSE_BYTES = 256 * 1024 * 1024; +// ---- Unified auth resolution (D12=A) ---- +// +// Pre-v0.32, openai-compatible auth was duplicated across instantiateEmbedding, +// instantiateExpansion, and instantiateChat with subtle drift (embedding had a +// `${recipe.id.toUpperCase()}_API_KEY` fallback the other two lacked). D12=A +// unifies all three through `Recipe.resolveAuth?(env)` with a sensible default +// so existing recipes need zero code changes; only deviating recipes (Azure +// with `api-key:` instead of `Authorization: Bearer`) override. + +/** + * Default auth resolver: returns `{headerName: 'Authorization', token: 'Bearer + * '}` where `` is the first present env var from `auth_env.required`, + * falling back to the first `auth_env.optional` entry, or 'unauthenticated' + * for fully no-auth recipes (Ollama). Throws AIConfigError when required env + * is missing. + * + * `touchpoint` is included in the error message so users know which call path + * triggered the missing-env error. + * + * @internal exported for tests; not part of the public gateway API. + */ +export function defaultResolveAuth( + recipe: Recipe, + env: Record, + touchpoint: 'embedding' | 'expansion' | 'chat', +): { headerName: string; token: string } { + const required = recipe.auth_env?.required ?? []; + const optional = recipe.auth_env?.optional ?? []; + + if (required.length === 0) { + // No-auth or optional-auth recipe (e.g. Ollama, llama-server). Read first + // present optional API-key env (ignoring URL-shaped names like + // OLLAMA_BASE_URL, which belong in cfg.base_urls, not auth). If none + // present, use 'unauthenticated' so createOpenAICompatible has something + // to put in Authorization (servers like Ollama / llama-server ignore it). + const optKey = optional.find( + k => !!env[k] && !/_(BASE_)?URL$/.test(k), + ); + const token = optKey ? env[optKey]! : 'unauthenticated'; + return { headerName: 'Authorization', token: `Bearer ${token}` }; + } + + const key = env[required[0]]; + if (!key) { + throw new AIConfigError( + `${recipe.name} ${touchpoint} requires ${required[0]}.`, + recipe.setup_hint, + ); + } + return { headerName: 'Authorization', token: `Bearer ${key}` }; +} + +/** + * Apply the recipe's auth resolver (or default) and translate the result into + * `createOpenAICompatible` options. Authorization-Bearer style returns + * `{apiKey}` (the SDK's native path); custom-header style returns `{headers}` + * with NO apiKey to avoid double-auth. + * + * @internal exported for tests; not part of the public gateway API. + */ +export function applyResolveAuth( + recipe: Recipe, + cfg: AIGatewayConfig, + touchpoint: 'embedding' | 'expansion' | 'chat', +): { apiKey?: string; headers?: Record } { + const resolved = recipe.resolveAuth + ? recipe.resolveAuth(cfg.env) + : defaultResolveAuth(recipe, cfg.env, touchpoint); + + // Bearer-via-Authorization: use the SDK's native apiKey path (which sets + // Authorization: Bearer internally). Strip the 'Bearer ' prefix the + // resolver returned. + if ( + resolved.headerName === 'Authorization' && + resolved.token.startsWith('Bearer ') + ) { + return { apiKey: resolved.token.slice('Bearer '.length) }; + } + + // Custom header (Azure: api-key). Use headers; do NOT pass apiKey, or the + // SDK will also set Authorization and the server may reject double-auth. + return { headers: { [resolved.headerName]: resolved.token } }; +} + +/** + * Resolve the openai-compatible URL + optional fetch wrapper. Defaults to + * `cfg.base_urls?.[recipe.id] ?? recipe.base_url_default` (the pre-v0.32 + * behavior). Recipes whose URL is env-templated (Azure: needs endpoint + + * deployment + api-version) override `recipe.resolveOpenAICompatConfig` to + * build the URL and inject custom fetch behavior. + * + * @internal exported for tests. + */ +export function applyOpenAICompatConfig( + recipe: Recipe, + cfg: AIGatewayConfig, +): { baseURL: string; fetch?: typeof fetch } { + if (recipe.resolveOpenAICompatConfig) { + return recipe.resolveOpenAICompatConfig(cfg.env); + } + const baseURL = cfg.base_urls?.[recipe.id] ?? recipe.base_url_default; + if (!baseURL) { + throw new AIConfigError( + `${recipe.name} requires a base URL.`, + recipe.setup_hint, + ); + } + return { baseURL }; +} + /** Configure the gateway. Called by cli.ts#connectEngine. Clears cached models. */ export function configureGateway(config: AIGatewayConfig): void { _config = { @@ -259,6 +369,10 @@ function warnRecipesMissingBatchTokens(): void { // recipe; suppress the warning for it. Every other recipe missing the // field is suspicious. if (recipe.id === 'openai') continue; + // v0.32 (#779): explicit opt-out for dynamic-cap recipes (Ollama, + // LiteLLM proxy, llama-server) — they ship without a static cap because + // the cap depends on a user-launched server. Warning is noise for them. + if (embedding.no_batch_cap === true) continue; if (_warnedRecipes.has(recipe.id)) continue; _warnedRecipes.add(recipe.id); // eslint-disable-next-line no-console @@ -380,8 +494,19 @@ export function isAvailable(touchpoint: TouchpointKind): boolean { // embedding from an anthropic-configured brain is unavailable regardless of auth. const touchpointConfig = recipe.touchpoints[touchpoint as 'embedding' | 'expansion' | 'chat']; if (!touchpointConfig) return false; - // Openai-compat recipes with empty models list (e.g. litellm template) require user-provided model - if (Array.isArray(touchpointConfig.models) && touchpointConfig.models.length === 0 && recipe.id === 'litellm') return false; + // Openai-compat recipes with empty models list require a user-provided + // model. Either the recipe explicitly opts in via + // EmbeddingTouchpoint.user_provided_models (D8=A), or the legacy + // `recipe.id === 'litellm'` heuristic (back-compat for pre-v0.32 builds + // where the field hadn't been declared yet). + const isUserProvided = + touchpoint === 'embedding' && + (touchpointConfig as any).user_provided_models === true; + if ( + Array.isArray(touchpointConfig.models) && + touchpointConfig.models.length === 0 && + (recipe.id === 'litellm' || isUserProvided) + ) return false; // For openai-compatible without auth requirements (Ollama local), treat as always-available. const required = recipe.auth_env?.required ?? []; @@ -571,32 +696,20 @@ function instantiateEmbedding(recipe: Recipe, modelId: string, cfg: AIGatewayCon `Anthropic has no embedding model. Use openai or google for embeddings.`, ); case 'openai-compatible': { - const baseUrl = cfg.base_urls?.[recipe.id] ?? recipe.base_url_default; - if (!baseUrl) throw new AIConfigError( - `${recipe.name} requires a base URL.`, - recipe.setup_hint, - ); - // For openai-compatible, auth is optional (ollama local) but pass a dummy key if unauthenticated. - const apiKey = recipe.auth_env?.required[0] - ? cfg.env[recipe.auth_env.required[0]] - : (cfg.env[`${recipe.id.toUpperCase()}_API_KEY`] ?? 'unauthenticated'); - if (recipe.auth_env?.required.length && !apiKey) { - throw new AIConfigError( - `${recipe.name} requires ${recipe.auth_env.required[0]}.`, - recipe.setup_hint, - ); - } + // D12=A: unified auth via Recipe.resolveAuth (or default). + const auth = applyResolveAuth(recipe, cfg, 'embedding'); + // v0.32: env-templated base URL + optional fetch wrapper for Azure. + const compat = applyOpenAICompatConfig(recipe, cfg); + // Voyage's openai-compat path needs voyageCompatFetch (translates + // request/response shape) when the recipe doesn't ship its own fetch + // wrapper via resolveOpenAICompatConfig. Azure recipes ship their own + // fetch (api-version splice); voyage doesn't — use voyageCompatFetch. + const fetchWrapper = compat.fetch ?? (recipe.id === 'voyage' ? voyageCompatFetch : undefined); const client = createOpenAICompatible({ name: recipe.id, - baseURL: baseUrl, - apiKey: apiKey ?? 'unauthenticated', - // Voyage AI's `/v1/embeddings` endpoint is "OpenAI-compatible" only in URL - // shape; it rejects `encoding_format=float` (only `base64` is accepted) and - // ignores OpenAI's `dimensions` parameter (Voyage uses `output_dimension`). - // The default openai-compatible client sends `encoding_format=float`, which - // makes Voyage respond with HTTP 400 "Bad Request". Strip those fields - // before forwarding when targeting Voyage. - fetch: recipe.id === 'voyage' ? voyageCompatFetch : undefined, + baseURL: compat.baseURL, + ...(fetchWrapper ? { fetch: fetchWrapper } : {}), + ...auth, }); return client.textEmbeddingModel(modelId); } @@ -1026,15 +1139,15 @@ function instantiateExpansion(recipe: Recipe, modelId: string, cfg: AIGatewayCon return createAnthropic({ apiKey }).languageModel(modelId); } case 'openai-compatible': { - const baseUrl = cfg.base_urls?.[recipe.id] ?? recipe.base_url_default; - if (!baseUrl) throw new AIConfigError(`${recipe.name} requires a base URL.`, recipe.setup_hint); - const apiKey = recipe.auth_env?.required[0] - ? cfg.env[recipe.auth_env.required[0]] - : 'unauthenticated'; + // D12=A: unified auth via Recipe.resolveAuth (or default). + const auth = applyResolveAuth(recipe, cfg, 'expansion'); + // v0.32: env-templated base URL + optional fetch wrapper. + const compat = applyOpenAICompatConfig(recipe, cfg); return createOpenAICompatible({ name: recipe.id, - baseURL: baseUrl, - apiKey: apiKey ?? 'unauthenticated', + baseURL: compat.baseURL, + ...(compat.fetch ? { fetch: compat.fetch } : {}), + ...auth, }).languageModel(modelId); } } @@ -1229,17 +1342,15 @@ function instantiateChat(recipe: Recipe, modelId: string, cfg: AIGatewayConfig): return createAnthropic({ apiKey }).languageModel(modelId); } case 'openai-compatible': { - const baseUrl = cfg.base_urls?.[recipe.id] ?? recipe.base_url_default; - if (!baseUrl) throw new AIConfigError(`${recipe.name} requires a base URL.`, recipe.setup_hint); - const required = recipe.auth_env?.required ?? []; - const apiKey = required[0] ? cfg.env[required[0]] : 'unauthenticated'; - if (required.length > 0 && !apiKey) { - throw new AIConfigError(`${recipe.name} requires ${required[0]}.`, recipe.setup_hint); - } + // D12=A: unified auth via Recipe.resolveAuth (or default). + const auth = applyResolveAuth(recipe, cfg, 'chat'); + // v0.32: env-templated base URL + optional fetch wrapper. + const compat = applyOpenAICompatConfig(recipe, cfg); return createOpenAICompatible({ name: recipe.id, - baseURL: baseUrl, - apiKey: apiKey ?? 'unauthenticated', + baseURL: compat.baseURL, + ...(compat.fetch ? { fetch: compat.fetch } : {}), + ...auth, }).languageModel(modelId); } default: diff --git a/src/core/ai/probes.ts b/src/core/ai/probes.ts index 81097517b..ba17b6769 100644 --- a/src/core/ai/probes.ts +++ b/src/core/ai/probes.ts @@ -45,3 +45,15 @@ export async function probeLMStudio(): Promise { const url = process.env.LMSTUDIO_BASE_URL ?? 'http://localhost:1234/v1'; return probeOpenAICompat(url); } + +/** + * Probe llama.cpp's `llama-server --embeddings` endpoint. Defaults to port + * 8080 (llama-server's default; distinct from Ollama's 11434 and LM Studio's + * 1234). Override via `LLAMA_SERVER_BASE_URL` env, or pass `baseURL` directly + * (callers with access to `cfg.base_urls['llama-server']` should pass it so + * probe agrees with what the gateway will actually call). + */ +export async function probeLlamaServer(baseURL?: string): Promise { + const url = baseURL ?? process.env.LLAMA_SERVER_BASE_URL ?? 'http://localhost:8080/v1'; + return probeOpenAICompat(url); +} diff --git a/src/core/ai/recipes/azure-openai.ts b/src/core/ai/recipes/azure-openai.ts new file mode 100644 index 000000000..23aa7ab47 --- /dev/null +++ b/src/core/ai/recipes/azure-openai.ts @@ -0,0 +1,111 @@ +import type { Recipe } from '../types.ts'; +import { AIConfigError } from '../errors.ts'; + +const DEFAULT_API_VERSION = '2024-10-21'; // stable Azure OpenAI version as of 2026-05 + +/** + * Azure OpenAI. The first recipe in v0.32 to exercise both seams: + * - resolveAuth returns `{headerName: 'api-key', token: }` instead of + * Authorization: Bearer (Azure's API explicitly requires `api-key:` and + * rejects double-auth). + * - resolveOpenAICompatConfig templates the URL from env + injects an + * `?api-version=` query param via a custom fetch wrapper. + * + * Azure's URL shape: + * {ENDPOINT}/openai/deployments/{DEPLOYMENT}/embeddings?api-version=... + * + * The AI SDK's openai-compatible adapter appends `/embeddings` to the + * baseURL, so we set baseURL to `{ENDPOINT}/openai/deployments/{DEPLOYMENT}` + * and let the SDK's path-suffix handle the rest. The api-version query is + * spliced via the fetch wrapper because the SDK has no native query-param + * option. + * + * Reference: https://learn.microsoft.com/en-us/azure/ai-services/openai/ + */ +export const azureOpenAI: Recipe = { + id: 'azure-openai', + name: 'Azure OpenAI', + tier: 'openai-compat', + implementation: 'openai-compatible', + // base_url_default omitted: Azure URLs are env-templated only. + auth_env: { + required: [ + 'AZURE_OPENAI_API_KEY', + 'AZURE_OPENAI_ENDPOINT', + 'AZURE_OPENAI_DEPLOYMENT', + ], + optional: ['AZURE_OPENAI_API_VERSION'], + setup_url: + 'https://learn.microsoft.com/en-us/azure/ai-services/openai/quickstart', + }, + touchpoints: { + embedding: { + models: [ + 'text-embedding-3-large', + 'text-embedding-3-small', + 'text-embedding-ada-002', + ], + default_dims: 1536, + // Matryoshka via text-embedding-3-*; ada-002 is fixed at 1536. + dims_options: [256, 512, 768, 1024, 1536, 3072], + cost_per_1m_tokens_usd: 0.13, + price_last_verified: '2026-05-10', + max_batch_tokens: 8192, + }, + }, + resolveAuth(env) { + const key = env.AZURE_OPENAI_API_KEY; + if (!key) { + throw new AIConfigError( + `Azure OpenAI requires AZURE_OPENAI_API_KEY.`, + 'Get a key from your Azure portal: https://learn.microsoft.com/en-us/azure/ai-services/openai/quickstart', + ); + } + // Azure uses `api-key:` (no Bearer); the unified seam routes this + // through `headers` instead of the SDK's apiKey field to avoid any + // double-auth Authorization header sneaking in. + return { headerName: 'api-key', token: key }; + }, + resolveOpenAICompatConfig(env) { + const endpoint = env.AZURE_OPENAI_ENDPOINT?.replace(/\/+$/, ''); + const deployment = env.AZURE_OPENAI_DEPLOYMENT; + if (!endpoint) { + throw new AIConfigError( + `Azure OpenAI requires AZURE_OPENAI_ENDPOINT.`, + 'Find your endpoint at portal.azure.com → Azure OpenAI resource → Keys and Endpoint.', + ); + } + if (!deployment) { + throw new AIConfigError( + `Azure OpenAI requires AZURE_OPENAI_DEPLOYMENT.`, + 'Each Azure OpenAI deployment has its own URL path. Set AZURE_OPENAI_DEPLOYMENT to the deployment name from your Azure portal.', + ); + } + const apiVersion = env.AZURE_OPENAI_API_VERSION ?? DEFAULT_API_VERSION; + const baseURL = `${endpoint}/openai/deployments/${deployment}`; + // Custom fetch wrapper splices ?api-version=... onto every request. + // Azure rejects requests without it. + // Cast through `any` because TS's `typeof fetch` includes a `preconnect` + // method that wrappers don't need (the AI SDK never calls it). + const wrappedFetch = (async (input: any, init: any) => { + const url = + typeof input === 'string' + ? input + : input instanceof URL + ? input.toString() + : (input as Request).url; + const sep = url.includes('?') ? '&' : '?'; + const finalUrl = url.includes('api-version=') + ? url + : `${url}${sep}api-version=${encodeURIComponent(apiVersion)}`; + const finalInput = + typeof input === 'string' || input instanceof URL + ? finalUrl + : new Request(finalUrl, input as Request); + return fetch(finalInput, init); + }) as unknown as typeof fetch; + return { baseURL, fetch: wrappedFetch }; + }, + setup_hint: + 'Azure portal → Azure OpenAI resource. Set AZURE_OPENAI_API_KEY, AZURE_OPENAI_ENDPOINT, AZURE_OPENAI_DEPLOYMENT. Optionally AZURE_OPENAI_API_VERSION (default 2024-10-21).', +}; diff --git a/src/core/ai/recipes/dashscope.ts b/src/core/ai/recipes/dashscope.ts new file mode 100644 index 000000000..7aa2c273f --- /dev/null +++ b/src/core/ai/recipes/dashscope.ts @@ -0,0 +1,42 @@ +import type { Recipe } from '../types.ts'; + +/** + * Alibaba DashScope (灵积). OpenAI-compatible /embeddings endpoint at + * dashscope-intl.aliyuncs.com. Hosts text-embedding-v2 (older) and + * text-embedding-v3 (current; Matryoshka-aware up to 1024 dims). + * + * Reference: https://help.aliyun.com/zh/model-studio/getting-started/ + * + * Note: the international endpoint requires a region-aware DASHSCOPE_API_KEY. + * China-region users typically point at https://dashscope.aliyuncs.com/... + * via cfg.base_urls['dashscope']. v0.32 ships with the international + * default; users override per the recipe convention. + */ +export const dashscope: Recipe = { + id: 'dashscope', + name: 'Alibaba DashScope (灵积)', + tier: 'openai-compat', + implementation: 'openai-compatible', + base_url_default: 'https://dashscope-intl.aliyuncs.com/compatible-mode/v1', + auth_env: { + required: ['DASHSCOPE_API_KEY'], + setup_url: 'https://help.aliyun.com/zh/model-studio/getting-started/', + }, + touchpoints: { + embedding: { + models: ['text-embedding-v3', 'text-embedding-v2'], + default_dims: 1024, + dims_options: [64, 128, 256, 512, 768, 1024], + // Alibaba doesn't publish a hard batch-token cap for the OpenAI-compat + // path. Conservative declaration so the gateway pre-splits before + // hitting whatever undocumented server-side limit exists. + max_batch_tokens: 8192, + // text-embedding-v3 mixes English + CJK heavily; the tokenizer is + // closer to Voyage density than OpenAI tiktoken for CJK-dominant + // content. Conservative chars_per_token=2 leaves headroom. + chars_per_token: 2, + }, + }, + setup_hint: + 'Get an API key at https://help.aliyun.com/zh/model-studio/getting-started/, then `export DASHSCOPE_API_KEY=...`', +}; diff --git a/src/core/ai/recipes/index.ts b/src/core/ai/recipes/index.ts index 21a1506d6..1682fa115 100644 --- a/src/core/ai/recipes/index.ts +++ b/src/core/ai/recipes/index.ts @@ -15,6 +15,11 @@ import { litellmProxy } from './litellm-proxy.ts'; import { deepseek } from './deepseek.ts'; import { groq } from './groq.ts'; import { together } from './together.ts'; +import { llamaServer } from './llama-server.ts'; +import { minimax } from './minimax.ts'; +import { dashscope } from './dashscope.ts'; +import { zhipu } from './zhipu.ts'; +import { azureOpenAI } from './azure-openai.ts'; const ALL: Recipe[] = [ openai, @@ -26,6 +31,11 @@ const ALL: Recipe[] = [ deepseek, groq, together, + llamaServer, + minimax, + dashscope, + zhipu, + azureOpenAI, ]; /** Map from `provider:id` key to recipe. */ diff --git a/src/core/ai/recipes/litellm-proxy.ts b/src/core/ai/recipes/litellm-proxy.ts index 8f7da2dea..d7ee4e946 100644 --- a/src/core/ai/recipes/litellm-proxy.ts +++ b/src/core/ai/recipes/litellm-proxy.ts @@ -23,9 +23,13 @@ export const litellmProxy: Recipe = { embedding: { // Models depend on the proxy's config; declare empties so wizard prompts user. models: [], + user_provided_models: true, // v0.32 D8=A wire-through for the litellm hardcode default_dims: 0, // user must declare --embedding-dimensions explicitly cost_per_1m_tokens_usd: undefined, price_last_verified: '2026-04-20', + // LiteLLM's batch capacity is determined by the backend it proxies; + // no static cap to declare here. v0.32 (#779). + no_batch_cap: true, }, }, setup_hint: 'Run LiteLLM (https://docs.litellm.ai) in front of any provider; set LITELLM_BASE_URL + pass --embedding-model litellm: and --embedding-dimensions .', diff --git a/src/core/ai/recipes/llama-server.ts b/src/core/ai/recipes/llama-server.ts new file mode 100644 index 000000000..1bc8b97cc --- /dev/null +++ b/src/core/ai/recipes/llama-server.ts @@ -0,0 +1,67 @@ +import type { Recipe } from '../types.ts'; +import { probeLlamaServer } from '../probes.ts'; + +/** + * llama.cpp's `llama-server --embeddings` (also published as + * `@llama.cpp/llama-server`). Exposes an OpenAI-compatible /v1/embeddings + * endpoint. Distinct from Ollama: different default port (8080), different + * model-management story (you launch it with `--model `; the server + * serves whatever model was passed). + * + * Like LiteLLM, this recipe ships with `models: []` because the model + * identity is whatever the user launched llama-server with. They MUST + * pass `--embedding-model llama-server:` and `--embedding-dimensions + * `. The wizard refuses to pick implicit defaults. + * + * Reference: https://github.com/ggml-org/llama.cpp/blob/master/tools/server/README.md + */ +export const llamaServer: Recipe = { + id: 'llama-server', + name: 'llama.cpp llama-server (local)', + tier: 'openai-compat', + implementation: 'openai-compatible', + base_url_default: 'http://localhost:8080/v1', + auth_env: { + required: [], + optional: ['LLAMA_SERVER_BASE_URL', 'LLAMA_SERVER_API_KEY'], + setup_url: + 'https://github.com/ggml-org/llama.cpp/blob/master/tools/server/README.md', + }, + touchpoints: { + embedding: { + models: [], // user-driven; whatever model the server was launched with + user_provided_models: true, + default_dims: 0, // forces explicit --embedding-dimensions + cost_per_1m_tokens_usd: 0, + price_last_verified: '2026-05-10', + // llama-server's batch capacity is set by `--ctx-size` at launch + // time; no static cap to declare. v0.32 (#779). + no_batch_cap: true, + }, + }, + /** + * Probe via the OpenAI-compatible /v1/models endpoint. Caller passes the + * resolved baseURL (from cfg.base_urls['llama-server'] or env), so the + * probe agrees with what the gateway will actually call. Falls back to + * env / localhost:8080 when called without an argument. + */ + async probe(baseURL?: string) { + const url = baseURL ?? process.env.LLAMA_SERVER_BASE_URL ?? 'http://localhost:8080/v1'; + const result = await probeLlamaServer(url); + if (!result.reachable) { + return { + ready: false, + hint: `llama-server not reachable at ${url}. Start it with \`./llama-server --model --embeddings\` or set LLAMA_SERVER_BASE_URL.`, + }; + } + if (!result.models_endpoint_valid) { + return { + ready: false, + hint: `llama-server reached but /v1/models returned an unexpected shape: ${result.error ?? 'unknown'}.`, + }; + } + return { ready: true }; + }, + setup_hint: + 'Build llama.cpp, then `llama-server --model --embeddings`. Set --embedding-model llama-server: + --embedding-dimensions .', +}; diff --git a/src/core/ai/recipes/minimax.ts b/src/core/ai/recipes/minimax.ts new file mode 100644 index 000000000..49a756057 --- /dev/null +++ b/src/core/ai/recipes/minimax.ts @@ -0,0 +1,44 @@ +import type { Recipe } from '../types.ts'; + +/** + * MiniMax (海螺AI). OpenAI-compatible /embeddings endpoint at + * api.minimax.chat. The flagship embedding model is `embo-01` (1536 dims). + * + * MiniMax's API takes an extra `type: 'db' | 'query'` field for asymmetric + * retrieval. gbrain currently has no notion of "this is a document vs a + * query" at the embed-call site (embed() takes only texts), so we default + * to `type: 'db'` for the indexing path. Queries also embed with `type: + * 'db'`, making retrieval symmetric. This sacrifices some retrieval + * quality vs. a true asymmetric setup but works correctly. A follow-up + * TODO will thread query/document context through the embed seam for + * full asymmetric support. + * + * Reference: https://www.minimaxi.com/document/guides/embeddings + */ +export const minimax: Recipe = { + id: 'minimax', + name: 'MiniMax (海螺AI)', + tier: 'openai-compat', + implementation: 'openai-compatible', + base_url_default: 'https://api.minimaxi.com/v1', + auth_env: { + required: ['MINIMAX_API_KEY'], + optional: ['MINIMAX_GROUP_ID'], + setup_url: 'https://www.minimaxi.com/document/guides/embeddings', + }, + touchpoints: { + embedding: { + models: ['embo-01'], + default_dims: 1536, + cost_per_1m_tokens_usd: 0.07, + price_last_verified: '2026-05-09', + // MiniMax docs don't publish a hard batch-token cap; declare a + // conservative 4096-token budget so the gateway pre-splits before + // hitting whatever undocumented server-side limit exists. Recursive + // halving in the gateway catches token-limit errors at runtime. + max_batch_tokens: 4096, + }, + }, + setup_hint: + 'Get an API key at https://www.minimaxi.com, then `export MINIMAX_API_KEY=...`', +}; diff --git a/src/core/ai/recipes/ollama.ts b/src/core/ai/recipes/ollama.ts index 31fac0ae6..0f355cdae 100644 --- a/src/core/ai/recipes/ollama.ts +++ b/src/core/ai/recipes/ollama.ts @@ -17,6 +17,9 @@ export const ollama: Recipe = { default_dims: 768, // nomic-embed-text native dim cost_per_1m_tokens_usd: 0, price_last_verified: '2026-04-20', + // Ollama's batch capacity depends on the locally loaded model + the + // OLLAMA_NUM_PARALLEL config; no static cap to declare. v0.32 (#779). + no_batch_cap: true, }, }, setup_hint: 'Install Ollama from https://ollama.ai, then `ollama pull nomic-embed-text` and `ollama serve`.', diff --git a/src/core/ai/recipes/zhipu.ts b/src/core/ai/recipes/zhipu.ts new file mode 100644 index 000000000..758c5c9cb --- /dev/null +++ b/src/core/ai/recipes/zhipu.ts @@ -0,0 +1,40 @@ +import type { Recipe } from '../types.ts'; + +/** + * Zhipu AI (智谱AI) BigModel Open Platform. OpenAI-compatible /embeddings + * endpoint at open.bigmodel.cn. Hosts embedding-2 (1024d) and embedding-3 + * (Matryoshka up to 2048d). + * + * embedding-3 at 2048 dims exceeds pgvector's HNSW cap of 2000 — those + * brains fall back to exact vector scans (see + * src/core/ai/vector-index.ts:PGVECTOR_HNSW_VECTOR_MAX_DIMS). v0.32 ships + * with `default_dims: 1024` (HNSW-compatible) and exposes 2048 via + * dims_options for users who want the full embedding fidelity at the + * cost of slower retrieval. + * + * Reference: https://open.bigmodel.cn/ + */ +export const zhipu: Recipe = { + id: 'zhipu', + name: 'Zhipu AI (智谱AI BigModel)', + tier: 'openai-compat', + implementation: 'openai-compatible', + base_url_default: 'https://open.bigmodel.cn/api/paas/v4', + auth_env: { + required: ['ZHIPUAI_API_KEY'], + setup_url: 'https://open.bigmodel.cn/', + }, + touchpoints: { + embedding: { + models: ['embedding-3', 'embedding-2'], + default_dims: 1024, + // 2048 exposed but breaks HNSW (exact-scan fallback). 1024/512/256 + // stay HNSW-compatible. + dims_options: [256, 512, 1024, 2048], + max_batch_tokens: 8192, + chars_per_token: 2, + }, + }, + setup_hint: + 'Get an API key at https://open.bigmodel.cn/, then `export ZHIPUAI_API_KEY=...`', +}; diff --git a/src/core/ai/types.ts b/src/core/ai/types.ts index b9125e5f0..e88867993 100644 --- a/src/core/ai/types.ts +++ b/src/core/ai/types.ts @@ -72,6 +72,33 @@ export interface EmbeddingTouchpoint { * text embedding paths ignore it. */ multimodal_models?: string[]; + /** + * v0.32: when true, the recipe ships without a fixed model list and users + * MUST provide `--embedding-model provider:model` and + * `--embedding-dimensions N` explicitly. Used by litellm-proxy and + * llama-server (and any future "bring your own backend" recipe). + * + * Consumers: + * - `recipes-contract.test.ts` permits `models.length === 0` only when + * this flag is true. + * - `gateway.ts` skips the model-list-must-include-modelId check. + * - `init.ts:resolveAIOptions` refuses the implicit "first model" pick + * for shorthand `--model ` and prints a setup hint. + */ + user_provided_models?: true; + /** + * v0.32 (#779 reworked): explicit opt-out of the missing-max_batch_tokens + * startup warning. Set to `true` for recipes whose batch capacity is + * genuinely dynamic (Ollama: depends on user-loaded model; LiteLLM proxy: + * depends on backend; llama.cpp: depends on `--ctx-size` at server launch). + * + * Without this flag, missing `max_batch_tokens` triggers a once-per-process + * stderr warning so future recipes that forget the cap (and would + * silently rely on recursive-halving) don't ship un-noticed. Recipes that + * declare `no_batch_cap: true` are explicitly opting out — the warning is + * noise for them. + */ + no_batch_cap?: true; } /** @@ -150,6 +177,63 @@ export interface Recipe { aliases?: Record; /** One-line description of setup (shown in wizard + env subcommand). */ setup_hint?: string; + /** + * v0.32 (D12=A): unified auth resolver across embed / expansion / chat + * touchpoints. Returns the header name (`Authorization`, `api-key`, etc.) + * and the full header value (for Bearer-style providers, include the + * `Bearer ` prefix). Throws AIConfigError when required env is missing + * with a hint pointing at the recipe's setup_url. + * + * When omitted, the gateway applies a default that returns + * `{headerName: 'Authorization', token: 'Bearer ' + env[auth_env.required[0]]}`. + * The default is the right behavior for OpenAI-compatible providers with a + * single API key. Recipes deviating (Azure uses `api-key`; future OAuth + * providers fetch dynamic tokens) override this. + * + * IMPORTANT: this runs at gateway-configure time (NOT at embed-call time) + * so the env snapshot in `cfg.env` is consulted, never `process.env`. + */ + resolveAuth?(env: Record): { + headerName: string; + token: string; + }; + /** + * v0.32: templated openai-compatible config for recipes whose URL shape + * doesn't fit a static `base_url_default`. Returns the resolved baseURL + * and an optional fetch wrapper for cases like Azure OpenAI that need a + * query parameter (?api-version=) injected on every request. + * + * Default behavior (when undefined): use `base_urls[recipe.id]` from + * config or `recipe.base_url_default`. Throws `AIConfigError` when both + * are missing. + * + * Currently only Azure OpenAI overrides this — the URL is templated + * from `AZURE_OPENAI_ENDPOINT` + `AZURE_OPENAI_DEPLOYMENT` and the fetch + * wrapper splices `api-version` into every request URL. + */ + resolveOpenAICompatConfig?(env: Record): { + baseURL: string; + fetch?: typeof fetch; + }; + /** + * v0.32 (D13=A): optional runtime readiness check for local-server + * recipes (ollama, llama-server, future lmstudio-recipe). Returns + * `ready: false` when the local endpoint isn't reachable, with a `hint` + * the wizard / doctor can surface. + * + * Defaults to env-only readiness (`auth_env.required` all set) when + * absent. Consumed by `runExplain()` in `src/commands/providers.ts` and + * by the doctor's embedding probe; both wrap the call in + * `Promise.allSettled` with a 200ms timeout so a hung local server does + * not block the provider matrix. + * + * `baseURL`: optional resolved URL the gateway will actually call (from + * `cfg.base_urls[recipe.id]` or recipe defaults). Pass it so the probe + * checks the same endpoint as live traffic. Without it, the probe falls + * back to recipe defaults / env, which can disagree with config-only + * URL overrides (codex finding #5). + */ + probe?(baseURL?: string): Promise<{ ready: boolean; hint?: string }>; } export interface AIGatewayConfig { diff --git a/src/core/config.ts b/src/core/config.ts index ae84916fa..9b920c409 100644 --- a/src/core/config.ts +++ b/src/core/config.ts @@ -151,6 +151,7 @@ export function loadConfig(): GBrainConfig | null { ...(dbUrl ? { database_url: dbUrl } : {}), ...(dbUrl ? { database_path: undefined } : {}), ...(process.env.OPENAI_API_KEY ? { openai_api_key: process.env.OPENAI_API_KEY } : {}), + ...(process.env.ANTHROPIC_API_KEY ? { anthropic_api_key: process.env.ANTHROPIC_API_KEY } : {}), ...(process.env.GBRAIN_EMBEDDING_MODEL ? { embedding_model: process.env.GBRAIN_EMBEDDING_MODEL } : {}), ...(process.env.GBRAIN_EMBEDDING_DIMENSIONS ? { embedding_dimensions: parseInt(process.env.GBRAIN_EMBEDDING_DIMENSIONS, 10) } : {}), ...(process.env.GBRAIN_EXPANSION_MODEL ? { expansion_model: process.env.GBRAIN_EXPANSION_MODEL } : {}), diff --git a/test/ai/no-batch-cap-suppression.serial.test.ts b/test/ai/no-batch-cap-suppression.serial.test.ts new file mode 100644 index 000000000..9bd3e69b7 --- /dev/null +++ b/test/ai/no-batch-cap-suppression.serial.test.ts @@ -0,0 +1,79 @@ +/** + * #779 + #121 adjacent fixes (Commit 9 of v0.32 wave). + * + * Coverage: + * - Recipes with `embedding.no_batch_cap: true` suppress the + * missing-max_batch_tokens startup warning (#779) + * - Real-provider recipes without the flag still warn (regression guard) + * - listRecipes returns expected dynamic-cap recipes (ollama, litellm, + * llama-server) all flagged + */ + +import { afterAll, beforeAll, describe, expect, mock, test } from 'bun:test'; +import { configureGateway, resetGateway } from '../../src/core/ai/gateway.ts'; +import { listRecipes, getRecipe } from '../../src/core/ai/recipes/index.ts'; + +describe('v0.32 #779: no_batch_cap suppresses the missing-max_batch_tokens warning', () => { + let warnSpy: ReturnType; + let realWarn: typeof console.warn; + + beforeAll(() => { + realWarn = console.warn; + warnSpy = mock(() => {}); + console.warn = warnSpy as any; + }); + + afterAll(() => { + console.warn = realWarn; + resetGateway(); + }); + + test('Ollama, LiteLLM, llama-server all declare no_batch_cap: true', () => { + for (const id of ['ollama', 'litellm', 'llama-server']) { + const r = getRecipe(id); + expect(r, `${id} not registered`).toBeDefined(); + expect( + r!.touchpoints.embedding?.no_batch_cap, + `${id} should declare no_batch_cap: true`, + ).toBe(true); + } + }); + + test('configureGateway does NOT warn for ollama/litellm/llama-server', () => { + warnSpy.mockClear(); + resetGateway(); + configureGateway({ env: {} }); + const messages = warnSpy.mock.calls.map(c => String(c[0] ?? '')); + for (const id of ['ollama', 'litellm', 'llama-server']) { + expect( + messages.some(m => m.includes(`"${id}"`)), + `should NOT warn for ${id}`, + ).toBe(false); + } + }); + + test('configureGateway STILL warns for google (real provider, no cap declared)', () => { + warnSpy.mockClear(); + resetGateway(); + configureGateway({ env: {} }); + const messages = warnSpy.mock.calls.map(c => String(c[0] ?? '')); + expect( + messages.some(m => m.includes('"google"') && m.includes('without max_batch_tokens')), + 'google should warn (it has fixed-cap models)', + ).toBe(true); + }); + + test('every recipe with empty models[] declares user_provided_models OR has openai-fast-path', () => { + // Cross-cutting invariant: contracts should not silently disagree. + for (const r of listRecipes()) { + const e = r.touchpoints.embedding; + if (!e) continue; + if (e.models.length === 0) { + expect( + e.user_provided_models === true || r.id === 'litellm', + `${r.id} has empty models[] — must declare user_provided_models: true`, + ).toBe(true); + } + } + }); +}); diff --git a/test/ai/recipe-azure-openai.test.ts b/test/ai/recipe-azure-openai.test.ts new file mode 100644 index 000000000..1bd4dcd07 --- /dev/null +++ b/test/ai/recipe-azure-openai.test.ts @@ -0,0 +1,202 @@ +/** + * Azure OpenAI recipe smoke (Commit 8 of the v0.32 wave). + * + * Azure is the first recipe to exercise BOTH new seams: + * - resolveAuth → custom header (api-key, NOT Authorization Bearer) + * - resolveOpenAICompatConfig → templated baseURL + fetch wrapper that + * splices `?api-version=` onto every request + * + * Coverage: + * - Recipe registered with expected shape + * - resolveAuth returns api-key header; missing key → AIConfigError + * - resolveOpenAICompatConfig templates baseURL from endpoint + deployment + * - resolveOpenAICompatConfig throws when endpoint or deployment missing + * - fetch wrapper splices api-version query param (default + override) + * - applyResolveAuth puts the key in headers (NOT apiKey, no double-auth) + * - applyOpenAICompatConfig honors the recipe override + */ + +import { describe, expect, test } from 'bun:test'; +import { getRecipe } from '../../src/core/ai/recipes/index.ts'; +import { + applyResolveAuth, + applyOpenAICompatConfig, +} from '../../src/core/ai/gateway.ts'; +import { AIConfigError } from '../../src/core/ai/errors.ts'; + +const FULL_ENV = { + AZURE_OPENAI_API_KEY: 'az-fake-key', + AZURE_OPENAI_ENDPOINT: 'https://my-resource.openai.azure.com', + AZURE_OPENAI_DEPLOYMENT: 'embed-deployment', +}; + +describe('recipe: azure-openai', () => { + test('registered with expected shape', () => { + const r = getRecipe('azure-openai'); + expect(r).toBeDefined(); + expect(r!.id).toBe('azure-openai'); + expect(r!.tier).toBe('openai-compat'); + expect(r!.implementation).toBe('openai-compatible'); + expect(r!.base_url_default).toBeUndefined(); // env-templated only + expect(r!.auth_env?.required).toEqual([ + 'AZURE_OPENAI_API_KEY', + 'AZURE_OPENAI_ENDPOINT', + 'AZURE_OPENAI_DEPLOYMENT', + ]); + expect(r!.auth_env?.optional).toContain('AZURE_OPENAI_API_VERSION'); + }); + + test('embedding touchpoint declares 3 models + 1536 default + Matryoshka options', () => { + const r = getRecipe('azure-openai')!; + expect(r.touchpoints.embedding).toBeDefined(); + expect(r.touchpoints.embedding!.models).toEqual([ + 'text-embedding-3-large', + 'text-embedding-3-small', + 'text-embedding-ada-002', + ]); + expect(r.touchpoints.embedding!.default_dims).toBe(1536); + expect(r.touchpoints.embedding!.dims_options).toContain(3072); + }); + + test('resolveAuth returns api-key header (NOT Authorization Bearer)', () => { + const r = getRecipe('azure-openai')!; + const auth = r.resolveAuth!({ AZURE_OPENAI_API_KEY: 'az-fake-key' }); + expect(auth.headerName).toBe('api-key'); + expect(auth.token).toBe('az-fake-key'); + expect(auth.token).not.toContain('Bearer'); // critical: no Bearer prefix + }); + + test('resolveAuth throws AIConfigError when AZURE_OPENAI_API_KEY missing', () => { + const r = getRecipe('azure-openai')!; + expect(() => r.resolveAuth!({})).toThrow(AIConfigError); + }); + + test('applyResolveAuth puts the key in headers (NOT apiKey) — no double-auth', () => { + const r = getRecipe('azure-openai')!; + const result = applyResolveAuth(r, { env: FULL_ENV } as any, 'embedding'); + expect(result.apiKey, 'apiKey must be undefined to avoid double-auth').toBeUndefined(); + expect(result.headers).toEqual({ 'api-key': 'az-fake-key' }); + }); + + test('resolveOpenAICompatConfig templates baseURL from endpoint + deployment', () => { + const r = getRecipe('azure-openai')!; + const cfg = r.resolveOpenAICompatConfig!(FULL_ENV); + expect(cfg.baseURL).toBe( + 'https://my-resource.openai.azure.com/openai/deployments/embed-deployment', + ); + expect(typeof cfg.fetch).toBe('function'); + }); + + test('resolveOpenAICompatConfig strips trailing slash from endpoint', () => { + const r = getRecipe('azure-openai')!; + const cfg = r.resolveOpenAICompatConfig!({ + ...FULL_ENV, + AZURE_OPENAI_ENDPOINT: 'https://my-resource.openai.azure.com/', + }); + expect(cfg.baseURL).toBe( + 'https://my-resource.openai.azure.com/openai/deployments/embed-deployment', + ); + }); + + test('resolveOpenAICompatConfig throws when endpoint or deployment missing', () => { + const r = getRecipe('azure-openai')!; + expect(() => + r.resolveOpenAICompatConfig!({ + AZURE_OPENAI_API_KEY: 'k', + AZURE_OPENAI_DEPLOYMENT: 'd', + }), + ).toThrow(AIConfigError); + expect(() => + r.resolveOpenAICompatConfig!({ + AZURE_OPENAI_API_KEY: 'k', + AZURE_OPENAI_ENDPOINT: 'https://x.openai.azure.com', + }), + ).toThrow(AIConfigError); + }); + + test('fetch wrapper splices ?api-version=... onto every request URL (default version)', async () => { + const r = getRecipe('azure-openai')!; + const cfg = r.resolveOpenAICompatConfig!(FULL_ENV); + const wrapped = cfg.fetch!; + // Stub global fetch to capture the URL the wrapper hands off. + const captured: string[] = []; + const realFetch = globalThis.fetch; + globalThis.fetch = ((input: any, _init?: any) => { + const url = typeof input === 'string' ? input : input instanceof URL ? input.toString() : input.url; + captured.push(url); + return Promise.resolve(new Response('{}', { status: 200 })); + }) as typeof fetch; + try { + await wrapped('https://my-resource.openai.azure.com/openai/deployments/embed-deployment/embeddings'); + expect(captured).toHaveLength(1); + expect(captured[0]).toContain('api-version='); + expect(captured[0]).toContain('2024-10-21'); // DEFAULT_API_VERSION + } finally { + globalThis.fetch = realFetch; + } + }); + + test('fetch wrapper honors AZURE_OPENAI_API_VERSION override', async () => { + const r = getRecipe('azure-openai')!; + const cfg = r.resolveOpenAICompatConfig!({ + ...FULL_ENV, + AZURE_OPENAI_API_VERSION: '2025-04-01', + }); + const captured: string[] = []; + const realFetch = globalThis.fetch; + globalThis.fetch = ((input: any) => { + const url = typeof input === 'string' ? input : input instanceof URL ? input.toString() : input.url; + captured.push(url); + return Promise.resolve(new Response('{}', { status: 200 })); + }) as typeof fetch; + try { + await cfg.fetch!('https://my-resource.openai.azure.com/openai/deployments/embed-deployment/embeddings'); + expect(captured[0]).toContain('api-version=2025-04-01'); + } finally { + globalThis.fetch = realFetch; + } + }); + + test('fetch wrapper does NOT double-add api-version when caller already set it', async () => { + const r = getRecipe('azure-openai')!; + const cfg = r.resolveOpenAICompatConfig!(FULL_ENV); + const captured: string[] = []; + const realFetch = globalThis.fetch; + globalThis.fetch = ((input: any) => { + const url = typeof input === 'string' ? input : input instanceof URL ? input.toString() : input.url; + captured.push(url); + return Promise.resolve(new Response('{}', { status: 200 })); + }) as typeof fetch; + try { + await cfg.fetch!('https://my-resource.openai.azure.com/openai/deployments/embed-deployment/embeddings?api-version=2025-01-01'); + expect(captured[0]).toBe( + 'https://my-resource.openai.azure.com/openai/deployments/embed-deployment/embeddings?api-version=2025-01-01', + ); + } finally { + globalThis.fetch = realFetch; + } + }); + + test('applyOpenAICompatConfig honors the recipe override (templated URL)', () => { + const r = getRecipe('azure-openai')!; + const result = applyOpenAICompatConfig(r, { env: FULL_ENV } as any); + expect(result.baseURL).toBe( + 'https://my-resource.openai.azure.com/openai/deployments/embed-deployment', + ); + expect(typeof result.fetch).toBe('function'); + }); + + test('dimsProviderOptions threads dimensions for text-embedding-3-* via openai-compat', async () => { + // Codex finding #1: Azure (openai-compatible) was missing dim + // passthrough for text-embedding-3-large. Without `dimensions`, Azure + // returns 3072d; gbrain config expects 1536d → first embed hard-fails. + const { dimsProviderOptions } = await import('../../src/core/ai/dims.ts'); + expect(dimsProviderOptions('openai-compatible', 'text-embedding-3-large', 1536)) + .toEqual({ openaiCompatible: { dimensions: 1536 } }); + expect(dimsProviderOptions('openai-compatible', 'text-embedding-3-small', 512)) + .toEqual({ openaiCompatible: { dimensions: 512 } }); + // ada-002 has no dimensions knob; recipe must accept the native 1536. + expect(dimsProviderOptions('openai-compatible', 'text-embedding-ada-002', 1536)) + .toBeUndefined(); + }); +}); diff --git a/test/ai/recipe-dashscope.test.ts b/test/ai/recipe-dashscope.test.ts new file mode 100644 index 000000000..2d1dac288 --- /dev/null +++ b/test/ai/recipe-dashscope.test.ts @@ -0,0 +1,71 @@ +/** + * DashScope (Alibaba) recipe smoke (Commit 6 of the v0.32 wave). + */ + +import { describe, expect, test } from 'bun:test'; +import { getRecipe } from '../../src/core/ai/recipes/index.ts'; +import { defaultResolveAuth } from '../../src/core/ai/gateway.ts'; +import { AIConfigError } from '../../src/core/ai/errors.ts'; + +describe('recipe: dashscope', () => { + test('registered with expected shape', () => { + const r = getRecipe('dashscope'); + expect(r).toBeDefined(); + expect(r!.id).toBe('dashscope'); + expect(r!.tier).toBe('openai-compat'); + expect(r!.implementation).toBe('openai-compatible'); + expect(r!.base_url_default).toBe( + 'https://dashscope-intl.aliyuncs.com/compatible-mode/v1', + ); + expect(r!.auth_env?.required).toEqual(['DASHSCOPE_API_KEY']); + }); + + test('embedding touchpoint declares text-embedding-v3 first + 1024 dims', () => { + const r = getRecipe('dashscope')!; + expect(r.touchpoints.embedding).toBeDefined(); + expect(r.touchpoints.embedding!.models[0]).toBe('text-embedding-v3'); + expect(r.touchpoints.embedding!.models).toContain('text-embedding-v2'); + expect(r.touchpoints.embedding!.default_dims).toBe(1024); + expect(r.touchpoints.embedding!.dims_options).toEqual([64, 128, 256, 512, 768, 1024]); + // Matryoshka: every dims option ≤ 2000 (HNSW-compatible). + for (const d of r.touchpoints.embedding!.dims_options ?? []) { + expect(d).toBeLessThanOrEqual(2000); + } + }); + + test('default auth: DASHSCOPE_API_KEY set → "Bearer "', () => { + const r = getRecipe('dashscope')!; + const auth = defaultResolveAuth( + r, + { DASHSCOPE_API_KEY: 'sk-dashscope-fake' }, + 'embedding', + ); + expect(auth.headerName).toBe('Authorization'); + expect(auth.token).toBe('Bearer sk-dashscope-fake'); + }); + + test('default auth: missing DASHSCOPE_API_KEY → AIConfigError', () => { + const r = getRecipe('dashscope')!; + expect(() => defaultResolveAuth(r, {}, 'embedding')).toThrow(AIConfigError); + }); + + test('declares chars_per_token + max_batch_tokens for safer batching', () => { + const r = getRecipe('dashscope')!; + expect(r.touchpoints.embedding!.max_batch_tokens).toBeGreaterThan(0); + expect(r.touchpoints.embedding!.chars_per_token).toBeGreaterThan(0); + }); + + test('dimsProviderOptions threads dimensions for text-embedding-v3 (Matryoshka)', async () => { + // Codex finding #1: DashScope text-embedding-v3 is Matryoshka 64-1024. + // Without `dimensions` on the wire, user-selected non-default dims are + // silently ignored and the provider returns its default size. + const { dimsProviderOptions } = await import('../../src/core/ai/dims.ts'); + expect(dimsProviderOptions('openai-compatible', 'text-embedding-v3', 512)) + .toEqual({ openaiCompatible: { dimensions: 512 } }); + expect(dimsProviderOptions('openai-compatible', 'text-embedding-v3', 1024)) + .toEqual({ openaiCompatible: { dimensions: 1024 } }); + // text-embedding-v2 is fixed-dim; no passthrough. + expect(dimsProviderOptions('openai-compatible', 'text-embedding-v2', 1024)) + .toBeUndefined(); + }); +}); diff --git a/test/ai/recipe-llama-server.test.ts b/test/ai/recipe-llama-server.test.ts new file mode 100644 index 000000000..58775142c --- /dev/null +++ b/test/ai/recipe-llama-server.test.ts @@ -0,0 +1,82 @@ +/** + * llama-server recipe smoke (Commit 4 of the v0.32 wave). + * + * llama-server is the second user-driven-models recipe (alongside + * litellm-proxy). It declares `models: []`, `user_provided_models: true`, + * and a `probe()` that consults LLAMA_SERVER_BASE_URL. + * + * Coverage: + * - Recipe registered + has expected fields + * - user_provided_models is the explicit signal (not the legacy id heuristic) + * - probe is callable and reports `ready: false` with a setup hint when no server is listening + * - default auth resolves to "Bearer unauthenticated" (or the API key if set) + */ + +import { describe, expect, test } from 'bun:test'; +import { getRecipe } from '../../src/core/ai/recipes/index.ts'; +import { defaultResolveAuth } from '../../src/core/ai/gateway.ts'; +import { withEnv } from '../helpers/with-env.ts'; + +describe('recipe: llama-server', () => { + test('registered with expected shape', () => { + const r = getRecipe('llama-server'); + expect(r).toBeDefined(); + expect(r!.id).toBe('llama-server'); + expect(r!.tier).toBe('openai-compat'); + expect(r!.implementation).toBe('openai-compatible'); + expect(r!.base_url_default).toBe('http://localhost:8080/v1'); + expect(r!.auth_env?.required ?? []).toEqual([]); + expect(r!.auth_env?.optional ?? []).toContain('LLAMA_SERVER_BASE_URL'); + expect(r!.auth_env?.optional ?? []).toContain('LLAMA_SERVER_API_KEY'); + }); + + test('embedding touchpoint declares user_provided_models', () => { + const r = getRecipe('llama-server')!; + expect(r.touchpoints.embedding).toBeDefined(); + expect(r.touchpoints.embedding!.models).toEqual([]); + expect(r.touchpoints.embedding!.user_provided_models).toBe(true); + expect(r.touchpoints.embedding!.default_dims).toBe(0); + }); + + test('declares a probe function', () => { + const r = getRecipe('llama-server')!; + expect(typeof r.probe).toBe('function'); + }); + + test('probe returns ready=false with hint when no server listening on default port', async () => { + // Use a guaranteed-unreachable port. withEnv ensures the prior value + // (if any) is restored after the test, including across the + // shared-process parallel test runner. + await withEnv({ LLAMA_SERVER_BASE_URL: 'http://127.0.0.1:1/v1' }, async () => { + const r = getRecipe('llama-server')!; + const result = await r.probe!(); + expect(result.ready).toBe(false); + expect(result.hint).toBeDefined(); + expect(result.hint!.toLowerCase()).toContain('llama-server'); + }); + }); + + test('default auth: no env → "Bearer unauthenticated"', () => { + const r = getRecipe('llama-server')!; + const auth = defaultResolveAuth(r, {}, 'embedding'); + expect(auth.headerName).toBe('Authorization'); + expect(auth.token).toBe('Bearer unauthenticated'); + }); + + test('default auth: LLAMA_SERVER_API_KEY set → "Bearer "', () => { + const r = getRecipe('llama-server')!; + const auth = defaultResolveAuth(r, { LLAMA_SERVER_API_KEY: 'sk-llama-fake' }, 'embedding'); + expect(auth.headerName).toBe('Authorization'); + expect(auth.token).toBe('Bearer sk-llama-fake'); + }); + + test('default auth: LLAMA_SERVER_BASE_URL alone does NOT become the Bearer (URL-shaped optional)', () => { + const r = getRecipe('llama-server')!; + const auth = defaultResolveAuth( + r, + { LLAMA_SERVER_BASE_URL: 'http://my-llama:8080/v1' }, + 'embedding', + ); + expect(auth.token).toBe('Bearer unauthenticated'); + }); +}); diff --git a/test/ai/recipe-minimax.test.ts b/test/ai/recipe-minimax.test.ts new file mode 100644 index 000000000..96f6c9eda --- /dev/null +++ b/test/ai/recipe-minimax.test.ts @@ -0,0 +1,59 @@ +/** + * MiniMax recipe smoke (Commit 5 of the v0.32 wave). + * + * Coverage: + * - Recipe registered with expected shape + * - default auth: MINIMAX_API_KEY → "Bearer "; missing → AIConfigError + * - dimsProviderOptions threads `type: 'db'` for embo-01 (the asymmetric + * retrieval field default) — pins the v1 indexing-only behavior + */ + +import { describe, expect, test } from 'bun:test'; +import { getRecipe } from '../../src/core/ai/recipes/index.ts'; +import { defaultResolveAuth } from '../../src/core/ai/gateway.ts'; +import { dimsProviderOptions } from '../../src/core/ai/dims.ts'; +import { AIConfigError } from '../../src/core/ai/errors.ts'; + +describe('recipe: minimax', () => { + test('registered with expected shape', () => { + const r = getRecipe('minimax'); + expect(r).toBeDefined(); + expect(r!.id).toBe('minimax'); + expect(r!.tier).toBe('openai-compat'); + expect(r!.implementation).toBe('openai-compatible'); + expect(r!.base_url_default).toBe('https://api.minimaxi.com/v1'); + expect(r!.auth_env?.required).toEqual(['MINIMAX_API_KEY']); + expect(r!.auth_env?.optional).toContain('MINIMAX_GROUP_ID'); + }); + + test('embedding touchpoint declares embo-01 + 1536 dims', () => { + const r = getRecipe('minimax')!; + expect(r.touchpoints.embedding).toBeDefined(); + expect(r.touchpoints.embedding!.models).toEqual(['embo-01']); + expect(r.touchpoints.embedding!.default_dims).toBe(1536); + expect(r.touchpoints.embedding!.user_provided_models ?? false).toBe(false); + expect(r.touchpoints.embedding!.max_batch_tokens).toBe(4096); + }); + + test('default auth: MINIMAX_API_KEY set → "Bearer "', () => { + const r = getRecipe('minimax')!; + const auth = defaultResolveAuth(r, { MINIMAX_API_KEY: 'fake-mm-key' }, 'embedding'); + expect(auth.headerName).toBe('Authorization'); + expect(auth.token).toBe('Bearer fake-mm-key'); + }); + + test('default auth: missing MINIMAX_API_KEY → AIConfigError', () => { + const r = getRecipe('minimax')!; + expect(() => defaultResolveAuth(r, {}, 'embedding')).toThrow(AIConfigError); + }); + + test('dimsProviderOptions threads type:db for embo-01', () => { + const opts = dimsProviderOptions('openai-compatible', 'embo-01', 1536); + expect(opts).toEqual({ openaiCompatible: { type: 'db' } }); + }); + + test('dimsProviderOptions returns undefined for non-MiniMax openai-compat models', () => { + expect(dimsProviderOptions('openai-compatible', 'voyage-3-lite', 512)).toBeUndefined(); + expect(dimsProviderOptions('openai-compatible', 'nomic-embed-text', 768)).toBeUndefined(); + }); +}); diff --git a/test/ai/recipe-zhipu.test.ts b/test/ai/recipe-zhipu.test.ts new file mode 100644 index 000000000..dfb5f112a --- /dev/null +++ b/test/ai/recipe-zhipu.test.ts @@ -0,0 +1,85 @@ +/** + * Zhipu AI (BigModel) recipe smoke (Commit 7 of the v0.32 wave). + * + * Coverage: + * - Recipe registered with expected shape + * - default auth: ZHIPUAI_API_KEY → "Bearer "; missing → AIConfigError + * - dims_options exposes [256, 512, 1024, 2048]; default 1024 (HNSW-compatible) + * - 2048-dim path falls into exact-scan branch via chunkEmbeddingIndexSql + * from src/core/vector-index.ts + */ + +import { describe, expect, test } from 'bun:test'; +import { getRecipe } from '../../src/core/ai/recipes/index.ts'; +import { defaultResolveAuth } from '../../src/core/ai/gateway.ts'; +import { AIConfigError } from '../../src/core/ai/errors.ts'; +import { + PGVECTOR_HNSW_VECTOR_MAX_DIMS, + chunkEmbeddingIndexSql, +} from '../../src/core/vector-index.ts'; + +describe('recipe: zhipu', () => { + test('registered with expected shape', () => { + const r = getRecipe('zhipu'); + expect(r).toBeDefined(); + expect(r!.id).toBe('zhipu'); + expect(r!.tier).toBe('openai-compat'); + expect(r!.implementation).toBe('openai-compatible'); + expect(r!.base_url_default).toBe('https://open.bigmodel.cn/api/paas/v4'); + expect(r!.auth_env?.required).toEqual(['ZHIPUAI_API_KEY']); + }); + + test('embedding touchpoint declares embedding-3 first + 1024 dims (HNSW-compatible default)', () => { + const r = getRecipe('zhipu')!; + expect(r.touchpoints.embedding).toBeDefined(); + expect(r.touchpoints.embedding!.models[0]).toBe('embedding-3'); + expect(r.touchpoints.embedding!.models).toContain('embedding-2'); + expect(r.touchpoints.embedding!.default_dims).toBe(1024); + expect(r.touchpoints.embedding!.dims_options).toEqual([256, 512, 1024, 2048]); + // The default must stay HNSW-compatible. + expect(r.touchpoints.embedding!.default_dims).toBeLessThanOrEqual( + PGVECTOR_HNSW_VECTOR_MAX_DIMS, + ); + }); + + test('default auth: ZHIPUAI_API_KEY set → "Bearer "', () => { + const r = getRecipe('zhipu')!; + const auth = defaultResolveAuth(r, { ZHIPUAI_API_KEY: 'fake-zhipu-key' }, 'embedding'); + expect(auth.headerName).toBe('Authorization'); + expect(auth.token).toBe('Bearer fake-zhipu-key'); + }); + + test('default auth: missing ZHIPUAI_API_KEY → AIConfigError', () => { + const r = getRecipe('zhipu')!; + expect(() => defaultResolveAuth(r, {}, 'embedding')).toThrow(AIConfigError); + }); + + test('2048-dim option from dims_options falls into exact-scan branch', () => { + // 2048d exceeds the HNSW cap, so chunkEmbeddingIndexSql returns the + // exact-scan-skip-index path. Users picking 2048 trade ANN speed for + // full embedding fidelity. + const sql = chunkEmbeddingIndexSql(2048); + expect(sql.toLowerCase()).toContain('skipped'); + expect(sql.toLowerCase()).toContain('hnsw'); + }); + + test('1024-dim default returns the HNSW index SQL (fast path)', () => { + const sql = chunkEmbeddingIndexSql(1024); + expect(sql.toLowerCase()).toContain('create index'); + expect(sql.toLowerCase()).toContain('hnsw'); + }); + + test('dimsProviderOptions threads dimensions for embedding-3 (Matryoshka)', async () => { + // Codex finding #1: Zhipu embedding-3 is Matryoshka 256-2048. Without + // `dimensions` on the wire, user-selected non-default dims are + // silently ignored. + const { dimsProviderOptions } = await import('../../src/core/ai/dims.ts'); + expect(dimsProviderOptions('openai-compatible', 'embedding-3', 1024)) + .toEqual({ openaiCompatible: { dimensions: 1024 } }); + expect(dimsProviderOptions('openai-compatible', 'embedding-3', 2048)) + .toEqual({ openaiCompatible: { dimensions: 2048 } }); + // embedding-2 is fixed-dim; no passthrough. + expect(dimsProviderOptions('openai-compatible', 'embedding-2', 1024)) + .toBeUndefined(); + }); +}); diff --git a/test/ai/recipes-existing-regression.test.ts b/test/ai/recipes-existing-regression.test.ts new file mode 100644 index 000000000..567c1c94e --- /dev/null +++ b/test/ai/recipes-existing-regression.test.ts @@ -0,0 +1,196 @@ +/** + * IRON RULE regression test (D2/D12=A): the v0.32 resolveAuth refactor + * MUST NOT change auth behavior for any of the 9 existing recipes + * (openai, anthropic, google, deepseek, groq, ollama, litellm-proxy, + * together, voyage). + * + * Pre-v0.32, openai-compatible auth was duplicated 3 times in gateway.ts + * with subtle drift; D12=A unified all three through Recipe.resolveAuth? + * with a default that covers existing recipes unchanged. This test pins + * the contract so the next refactor can't silently regress it. + * + * Coverage: + * - defaultResolveAuth returns Authorization Bearer when required[0] is set + * - throws AIConfigError when required env is missing (with recipe name + touchpoint in message) + * - falls back to first present optional env when required is empty (Ollama-style) + * - falls back to 'unauthenticated' when neither required nor optional present + * - applyResolveAuth converts Authorization Bearer to {apiKey} (SDK native) + * - applyResolveAuth converts custom headers to {headers} WITHOUT apiKey (no double-auth) + * - all 3 touchpoints (embedding, expansion, chat) produce identical auth shape for the same recipe+env + * - native recipes (openai, anthropic, google) are not consulted via resolveAuth (they use their AI-SDK adapters directly) + */ + +import { describe, expect, test } from 'bun:test'; +import { defaultResolveAuth, applyResolveAuth } from '../../src/core/ai/gateway.ts'; +import { listRecipes, getRecipe } from '../../src/core/ai/recipes/index.ts'; +import { AIConfigError } from '../../src/core/ai/errors.ts'; +import type { Recipe } from '../../src/core/ai/types.ts'; + +const TOUCHPOINTS: Array<'embedding' | 'expansion' | 'chat'> = ['embedding', 'expansion', 'chat']; + +describe('IRON RULE: existing 9 recipes survive the v0.32 resolveAuth refactor', () => { + test('all 9 baseline recipes are still registered (subset, allows post-v0.32 additions)', () => { + const ids = new Set(listRecipes().map(r => r.id)); + for (const baseline of [ + 'anthropic', + 'deepseek', + 'google', + 'groq', + 'litellm', + 'ollama', + 'openai', + 'together', + 'voyage', + ]) { + expect(ids.has(baseline), `baseline recipe ${baseline} missing post-refactor`).toBe(true); + } + }); + + test('every recipe with a non-empty required[] returns Authorization Bearer ', () => { + for (const r of listRecipes()) { + const required = r.auth_env?.required ?? []; + if (required.length === 0) continue; + const env = { [required[0]]: `fake-${r.id}-key` }; + const auth = defaultResolveAuth(r, env, 'embedding'); + expect(auth.headerName).toBe('Authorization'); + expect(auth.token).toBe(`Bearer fake-${r.id}-key`); + } + }); + + test('missing required env throws AIConfigError naming the recipe + touchpoint', () => { + const recipesWithRequired = listRecipes().filter(r => (r.auth_env?.required ?? []).length > 0); + expect(recipesWithRequired.length).toBeGreaterThan(0); + for (const r of recipesWithRequired) { + for (const tp of TOUCHPOINTS) { + let caught: unknown; + try { + defaultResolveAuth(r, {}, tp); + } catch (e) { + caught = e; + } + expect(caught, `${r.id} ${tp} should throw on missing env`).toBeInstanceOf(AIConfigError); + const msg = (caught as Error).message; + expect(msg).toContain(r.name); + expect(msg).toContain(tp); + expect(msg).toContain(r.auth_env!.required[0]); + } + } + }); + + test('Ollama (empty required, OLLAMA_API_KEY set) reads it as the Bearer token', () => { + const ollama = getRecipe('ollama'); + expect(ollama).toBeDefined(); + expect(ollama!.auth_env?.required ?? []).toEqual([]); + const optional = ollama!.auth_env?.optional ?? []; + expect(optional).toContain('OLLAMA_API_KEY'); + // OLLAMA_API_KEY (a non-URL-shaped optional) becomes the Bearer. + const auth = defaultResolveAuth(ollama!, { OLLAMA_API_KEY: 'fake-token' }, 'embedding'); + expect(auth.headerName).toBe('Authorization'); + expect(auth.token).toBe('Bearer fake-token'); + }); + + test('Ollama (no env at all) falls back to "Bearer unauthenticated"', () => { + const ollama = getRecipe('ollama'); + const auth = defaultResolveAuth(ollama!, {}, 'embedding'); + expect(auth.headerName).toBe('Authorization'); + expect(auth.token).toBe('Bearer unauthenticated'); + }); + + test('URL-shaped optional env (OLLAMA_BASE_URL, LLAMA_SERVER_BASE_URL) does NOT become the Bearer token', () => { + // Regression for the v0.32 default-fallback design: optional entries + // ending in _URL or _BASE_URL are config (cfg.base_urls), not auth. + // The fallback must skip them and consult the next optional API-key entry. + const ollama = getRecipe('ollama'); + const auth1 = defaultResolveAuth( + ollama!, + { OLLAMA_BASE_URL: 'http://my-ollama/v1' }, + 'embedding', + ); + expect(auth1.token, 'OLLAMA_BASE_URL must not become Bearer token').toBe('Bearer unauthenticated'); + + // When BOTH BASE_URL and API_KEY are set, the API_KEY wins. + const auth2 = defaultResolveAuth( + ollama!, + { OLLAMA_BASE_URL: 'http://my-ollama/v1', OLLAMA_API_KEY: 'real-key' }, + 'embedding', + ); + expect(auth2.token).toBe('Bearer real-key'); + }); + + test('all 3 touchpoints produce identical auth for the same recipe + env', () => { + // Critical regression: pre-v0.32, embedding had a fallback to + // ${recipe.id.toUpperCase()}_API_KEY that expansion and chat lacked. + // Post-D12=A unification, all 3 touchpoints go through the same + // resolver, so the auth shape MUST match. + for (const r of listRecipes()) { + if (r.implementation !== 'openai-compatible') continue; + const required = r.auth_env?.required ?? []; + const env: Record = {}; + if (required.length > 0) env[required[0]] = `fake-${r.id}-key`; + + const embeddingAuth = applyResolveAuth(r, { env } as any, 'embedding'); + const expansionAuth = applyResolveAuth(r, { env } as any, 'expansion'); + const chatAuth = applyResolveAuth(r, { env } as any, 'chat'); + + expect(embeddingAuth, `${r.id} embed=expand`).toEqual(expansionAuth); + expect(expansionAuth, `${r.id} expand=chat`).toEqual(chatAuth); + } + }); + + test('applyResolveAuth converts Authorization Bearer to {apiKey} (SDK-native path)', () => { + const voyage = getRecipe('voyage')!; + const env = { VOYAGE_API_KEY: 'fake-voyage-key' }; + const auth = applyResolveAuth(voyage, { env } as any, 'embedding'); + expect(auth.apiKey).toBe('fake-voyage-key'); + expect(auth.headers).toBeUndefined(); + }); + + test('applyResolveAuth respects a recipe.resolveAuth override that returns a custom header', () => { + // Synthetic recipe with a custom-header resolveAuth (Azure-style preview; + // the actual Azure recipe lands in commit 8). Ensures the seam works. + const fakeAzure: Recipe = { + id: 'fake-azure', + name: 'Fake Azure', + tier: 'openai-compat', + implementation: 'openai-compatible', + auth_env: { required: ['FAKE_AZURE_API_KEY'] }, + touchpoints: {}, + resolveAuth(env) { + const k = env.FAKE_AZURE_API_KEY; + if (!k) throw new AIConfigError('Fake Azure requires FAKE_AZURE_API_KEY.'); + return { headerName: 'api-key', token: k }; + }, + }; + const env = { FAKE_AZURE_API_KEY: 'fake-key' }; + const auth = applyResolveAuth(fakeAzure, { env } as any, 'embedding'); + expect(auth.apiKey, 'custom-header path must NOT set apiKey').toBeUndefined(); + expect(auth.headers).toEqual({ 'api-key': 'fake-key' }); + }); + + test('native-* recipes have no resolveAuth declared; they take native SDK paths', () => { + // Confirms the architectural invariant: resolveAuth is only consulted by + // the openai-compatible branches in instantiate{Embedding,Expansion,Chat}. + // Native recipes (openai, anthropic, google) use createOpenAI / + // createAnthropic / createGoogleGenerativeAI directly with the SDK's + // own apiKey field. This test pins that resolveAuth is intentionally + // absent on the native recipes — a future drift that adds it without + // wiring it through the native branches would silently fail this assert. + for (const id of ['openai', 'anthropic', 'google']) { + const r = getRecipe(id); + expect(r, `recipe ${id} missing`).toBeDefined(); + expect(r!.tier).toBe('native'); + expect(r!.resolveAuth, `${id} should NOT declare resolveAuth in v0.32`).toBeUndefined(); + } + }); + + test('only Azure overrides resolveAuth in v0.32 (default applies elsewhere)', () => { + // The default resolver covers every openai-compatible recipe except + // Azure, which uses the api-key custom-header path. The IRON RULE + // contract: any new override beyond Azure must be reviewed for + // double-auth + back-compat regression. + const overrides = listRecipes().filter( + r => r.implementation === 'openai-compatible' && r.resolveAuth, + ); + expect(overrides.map(r => r.id).sort()).toEqual(['azure-openai']); + }); +}); diff --git a/test/brain-registry.serial.test.ts b/test/brain-registry.serial.test.ts index 83eff6b60..fae4e6154 100644 --- a/test/brain-registry.serial.test.ts +++ b/test/brain-registry.serial.test.ts @@ -270,12 +270,29 @@ describe('BrainRegistry — lazy init', () => { // verify the routing logic by observing the default-branch path. This // test proves the fall-through to HOST_BRAIN_ID happens before any // lookup, not that host init actually succeeds. - const reg = new BrainRegistry([]); - // Expect the host-init path to be attempted (it'll fail on missing - // ~/.gbrain/config.json in test env, but the error will come from - // initHostBrain, not UnknownBrainError — proving routing hit host). - await expect(reg.getBrain(null)).rejects.not.toBeInstanceOf(UnknownBrainError); - await expect(reg.getBrain(undefined)).rejects.not.toBeInstanceOf(UnknownBrainError); - await expect(reg.getBrain('')).rejects.not.toBeInstanceOf(UnknownBrainError); + // + // Hermeticity: dev machines often have a real ~/.gbrain/config.json + // (the maintainer's own brain). Without GBRAIN_HOME isolation, the + // host-init path RESOLVES successfully on those machines instead of + // rejecting, breaking the `rejects.not.toBeInstanceOf` assertion. Pin + // GBRAIN_HOME to a guaranteed-empty tempdir so host-init has nothing + // to find and fails loudly (which is exactly the error the assertion + // wants — not UnknownBrainError, but ALSO not a successful resolve). + const isolatedHome = mkdtempSync(join(tmpdir(), 'brain-registry-home-')); + track(isolatedHome); + const savedHome = process.env.GBRAIN_HOME; + process.env.GBRAIN_HOME = isolatedHome; + try { + const reg = new BrainRegistry([]); + // Expect the host-init path to be attempted (it'll fail on missing + // /.gbrain/config.json, but the error will come from + // initHostBrain, not UnknownBrainError — proving routing hit host). + await expect(reg.getBrain(null)).rejects.not.toBeInstanceOf(UnknownBrainError); + await expect(reg.getBrain(undefined)).rejects.not.toBeInstanceOf(UnknownBrainError); + await expect(reg.getBrain('')).rejects.not.toBeInstanceOf(UnknownBrainError); + } finally { + if (savedHome !== undefined) process.env.GBRAIN_HOME = savedHome; + else delete process.env.GBRAIN_HOME; + } }); });