Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
464dbb2
feat(ai/types): add resolveAuth + probe + user_provided_models fields
garrytan May 10, 2026
55d9b0c
feat(ai/gateway): unify openai-compatible auth via Recipe.resolveAuth…
garrytan May 10, 2026
014d4e8
test(ai): IRON RULE regression test for v0.32 resolveAuth refactor
garrytan May 10, 2026
74c849f
feat(ai): add llama-server recipe (#702 reworked)
garrytan May 10, 2026
1abbe8d
feat(ai): add MiniMax recipe (#148 reworked)
garrytan May 10, 2026
ce05aef
feat(ai): add Alibaba DashScope recipe (#59 split, part 1/2)
garrytan May 10, 2026
ead9643
feat(ai): add Zhipu AI (BigModel) recipe (#59 split, part 2/2)
garrytan May 10, 2026
4ae4a98
feat(ai): add Azure OpenAI recipe (#459 reworked)
garrytan May 10, 2026
180f4fc
feat(ai): adjacent fixes — no_batch_cap (#779) + config-key fallbacks…
garrytan May 10, 2026
c42b885
feat(discoverability): doctor alt-provider advisory + init user_provi…
garrytan May 10, 2026
c384fad
docs(v0.32.0): embedding-providers.md + README callout + CHANGELOG + …
garrytan May 10, 2026
50970ab
docs: regenerate llms.txt + llms-full.txt for v0.32.0
garrytan May 10, 2026
4da8ff9
Merge remote-tracking branch 'origin/master' into garrytan/santo-domi…
garrytan May 10, 2026
a988004
test: hermetic GBRAIN_HOME for brain-registry serial flake + withEnv …
garrytan May 10, 2026
8dbd02a
fix: address 5 codex pre-merge findings (dim passthrough + URL routin…
garrytan May 10, 2026
4defa92
fix(ci): isolate v0.32 no-batch-cap test from mock.module leak (close…
garrytan May 10, 2026
85e5f6d
Merge remote-tracking branch 'origin/master' into garrytan/santo-domi…
garrytan May 10, 2026
4030589
Merge remote-tracking branch 'origin/master' into garrytan/santo-domi…
garrytan May 10, 2026
95d06e1
Merge remote-tracking branch 'origin/master' into garrytan/santo-domi…
garrytan May 11, 2026
0adc258
Merge remote-tracking branch 'origin/master' into garrytan/santo-domi…
garrytan May 11, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
116 changes: 116 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,122 @@

All notable changes to GBrain will be documented in this file.

## [0.32.0] - 2026-05-10

**5 new embedding providers + the discoverability fix that closes the 17-PR dupe cluster.**
**`gbrain providers list` now shows 14 recipes; `gbrain doctor` tells you which alternatives are already wired.**

A triage of 197 open issues + 289 open PRs surfaced a 17-PR cluster of community embedding-provider PRs filed within ~3 weeks (Ollama, Gemini, Voyage, Azure, MiniMax, Copilot, llama-server, Vertex, DashScope, Zhipu, etc.). Most were dupes of work already in master — gbrain has shipped a comprehensive AI SDK gateway + recipe pattern since v0.14, with 9 providers built in. Users just didn't know.

v0.32.0 ships the missing recipes that aren't covered by the existing pattern, plus a documentation pass + doctor advisory + improved error hints that close the discoverability gap. Codex outside-voice review during plan-eng-review caught the discoverability framing — without it, the wave would have shipped 8 recipes plus an OAuth subsystem instead of the focused 5-recipe + docs delivery.

### The numbers that matter

```
gbrain providers list → v0.31.1: 9 providers → v0.32.0: 14 providers
gbrain doctor → v0.31.1: 1 advisory → v0.32.0: 2 advisories (+ alternative_providers)
```

5 new recipes:

| Recipe | Auth | Default dims | Notes |
|---|---|---|---|
| `azure-openai` | `AZURE_OPENAI_API_KEY` + `AZURE_OPENAI_ENDPOINT` + `AZURE_OPENAI_DEPLOYMENT` | 1536 | First recipe with `api-key:` custom header (not Bearer); first with templated URL + `?api-version=` query injection |
| `minimax` | `MINIMAX_API_KEY` | 1536 | China-region; embo-01 model; type='db' asymmetric retrieval field plumbed via dims.ts |
| `dashscope` | `DASHSCOPE_API_KEY` | 1024 | Alibaba; international endpoint default; CJK-aware batching (chars_per_token=2) |
| `zhipu` | `ZHIPUAI_API_KEY` | 1024 | BigModel; embedding-3 with Matryoshka up to 2048 (HNSW falls back to exact-scan past 2000 dims) |
| `llama-server` | (none) | user-set | llama.cpp's `llama-server --embeddings`; user_provided_models recipe |

### What this means for new users

`gbrain init` keeps OpenAI as the zero-config default. Users with API keys for any of the other 13 providers see them surfaced via `gbrain doctor` ("Detected 2 alternative embedding providers ready to use: voyage, dashscope. Run `gbrain providers list` to switch."). Users on Azure tenancies, China-region, or local-only setups have first-class recipes instead of "find a workaround." Users with provider needs gbrain doesn't ship can route through LiteLLM proxy (the universal escape hatch) without writing custom code.

For agents: every recipe is registered in the same `listRecipes()` registry, so `gbrain providers list/test/env/explain` automatically picks up new recipes without code changes. The recipe contract test (`test/ai/recipes-contract.test.ts`) keeps the registry honest.

### To take advantage of v0.32.0

`gbrain upgrade` should do this automatically. If it didn't:

1. **Confirm the new recipes show:**
```bash
gbrain providers list
```
Should show 14 entries including `azure-openai`, `minimax`, `dashscope`, `zhipu`, `llama-server`.

2. **Try the doctor advisory:**
```bash
gbrain doctor
```
Look for the `alternative_providers` row. If env vars for unconfigured providers are present, it'll name them.

3. **Read the new docs** at [`docs/integrations/embedding-providers.md`](docs/integrations/embedding-providers.md) — capability matrix, decision tree, per-recipe setup, "my provider isn't listed" path.

4. **No breaking changes**: the existing 9 recipes (openai, anthropic, google, deepseek, groq, ollama, litellm-proxy, together, voyage) keep working unchanged. The internal auth refactor (D12=A unified resolveAuth seam) is pinned by `test/ai/recipes-existing-regression.test.ts` so the next refactor can't silently break them.

5. **If anything breaks**, file an issue at https://github.com/garrytan/gbrain/issues with `gbrain doctor` output. The only behavior change for existing recipes: Ollama expansion + chat now read `OLLAMA_API_KEY` when set (embedding already did; the unification aligns all three touchpoints).

### Itemized changes

#### Architectural foundations

- **Recipe.resolveAuth(env) seam (D12=A)**: unified the openai-compatible auth path, which was duplicated 3 times across `instantiateEmbedding`, `instantiateExpansion`, `instantiateChat` with subtle drift. Default impl (used by all existing recipes unchanged) returns `{headerName: 'Authorization', token: 'Bearer <key>'}`. Recipes deviating override; Azure is the first.
- **Recipe.resolveOpenAICompatConfig(env) seam**: env-templated baseURL + optional fetch wrapper for recipes whose URL shape doesn't fit a static `base_url_default`. Azure uses both seams.
- **Recipe.probe() seam (D13=A)**: recipe-owned readiness check for local-server providers. Replaces the hardcoded `recipe.id === 'ollama'` special case in `runExplain()`. llama-server declares its own probe; future local providers self-register.
- **EmbeddingTouchpoint.user_provided_models?: true (D8=A)**: explicit signal for recipes that ship without a fixed model list (litellm, llama-server). Replaces the legacy `recipe.id === 'litellm'` hardcode in gateway.ts:223; refusal in `init.ts:resolveAIOptions` for shorthand `--model` with a setup hint pointing at the explicit form.
- **EmbeddingTouchpoint.no_batch_cap?: true**: silences the missing-max_batch_tokens startup warning for recipes with genuinely dynamic batch capacity (Ollama, LiteLLM proxy, llama-server). Pre-fix: 3 stderr warnings on every `configureGateway()` call. Post-fix: only `google` warns.

#### Discoverability

- New `docs/integrations/embedding-providers.md` (one-pager: capability table, decision tree, per-recipe setup, "my provider isn't listed" path to LiteLLM).
- README embedding-providers callout near the top of the install section.
- `gbrain doctor` adds an `alternative_providers` check that surfaces recipes whose env vars are already set but aren't the configured provider.
- `gbrain init --model litellm` (or any user_provided_models recipe) now refuses with a structured setup hint instead of throwing "no embedding models listed."

#### Codex review fixes (pre-merge)

- **dimsProviderOptions on openai-compatible**: text-embedding-3-* (Azure), text-embedding-v3 (DashScope), and embedding-3 (Zhipu) now thread `dimensions` to the wire. Without this, Azure-default 3072d would mismatch a 1536d brain on the first embed; DashScope/Zhipu Matryoshka requests would be silently ignored.
- **`gbrain init --embedding-model llama-server:foo` (verbose path)**: now refuses without `--embedding-dimensions`. Pre-fix, the verbose path fell through to the gateway's 1536d default and silently created the wrong-width schema (only the shorthand `--model` was guarded).
- **MiniMax host correction**: `api.minimax.chat` → `api.minimaxi.com` (matches MiniMax's current OpenAI-compatible docs).
- **`LLAMA_SERVER_BASE_URL` reaches the gateway**: `buildGatewayConfig` now threads `LLAMA_SERVER_BASE_URL`, `OLLAMA_BASE_URL`, `LMSTUDIO_BASE_URL`, `LITELLM_BASE_URL` env into `cfg.base_urls` so embed traffic actually hits the configured port. Pre-fix, the env-only setup let probe pass on a custom port while traffic still hit `localhost:8080`.
- **`Recipe.probe(baseURL?)` accepts the resolved URL**: probe and gateway can no longer disagree when only `provider_base_urls` is set in config (no env). Callers with cfg pass the URL; legacy callers fall back to env.

#### Adjacent fixes

- **#779 (alexandreroumieu-codeapprentice) reworked**: `EmbeddingTouchpoint.no_batch_cap?: true` opt-out for dynamic-cap recipes.
- **#121 (vinsew) reworked**: `~/.gbrain/config.json` API keys now propagate to the gateway env. Pre-fix, `openai_api_key` / `anthropic_api_key` config-file values were ignored (the gateway only saw `process.env`). Common bite: launchd-spawned daemons or agent subprocess tools without `~/.zshrc` propagation. Process env still wins on conflict.
- `loadConfig()` now merges `ANTHROPIC_API_KEY` env var into the file-config result (was silently dropped).
- IRON RULE regression test (`test/ai/recipes-existing-regression.test.ts`): pins that the v0.32 resolveAuth refactor preserves auth behavior for the existing 9 recipes.

### Closed as superseded

The following community PRs are closed because their work is now covered by the recipe system + LiteLLM proxy escape hatch + the recipes shipped in this wave:

- #49, #58, #73, #100, #112, #134, #137, #150, #172, #178, #255, #327, #420, #482, #516, #780, #89 — pluggable embedding adapter / Ollama / Gemini / E5 / Azure-via-LiteLLM / etc.

Each contributor identified a real gap; the patterns they prototyped converged on the recipe system that was shipped in v0.14. Thank you for the early signal.

### Deferred to v0.32.x (with TODOS.md entries)

- **#729 Vertex AI ADC** (lucha0404): proper ADC chain (metadata server, gcloud creds, service-account JSON) is a real product surface, not the single-source-JSON path the original PR proposed.
- **#691 GitHub Copilot** (tonyxu-io): outbound OAuth is a new product surface (login flow, browser/device flow, refresh, UX), not a sidecar recipe. Needs its own design pass.
- **#698 OpenAI Codex OAuth** (perlantir): same OAuth-product-surface argument; chat-only.
- **#765 Hunyuan PGLite + CJK keyword fallback** (313094319-sudo): the CJK PGLite branch is ~150 lines of new SQL + scoring logic that deserves its own focused PR rather than being folded into a 9-commit wave.
- **Interactive provider chooser in `gbrain init`**: the wizard piece of the discoverability lane. v0.32.0 ships the doctor advisory + cleaner refusal that close the 80% case; the full wizard is a v0.32.x follow-up.
- **Real-credentials per-recipe smoke fixtures**: opt-in CI matrix gated on API-key budget approval.

### Contributors

Reworked from / inspired by:
- @cacity (#148 MiniMax)
- @JamesJZhang (#459 Azure OpenAI)
- @Magicray1217 (#59 DashScope + Zhipu)
- @SiyaoZheng (#702 llama-server)
- @alexandreroumieu-codeapprentice (#779)
- @vinsew (#121)
- @100yenadmin / Eva (Voyage 4 Large 2048d HNSW policy, shipped earlier via 3004a87)

Codex outside-voice review during plan-eng-review drove the scope reduction (D11=C) from 8 recipes + OAuth subsystem to 5 recipes + docs.

## [0.31.12] - 2026-05-10

**The chat default no longer 404s, and every Claude call gbrain makes is now one config key away from your preferred model.**
Expand Down
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,8 @@ GBrain is those patterns, generalized. 34 skills. Install in 30 minutes. Your ag

> **LLMs:** fetch [`llms.txt`](llms.txt) for the documentation map, or [`llms-full.txt`](llms-full.txt) for the same map with core docs inlined in one fetch. **Agents:** start with [`AGENTS.md`](AGENTS.md) (or [`CLAUDE.md`](CLAUDE.md) if you're Claude Code).

> **Embedding providers:** OpenAI is the default, but gbrain ships with **14 recipes** covering Voyage, Google Gemini, Azure OpenAI, MiniMax, Alibaba DashScope, Zhipu, Ollama (local), llama.cpp llama-server (local), LiteLLM proxy (universal), and 5 more. Run `gbrain providers list` to see them, or read [`docs/integrations/embedding-providers.md`](docs/integrations/embedding-providers.md) for setup, pricing, and a decision tree. `gbrain doctor` will surface alternative providers whose env vars you already have set.

## Install

### On an agent platform (recommended)
Expand Down
66 changes: 66 additions & 0 deletions TODOS.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,71 @@
# TODOS

## Embedding-provider follow-ups (v0.32.0)

- [ ] **v0.32.x: Vertex AI ADC embedding provider (#729 originally).** lucha0404
prototyped this with single-source-JSON via `GOOGLE_APPLICATION_CREDENTIALS`.
Real ADC is the full chain (metadata server, gcloud creds, service-account
JSON). The recipe needs to either use `@ai-sdk/google-vertex` (one new
dep, native fit) or implement the chain via Bun.crypto.subtle for RS256
JWT signing (zero dep, ~150 lines + RS256 spike). Original Q3 chose
zero-dep; revisit the dep budget when scoping.

- [ ] **v0.32.x: GitHub Copilot embeddings (#691 originally).** tonyxu-io
proposed adding Copilot's Metis embedding endpoint as a sidecar recipe.
Codex review caught that this is not a recipe-add — it's an outbound OAuth
product surface (login flow, browser/device flow, refresh, UX). Needs its
own design pass: where does the token live? `~/.gbrain/oauth/copilot.json`
mode 0600 was the v0.32 plan; revisit + write `gbrain auth login copilot`.

- [ ] **v0.32.x: OpenAI Codex OAuth chat provider (#698 originally).** perlantir
proposed a chat-only provider that reuses ChatGPT subscription auth instead
of API keys. Same OAuth-product-surface argument as #691. Same shared
infra: `~/.gbrain/oauth/<provider>.json` + `gbrain auth login <provider>`.
Build alongside #691 in one OAuth-subsystem wave.

- [ ] **v0.32.x: CJK PGLite keyword fallback (#765 extracted).** 313094319-sudo
hit a real gap: PGLite's FTS doesn't tokenize CJK well, so Chinese queries
return empty results even with proper embeddings. Their PR added a
hasCJK detection branch in `searchKeyword` that switches to LIKE-based
fuzzy matching with a custom scoring function. ~150 lines of new SQL +
scoring + tests. Worth its own focused PR rather than folded into the
v0.32 wave's adjacent-fix lane. Extract `extractSearchTokens`,
`normalizeSearchText`, `hasCJK` helpers + the CJK branch in
`pglite-engine.ts:searchKeyword`. Includes tests for romaji + Korean
Hangul + traditional/simplified Chinese.

- [ ] **v0.32.x: interactive provider chooser in `gbrain init`.** The full
wizard piece of the v0.32 discoverability lane was deferred. Today
`gbrain init` (no flags, TTY) silently uses OpenAI default. Plan: hook
into `init.ts:resolveAIOptions`, when no `--model` AND TTY AND not
`--non-interactive`, call `runExplain([])` (non-JSON path) from
`providers.ts:233-350` to print the provider matrix, then prompt with
readline (mirror `supabaseWizard()` at `init.ts:108`). Suggest
recommended based on env detection. Refuse `user_provided_models`
shorthand (already done in v0.32.0). Tests:
`test/init-provider-wizard.test.ts` (TTY → prompt fires; non-TTY →
falls through; invalid choice → re-prompts).

- [ ] **v0.32.x: real-credentials per-recipe smoke-test CI matrix.** Codex
finding #6 noted that unit tests via `__setEmbedTransportForTests` prove
routing but not contract correctness with the actual provider HTTP
shape. Provider APIs change quietly (Voyage encoding-format, MiniMax
type field, Azure header). One real-call per recipe per month catches
drift before users do; <$1/run estimated. Requires API-key budget
approval + repo secrets.

- [ ] **v0.32.x: MiniMax asymmetric retrieval support.** v0.32 ships
`embo-01` with `type: 'db'` for both indexing and queries (symmetric
retrieval). True asymmetric needs a query/document signal threaded
through the embed seam. Worth it for MiniMax users who care about
retrieval quality on Chinese content; defer until users complain.

- [ ] **v0.32.x: un-hardcode the multimodal dispatch at gateway.ts:583.**
Currently `recipe.id !== 'voyage'` is hardcoded — harmless until a
second multimodal recipe lands. Make it table-driven via
`Recipe.touchpoints.embedding.supports_multimodal` +
`multimodal_models`. ~10 lines + a contract test.

## v0.31.2 follow-ups

### Investigate: `gbrain query <common-keyword>` infinite loop
Expand Down
2 changes: 1 addition & 1 deletion VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
0.31.12
0.32.0
Loading
Loading