fix(ai/recipes): declare max_batch_tokens on google embedding recipe by ChenyqThu · Pull Request #1016 · garrytan/gbrain

ChenyqThu · 2026-05-15T08:01:04Z

Why

google is the only first-party embedding recipe still missing max_batch_tokens after v0.32 (#779) landed the once-per-process startup warning. Operators routing through google:gemini-embedding-001 (the default-provider path after v0.27 native gateway) see the warning on every gbrain query, every kos-compat-api /ingest response, and every cron gbrain invocation:

[ai.gateway] recipe "google" declares an embedding touchpoint without max_batch_tokens; recursion is the only safety net for batch caps.

For CJK-dense or large-payload batches the absent field also forces the gateway to discover Google's per-request token cap reactively via recursive halving instead of pre-splitting — one wasted HTTP round-trip per oversized batch.

The existing test test/ai/no-batch-cap-suppression.serial.test.ts even explicitly pins this state with the comment "google should warn (it has fixed-cap models)", framing it as a known TODO.

Patch

Three field additions to google.touchpoints.embedding:

max_batch_tokens: 20_000,
chars_per_token: 2,
// safety_factor left at gateway default 0.8

Rationale

max_batch_tokens: 20_000 — Google's documented per-text cap is 2 048 tokens; ~20 k tokens per request is the soft cap before gemini-embedding-001 starts emitting 429s.
chars_per_token: 2 — English averages ~4 chars/token (OpenAI default), CJK ~1.5. Picking 2 reflects realistic density on mixed-corpora brains so pre-split stays safe on CJK-heavy payloads without over-shrinking on English.
safety_factor — left at gateway default 0.8. Pre-split lands at 20 000 × 0.8 / 2 = 8 000 chars/batch.

Test follow-through

Two regression tests pinned google as the canary "real provider with no cap declared":

test/ai/no-batch-cap-suppression.serial.test.ts: assumed google STILL warned. With google capped, the test flips to assert the stronger invariant — no first-party recipe warns, because every native/openai-compat recipe now declares either max_batch_tokens or no_batch_cap.
test/ai/adaptive-embed-batch.test.ts: checked contractMatch.length > 0. After this patch the canary set is empty → toBe(0). The once-per-process suppression mechanism is still exercised by the firstCallCount stability check earlier in the same test.

Both tests still gate the suppression machinery; they just no longer require any first-party recipe to be the canary. If maintainers prefer a synthetic test-only uncapped recipe to keep a positive "warning fires when expected" assertion, happy to add one in a follow-up.

Validation

bun run typecheck clean
bun test test/ai/ — 144 pass / 0 fail (was 142 pass / 2 fail pre-patch, expected: the two tests above)

Field repro

Filed downstream from the Jarvis KOS v2 fork (ChenyqThu/jarvis-knowledge-os-v2), where the google native gateway is the default embedding provider since the v0.27 M3 cutover. The warning has been load-bearing log noise for ~6 weeks across kos-compat-api responses, cron logs, and operator queries. Same pattern as #627 / upstream fixwave #682+#741 (forward-bootstrap mcp_request_log) — small fork-local patch carried while upstream evaluates.

🤖 Generated with Claude Code

google was the only first-party embedding recipe still missing max_batch_tokens after v0.32 garrytan#779 landed the once-per-process startup warning. Operators routing through google:gemini-embedding-001 (the default-provider path after v0.27 native gateway) saw the warning on every `gbrain query`, every kos-compat-api / MCP `/ingest` response, and every cron `gbrain` invocation. For CJK-dense or large-payload batches the absent field also forced the gateway to discover Google's per-request token cap reactively via recursive halving instead of pre-splitting. Declared: - max_batch_tokens: 20_000 — Google's per-text cap is 2048 tokens; ~20k tokens/request is the soft cap before gemini-embedding-001 starts emitting 429s. - chars_per_token: 2 — CJK density on mixed corpora (English averages ~4, CJK ~1.5; 2 keeps pre-split safe for both). - safety_factor left at gateway default 0.8 → pre-split lands at ~8 000 chars/batch, well under any per-request floor Google publishes. Two existing regression tests pinned google as the canary "real provider with no cap declared": - test/ai/no-batch-cap-suppression.serial.test.ts assumed google STILL warned (the comment explicitly called it a fixed-cap model waiting for someone to cap it). With this patch google joins the capped set, so the test flips to assert the strong invariant: NO first-party recipe warns, because every native and openai-compat recipe now declares either max_batch_tokens or no_batch_cap. - test/ai/adaptive-embed-batch.test.ts checked `contractMatch.length > 0`. After this patch the canary set is empty, so `toBe(0)`. The once-per-process suppression mechanism is still exercised by the `firstCallCount` stability check earlier in the same test. Validation: - bun run typecheck clean - bun test test/ai/ — 144 pass / 0 fail (was 142 pass / 2 fail pre-patch, expected: the two tests above)

PR garrytan#1016 (upstream-fix/google-recipe-max-batch-tokens) also touches two regression tests that pinned google as the canary "real provider with no cap declared". Without backporting them to fork master, `bun test test/ai/` would 2-fail on this tree even though google.ts has been capped here since af2a806. - no-batch-cap-suppression.serial.test.ts: replaces the "STILL warns for google" test with a stronger invariant — NO first-party recipe warns, because every native/openai-compat recipe now declares either max_batch_tokens or no_batch_cap. - adaptive-embed-batch.test.ts: inverts `contractMatch.length > 0` to `=== 0` and adds google to the "must be absent from warnings" assertion list (sibling to voyage + openai). Validation: `bun test test/ai/` → 144 pass / 0 fail. Also: TODO.md updates link PR-1016 + PR-1017 from the related entries; PR-2 oauth-bootstrap entry flips to FILED with the same patch metadata. On upstream merge of either PR, the next sync auto-drops these local edits via clean text merge. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

JARVIS-ARCHITECTURE.md §6.25 records the 2026-05-15 follow-up session: diagnostic correction (the historical 117k stderr "fire" that wasn't), two upstream PRs filed (garrytan#1016 google max_batch_tokens, garrytan#1017 oauth_clients forward-bootstrap), CJK keyword-only eval verdict, overlap-matrix verdict, M2-B + M2-C verdicts, mechanical cleanup (48 sync_failures ack'd, bun-test hang root-caused), and the new [facts:absorb] sub-process DB-connection latent bug. 8 TODO items closed, 1 new entry filed. CLAUDE.md "Chinese-first knowledge base" rule tightened per the CJK-eval verdict: the strict invariant is *compound CJK (4+ Han chars without whitespace) requires vector*; English and 2-3 char standalone CJK match fine on keyword. Operational guidance unchanged — the modal operator query on this brain is a compound CJK phrase that depends on vector being live. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

ChenyqThu mentioned this pull request May 17, 2026

feat(dream): --archive-dir flag persists CycleReport JSON + atomic latest.json symlink #1133

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(ai/recipes): declare max_batch_tokens on google embedding recipe#1016

fix(ai/recipes): declare max_batch_tokens on google embedding recipe#1016
ChenyqThu wants to merge 1 commit into
garrytan:masterfrom
ChenyqThu:upstream-fix/google-recipe-max-batch-tokens

ChenyqThu commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ChenyqThu commented May 15, 2026

Why

Patch

Rationale

Test follow-through

Validation

Field repro

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant