fix(ai/recipes): declare max_batch_tokens on google embedding recipe#1016
Open
ChenyqThu wants to merge 1 commit into
Open
fix(ai/recipes): declare max_batch_tokens on google embedding recipe#1016ChenyqThu wants to merge 1 commit into
ChenyqThu wants to merge 1 commit into
Conversation
google was the only first-party embedding recipe still missing max_batch_tokens after v0.32 garrytan#779 landed the once-per-process startup warning. Operators routing through google:gemini-embedding-001 (the default-provider path after v0.27 native gateway) saw the warning on every `gbrain query`, every kos-compat-api / MCP `/ingest` response, and every cron `gbrain` invocation. For CJK-dense or large-payload batches the absent field also forced the gateway to discover Google's per-request token cap reactively via recursive halving instead of pre-splitting. Declared: - max_batch_tokens: 20_000 — Google's per-text cap is 2048 tokens; ~20k tokens/request is the soft cap before gemini-embedding-001 starts emitting 429s. - chars_per_token: 2 — CJK density on mixed corpora (English averages ~4, CJK ~1.5; 2 keeps pre-split safe for both). - safety_factor left at gateway default 0.8 → pre-split lands at ~8 000 chars/batch, well under any per-request floor Google publishes. Two existing regression tests pinned google as the canary "real provider with no cap declared": - test/ai/no-batch-cap-suppression.serial.test.ts assumed google STILL warned (the comment explicitly called it a fixed-cap model waiting for someone to cap it). With this patch google joins the capped set, so the test flips to assert the strong invariant: NO first-party recipe warns, because every native and openai-compat recipe now declares either max_batch_tokens or no_batch_cap. - test/ai/adaptive-embed-batch.test.ts checked `contractMatch.length > 0`. After this patch the canary set is empty, so `toBe(0)`. The once-per-process suppression mechanism is still exercised by the `firstCallCount` stability check earlier in the same test. Validation: - bun run typecheck clean - bun test test/ai/ — 144 pass / 0 fail (was 142 pass / 2 fail pre-patch, expected: the two tests above)
ChenyqThu
pushed a commit
to ChenyqThu/jarvis-knowledge-os-v2
that referenced
this pull request
May 15, 2026
PR garrytan#1016 (upstream-fix/google-recipe-max-batch-tokens) also touches two regression tests that pinned google as the canary "real provider with no cap declared". Without backporting them to fork master, `bun test test/ai/` would 2-fail on this tree even though google.ts has been capped here since af2a806. - no-batch-cap-suppression.serial.test.ts: replaces the "STILL warns for google" test with a stronger invariant — NO first-party recipe warns, because every native/openai-compat recipe now declares either max_batch_tokens or no_batch_cap. - adaptive-embed-batch.test.ts: inverts `contractMatch.length > 0` to `=== 0` and adds google to the "must be absent from warnings" assertion list (sibling to voyage + openai). Validation: `bun test test/ai/` → 144 pass / 0 fail. Also: TODO.md updates link PR-1016 + PR-1017 from the related entries; PR-2 oauth-bootstrap entry flips to FILED with the same patch metadata. On upstream merge of either PR, the next sync auto-drops these local edits via clean text merge. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
ChenyqThu
pushed a commit
to ChenyqThu/jarvis-knowledge-os-v2
that referenced
this pull request
May 15, 2026
JARVIS-ARCHITECTURE.md §6.25 records the 2026-05-15 follow-up session: diagnostic correction (the historical 117k stderr "fire" that wasn't), two upstream PRs filed (garrytan#1016 google max_batch_tokens, garrytan#1017 oauth_clients forward-bootstrap), CJK keyword-only eval verdict, overlap-matrix verdict, M2-B + M2-C verdicts, mechanical cleanup (48 sync_failures ack'd, bun-test hang root-caused), and the new [facts:absorb] sub-process DB-connection latent bug. 8 TODO items closed, 1 new entry filed. CLAUDE.md "Chinese-first knowledge base" rule tightened per the CJK-eval verdict: the strict invariant is *compound CJK (4+ Han chars without whitespace) requires vector*; English and 2-3 char standalone CJK match fine on keyword. Operational guidance unchanged — the modal operator query on this brain is a compound CJK phrase that depends on vector being live. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
4 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
googleis the only first-party embedding recipe still missingmax_batch_tokensafter v0.32 (#779) landed the once-per-process startup warning. Operators routing throughgoogle:gemini-embedding-001(the default-provider path after v0.27 native gateway) see the warning on everygbrain query, every kos-compat-api/ingestresponse, and every crongbraininvocation:For CJK-dense or large-payload batches the absent field also forces the gateway to discover Google's per-request token cap reactively via recursive halving instead of pre-splitting — one wasted HTTP round-trip per oversized batch.
The existing test
test/ai/no-batch-cap-suppression.serial.test.tseven explicitly pins this state with the comment "google should warn (it has fixed-cap models)", framing it as a known TODO.Patch
Three field additions to
google.touchpoints.embedding:Rationale
max_batch_tokens: 20_000— Google's documented per-text cap is 2 048 tokens; ~20 k tokens per request is the soft cap beforegemini-embedding-001starts emitting 429s.chars_per_token: 2— English averages ~4 chars/token (OpenAI default), CJK ~1.5. Picking 2 reflects realistic density on mixed-corpora brains so pre-split stays safe on CJK-heavy payloads without over-shrinking on English.safety_factor— left at gateway default 0.8. Pre-split lands at20 000 × 0.8 / 2 = 8 000chars/batch.Test follow-through
Two regression tests pinned
googleas the canary "real provider with no cap declared":test/ai/no-batch-cap-suppression.serial.test.ts: assumed google STILL warned. With google capped, the test flips to assert the stronger invariant — no first-party recipe warns, because every native/openai-compat recipe now declares eithermax_batch_tokensorno_batch_cap.test/ai/adaptive-embed-batch.test.ts: checkedcontractMatch.length > 0. After this patch the canary set is empty →toBe(0). The once-per-process suppression mechanism is still exercised by thefirstCallCountstability check earlier in the same test.Both tests still gate the suppression machinery; they just no longer require any first-party recipe to be the canary. If maintainers prefer a synthetic test-only uncapped recipe to keep a positive "warning fires when expected" assertion, happy to add one in a follow-up.
Validation
bun run typecheckcleanbun test test/ai/— 144 pass / 0 fail (was 142 pass / 2 fail pre-patch, expected: the two tests above)Field repro
Filed downstream from the Jarvis KOS v2 fork (
ChenyqThu/jarvis-knowledge-os-v2), where the google native gateway is the default embedding provider since the v0.27 M3 cutover. The warning has been load-bearing log noise for ~6 weeks across kos-compat-api responses, cron logs, and operator queries. Same pattern as #627 / upstream fixwave #682+#741 (forward-bootstrap mcp_request_log) — small fork-local patch carried while upstream evaluates.🤖 Generated with Claude Code