Skip to content

fix(ai/recipes): declare max_batch_tokens on google embedding recipe#1016

Open
ChenyqThu wants to merge 1 commit into
garrytan:masterfrom
ChenyqThu:upstream-fix/google-recipe-max-batch-tokens
Open

fix(ai/recipes): declare max_batch_tokens on google embedding recipe#1016
ChenyqThu wants to merge 1 commit into
garrytan:masterfrom
ChenyqThu:upstream-fix/google-recipe-max-batch-tokens

Conversation

@ChenyqThu
Copy link
Copy Markdown

Why

google is the only first-party embedding recipe still missing max_batch_tokens after v0.32 (#779) landed the once-per-process startup warning. Operators routing through google:gemini-embedding-001 (the default-provider path after v0.27 native gateway) see the warning on every gbrain query, every kos-compat-api /ingest response, and every cron gbrain invocation:

[ai.gateway] recipe "google" declares an embedding touchpoint without max_batch_tokens; recursion is the only safety net for batch caps.

For CJK-dense or large-payload batches the absent field also forces the gateway to discover Google's per-request token cap reactively via recursive halving instead of pre-splitting — one wasted HTTP round-trip per oversized batch.

The existing test test/ai/no-batch-cap-suppression.serial.test.ts even explicitly pins this state with the comment "google should warn (it has fixed-cap models)", framing it as a known TODO.

Patch

Three field additions to google.touchpoints.embedding:

max_batch_tokens: 20_000,
chars_per_token: 2,
// safety_factor left at gateway default 0.8

Rationale

  • max_batch_tokens: 20_000 — Google's documented per-text cap is 2 048 tokens; ~20 k tokens per request is the soft cap before gemini-embedding-001 starts emitting 429s.
  • chars_per_token: 2 — English averages ~4 chars/token (OpenAI default), CJK ~1.5. Picking 2 reflects realistic density on mixed-corpora brains so pre-split stays safe on CJK-heavy payloads without over-shrinking on English.
  • safety_factor — left at gateway default 0.8. Pre-split lands at 20 000 × 0.8 / 2 = 8 000 chars/batch.

Test follow-through

Two regression tests pinned google as the canary "real provider with no cap declared":

  • test/ai/no-batch-cap-suppression.serial.test.ts: assumed google STILL warned. With google capped, the test flips to assert the stronger invariant — no first-party recipe warns, because every native/openai-compat recipe now declares either max_batch_tokens or no_batch_cap.
  • test/ai/adaptive-embed-batch.test.ts: checked contractMatch.length > 0. After this patch the canary set is empty → toBe(0). The once-per-process suppression mechanism is still exercised by the firstCallCount stability check earlier in the same test.

Both tests still gate the suppression machinery; they just no longer require any first-party recipe to be the canary. If maintainers prefer a synthetic test-only uncapped recipe to keep a positive "warning fires when expected" assertion, happy to add one in a follow-up.

Validation

  • bun run typecheck clean
  • bun test test/ai/ — 144 pass / 0 fail (was 142 pass / 2 fail pre-patch, expected: the two tests above)

Field repro

Filed downstream from the Jarvis KOS v2 fork (ChenyqThu/jarvis-knowledge-os-v2), where the google native gateway is the default embedding provider since the v0.27 M3 cutover. The warning has been load-bearing log noise for ~6 weeks across kos-compat-api responses, cron logs, and operator queries. Same pattern as #627 / upstream fixwave #682+#741 (forward-bootstrap mcp_request_log) — small fork-local patch carried while upstream evaluates.

🤖 Generated with Claude Code

google was the only first-party embedding recipe still missing
max_batch_tokens after v0.32 garrytan#779 landed the once-per-process startup
warning. Operators routing through google:gemini-embedding-001 (the
default-provider path after v0.27 native gateway) saw the warning on
every `gbrain query`, every kos-compat-api / MCP `/ingest` response,
and every cron `gbrain` invocation. For CJK-dense or large-payload
batches the absent field also forced the gateway to discover Google's
per-request token cap reactively via recursive halving instead of
pre-splitting.

Declared:
- max_batch_tokens: 20_000 — Google's per-text cap is 2048 tokens;
  ~20k tokens/request is the soft cap before gemini-embedding-001
  starts emitting 429s.
- chars_per_token: 2 — CJK density on mixed corpora (English averages
  ~4, CJK ~1.5; 2 keeps pre-split safe for both).
- safety_factor left at gateway default 0.8 → pre-split lands at
  ~8 000 chars/batch, well under any per-request floor Google
  publishes.

Two existing regression tests pinned google as the canary "real
provider with no cap declared":

- test/ai/no-batch-cap-suppression.serial.test.ts assumed google
  STILL warned (the comment explicitly called it a fixed-cap model
  waiting for someone to cap it). With this patch google joins the
  capped set, so the test flips to assert the strong invariant: NO
  first-party recipe warns, because every native and openai-compat
  recipe now declares either max_batch_tokens or no_batch_cap.

- test/ai/adaptive-embed-batch.test.ts checked
  `contractMatch.length > 0`. After this patch the canary set is
  empty, so `toBe(0)`. The once-per-process suppression mechanism is
  still exercised by the `firstCallCount` stability check earlier in
  the same test.

Validation:
- bun run typecheck clean
- bun test test/ai/ — 144 pass / 0 fail (was 142 pass / 2 fail
  pre-patch, expected: the two tests above)
ChenyqThu pushed a commit to ChenyqThu/jarvis-knowledge-os-v2 that referenced this pull request May 15, 2026
PR garrytan#1016 (upstream-fix/google-recipe-max-batch-tokens)
also touches two regression tests that pinned google as the canary
"real provider with no cap declared". Without backporting them to
fork master, `bun test test/ai/` would 2-fail on this tree even
though google.ts has been capped here since af2a806.

- no-batch-cap-suppression.serial.test.ts: replaces the "STILL warns
  for google" test with a stronger invariant — NO first-party recipe
  warns, because every native/openai-compat recipe now declares
  either max_batch_tokens or no_batch_cap.
- adaptive-embed-batch.test.ts: inverts `contractMatch.length > 0`
  to `=== 0` and adds google to the "must be absent from warnings"
  assertion list (sibling to voyage + openai).

Validation: `bun test test/ai/` → 144 pass / 0 fail.

Also: TODO.md updates link PR-1016 + PR-1017 from the related
entries; PR-2 oauth-bootstrap entry flips to FILED with the same
patch metadata.

On upstream merge of either PR, the next sync auto-drops these
local edits via clean text merge.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
ChenyqThu pushed a commit to ChenyqThu/jarvis-knowledge-os-v2 that referenced this pull request May 15, 2026
JARVIS-ARCHITECTURE.md §6.25 records the 2026-05-15 follow-up session:
diagnostic correction (the historical 117k stderr "fire" that wasn't),
two upstream PRs filed (garrytan#1016 google max_batch_tokens, garrytan#1017
oauth_clients forward-bootstrap), CJK keyword-only eval verdict,
overlap-matrix verdict, M2-B + M2-C verdicts, mechanical cleanup
(48 sync_failures ack'd, bun-test hang root-caused), and the new
[facts:absorb] sub-process DB-connection latent bug. 8 TODO items
closed, 1 new entry filed.

CLAUDE.md "Chinese-first knowledge base" rule tightened per the
CJK-eval verdict: the strict invariant is *compound CJK (4+ Han chars
without whitespace) requires vector*; English and 2-3 char standalone
CJK match fine on keyword. Operational guidance unchanged — the modal
operator query on this brain is a compound CJK phrase that depends on
vector being live.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant