v0.33.2.0 feat(search-lite): token budget + semantic query cache + intent weighting by garrytan-agents · Pull Request #897 · garrytan/gbrain

garrytan-agents · 2026-05-12T02:34:57Z

What

Three additive search features inspired by brain-kit. All backward-compatible: existing callers see identical behavior unless they opt in to the new options.

1. Token Budget Enforcement (`src/core/search/token-budget.ts`)

Cap the cumulative token cost of returned results so search payloads fit downstream context windows. Greedy top-down walk; preserves caller ordering; no re-rank. char/4 heuristic for token counting (no tokenizer dependency — keeps the bun --compile bundle small).

SearchOpts.tokenBudget — numeric cap. Default undefined = no-op.
HybridSearchMeta.token_budget = { budget, used, kept, dropped }
HTTP query op: pass token_budget param.

2. Semantic Query Cache (`src/core/search/query-cache.ts` + migration v52)

Cache search results keyed by query embedding similarity. HNSW lookup: embedding <=> $1 < 0.08 (cosine similarity ≥ 0.92). Per-source isolation. Per-row TTL (default 3600s). Best-effort writes; all errors swallowed so the cache never breaks the search hot path.

Migration v52 creates query_cache table. HALFVEC where pgvector ≥ 0.7; VECTOR otherwise. Embedding dim resolved from config.embedding_dimensions so non-OpenAI brains work.
New gbrain cache CLI: stats / clear --yes / prune.
Config keys: search.cache.enabled / search.cache.similarity_threshold / search.cache.ttl_seconds.
HybridSearchMeta.cache = { status, similarity?, age_seconds? }
Routed through new hybridSearchCached(engine, query, opts) wrapper. The query op in operations.ts now uses this wrapper so MCP/CLI calls benefit automatically. Skipped for two-pass walks + non-default embedding columns where cache semantics don't hold.

3. Zero-LLM Intent Weighting (`src/core/search/intent-weights.ts`)

Builds on the existing 4-intent classifier (src/core/search/query-intent.ts). New weight-adjustment layer applies subtle per-intent nudges:

Intent	Adjustment
entity	boost keyword RRF + exact slug/title match boost
temporal	default `recency='on'` when caller left it unset
event	boost keyword RRF (rare named entities) + soft recency
general	no-op (1.0 multipliers)

All adjustments are subtle (max 1.25×). Caller-explicit options always win — intent weighting never silently overrides explicit recency / salience. Default ON; opt out via opts.intentWeighting = false. The LLM query expansion path is still available and opt-in via opts.expansion = true — it just isn't the default anymore.

HybridSearchMeta.intent now surfaces classifier output for debugging.

Files changed (13)

 src/cli.ts                                |  +13 -1
 src/commands/cache.ts                     | +106 NEW
 src/core/migrate.ts                       | +104 NEW (migration v52)
 src/core/operations.ts                    |  +11 -2
 src/core/search/hybrid.ts                 | +281 -7
 src/core/search/intent-weights.ts         | +132 NEW
 src/core/search/query-cache.ts            | +321 NEW
 src/core/search/token-budget.ts           | +110 NEW
 src/core/types.ts                         |  +57 NEW
 test/hybrid-search-lite.serial.test.ts    | +163 NEW
 test/intent-weights.test.ts               | +123 NEW
 test/query-cache.test.ts                  | +247 NEW
 test/token-budget.test.ts                 | +125 NEW

Test results

44 new tests pass across the 4 new test files
- test/token-budget.test.ts — 10 tests (pure module)
- test/intent-weights.test.ts — 13 tests (pure module)
- test/query-cache.test.ts — 12 tests (PGLite, including migration v52 schema verification + TTL expiration + source isolation)
- test/hybrid-search-lite.serial.test.ts — 9 tests (PGLite e2e through hybridSearchCached)
105 existing search tests still pass (test/search.test.ts, test/query-intent*.test.ts, test/hybrid-meta.test.ts, test/search-limit.test.ts, test/search-lang-symbol-kind.test.ts, test/search-image-column.test.ts)
162 tests total pass when run together (no cross-file isolation issues)
bun run verify clean (privacy + jsonb + progress + test-isolation + wasm + admin-build + admin-scope-drift + cli-exec + system-of-record + tsc --noEmit)
bun run src/cli.ts doctor --fast runs successfully — pre-existing resolver_health / minions_migration failures on master are unrelated to this PR.

Design notes

Backward compat is the contract. Token budget defaults to off. Cache defaults on but cache miss = normal search. Intent weighting defaults on but adjusts weights at most 1.25× and never overrides caller-explicit options.
No new dependencies. Token counting uses char/4 (js-tiktoken would balloon the bun --compile bundle). Migration uses the same HALFVEC/VECTOR pgvector-version probe pattern as v45 (facts) and v40, so it works on both Postgres and PGLite without extra config.
System-of-record clean. The query_cache table is a derived cache, not authoritative state. gbrain cache clear is the documented invalidation surface; the system-of-record check passes.

^{Need help on this PR? Tag @codesmith with what you need.}

Let Codesmith autofix CI failures and bot reviews

…ting Adds three additive features to the hybrid search pipeline. All backward-compatible: existing callers see identical behavior unless they opt in to the new options. ## 1. Token Budget Enforcement (src/core/search/token-budget.ts) Cap the cumulative token cost of returned results so search payloads fit downstream context windows. Greedy top-down walk; preserves caller ordering; no re-rank. char/4 heuristic for token counting (no tokenizer dependency \u2014 keeps the bun --compile bundle small). SearchOpts.tokenBudget \u2014 numeric cap. Default undefined = no-op. HybridSearchMeta.token_budget = { budget, used, kept, dropped } HTTP query op: pass `token_budget` param. ## 2. Semantic Query Cache (src/core/search/query-cache.ts + migration v52) Cache search results keyed by query embedding similarity. HNSW lookup: `embedding <=> $1 < 0.08` (cosine similarity >= 0.92). Per-source isolation so multi-source brains don\u2019t bleed. Per-row TTL (default 3600s). Best-effort writes; all errors swallowed so the cache never breaks the search hot path. Migration v52 creates query_cache table with HALFVEC where pgvector >= 0.7; falls back to VECTOR with the resolved config.embedding_dimensions dim. New `gbrain cache` CLI: stats / clear --yes / prune. Config keys: search.cache.enabled / similarity_threshold / ttl_seconds. HybridSearchMeta.cache = { status, similarity?, age_seconds? } Routed through new `hybridSearchCached(engine, query, opts)` wrapper; the operations.ts query op now uses this wrapper so MCP/CLI calls benefit automatically. Skipped for two-pass walks + non-default embedding columns where cache semantics don\u2019t hold. ## 3. Zero-LLM Intent Weighting (src/core/search/intent-weights.ts) Builds on the existing query-intent classifier (4 intents: entity / temporal / event / general). New weight-adjustment layer applies subtle per-intent nudges: entity \u2192 boost keyword RRF + exact slug/title match temporal \u2192 default recency=on when caller left it unset event \u2192 boost keyword RRF (rare named entities) + soft recency general \u2192 no-op (1.0 multipliers everywhere) All adjustments are SUBTLE (max 1.25x). Caller-explicit options ALWAYS win \u2014 intent weighting never silently overrides recency / salience. Default ON; opt out via `opts.intentWeighting = false`. LLM query expansion (expansion.ts) is still available and opt-in via `opts.expansion = true` \u2014 it just isn\u2019t the default anymore. HybridSearchMeta.intent now surfaces classifier output for debugging. ## Tests test/token-budget.test.ts (10 tests, pure module) test/intent-weights.test.ts (13 tests, pure module) test/query-cache.test.ts (12 tests, PGLite) test/hybrid-search-lite.serial.test.ts (9 tests, PGLite e2e) Plus 105 pre-existing search tests still pass. `bun run verify` clean. Co-authored-by: Wintermute <[email protected]>

Resolved single conflict in src/core/migrate.ts: master claimed v52/53/54 (eval_contradictions_cache + eval_contradictions_runs + cjk_wave). PR garrytan#897's v52 query_cache migration renumbered to v55 to sit after. Typecheck clean.

…ybridSearch Three named modes (conservative / balanced / tokenmax) that bundle the search-lite knobs from PR garrytan#897 into a single config key. Mode resolution lives in bare hybridSearch (NOT just the cached wrapper) so eval-replay and eval-longmemeval — which call bare hybridSearch — test the same mode-affected behavior as production. See [CDX-5+6] in the plan. The mode bundle supplies DEFAULTS for intentWeighting, tokenBudget, expansion, and searchLimit when the caller leaves those undefined. Per-call SearchOpts and per-key config overrides still win (matches the v0.31.12 model-tier resolution chain at model-config.ts:resolveModel). knobsHash() exposes a stable SHA-256 of the resolved knob set; the cache contamination hotfix (next commit) consumes it to prevent a tokenmax write from being served to a conservative read. Three new fields on HybridSearchMeta: - mode (resolved mode name) - existing token_budget meta now fires from bare hybridSearch too Bare hybridSearch now applies tokenBudget at all three return paths (no-embedding-provider, keyword-only-fallback, main). Previously only hybridSearchCached enforced budget; eval commands missed it. Tests: 37 unit cases pin the 3x7 bundle table cell-by-cell, the resolution chain semantics, knobs hash determinism + cross-mode separation, and the config-table parser. All 72 search-lite tests pass. Bisect-friendly: this commit ONLY adds mode resolution. The cache-key contamination hotfix [CDX-4] is a separate atomic commit (next).

PR garrytan#897's query_cache keyed rows on sha256(source_id::query_text) only. A tokenmax search (expansion=on, limit=50) populated a row that a subsequent conservative call (no expansion, limit=10) read back, serving the wrong-shape results. This is a real bug in PR garrytan#897 today, regardless of the v0.32.3 mode picker work — Codex caught it in plan review. Fix: - Migration v56 adds query_cache.knobs_hash TEXT column + composite (source_id, knobs_hash, created_at) index. Existing rows have NULL knobs_hash and are excluded from lookups (silently re-populated with the right hash on first hit — no orphan data, no destructive migration). - cacheRowId(query, source, knobsHash) — knobsHash now part of the PK so a tokenmax write and a conservative write for the same (query, source) land in distinct rows. - SemanticQueryCache.lookup({knobsHash}) filters WHERE knobs_hash = $. - SemanticQueryCache.store({knobsHash}) writes the resolved hash. - hybridSearchCached threads knobsHash from resolveSearchMode through every cache call. Cache config (enabled/threshold/TTL) now reads from the resolved mode bundle, not directly from the config table. Tests (test/query-cache-knobs-hash.test.ts, 11 cases): - cacheRowId bifurcates by knobsHash - Tokenmax write does NOT contaminate conservative lookup - Three modes coexist as distinct rows for same query - Legacy NULL-knobs_hash rows are excluded from lookup - Same-mode write updates in place (no duplicate rows) All 58 cache + mode tests pass. Migration v56 applies cleanly on a fresh PGLite brain. Bisect-friendly: this commit is the cache-key hotfix alone. Mode resolution wiring lives in the previous commit.

…able Migration v57 creates search_telemetry (date, mode, intent, count, sum_results, sum_tokens, sum_budget_dropped, cache_hit, cache_miss, first_seen, last_seen). PK (date, mode, intent) caps growth at ~4380 rows/year. Sums + counts only — averages derive at read time so concurrent ON CONFLICT writes from multiple gbrain processes accumulate correctly [CDX-17]. In-memory bucket flushed periodically (60s OR 100 calls) + on process beforeExit/SIGINT/SIGTERM with a 2-second cap. The search hot path NEVER waits on this write [D2, CDX-19]. Date-bucketed cache_hit / cache_miss columns make hit rate over --days N derivable [CDX-18]. query_cache.hit_count is a lifetime counter and can't be sliced by window. Wired into bare hybridSearch via emitMeta: every search call sync-bumps a bucket. flush() drains atomically by swapping the map before SQL writes so a record() during flush lands in the new map. readSearchStats(engine, {days}) returns the StatsWindow shape that gbrain search stats consumes (next commit). Tests: 16 unit cases pin record/flush/read semantics including ON-CONFLICT-adds-raw-values, concurrent-flush coalescing, cache hit-rate math, missing-table graceful degradation, and window clamping. 53 migrations apply on a fresh PGLite brain.

…+8+9] CDX-8: gbrain config has no unset path today. Required before `gbrain search modes --reset` can clear search.* overrides. - BrainEngine.unsetConfig(key) → returns rows deleted (0|1) - BrainEngine.listConfigKeys(prefix) → exact-literal prefix match with LIKE-escape on user-supplied % / _ / \ characters - PGLiteEngine + PostgresEngine implementations - `gbrain config unset <key>` and `gbrain config unset --pattern <prefix>` sub-subcommands CDX-9: readLine has no EOF detection or timeout. Mode-picker plan calls out "TTY closes mid-prompt → defaults to balanced" but the raw helper hangs forever. New readLineSafe(prompt, defaultValue, timeoutMs=60s): - Returns defaultValue on stdin 'end' event - Returns defaultValue on timeout - Returns defaultValue on empty Enter - Non-TTY stdin returns defaultValue immediately (e2e safe) - Returns trimmed user input otherwise Exported so install picker (next task) can use it. Tests: 9 cases pin unset semantics + prefix matcher edge cases (glob-wildcard escape, sort order, idempotent loop, search.* sweep). All 53 migrations apply on a fresh PGLite brain.

Install picker (src/commands/init-mode-picker.ts): - Runs as a phase inside `gbrain init` AFTER engine.initSchema() so DB config writes work [CDX-7]. - Idempotent: skipped on re-init if search.mode is already set. - Smart auto-suggestion via recommendModeFor() reads models.tier.subagent / models.default / OPENAI_API_KEY: * Opus default/subagent → tokenmax (quality ceiling) * Haiku subagent → conservative (4K budget keeps cost down) * No OpenAI key → conservative (no LLM expansion possible) * Sonnet / unknown → balanced (safe default) - TTY shows menu via readLineSafe (60s timeout, defaults on EOF/empty). - Non-TTY auto-selects + emits operator hint: [gbrain] search mode: X (auto-selected — reason) [gbrain] To change: gbrain config set search.mode <...> - --json mode emits structured `{phase: 'search_mode_picker', ...}` event. - Wired into both initPGLite and initPostgres flows. Upgrade banner (src/commands/upgrade.ts): - One-shot stderr banner in runPostUpgrade. - State persisted via config key `search.mode_upgrade_notice_shown=true` — fires at most once per install. - Copy corrected per [CDX-1+2+3]: production query op STILL defaults expand=true and limit=20. The banner reframes from "behavior is regressing" to "named modes available + here's how to preserve exact current shape." Tests (test/init-mode-picker.test.ts, 16 cases): - recommendModeFor heuristic for all 4 input shapes - parseModeInput accepts numeric/named/case-insensitive, rejects garbage - runModePicker non-TTY auto-selects + writes config - Idempotent + --force re-prompt + JSON output - Opus → tokenmax, Haiku → conservative real wiring through engine

Three sub-subcommands mirroring the gbrain models (v0.31.12) shape: gbrain search modes [--json] Read-only routing dashboard. Shows the three mode bundles, the active mode, and the source of every resolved knob: cache_enabled = true [override: search.cache.enabled] tokenBudget = 4000 [mode: conservative] Plus knob descriptions for legibility. gbrain search modes --reset [--source <mode>] Clears every search.* override (NOT search.mode itself). Preserves the upgrade-notice state key. --source <mode> is a dry-run that lists what --reset would change without writing — the paved path [CDX-8] flagged as missing. gbrain search stats [--days N] [--json] Observability. Reads the search_telemetry rollup over the window (clamps to [1, 365]). Prints cache hit rate, mode mix, intent mix, budget drops, avg results/tokens. JSON output includes _meta.metric_glossary block per [CDX-25]. gbrain search tune [--apply] [--json] Recommendation engine. 5 rules cover the bug class: - Insufficient data → "no_recommendations" status - Conservative + high budget-drop rate → suggest balanced - High cache hit rate (>85%) → suggest similarity threshold bump - Tokenmax + Haiku subagent → suggest balanced (cost mismatch) - Cache disabled but stats show usage → suggest re-enabling --apply mutates config via setConfig / unsetConfig with a paste-ready revert command printed at the end. Registered in src/cli.ts dispatch table. 17 unit cases pin: - Dashboard report shape + per-knob source attribution - --reset preserves search.mode + notice key - --source dry-run never writes - stats reads telemetry rollup; --days clamps - tune recommendation rules fire on real telemetry data - --apply mutates config - --help + unknown subcommand exit codes

… guard Single source of truth at src/core/eval/metric-glossary.ts. Every entry carries 3 fields: - industry_term (canonical IR/NLP literature name, preserved verbatim) - eli10 (plain-English a 16-year-old can follow) - range (numeric range + interpretation) Covers 4 metric families: - Retrieval: P@k, R@k, MRR, nDCG@k - Stability: Jaccard@k, top-1 stability - Statistical: p-value (paired bootstrap + Bonferroni), 95% CI - Operational: cache hit rate, avg results/tokens, cost per query, p99 latency Public surface: - getMetricGloss(metric) → full entry or null - eli10For(metric) → plain-English string or null - buildMetricGlossaryMeta(metrics[]) → {metric → eli10} record for JSON `_meta.metric_glossary` blocks per [CDX-25]. ONE block per response, NOT sibling `_gloss` fields on every metric. - renderMetricGlossaryMarkdown() → deterministic Markdown for the doc Auto-generation: scripts/generate-metric-glossary.ts emits docs/eval/METRIC_GLOSSARY.md. Deterministic (same input → same bytes) so the CI guard can diff. CI guard: scripts/check-eval-glossary-fresh.sh regenerates into a temp file and diffs against the committed doc. Out-of-date doc fails the build. Wired into `bun run verify` (and therefore `bun run test:full`). Tests (test/metric-glossary.test.ts, 18 cases): - Every documented metric is present - Every entry has all 3 required fields - Accessors return null on unknown metrics (no throw) - buildMetricGlossaryMeta silently drops unknown metrics - renderer output is deterministic across calls - Renderer groups metrics into 4 sections docs/eval/METRIC_GLOSSARY.md: 5491 bytes, 124 lines, fresh.

src/core/eval/drift-watch.ts — curated retrieval watch-list [CDX-6]. Five patterns covering the surface that actually affects retrieval quality: - src/core/search/ (search pipeline) - src/core/embedding.ts (embedding shape) - src/core/chunkers/ (chunk granularity) - src/core/ai/recipes/anthropic.ts + openai.ts (expansion + embed routing) - src/core/operations.ts (the query op definition) Adding to the list is a deliberate act — requires a CHANGELOG line so coverage grows on purpose, not by accident. Pure functions: - matchesWatchPattern(path) — trailing-slash = prefix, bare = equality - filesDriftedSince(repoRoot, sha?) — git diff --name-only wrapper - watchedFilesDrifted(repoRoot, sha?) — composite src/commands/doctor.ts — two new checks. checkSearchMode [CDX-20]: status stays 'ok' (never warns, never docks health score). Hint in message field. Three branches: - unset → "search.mode is unset (using balanced fallback). Run `gbrain search modes` to see what is running and pick a mode." - mode + no overrides → "Mode: X (no per-key overrides — mode bundle is canonical)." - mode + overrides → "Mode: X with N per-key override(s) (k1, k2, …). To consolidate to the pure mode bundle: gbrain search modes --reset" Upgrade-notice state key (search.mode_upgrade_notice_shown) is excluded from the override roster — it's not a knob. checkEvalDrift [CDX-6]: surfaces uncommitted changes to retrieval-watched files. Always 'ok'; operator-facing reminder. Names up to 3 drifted files in the message + paste-ready re-eval command. Both helpers exported (was: file-private) so tests can pin behavior without walking the full runDoctor pipeline. Tests: 12 drift-watch cases + 7 doctor-check cases. Pin watch-list shape, prefix-vs-equality matcher semantics, missing-repo graceful failure, and all three search_mode branches.

Per-mode --mode flag plumbed into: - gbrain eval longmemeval --mode <conservative|balanced|tokenmax> Sets search.mode in the benchmark brain's config table; config is in PRESERVE_TABLES so resetTables doesn't wipe it between questions. Mode surfaces in the per-question NDJSON row. - gbrain eval replay --mode <m> + --compare-limit N --compare-limit forces a constant K across modes [CDX-13]; without it, Jaccard@k against the captured baseline measures K-drift, not quality. Mode is set once before the replay loop. - NOT cross-modal per [CDX-11]: cross-modal scores OUTPUT against TASK; it doesn't retrieve. Adding --mode there is theater. New: gbrain eval run-all orchestrator (src/commands/eval-run-all.ts): - Sweeps every requested mode × suite combination - Sequential default per D9; --parallel N opt-in (clamped to mode count) - Cost guard with split caps [CDX-15+16]: --budget-usd-retrieval N (default $5) --budget-usd-answer N (default $20) Non-TTY refuses with exit 2 unless --yes AND explicit --budget-usd-* flags pass. TTY refuses without --yes (defense against agent loops). - estimateRunCost computes per-(suite,mode) breakdown including the expansion-Haiku surcharge for tokenmax. - Audit trail: appends to <repo>/.gbrain-evals/eval-results.jsonl [CDX-23]. Personal brain (~/.gbrain) NEVER touched. - v0.32.3 ships orchestrator + argv + guard + persist hook. In-process per-suite invocation is a v0.32.4 follow-up (operator runs the per-suite CLIs with the documented --mode flag for now; each completion calls persistRunRecord to log). New: gbrain eval compare report (src/commands/eval-compare.ts): - Reads eval-results.jsonl, groups by (suite, mode), renders MD or JSON - Most-recent (suite, mode, commit) wins when duplicates exist - JSON output has schema_version=2 + _meta.metric_glossary block per [CDX-25] (ONE block per response, not sibling _gloss fields) - _meta.methodology field names the paired-bootstrap + Bonferroni discipline per [CDX-14] so haters can reproduce - Missing file → friendly hint pointing at `gbrain eval run-all` Wired into eval dispatch table in src/commands/eval.ts. Metric glossary fuzzy fallback: `recall@10` → `recall@k` lookup (the glossary documents the family; report rows carry specific K values). Routes through getMetricGloss for every call site. Tests (42 cases total — all green): - eval-run-all.test.ts (19): argv parser, cost estimate, guard semantics for all 4 (over/under × tty/non-tty) shapes, persist hook NDJSON shape. - eval-compare.test.ts (5): JSON + MD output shapes, glossary integration, missing-file graceful, mode filter, most-recent-wins. - metric-glossary.test.ts (18): unchanged but updated assertions to cover the fuzzy `@N` → `@k` fallback. Pre-existing eval-replay / eval-longmemeval / eval-export / eval-prune tests (42 cases) still pass — --mode + --compare-limit are additive.

docs/eval/SEARCH_MODE_METHODOLOGY.md — haters-immune 8-section template. Documents what the eval measures + does NOT measure, datasets + sizes (LongMemEval n=500, Replay n=200, BrainBench n=1240 docs / 350 qrels), random seed 42, run procedure verbatim, threats to validity (LongMemEval English+technical skew, char/4 heuristic ~5-10% off, expansion ~97.6% relative lift on this corpus), per-question raw outputs, pre-registered expectations (tokenmax wins R@10 by 5-15pp, conservative wins cost by 5-15x, balanced lands within 3pp), re-run cadence anchored to the src/core/eval/drift-watch.ts watch-list. Statistical-significance section pins paired bootstrap with 10,000 resamples + Bonferroni correction across 3 modes × 4 metrics [CDX-14]. CLAUDE.md gets two new sections: ## Search Mode (3-mode table + resolution chain + [CDX-4] cache contamination fix note + CLI commands) and ## Eval discipline (single-source-of-truth glossary, methodology doc, eval_results in repo NOT personal brain per [CDX-23]). README.md Quick Start gets a paragraph naming the install picker, mode heuristic, and the methodology link. skills/conventions/search-modes.md NEW — convention file consumed by brain-ops + query + signal-detector skills via the existing `> **Convention:**` callout pattern. Routes "what mode" / "tune retrieval" / "compare modes" queries to the right CLI surface. skills/RESOLVER.md gets two new trigger rows pointing at gbrain search * and gbrain eval compare.

bun run build:llms — picks up the new CLAUDE.md sections (Search Mode + Eval discipline) and the docs/eval/SEARCH_MODE_METHODOLOGY.md addition. build-llms.test.ts gate now passes.

… flow The v0.32.3 search_mode + eval_drift helpers were inserted into the DB-checks sub-helper at runDbChecks (line 345-355), but runDoctor itself maintains its own check list and only calls the helpers' subset. Push the two checks into the main runDoctor path (after the existing sync_freshness check at line 2347) so they actually appear in `gbrain doctor --json` output. Both checks gated on engine !== null. Progress reporter heartbeat fires for each. Both still return status 'ok' per [CDX-20] so health score is preserved. Verified end-to-end on a real Postgres brain: gbrain doctor --json now includes 'search_mode' and 'eval_drift' in the checks array.

garrytan changed the title ~~feat(search-lite): token budget + semantic query cache + intent weighting~~ v0.33.1.0 feat(search-lite): token budget + semantic query cache + intent weighting May 12, 2026

garrytan changed the title ~~v0.33.1.0 feat(search-lite): token budget + semantic query cache + intent weighting~~ v0.33.2.0 feat(search-lite): token budget + semantic query cache + intent weighting May 12, 2026

garrytan added 13 commits May 11, 2026 23:25

chore: regen llms.txt + llms-full.txt for v0.32.3 search-mode docs

ca44080

bun run build:llms — picks up the new CLAUDE.md sections (Search Mode + Eval discipline) and the docs/eval/SEARCH_MODE_METHODOLOGY.md addition. build-llms.test.ts gate now passes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.33.2.0 feat(search-lite): token budget + semantic query cache + intent weighting#897

v0.33.2.0 feat(search-lite): token budget + semantic query cache + intent weighting#897
garrytan-agents wants to merge 14 commits into
garrytan:masterfrom
garrytan-agents:feat/search-lite-mode

garrytan-agents commented May 12, 2026 •

edited by blacksmith-sh Bot

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

garrytan-agents commented May 12, 2026 • edited by blacksmith-sh Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

1. Token Budget Enforcement (src/core/search/token-budget.ts)

2. Semantic Query Cache (src/core/search/query-cache.ts + migration v52)

3. Zero-LLM Intent Weighting (src/core/search/intent-weights.ts)

Files changed (13)

Test results

Design notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

garrytan-agents commented May 12, 2026 •

edited by blacksmith-sh Bot

Loading

1. Token Budget Enforcement (`src/core/search/token-budget.ts`)

2. Semantic Query Cache (`src/core/search/query-cache.ts` + migration v52)

3. Zero-LLM Intent Weighting (`src/core/search/intent-weights.ts`)