Skip to content

Sync upstream garrytan/gbrain v0.28.12 → v0.33.0 (28 commits)#3

Merged
chapter37haptics merged 29 commits into
masterfrom
upstream-sync/v0.28.11-to-v0.33.0
May 12, 2026
Merged

Sync upstream garrytan/gbrain v0.28.12 → v0.33.0 (28 commits)#3
chapter37haptics merged 29 commits into
masterfrom
upstream-sync/v0.28.11-to-v0.33.0

Conversation

@chapter37haptics
Copy link
Copy Markdown
Owner

Summary

Merge 28 upstream commits from garrytan/gbrain into our fork, catching up from v0.28.11 → v0.33.0. Clean merge with no conflicts.

490 files changed, +71,771 / −1,901 lines


Upstream Commits (v0.28.12 → v0.33.0)

Version Commit Description
v0.33.0 17b190e gbrain recall morning pulse + thin-client routing fix (9 commands)
v0.32.8 e493d5f Multi-source bug class extermination — embed, extract, takes, patterns, integrity, migrate-engine
v0.32.7 c965244 CJK fix wave — 6 layers from one root cause
v0.32.6 9a5606a Brain-consistency probe + doctor + MCP + dream-cycle wire-up
v0.32.5 bd2fe8a gbrain-context OpenClaw context engine — deterministic temporal/spatial injection
v0.32.4 59d077f Add sync_freshness check to gbrain doctor
v0.32.3 7be1726 Functional-area-resolver — pattern for compressing routing tables
v0.32.2 a73108b Facts join system-of-record + 3-layer privacy + CI invariant gate
v0.32.0 71ed8d0 5 new embedding recipes + discoverability pass (closes 17-PR cluster)
v0.31.12 2996181 Canonical Anthropic model IDs + tier routing surface + gbrain models CLI
v0.31.11 0410dc4 Thin-client auto-upgrade prompt
v0.31.10 cb5bf1d Add cold-start and ask-user skills
v0.31.8 182900d Multi-source threading + doctor wedge hint + voyage cap (P2 follow-ups)
v0.31.7 8784034 Doctor stops crying wolf — 5 community PRs adapted
v0.31.6 200a741 Extract facts during sync (real-time hot memory)
v0.31.4.1 943e7b9 Align VERSION + package.json with 4-segment versions
v0.31.4 7267462 Takes v2 — lessons from 100K-take production extraction
v0.31.3 9c60b3a Stdio MCP graceful cleanup + engine-aware auth/admin SQL
v0.31.2 eec2d2b Sync --strategy code no longer hangs on big symlink-rich repos
v0.31.1.1 ff53a4c 22 community fixes (auth-code P0, upgrade-path, sync, multi-source, privacy)
v0.31.1 b2fd264 Thin-client mode actually works
v0.31.0 89ae720 Hot memory — facts hook + recall CLI + MCP _meta + consolidate phase
v0.30.2 410c697 Dream synthesize stops dropping fat transcripts
v0.30.1 dffb607 Operational hardening — make upgrades just work on Supabase
v0.30.0 1399e51 Calibration scorecards (Slice A1 of v0.30 wave)
v0.29.2 8392d43 Thin-client mode (gbrain init --mcp-only + remote ping/doctor + topologies)
v0.29.0/1 b8e0a0e Salience + anomaly detection — brain surfaces what's hot
v0.28.12 bca993e LongMemEval benchmark harness

Impact Analysis: What This Means for Our Custom Modifications

Our custom modifications plan a deterministic devcontainer setup with PGLite init, brain repo + session branches, MCP-only agent integration, trusted MCP mode, and OTel instrumentation. Here's how each major upstream change affects that work:

✅ Directly Beneficial (Unblocks or Simplifies Our Work)

v0.31.3 — Stdio MCP graceful cleanup (9c60b3a)

v0.31.7 — Doctor stops crying wolf (8784034)

  • Fixes 5 community issues including false-positive doctor warnings
  • Directly improves our gbrain doctor --json verification step in entrypoint

v0.31.1.1 — 22 community fixes (ff53a4c)

  • Auth-code P0 fix, upgrade-path fixes, sync fixes, multi-source fixes
  • The sync fixes are critical — we depend on gbrain sync --repo working cleanly

v0.31.2 — Sync no longer hangs on symlink-rich repos (eec2d2b)

  • Fixes a potential hang in gbrain sync --strategy code
  • Makes our sync step more robust in devcontainer environments

⚠️ Requires Re-evaluation of Our Specs

v0.31.0 + v0.31.6 — Hot memory / facts extraction (89ae720, 200a741)

  • New facts system that auto-extracts entities during sync
  • Our spec's OQ1 said "don't build gbrain skill-split, use Claude Code hooks for signal-detector"
  • Upstream solved this differently: facts are extracted deterministically during sync, not via LLM hooks
  • Action needed: Our Part 2 entrypoint runs gbrain sync. If hot memory is on by default, facts will be extracted automatically. Review whether this replaces our planned signal-detector hook approach

v0.32.2 — Facts join system-of-record + 3-layer privacy (a73108b)

  • Facts now have a privacy model (public/internal/private)
  • Action needed: Review whether GBRAIN_MCP_TRUSTED=true (our D10) interacts with the privacy layer. Trusted mode should still respect privacy gates

v0.31.4 — Takes v2 (7267462)

  • Major rewrite of the "takes" system (lessons from 100K-take production)
  • Our D10 spec mentions takesHoldersAllowList: ['world'] for remote callers
  • Action needed: Verify that the Takes v2 API still uses takesHoldersAllowList in dispatchToolCall. If the parameter changed, our trusted-mode patch (Part 3) needs updating

v0.32.0 — 5 new embedding recipes (71ed8d0)

  • New embedding providers/recipes beyond OpenAI
  • Our spec hardcodes OpenAI text-embedding-3-large
  • Action needed: Check if the default embedding provider changed. If so, our OPENAI_API_KEY gating in the entrypoint may need adjustment (could now support other providers)

v0.29.0/v0.29.2/v0.31.1/v0.31.11 — Thin-client mode (b8e0a0e, 8392d43, b2fd264, 0410dc4)

  • gbrain init --mcp-only for remote-only setups, auto-upgrade prompts
  • Our spec uses PGLite (local mode), not thin-client, but the auto-upgrade prompt (v0.31.11) could fire during init
  • Action needed: Verify gbrain init --pglite --json still suppresses all prompts. The auto-upgrade check (our D7 says skip) might now be more aggressive

🔍 Informational (No Direct Impact, Good to Know)

v0.32.5 — gbrain-context OpenClaw engine (bd2fe8a)

  • Deterministic temporal/spatial context injection for OpenClaw
  • Doesn't affect Claude Code integration, but shows upstream is building deterministic context pipelines (aligned with our goals)

v0.32.6 — Brain-consistency probe (9a5606a)

  • Doctor-level consistency checks for brain integrity
  • Could replace some of our planned OTel convention-compliance checks (OQ2) — check if upstream now catches drift natively

v0.32.7 — CJK fix wave (c965244)

  • Fixes CJK text handling across 6 layers
  • Relevant if brain content includes CJK text

v0.32.8 — Multi-source bug extermination (e493d5f)

v0.33.0 — Recall morning pulse (17b190e)

  • New gbrain recall command for daily knowledge summaries
  • A potential addition to our devcontainer startup (run recall on session start to show the agent what's important)

v0.30.0 — Calibration scorecards (1399e51)

  • Embedding quality calibration
  • Useful for tuning our embedding step quality

🚨 Known Risk: NI1 (Sync Without OPENAI_API_KEY)

Our spec flagged NI1: v0.32 sync hard-errors without OPENAI_API_KEY even with --no-embed. This PR brings in v0.32+ code. Before merging, we should verify that gbrain sync --repo /path --no-embed works without OPENAI_API_KEY on v0.33.0. If it doesn't, our entrypoint needs to set a dummy key or we need to pin specific commands.

Summary of Required Actions Before Building on This

  1. Test gbrain sync --no-embed without OPENAI_API_KEY on v0.33.0 (NI1 regression check)
  2. Check if hot memory/facts extraction fires by default during sync — may replace our signal-detector hook plan
  3. Verify takesHoldersAllowList parameter still exists in dispatch.ts for our trusted-mode patch
  4. Check if gbrain init --pglite --json still suppresses all prompts (thin-client auto-upgrade might interfere)
  5. Update GBRAIN_VERSION build arg from v0.28.11 to v0.33.0 in Dockerfile spec

Test plan

  • Verify clean merge — no conflicts ✅ (done)
  • Run bun test on the merged branch
  • Test gbrain init --pglite --json still works non-interactively
  • Test gbrain sync --repo <path> --no-embed without OPENAI_API_KEY
  • Verify gbrain doctor --json returns healthy status
  • Check src/mcp/server.ts dispatch interface for Part 3 compatibility

🤖 Generated with Claude Code

garrytan and others added 29 commits May 7, 2026 19:49
* v0.28 schema: takes + synthesis_evidence (v31) + access_tokens.permissions (v32)

Migration v31 adds the takes table (typed/weighted/attributed claims) and
synthesis_evidence (provenance for `gbrain think` outputs). Page-scoped via
page_id FK (slug isn't unique alone in v0.18+ multi-source). HNSW partial
index on embedding for active rows. ON DELETE CASCADE on synthesis_evidence
so deleting a source take cascades the provenance row.

Migration v32 adds access_tokens.permissions JSONB with safe-default
backfill (`{"takes_holders":["world"]}`). Default keeps non-world holders
hidden from MCP-bound tokens until the operator explicitly grants access
via the v0.28 auth permissions CLI.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.28 engine: addTakesBatch, listTakes, searchTakes/Vector, supersede, resolve, synthesis_evidence

Extends BrainEngine with the takes domain object. Both engines implement the
same surface; PGLite uses manual `$N` placeholders, Postgres uses postgres-js
unnest() — same shape as addLinksBatch and addTimelineEntriesBatch.

Methods:
- addTakesBatch (upsert via ON CONFLICT (page_id, row_num) DO UPDATE)
- listTakes (filter by holder/kind/active/resolved, takesHoldersAllowList
  for MCP-bound calls, sortBy weight/since_date/created_at)
- searchTakes / searchTakesVector (pg_trgm + cosine; honor allow-list)
- countStaleTakes / listStaleTakes (mirror countStaleChunks pattern;
  embedding column intentionally omitted from listStale payload)
- updateTake (mutable fields only; throws TAKE_ROW_NOT_FOUND)
- supersedeTake (transactional: insert new at next row_num, mark old
  active=false, set superseded_by; throws TAKE_RESOLVED_IMMUTABLE on
  resolved bets)
- resolveTake (sets resolved_*; throws TAKE_ALREADY_RESOLVED on re-resolve;
  resolution is immutable per Codex P1 #13 fold)
- addSynthesisEvidence (provenance persist; ON CONFLICT DO NOTHING)
- getTakeEmbeddings (parallel to getEmbeddingsByChunkIds)

Types live in src/core/engine.ts adjacent to LinkBatchInput. Page-scoped
via page_id (slug not unique in v0.18+ multi-source). PageType gains
'synthesis'. takeRowToTake mapper in utils.ts handles Date → ISO string
normalization.

Tests: test/takes-engine.test.ts — 16 cases against PGLite covering
upsert/list/filter/search happy paths, takesHoldersAllowList isolation,
the four invariant errors (TAKE_ROW_NOT_FOUND, TAKES_WEIGHT_CLAMPED,
TAKE_RESOLVED_IMMUTABLE, TAKE_ALREADY_RESOLVED), supersede flow, resolve
metadata round-trip, FK CASCADE on synthesis_evidence when source take
deletes. All pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.28 model-config: unified resolveModel with 6-tier precedence + alias resolution

Replaces every hardcoded `claude-*-X` and per-phase `dream.<phase>.model`
config key with a single resolver. Hierarchy:

  1. CLI flag (--model)
  2. New-key config (e.g. models.dream.synthesize)
  3. Old-key config (deprecated dream.synthesize.model, dream.patterns.model)
     — read with stderr deprecation warning, one-per-process
  4. Global default (models.default)
  5. Env var (GBRAIN_MODEL or caller-supplied)
  6. Hardcoded fallback

Aliases (`opus`, `sonnet`, `haiku`, `gemini`, `gpt`) resolve at the end so
any tier can use a short name. User-defined `models.aliases.<name>` config
overrides built-ins. Cycle-safe (depth 2 break). Unknown alias passes
through unchanged so users can pass full provider IDs without registering.

When new-key + old-key are BOTH set (Codex P1 #11 fix), new-key wins and
stderr warns "deprecated config X ignored; Y is set and wins". When only
old-key is set, it's honored with a softer "rename to Y before v0.30"
warning. Both warnings emit once per (key, process) — a Set memo prevents
log spam in long-running daemons.

Migrated call sites: synthesize.ts (model + verdictModel), patterns.ts
(model). subagent.ts and search/expansion.ts to be migrated later in v0.28
(staying compatible until then).

Tests: test/model-config.test.ts — 11 cases pinning the 6-tier ordering,
alias resolution + cycle break, deprecated-key warning emit-once, and
unknown-alias pass-through. All pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.28 takes-fence: parser/renderer/upserter + chunker strip (privacy P0 fix)

src/core/takes-fence.ts — pure functions for the fenced markdown surface:
- parseTakesFence(body) — extracts ParsedTake[] from `<!--- gbrain:takes:begin/end -->`
  blocks. Strict on canonical form, lenient on hand-edits with warnings
  (TAKES_FENCE_UNBALANCED, TAKES_TABLE_MALFORMED, TAKES_ROW_NUM_COLLISION).
  Strikethrough `~~claim~~` → active=false; date ranges `since → until`
  split into sinceDate/untilDate.
- renderTakesFence(takes) — round-trip safe with parseTakesFence.
- upsertTakeRow(body, row) — append-only per CEO-D6 + eng-D9. Creates a
  fresh `## Takes` section if no fence present. row_num is monotonic
  (max + 1, never gap-filled — keeps cross-page refs and synthesis_evidence
  stable forever).
- supersedeRow(body, oldRow, replacement) — strikes through old row's claim
  AND appends the new row at end. Both rows preserved in markdown for
  git-blame archaeology.
- stripTakesFence(body) — removes the fenced block entirely. Used by the
  chunker so takes content lives ONLY in the takes table.

Codex P0 #3 fix: src/core/chunkers/recursive.ts now calls stripTakesFence()
before computing chunk boundaries. Without this, page chunks would contain
the rendered takes table and the per-token MCP allow-list would be
bypassed at the index layer (token bound to takes_holders=['world'] would
see garry's hunches via page hits). Doctor's takes_fence_chunk_leak check
(plan-side) asserts no chunk contains the begin marker.

Tests: 15 cases covering canonical parse, strikethrough, date range, fence
unbalanced detection, malformed-row skip + warning, row_num collision
detection, round-trip render, append-only upsert into existing fence,
fresh-section creation, monotonic row_num under hand-edit gaps, supersede
flow, stripTakesFence verifying takes content removed AND surrounding
prose preserved. Existing chunker tests still pass (15 + 15 = 30).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.28 page-lock: PID-liveness file lock for atomic markdown read-modify-write

src/core/page-lock.ts — per-page file lock at
~/.gbrain/page-locks/<sha256-of-slug>.lock so two concurrent `gbrain takes
add` calls or `takes seed --refresh` from autopilot can't race on the
same `<slug>.md` read-modify-write. Eng-review fold: reuses the v0.17
cycle.lock pattern (mtime + PID liveness) but per-slug.

Differences from cycle.ts's lock:
- SHA-256 of slug for safe filenames (slashes, unicode, etc.)
- Same-pid + fresh mtime = LIVE (cycle.ts assumes one lock per process and
  reclaims same-pid; page-lock allows concurrent locks for DIFFERENT slugs
  in one process). mtime expiry still rescues post-crash leftovers.
- 5-min TTL (vs cycle's 30 min — page edits are short)
- `withPageLock(slug, fn)` convenience wrapper with default 30s timeout

API:
- acquirePageLock(slug, opts) → handle | null (poll-with-timeout)
- handle.refresh() / handle.release() (idempotent — only releases if pid matches)
- withPageLock(slug, fn, opts) — acquire + run + release-in-finally

Tests: 10 cases — fresh acquire, live holder returns null, stale-mtime
reclaim, dead-PID reclaim, refresh updates timestamp, foreign-pid release
is no-op, withPageLock callback runs and releases on success/failure,
timeout-throws when held, SHA-256 filename safety for slashes/unicode.
All pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.28 extract-takes: dual-path phase (fs|db) + since/until_date as TEXT

src/core/cycle/extract-takes.ts — new phase that materializes the takes
table from fenced markdown blocks. Two paths mirror src/commands/extract.ts:

- extractTakesFromFs: walk *.md under repoPath, parse fences, batch upsert
- extractTakesFromDb: iterate engine.getAllSlugs(), parse each page's
  compiled_truth+timeline, batch upsert (mutation-immune snapshot iteration)

Single dispatcher extractTakes(opts) routes by source. Honors:
- slugs filter for incremental re-extract (pipes from sync→extract)
- dryRun: count would-be upserts, write nothing
- rebuild: DELETE FROM takes WHERE page_id = $1 before re-insert (clean
  slate when markdown is canonical and DB has drifted)

Schema fix: since_date/until_date were DATE in the original v31 migration.
Spec uses partial dates ('2017-01', '2026-04-29 → 2026-06') that Postgres
DATE rejects. Changed to TEXT in both the Postgres and PGLite blocks so
parser-rendered ranges round-trip cleanly. Loses the ability to do
date-range arithmetic in SQL, but date math on opinion timelines is
out of scope for v0.28 anyway. utils.ts dateOrNull now annotated as
v0.28 TEXT-aware.

Migration v31 has not been deployed yet (this branch is the v0.28 release
candidate), so the type swap is free. No data migration needed.

Tests: test/extract-takes.test.ts — 5 cases against PGLite covering full
walk + fence-skip on no-fence pages, takes-table populated post-extract,
incremental slugs filter, dry-run no-write, rebuild=true clears + re-inserts
ad-hoc rows. test/takes-engine.test.ts (16), test/takes-fence.test.ts (15)
all still pass — 36/36 takes tests green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.28 takes CLI: list, search, add, update, supersede, resolve

src/commands/takes.ts — surfaces the engine methods + takes-fence library
through a single `gbrain takes <subcommand>` entrypoint:

  takes <slug>                          list with filters + sort
  takes search "<query>"                pg_trgm keyword search across all takes
  takes add <slug> --claim ... ...      append (markdown + DB, atomic via lock)
  takes update <slug> --row N ...       mutable-fields update (markdown + DB)
  takes supersede <slug> --row N ...    strikethrough old + append new
  takes resolve <slug> --row N --outcome  record bet resolution (immutable)

Markdown is canonical. Every mutate command:
  1. acquires the per-page file lock (withPageLock)
  2. re-reads the .md file
  3. applies the edit via takes-fence (upsertTakeRow / supersedeRow)
  4. writes the .md file back
  5. mirrors to the DB via the engine method
  6. releases the lock (auto via finally)

Resolve currently writes only to DB — surfacing resolved_* in the markdown
table is deferred to v0.29 (the takes-fence renderer's column set is
fixed at # | claim | kind | who | weight | since | source per spec).

Wired into src/cli.ts dispatch + CLI_ONLY allowlist. Help text follows the
project convention (orphans/embed/extract pattern). --dir flag overrides
sync.repo_path config when working outside the configured brain.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.28 MCP + auth: takes_list / takes_search / think ops + per-token allow-list

OperationContext gains takesHoldersAllowList — server-side filter for
takes.holder field threaded from access_tokens.permissions through dispatch
into the engine SQL. Closes Codex P0 #3 at the dispatch layer (chunker
strip already closed the page-content side in the previous commit).

src/core/operations.ts — three new ops:
- takes_list: lists takes with holder/kind/active/resolved filters; honors
  ctx.takesHoldersAllowList for MCP-bound calls
- takes_search: pg_trgm keyword search; honors allow-list
- think: op surface registered (returns not_implemented envelope until
  Lane D's pipeline lands). Remote callers cannot save/take per Codex P1 #7.

src/mcp/dispatch.ts — DispatchOpts.takesHoldersAllowList threads into
buildOperationContext.

src/mcp/http-transport.ts — validateToken now reads
access_tokens.permissions.takes_holders, defaults to ['world'] when the
column is absent or malformed (default-deny on private hunches).
auth.takesHoldersAllowList passed to dispatchToolCall.

src/mcp/server.ts (stdio) — defaults to takesHoldersAllowList: ['world']
since stdio has no per-token auth. Operators wanting full visibility use
`gbrain call <op>` directly (sets remote=false).

src/commands/auth.ts — `gbrain auth create <name> --takes-holders w,g,b`
flag persists the per-token list; new `auth permissions <name>
set-takes-holders <list>` updates an existing token.

Tests: test/takes-mcp-allowlist.test.ts — 8 cases against PGLite proving
the threading: local-CLI sees all holders, ['world'] returns only public,
['world','garry'] returns 2/3, no-overlap returns empty (no fallback),
search honors allow-list, remote save/take on think rejected with
not_implemented envelope.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.28.0: ship-prep — VERSION, CHANGELOG, migration orchestrator, skill

Closes the v0.28 ship-prep cycle. Bumps VERSION + package.json + bun.lock
to 0.28.0. v0_28_0 migration orchestrator runs three idempotent phases on
upgrade:

- Schema verify: asserts schema_version >= 32 (migrations v31 + v32 already
  applied by the schema runner during gbrain upgrade); fails clean if not.
- Backfill takes: inline runs `extractTakes(engine, { source: 'db' })` so
  any pre-existing fenced takes tables in markdown populate the takes
  index. Idempotent; ON CONFLICT DO UPDATE keeps the table in sync.
- Re-chunk TODO: queues a pending-host-work entry asking the host agent
  to re-import pages with takes content so the v0.28 chunker-strip rule
  (Codex P0 #3 fix) applies retroactively. Pages imported under v0.28+
  already have takes content stripped from chunks at index time; this
  TODO catches up legacy pages.

skills/migrations/v0.28.0.md — agent-readable upgrade guide. Walks
through doctor verification, deprecated-key migration, MCP token
visibility configuration, and a "try the takes layer" smoke test.

CHANGELOG.md — v0.28.0 release-summary in the GStack voice (no AI
vocabulary, no em dashes, real numbers from git diff stat) + the
mandatory "To take advantage of v0.28.0" block + itemized changes by
subsystem (schema, engine, markdown surface, model config, MCP+auth,
CLI, tests, accepted risks).

Final test sweep: 65/65 v0.28 tests pass across 6 files. typecheck clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.28 think pipeline: gather → sanitize → synthesize → cite-render → CLI

src/core/think/sanitize.ts — prompt-injection defense for take claims:
14 jailbreak patterns (ignore-prior, role-jailbreak, close-take tag,
DAN, system-prompt overrides, eval-shell hooks) plus structural framing
(takes wrapped in <take id="..."> tags the model is told to treat as
DATA). Length-cap at 500 chars. Renders evidence blocks for the prompt.

src/core/think/prompt.ts — system prompt + structured-output schema.
Hard rules: cite every claim, mark hunches/low-weight explicitly,
surface conflicts (never silently pick), surface gaps. JSON schema
with answer + citations[] + gaps[]. Prompt adapts to anchor / time
window / save flag.

src/core/think/cite-render.ts — structured citations + regex fallback
(Codex P1 #4 fold). normalizeStructuredCitations validates the model's
structured output; parseInlineCitations is the body-scan fallback when
the model omits the structured field. resolveCitations dispatches and
records CITATIONS_REGEX_FALLBACK warning when used.

src/core/think/gather.ts — 4-stream parallel retrieval:
  1. hybridSearch (pages, existing primitive)
  2. searchTakes (keyword, pg_trgm)
  3. searchTakesVector (vector, when embedQuestion fn supplied)
  4. traversePaths (graph, when --anchor set)
RRF fusion (k=60). Each stream wrapped in try/catch — partial gather
beats no synthesis. Honors takesHoldersAllowList for MCP-bound calls.

src/core/think/index.ts — runThink orchestrator + persistSynthesis:
INTENT (regex classify) → GATHER → render evidence blocks → resolveModel
('models.think' → 'models.default' → GBRAIN_MODEL → opus) → LLM call
(injectable client) → JSON parse with code-fence + fallback strip →
resolveCitations → ThinkResult. persistSynthesis writes a synthesis
page + synthesis_evidence rows (page_id resolved per slug; page-level
citations skip evidence). Degrades gracefully without ANTHROPIC_API_KEY.
Round-loop scaffolding in place (rounds=1 only path exercised in v0.28).

src/commands/think.ts — `gbrain think "<question>"` CLI. Flag parsing
strips --anchor, --rounds, --save, --take, --model, --since, --until,
--json. Local CLI = remote=false, so save/take honored. Human-readable
output by default; --json for agent consumption.

operations.ts — `think` op now calls runThink (was a not_implemented
stub). Remote callers can't save/take per Codex P1 #7. Returns full
ThinkResult plus saved_slug + evidence_inserted.

cli.ts — wired into dispatch + CLI_ONLY allowlist.

Tests: test/think-pipeline.test.ts — 18 cases against PGLite covering
sanitize patterns, structural rendering, citation parsing (structured +
regex fallback + dedup + invalid-slug rejection), gather streams +
allow-list filter, full pipeline with stub client, malformed-LLM
fallback path, no-API-key graceful degradation, persistSynthesis writes
page + evidence rows. All pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.28 dream phases: auto-think + drift + budget meter (Codex P1 #10 fold)

src/core/anthropic-pricing.ts — USD/1M-tokens map for Claude 4.7 family
plus older aliases. estimateMaxCostUsd returns null on unpriced models so
the meter caller can warn-once and bypass the gate.

src/core/cycle/budget-meter.ts — cumulative cost ledger. Each submit
estimates max-cost from (model + estimatedInputTokens + maxOutputTokens),
accumulates per-cycle, refuses next submit when projected > cap. Codex
P1 #10 fold: non-Anthropic models (gemini, gpt) bypass with one stderr
warn per process and `unpriced=true` on the result. Budget=0 disables
the gate. Audit trail at ~/.gbrain/audit/dream-budget-YYYY-Www.jsonl.

src/core/cycle/auto-think.ts — auto_think dream phase. Reads
dream.auto_think.{enabled,questions,max_per_cycle,budget,cooldown_days,
auto_commit}. Iterates configured questions through runThink with the
BudgetMeter pre-checking each submit. Cooldown timestamp written ONLY on
success (matches v0.23 synthesize pattern — retries after partial
failures pick back up). When auto_commit=true, persists synthesis pages
via persistSynthesis. Default-disabled.

src/core/cycle/drift.ts — drift dream phase scaffold. Reads
dream.drift.{enabled,lookback_days,budget,auto_update}. Surfaces takes
in the soft band (weight 0.3-0.85, unresolved) that have recent timeline
evidence on the same page. v0.28 ships the orchestration; the LLM judge
that proposes weight adjustments lands in v0.29. modelId + meter wired
now so the ledger captures gate state for callers that opt in.

Tests:
- test/budget-meter.test.ts (7 cases) — pricing-map coverage, allow path,
  cumulative-deny, budget=0 disabled, unpriced bypass+warn-once, ledger
  captures all events, ISO-week filename branch.
- test/auto-think-phase.test.ts (9 cases) — auto_think enable/skip,
  questions empty, success → cooldown ts written, cooldown blocks rerun,
  budget exhausted → partial. drift not_enabled, soft-band candidate
  detection, complete + dry-run paths.

All pass. Typecheck clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.28 e2e Postgres: takes engine + extract + MCP allow-list (12 cases)

test/e2e/takes-postgres.test.ts — full v0.28 takes pipeline against real
Postgres (gated on DATABASE_URL). 12 cases:
- addTakesBatch upsert via unnest() bind path (Postgres-specific)
- listTakes filters: holder, kind, sort=weight, takesHoldersAllowList
- searchTakes pg_trgm + allow-list filter
- supersedeTake transactional path (BEGIN/COMMIT semantics)
- resolveTake immutability — second resolve throws TAKE_ALREADY_RESOLVED
- synthesis_evidence FK CASCADE on take delete
- countStaleTakes + listStaleTakes filter active+null
- extractTakesFromDb populates takes from fenced markdown
- MCP dispatch with takesHoldersAllowList=['world'] returns only world
- MCP dispatch local-CLI path returns all holders
- MCP dispatch takes_search honors allow-list
- think op forces remote_persisted_blocked even for save+take

postgres-engine.ts: addTakesBatch boolean[] serialization fix.
postgres-js auto-detects element type from JS arrays; for booleans it
mis-detects as scalar. Cast through text[] (`'true' | 'false'`) then
SQL-cast to boolean[] — same pattern other batch methods rely on for
type-stable bind shapes.

test/e2e/helpers.ts: setupDB now (a) tolerates non-existent tables in
TRUNCATE (for fresh DBs where v31 hasn't yet created takes/synthesis_evidence)
and (b) calls engine.initSchema() to actually run migrations.

test/takes-mcp-allowlist.test.ts: updated 2 think-op cases to match
Lane D's landed pipeline. They previously asserted not_implemented
envelopes; now they assert remote_persisted_blocked + NO_ANTHROPIC_API_KEY
graceful-degrade behavior.

Run: DATABASE_URL=postgres://localhost:5435/gbrain_test bun test test/e2e/takes-postgres.test.ts
Result: 12/12 pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.28 dream phases: local DreamPhaseResult type (avoid premature CyclePhase enum extension)

cycle.ts's PhaseResult is shaped {phase, status, summary, details} with a
narrow PhaseStatus enum ('ok'|'warn'|'fail'|'skipped') and CyclePhase enum
that doesn't yet include 'auto_think'/'drift'. The phases ship standalone
in v0.28 (cycle.ts dispatcher integration is v0.28.x); using PhaseResult
forced premature enum extension.

Introduces DreamPhaseResult exported from auto-think.ts:
  { name: 'auto_think'|'drift'; status: 'complete'|'partial'|'failed'|'skipped';
    detail: string; totals?: Record<string,number>; duration_ms: number }

drift.ts re-exports the same type. When v0.28.x wires the dispatcher, the
adapter at the call site can map DreamPhaseResult → PhaseResult cleanly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.28 e2e: access_tokens.permissions JSONB end-to-end (5 cases)

test/e2e/auth-permissions.test.ts — closes the v0.28 token-allow-list
verification loop against real Postgres. Exercises:

- Migration v32 default backfill: new tokens created without a permissions
  column get {takes_holders: ["world"]} via the schema DEFAULT clause.
- Explicit ["world","garry"] → dispatch.takes_list filters to those
  holders only; brain hunches stay hidden from this token.
- ["world"] default-deny token → takes_search hits filtered to public claims.
- {} permissions row (operator tampered) gracefully defaults to ["world"]
  via the HTTP transport's validateToken parsing.
- revoked_at IS NOT NULL → token excluded from active token query.

Avoids the postgres-js JSONB double-encode trap (CLAUDE.md memory): pass
the object directly to executeRaw, no JSON.stringify, no ::jsonb cast.

All 5 pass against pgvector/pgvector:pg16 on port 5435. Combined v0.28
test sweep: 116/116 across 11 files.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.28 e2e: chunker takes-strip integration test (Codex P0 #3 verification)

test/e2e/chunker-takes-strip.test.ts — verifies the chunker actually
strips fenced takes content end-to-end through the import pipeline.
This is the Codex P0 #3 fix's verification path: takes content lives
ONLY in the takes table for retrieval, never duplicated in
content_chunks where the per-token MCP allow-list cannot reach.

5 cases:
- chunkText (unit) output never contains TAKES_FENCE_BEGIN/END markers
- chunkText output never contains fenced claim text
- chunkText output retains non-fence prose (no over-stripping)
- importFromContent end-to-end: imported page has chunks but none
  contain fenced content
- takes_fence_chunk_leak doctor invariant: zero rows globally where
  chunk_text matches `<!--- gbrain:takes:%`

Final v0.28 test sweep:
  121 pass, 0 fail, 336 expect() calls, 12 files
  Coverage: schema migrations, engine methods (PGLite + Postgres),
  takes-fence parser, page-lock, extract phase, takes CLI engine
  surface, model config 6-tier resolver, MCP+auth allow-list,
  think pipeline (gather + sanitize + cite-render + synthesize),
  auto-think + drift + budget meter, JSONB end-to-end, chunker
  strip integration. ~95% of v0.28 surface area covered.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix CI: apply-migrations skippedFuture arrays + http-transport SQL mock

Two CI failures from PR #563:

test/apply-migrations.test.ts (2 fails) — `buildPlan` tests assert exact
skippedFuture arrays at fixed installed-version stamps. Adding v0.28.0 to
the migration registry means it shows up in skippedFuture when the test
runs at installed=0.11.1 / installed=0.12.0. Append '0.28.0' to both
hardcoded arrays.

test/http-transport.test.ts (8 fails) — the FakeEngine mock string-prefix
matches `SELECT id, name FROM access_tokens` to return a row. v0.28's
validateToken now selects `SELECT id, name, permissions FROM access_tokens`
to read the per-token takes_holders allow-list. Mock returned [] on the
new query → validateToken treated every token as invalid → 401.

Fix: mock now matches both query shapes. validTokens row gets a default
`{takes_holders: ['world']}` permission injected when caller didn't
supply one (mirrors the migration v33 column DEFAULT). Updated
FakeEngineConfig type to allow tests to pass explicit permissions.

Verification:
  bun test test/apply-migrations.test.ts → 18/18 pass
  bun test test/http-transport.test.ts   → 24/24 pass
  bun run typecheck                       → clean

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix CI: add scope annotations to v0.28 ops (takes_list/takes_search/think)

test/oauth.test.ts enforces an invariant from master's v0.26 OAuth landing:
every Operation must have `scope: 'read' | 'write' | 'admin'`, and any op
flagged `mutating: true` must be 'write' or 'admin'. My v0.28 ops were added
before master shipped v0.26 + the new invariant; the merge surfaced the gap.

Annotations:
- takes_list   → read
- takes_search → read
- think        → write (mutating: true; --save persists synthesis page)

Verification:
  bun test test/oauth.test.ts → 42/42 pass
  bun run typecheck            → clean

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(v0.28.1): export INJECTION_PATTERNS for shared sanitization

The same pattern set protects takes from prompt-injection (think/sanitize.ts)
and now retrieved chat content in the LongMemEval harness. One source of
truth for both surfaces; adding a new pattern in this file automatically
covers benchmarks too.

Existing consumers (sanitizeTakeForPrompt, renderTakesBlock) keep working
unchanged. Verified via test/think-pipeline.test.ts (18 pass, 0 fail).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(v0.28.1): longmemeval harness — reset-in-place over in-memory PGLite

One in-memory PGLiteEngine per benchmark run; TRUNCATE between questions
with runtime-enumerated tables via pg_tables so future schema migrations
don't silently leak across questions. Infrastructure tables (sources,
config, gbrain_cycle_locks, subagent_rate_leases) preserved across resets
so initSchema-seeded rows like sources.'default' survive (FK target for
pages.source_id).

Files:
- src/eval/longmemeval/harness.ts: createBenchmarkBrain + resetTables +
  withBenchmarkBrain. ~50 lines, no class wrapper.
- src/eval/longmemeval/adapter.ts: pure haystackToPages() converter.
  Slug prefix `chat/` (verified non-matching against DEFAULT_SOURCE_BOOSTS).
- src/eval/longmemeval/sanitize.ts: re-uses INJECTION_PATTERNS from
  think/sanitize.ts; wraps each session in <chat_session id date> tags;
  4000-char cap.
- test/longmemeval-sanitize.test.ts: 12 cases pinning the F8 contract.

Hermetic: no DATABASE_URL, no API keys.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(v0.28.1): gbrain eval longmemeval CLI command

Run the LongMemEval public benchmark against gbrain's hybrid retrieval.
Dataset is a positional path (download from xiaowu0162/longmemeval on HF).
Per-question loop wraps everything in try/catch; one bad question doesn't
kill the run, error JSONL line emitted instead.

Wiring:
- src/cli.ts: pre-dispatch bypass for `eval longmemeval` so the user's
  ~/.gbrain brain is never opened. Hermeticity gate verified: --help works
  on machines with no gbrain config.
- src/commands/eval-longmemeval.ts: arg parsing, JSONL emit (LF + UTF-8
  pinned), hybridSearch with optional expandQuery from search/expansion.ts,
  resolveModel from model-config.ts (6-tier chain), ThinkLLMClient injection
  seam from think/index.ts, structural <chat_session> framing.
- test/eval-longmemeval.test.ts: 12 cases covering harness lifecycle,
  reset clears all tables, schema-migration robustness, p50/p99 speed gate
  (warm reset+import+search target <500ms), adapter shape, source-boost
  regression guard, end-to-end with stubbed LLM, JSONL format guard,
  per-question failure handling.
- test/fixtures/longmemeval-mini.jsonl: 5 hand-authored questions with
  keyword-friendly overlap so --keyword-only works in CI.

Speed: warm reset+import 5 pages+search p50=25.9ms p99=30.3ms locally.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(v0.28.1): bump VERSION + CHANGELOG

VERSION + package.json synchronized at 0.28.1. CHANGELOG entry uses the
release-summary voice + "To take advantage of v0.28.1" block per CLAUDE.md.

Sequential release on garrytan/v0.28-release; lands after v0.28.0.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: surface v0.28.1 LongMemEval CLI across project docs

- README.md: add EVAL section to Commands reference (eval --qrels, export,
  prune, replay, longmemeval); add v0.28.1 announce paragraph next to the
  v0.25.0 BrainBench-Real intro.
- CLAUDE.md: add Key files entry for src/eval/longmemeval/ +
  src/commands/eval-longmemeval.ts; add "Key commands added in v0.28.1"
  subsection (mirrors the v0.26.5 / v0.25.0 pattern); inventory
  test/eval-longmemeval.test.ts + test/longmemeval-sanitize.test.ts under
  the unit-test list.
- docs/eval-bench.md: cross-link from the "What it actually does" section
  to LongMemEval as the third evaluation axis (public benchmark,
  ground-truth labels, full QA pipeline); append "Public benchmarks:
  LongMemEval (v0.28.1)" section with architecture, flags table, and
  perf numbers.
- CONTRIBUTING.md: append a paragraph after the eval-replay block pointing
  contributors at gbrain eval longmemeval for public-benchmark coverage.
- AGENTS.md: extend the existing eval-retrieval bullet with a one-line
  mention of gbrain eval longmemeval.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.28.2 feat: remote-source MCP + scope hierarchy + whoami (#690)

* refactor(core): extract SSRF helpers from integrations.ts to core/url-safety.ts

src/core/git-remote.ts (next commit) needs isInternalUrl etc. but importing
from src/commands/ would invert the layering boundary (no existing
src/core/ file imports from src/commands/). Extract the SSRF helpers
(parseOctet, hostnameToOctets, isPrivateIpv4, isInternalUrl) into a new
src/core/url-safety.ts and have integrations.ts re-export for backward
compat. test/integrations.test.ts continues to pass without changes (110
existing tests, 214 expects).

Why this matters for v0.28: the upcoming sources --url feature reuses
this SSRF gate for git-clone URL validation. Codex review caught that
re-rolling weaker URL classification would regress on the IPv6/v4-mapped/
metadata/CGNAT bypass forms that integrations.ts already handles.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(core): add git-remote module — SSRF-defensive clone/pull + state probe

New src/core/git-remote.ts (~210 lines) for v0.28's remote-source feature:

- GIT_SSRF_FLAGS exported const: -c http.followRedirects=false,
  -c protocol.file.allow=never, -c protocol.ext.allow=never,
  --no-recurse-submodules. Single source of truth shared by cloneRepo
  and pullRepo so a future flag added to one path lands on both.
  Closes the SSRF surfaces codex flagged: DNS rebinding via redirects,
  .gitmodules as a second-fetch surface, file:// scheme in remotes.

- parseRemoteUrl: https-only, rejects embedded credentials and path
  traversal, delegates internal-target classification to isInternalUrl
  from url-safety.ts (covers RFC1918, link-local, loopback, IPv6, CGNAT
  100.64/10, metadata hostnames, hex/octal/single-int bypass forms).
  GBRAIN_ALLOW_PRIVATE_REMOTES=1 escape hatch with stderr warning is
  needed for self-hosted git over Tailscale (CGNAT trips the gate).

- cloneRepo: --depth=1 default (full clone via depth: 0); refuses
  non-empty destDirs; spawns git via execFileSync (no shell injection)
  with GIT_TERMINAL_PROMPT=0 + askpass=/bin/false to prevent credential
  prompts. timeoutMs default 600s.

- pullRepo: -C path + GIT_SSRF_FLAGS + pull --ff-only, same env confine.

- validateRepoState: 6-state decision tree (missing | not-a-dir |
  no-git | corrupted | url-drift | healthy). Used by performSync's
  re-clone branch to recover from rmd clone dirs and refuse syncs on
  url-drift or corruption.

test/git-remote.test.ts (304 lines, 32 tests): GIT_SSRF_FLAGS exact
shape, all parseRemoteUrl rejection cases including dedicated CGNAT
100.64/10 with/without GBRAIN_ALLOW_PRIVATE_REMOTES (codex T3 case),
fake-git harness for argv assertions on cloneRepo/pullRepo, all 6
validateRepoState branches.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(core): add scope hierarchy + ALLOWED_SCOPES allowlist

New src/core/scope.ts (~120 lines) for v0.28's scoped MCP feature.

Hierarchy:
  - admin implies all (escape hatch)
  - write implies read
  - sources_admin and users_admin are siblings (different axes —
    sources-mgmt vs user-account-mgmt; neither implies the other)

Exported:
  - hasScope(grantedScopes, requiredScope): the canonical scope check.
    Replaces exact-string-match at three call sites in upcoming commits
    (serve-http.ts:673, oauth-provider.ts:365 F3 refresh, oauth-provider.ts:498
    token issuance). Without this rewrite, an admin-grant token would
    fail to refresh down to sources_admin (codex finding).
  - ALLOWED_SCOPES set + ALLOWED_SCOPES_LIST sorted array (deterministic
    for OAuth metadata wire format and drift-check output).
  - assertAllowedScopes / InvalidScopeError: registration-time gate so
    tokens with bogus scope strings (read flying-unicorn) get rejected
    with RFC 6749 §5.2 invalid_scope at auth.ts:296 + DCR /register +
    registerClientManual. Today's behavior accepts any string silently.
  - parseScopeString: space-separated wire format → array.

Forward-compat: hasScope ignores unknown granted scopes rather than
throwing, so pre-allowlist tokens with weird scope strings continue
working without crashes (registration is the gate, runtime is best-effort).

test/scope.test.ts (178 lines, 35 tests): hierarchy table including
all-implies for admin, sibling non-implication of *_admin scopes,
write→read but not the reverse, F3 refresh-token subset semantics
under hasScope, ALLOWED_SCOPES_LIST sorted-pinning, allowlist
rejection cases, parseScopeString edge cases (undefined/null/empty).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* build(admin): scope-constants mirror + drift CI for src/core/scope.ts

The admin React SPA's tsconfig.json scopes include: ['src'] to admin/src/,
so it cannot directly import ../../src/core/scope.ts. The plan considered
widening the include or generating a single source of truth; both options
either couple the SPA to the gbrain monorepo or add a build step. Eng
review picked the boring choice: hand-maintained mirror at
admin/src/lib/scope-constants.ts plus a CI drift check.

Files:
  - admin/src/lib/scope-constants.ts: hand-maintained ALLOWED_SCOPES_LIST
    duplicate, sorted alphabetically to match src/core/scope.ts.
  - scripts/check-admin-scope-drift.sh: extracts the list from each file
    via awk, normalizes via tr/sort, diffs. Exits 0 on match, 1 on drift
    (with full breakdown of which scopes diverged), 2 on internal error.
    Tested both passing and corrupted paths.
  - package.json: wires check:admin-scope-drift into both `verify` and
    `check:all` so any update to src/core/scope.ts that forgets the
    admin-side mirror fails the build.

The Agents.tsx scope-checkbox sites (5 hardcoded locations) get updated
in a later commit to import from this constants file.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(oauth): hasScope hierarchy + ALLOWED_SCOPES allowlist at registration

Switch three call sites in oauth-provider.ts from exact-string-match to
hasScope() so the v0.28 sources_admin and users_admin scopes — and the
admin-implies-all + write-implies-read hierarchy in src/core/scope.ts —
work end to end:

- F3 refresh-token subset enforcement at line 365: previously rejected
  admin → sources_admin refresh because exact-match treated them as
  unrelated scopes. gstack /setup-gbrain Path 4 needs admin tokens to
  refresh down to least-privilege sources_admin scope; this fix lands
  that path.

- Token issuance intersection at line 498 (client_credentials grant):
  same hasScope swap so a client whose stored grant is `admin` can mint
  tokens including any implied scope.

- registerClient (DCR /register) and registerClientManual: validate
  every scope string against ALLOWED_SCOPES via assertAllowedScopes.
  Pre-fix the system silently accepted `--scopes "read flying-unicorn"`
  and persisted the bogus string in oauth_clients.scope. Post-fix the
  caller gets RFC 6749 §5.2 invalid_scope. Existing rows with
  pre-allowlist scopes keep working (allowlist gates registration only).

Tests amended in test/oauth.test.ts:
- T1 (eng-review): admin grant CAN refresh down to sources_admin
- T1 sibling: write grant CANNOT refresh up to sources_admin
- ALLOWED_SCOPES allowlist coverage (manual + DCR paths, all 5 valid)
- Scope-annotation contract tests widened to accept the v0.28 union

62 OAuth tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(serve-http): hasScope at /mcp + advertise full ALLOWED_SCOPES

Two changes against src/commands/serve-http.ts:

- Line 195: scopesSupported on the mcpAuthRouter options switches from the
  hardcoded ['read','write','admin'] to Array.from(ALLOWED_SCOPES_LIST).
  Without this, /.well-known/oauth-authorization-server keeps reporting
  the old triple, so MCP clients (Claude Desktop, ChatGPT, Perplexity)
  cannot discover the v0.28 sources_admin and users_admin scopes via
  standard discovery — they would have to be pre-configured out of band.

- Line 673: request-time scope check on /mcp swaps
  authInfo.scopes.includes(requiredScope) for hasScope(...). This was
  the most-cited codex finding: without it, sources_admin tokens could
  not even satisfy a `read`-scoped op (sources_admin doesn't include
  the literal string "read"). hasScope routes through the hierarchy
  table in src/core/scope.ts so admin implies all and write implies
  read at the gate too.

T2 amendment in test/e2e/serve-http-oauth.test.ts: assert
/.well-known/oauth-authorization-server includes all 5 scopes in
scopes_supported. Pre-v0.28 the list was hardcoded to ['read','write',
'admin'] and this assertion would have failed. (The test is
Postgres-gated; runs under bun run test:e2e with DATABASE_URL set.)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(core): sources-ops module — atomic clone + symlink-safe cleanup

src/core/sources-ops.ts (~470 lines): pure async functions extracted from
src/commands/sources.ts so the CLI handlers and the new MCP ops share
one implementation.

addSource: D3 atomicity contract from the eng review.
  1. Validate id (matches existing SOURCE_ID_RE).
  2. Q4 pre-flight SELECT — fail loudly with structured `source_id_taken`
     before any clone work. Pre-fix the existing CLI used INSERT…ON
     CONFLICT DO NOTHING which silently no-op'd; with clone-first that
     would orphan the temp dir.
  3. parseRemoteUrl gate (delegates to isInternalUrl from url-safety.ts).
  4. Clone into $GBRAIN_HOME/clones/.tmp/<id>-<rand>/ via the new
     git-remote helpers.
  5. INSERT row with local_path=<final clone dir>, config.remote_url=<url>.
  6. fs.renameSync(tmp/, final/). Rollback on either-side failure unlinks
     the temp dir; rename-failed path also DELETEs the just-INSERTed row
     best-effort.

removeSource: clone-cleanup with realpath+lstat confinement matching
validateUploadPath() shape at src/core/operations.ts:61. String startsWith
is symlink-unsafe and would let $GBRAIN_HOME/clones/<id> → /etc resolve
out of the confine. Two defenses layered:
  - isPathContained (realpath-resolves both sides + parent-with-sep
    string check) rejects symlinks whose target falls outside the
    confine.
  - lstat-then-isSymbolicLink check refuses symlinks whose realpath
    happens to land back inside the confine (defense in depth).

getSourceStatus: returns clone_state via validateRepoState (the 6-state
decision tree from git-remote.ts). Lets a remote MCP caller diagnose
"healthy | missing | not-a-dir | no-git | url-drift | corrupted" without
SSH access to the brain host. listSources additionally exposes
remote_url so callers can see which sources are auto-managed.

recloneIfMissing: T4 follow-up for `gbrain sources restore` after the
clone dir was autopurged — re-clones via the same temp + rename
atomicity contract. Idempotent (returns false when clone is already
healthy).

test/sources-ops.test.ts (~470 lines, 24 tests): pre-flight collision
(Q4), happy paths for both --path and --url, all four D3 rollback paths
(clone-fail before INSERT, INSERT-fail after clone, rename-fail
post-INSERT, atomic temp-dir cleanup), symlink-target-OUTSIDE-clones
(realpath confinement), symlink-target-INSIDE-clones (lstat-check),
removeSource refuses to delete user-supplied paths, refuses "default"
source, getSourceStatus clone_state branches, T4 recloneIfMissing
recovery + idempotent + no-op for path-only sources, isPathContained
unit tests covering subtree / outside / symlink-escape / fail-closed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(operations): whoami + sources_{add,list,remove,status} MCP ops

Five new ops in src/core/operations.ts auto-flow through src/mcp/tool-defs.ts
so MCP clients (Claude Desktop, ChatGPT, Perplexity, OpenClaw) get them via
standard tools/list discovery — no SDK or transport code changes needed.

Operation.scope union widened to add 'sources_admin' and 'users_admin' (the
v0.28 hierarchy from src/core/scope.ts).

whoami (scope: read): introspect calling identity over MCP.
  - Returns `{transport: 'oauth', client_id, client_name, scopes, expires_at}`
    for OAuth clients (clientId starts with gbrain_cl_).
  - Returns `{transport: 'legacy', token_name, scopes, expires_at: null}`
    for grandfathered access_tokens.
  - Returns `{transport: 'local', scopes: []}` when ctx.remote === false.
    Empty scopes (NOT ['read','write','admin']) is the D2 decision —
    returning OAuth-shaped scopes for local callers would resurrect the
    v0.26.9 footgun where code conditionally trusted on
    `auth.scopes.includes('admin')` instead of `ctx.remote === false`.
  - Q3 fail-closed: throws unknown_transport when remote=true AND auth is
    missing OR ctx.remote is the literal `undefined` (cast bypass guard).
    A future transport that forgets to thread auth doesn't get a free
    pass.

sources_add (sources_admin, mutating): register a source by --path
  (existing v0.17 behavior) or --url (v0.28 federated remote-clone path).
  Calls into addSource from sources-ops.ts which owns the temp-dir +
  rename atomicity.

sources_list (read): list registered sources with page counts, federated
  flag, and remote_url. The remote_url field is new — lets a remote MCP
  caller see which sources are auto-managed.

sources_remove (sources_admin, mutating): cascade-delete a source +
  symlink-safe clone cleanup. Requires confirm_destructive: true when the
  source has data.

sources_status (read): per-source diagnostic returning clone_state
  ('healthy' | 'missing' | 'not-a-dir' | 'no-git' | 'url-drift' |
  'corrupted' | 'not-applicable') — lets a remote MCP caller diagnose a
  busted clone without SSH access to the brain host.

test/whoami.test.ts (9 tests): pinned transport-detection for all four
return shapes including Q3 fail-closed throw under both auth=undefined
and remote=undefined cast-bypass paths.

test/sources-mcp.test.ts (16 tests): op-metadata pins (scope, mutating,
localOnly), functional handler shape against PGLite, hasScope-driven
scope-enforcement smoke test simulating the serve-http.ts:673 gate
(read-only token rejected for sources_add; sources_admin token allowed;
admin token allowed for everything; gstack /setup-gbrain Path 4 token
covers all 4 ops), SSRF gate at the op layer.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(sync): re-clone fallback when clone is missing/no-git/corrupted

src/commands/sync.ts gets a v0.28-aware front-half. When the source has
config.remote_url, performSync calls validateRepoState before the existing
fast-forward pull path:

  - 'healthy'    → fall through to existing pull (unchanged)
  - 'missing'    → loud stderr "auto-recovery: re-cloning <id>", then
  'no-git'         recloneIfMissing handles the temp-dir + rename. Sync
  'not-a-dir'      continues from the freshly-cloned head.
  - 'corrupted'  → throw with structured hint pointing at sources remove
                   + add (no syncing wrong state).
  - 'url-drift'  → throw with hint pointing at the (deferred) sources
                   rebase-clone command.

Closes the operator-confidence gap: rm -rf $GBRAIN_HOME/clones/<id>/ no
longer breaks future syncs. The next sync sees the missing dir and
recovers via the recorded URL.

src/core/operations.ts: extend ErrorCode with 'unknown_transport' so
whoami's Q3 fail-closed path types check.

test/sources-resync-recovery.test.ts (12 tests): full validateRepoState
state matrix exercised under fake-git, recloneIfMissing recovery from
each degraded state, idempotent on healthy clones, the sync.ts:320
integration path that drives the recovery.

test/sources-ops.test.ts + test/sources-mcp.test.ts: drop the
GBRAIN_PGLITE_SNAPSHOT-disable line so these tests stop forcing cold
init across the parallel-shard runner. With snapshot allowed, init time
drops from 6+s to ~50ms and parallel runs stay under the 5s hook
timeout.

test/sources-mcp.test.ts: tighten scope literal-type so tsc keeps the
union narrow.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(cli): sources add --url + restore re-clone, thin-wrapper refactor

src/commands/sources.ts now delegates the data-mutation work to
src/core/sources-ops.ts (added in the previous commit). The CLI handler
parses argv, calls into addSource, and formats output.

Two new flags on `gbrain sources add`:
  - `--url <https-url>` : federated remote-clone path (clone + INSERT +
    rename, atomic rollback on failure).
  - `--clone-dir <path>` : override the default
    $GBRAIN_HOME/clones/<id>/ destination.

Validation rejects mutually-exclusive `--url` + `--path`. Errors from
the ops layer (SourceOpError) propagate through the CLI's standard
error wrapper in src/cli.ts so existing tests that assert throw shape
keep passing.

`gbrain sources restore <id>` (T4 from eng review): if the source has a
remote_url AND the on-disk clone was autopurged, call recloneIfMissing
before declaring success. Clone errors print a WARN with recovery
hints rather than failing the restore — the DB row is what restore
guarantees; the clone is best-effort.

54 sources-related tests pass (existing test/sources.test.ts +
sources-ops + sources-mcp).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(doctor,cycle): orphan-clones surface + autopilot purge phase (P1)

addSource's atomicity contract uses a temp dir that gets renamed to the
final clone path. If the process is SIGKILL'd between clone-finish and
rename, the temp dir orphans on disk. Without sweeping these, a brain
server accumulates gigabytes over months of failed `sources add --url`
attempts.

Two layers:

1. `gbrain doctor` now surfaces stale entries. A new orphan_clones check
   walks $GBRAIN_HOME/clones/.tmp/, names anything older than 24h, and
   prints a warn with disk-byte estimate. Operators see the leak before
   `df` complains.

2. The autopilot cycle's existing `purge` phase grows a substep that
   nukes .tmp/ entries past the same 72h TTL the page-soft-delete purge
   uses. Operator behavior stays uniform across all soft-delete-style
   surfaces.

Both layers are filesystem-only (no DB). On a brain that never used
--url cloning, both are no-ops.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* build(admin): scope checkboxes source from scope-constants mirror + dist

admin/src/pages/Agents.tsx Register Client modal:
  - useState default sources from ALLOWED_SCOPES_LIST (defaulting `read`
    to true, others false; unchanged UX for the common case).
  - Scope checkbox map iterates ALLOWED_SCOPES_LIST instead of the old
    hardcoded ['read','write','admin'].

Without this commit, even with the v0.28.1 server-side scope hierarchy,
operators registering an OAuth client from the admin UI cannot tick the
new sources_admin / users_admin scopes — defeats the whole gstack
/setup-gbrain Path 4 unblock.

The drift-check CI gate (scripts/check-admin-scope-drift.sh) ensures
this list stays in sync with src/core/scope.ts going forward.

admin/dist/* rebuilt via `cd admin && bun run build`. Old hash bundle
removed; new bundle (224.96 kB / 68.70 kB gzip).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: v0.28.1 — remote-source MCP + scope hierarchy + whoami

VERSION + package.json: bump to 0.28.1 (per CLAUDE.md branch-scoped
versioning rule — this branch adds substantial new features on top of
v0.28.0).

CHANGELOG.md: new top-level entry for v0.28.1 in the gstack/Garry voice
(no AI vocabulary, no em dashes, real numbers + commands). Lead
paragraph names what the user can now do that they couldn't before.
"Numbers that matter" table calls out the +5 MCP ops, +2 OAuth scopes,
and the 4-to-0 SSH-step number for gstack /setup-gbrain Path 4. "What
this means for you" closer ties the work to the operator workflow shift.
"To take advantage of v0.28.1" block has paste-ready upgrade commands
including the admin SPA rebuild step. Itemized changes section
describes the architecture cleanly without exposing scope-string
internals to public attack-surface enumeration (per CLAUDE.md
responsible-disclosure rule).

TODOS.md: file 6 follow-ups under a new "Remote-source MCP follow-ups
(v0.28.1)" section: token rotation, migration introspection in
get_health, Accept-header friendliness, sources rebase-clone for
URL-drift recovery, --filter=blob:none partial-clone option, and the
chunker_version PGLite-schema parity codex caught.

README.md: short subsection under the existing sources CLI listing
that names the new --url flag and what auto-recovery does. Capability
framing (no scope-string enumeration).

llms.txt + llms-full.txt: regenerated via `bun run build:llms` so the
documentation bundle reflects the v0.28.1 entry. The build-llms
generator's drift check passes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(e2e): sources-remote-mcp — full gstack /setup-gbrain Path 4 round-trip

Spins up `gbrain serve --http` against real Postgres with a fake-git binary
in PATH (so `git clone` is exercised end-to-end without network), registers
two OAuth clients (sources_admin + read-only), mints tokens, calls the new
v0.28.1 MCP ops via /mcp, and asserts the gstack /setup-gbrain Path 4 flow
works end to end.

12 tests cover the full lifecycle:
- whoami over HTTP MCP returns transport=oauth + the right scopes
- /.well-known/oauth-authorization-server advertises all 5 scopes
- sources_add: clone fires, INSERT lands, row carries config.remote_url
- sources_status: clone_state=healthy after add
- sources_list: surfaces remote_url for the new source
- SSRF rejection: sources_add with RFC1918 URL fails at parseRemoteUrl gate
- Scope enforcement: read-only token gets insufficient_scope on sources_add
- Read-only token CAN call sources_list (read-scoped op)
- ALLOWED_SCOPES allowlist: CLI register-client rejects bogus scope
- Recovery: rm clone dir + sources_status reports clone_state=missing
- sources_remove: cascades + cleans up the auto-managed clone dir

Subprocess env threading replicates the v0.26.2 bun execSync inheritance
pattern — bun does NOT inherit process.env mutations, so every CLI
subprocess call passes env: { ...process.env } explicitly.

Cleanup contract mirrors test/e2e/serve-http-oauth.test.ts: revoke any
clients we registered, force-kill the server subprocess on SIGTERM
timeout, surface cleanup failures to stderr without throwing so real
test failures aren't masked.

The base table list in helpers.ts (ALL_TABLES) doesn't include sources
or oauth_clients, so this test explicitly truncates them in beforeAll
to avoid Q4 pre-flight collisions on re-run.

Skipped gracefully when DATABASE_URL is unset.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: codex adversarial review — confine remote sources_admin + close SSRF gaps

Pre-ship adversarial review (codex exec) caught five issues. Four ship in
this commit; the fifth (DNS rebinding) is filed as v0.28.x follow-up.

CRITICAL — `sources_admin` tokens over HTTP MCP could plant content at any
host path. The MCP op exposed `path` and `clone_dir` to remote callers; the
op layer trusted them verbatim, then auto-recovery's rm -rf on degraded
state turned that into arbitrary delete primitives. src/core/operations.ts
sources_add handler now drops both fields when ctx.remote !== false. Local
CLI keeps the override (operator trust). Loud logger.warn when a remote
caller tries — visible in the SSE feed without leaking values.

HIGH — Steady-state `git pull --ff-only` bypassed GIT_SSRF_FLAGS entirely.
The legacy helper at src/commands/sync.ts:192 spawned git without the
-c http.followRedirects=false -c protocol.{file,ext}.allow=never
--no-recurse-submodules set that cloneRepo applies. Every recurring sync
was reopening the redirect/submodule/protocol bypass. Routed the call site
at sync.ts:381 through pullRepo from git-remote.ts so initial clone and
ongoing pull share one defensive flag set.

MEDIUM — listSources ignored its `include_archived` flag. The op
advertised the param but the function destructured it as `_opts` and
queried every row. Archived sources' ids, local_paths, and remote_urls
were leaking to read-scoped MCP callers by default. Filter in SQL
(`WHERE archived IS NOT TRUE` unless the flag is set) so archived rows
never reach the wire.

PARTIAL HIGH — IPv6 ULA fc00::/7 and link-local fe80::/10 were not in
the isInternalUrl bypass list. Only ::1/:: and IPv4-mapped IPv6 were
blocked. Added regex-based ULA + link-local rejection to url-safety.ts.

Test coverage:
- test/git-remote.test.ts: 4 new IPv6 cases (ULA fc-prefix + fd-prefix,
  link-local fe80::, public IPv6 still allowed).
- test/sources-mcp.test.ts: 3 new cases pinning the remote/local
  asymmetry (clone_dir override silently ignored over MCP, path nulled,
  local CLI keeps the override).
- test/sources-mcp.test.ts: 2 new cases for include_archived honored.

DNS rebinding (codex finding #3): the current gate is lexical only.
A deliberate attacker who controls a hostname's A/AAAA records can still
resolve to an internal IP. Closing this requires async DNS resolution +
revalidation; filed as v0.28.x follow-up in TODOS.md so the API change
surface (parseRemoteUrl becomes async, every caller updates) lands in
its own PR.

323 tests pass (9 files); 4071 unit tests pass (full suite).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: rebump v0.28.1 → v0.28.2 (master collision)

Caught after PR creation. master is at v0.28.1 already; this branch
forked from garrytan/v0.28-release at v0.28.0 and naively bumped to
v0.28.1 without checking the master queue. CI version-gate would have
rejected at merge time (requires VERSION strictly greater than
master's).

Root cause: I bumped VERSION mechanically during plan implementation
(echo "0.28.1" > VERSION) without consulting the queue-aware allocator
at bin/gstack-next-version. /ship Step 12's idempotency check then
classified state as ALREADY_BUMPED and the workflow's "queue drift"
comparison was the safety net I should have hit — but I skipped it.

Files updated:
- VERSION + package.json: 0.28.1 → 0.28.2
- CHANGELOG.md: header + "To take advantage of v0.28.2" subsection
- README.md: sources --url note version reference
- TODOS.md: 7 follow-up entries' version references
- llms.txt + llms-full.txt: regenerated

PR title rewrite via gstack-pr-title-rewrite.sh handled in a separate
gh pr edit call; CI version-gate now passes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(todos): close longmemeval-publication, file 4 follow-up TODOs

Full 500-question 4-adapter LongMemEval _s benchmark landed at
github.com/garrytan/gbrain-evals#main:ced01f0. gbrain-hybrid 97.60% R@5,
+1.0pt over MemPal raw 96.6%. Replacing the now-stale "needs full run"
TODO with closure + 4 grounded follow-ups:

  1. Timeline-aware retrieval signal for temporal-reasoning questions
     (P2 — closes the only category we lose to MemPal-raw)
  2. Per-question batch consolidation for ~10x cold-cache speedup
     (P3 — makes daily benchmark CI gate practical)
  3. LongMemEval _m split run (P3 — differentiated, not yet published
     by MemPal)
  4. Cheaper-embedding-model recipe (P4 — recall-cost tradeoff curve)

Each TODO has the standard What/Why/Pros/Cons/Context/Depends-on shape per
the gbrain TODOS-format convention.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(llms): regenerate llms-full.txt to match merged CLAUDE.md

CI test/build-llms.test.ts asserts the committed llms.txt/llms-full.txt
are byte-for-byte identical to what scripts/build-llms.ts produces. The
master merge brought in v0.28.9/v0.28.10/v0.28.11 + multimodal embedding
notes that updated CLAUDE.md; the bundle was stale.

No content changes. Pure regeneration via `bun run build:llms`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(changelog): rewrite v0.28.12 entry — lead with the LongMemEval result

Old entry buried the headline ("LongMemEval lands in the box…") under
process detail (hermetic CI test count, 25.9ms p50, schema-table
runtime enumeration). The reader cares what gbrain DOES — not how we
plumbed the harness.

New entry leads with the actual number — 97.60% R@5 on the public
LongMemEval _s split, beating MemPalace raw by 1.0pt — followed by
the per-category win table that proves gbrain ties or beats MemPal in
5 of 6 question types and shows the +7.1pt assistant-voice lift.

Links to the full gbrain-evals report (97.60% headline + full
methodology + reproducible runner) so curious readers can dig deeper.

Two honest findings published in plain text: vector-only is
essentially tied with hybrid at K=5, and query expansion via Haiku is
a clean null result on this dataset. Better to publish the null than
hide it.

Reproduction block updated to match the actual gbrain-evals workflow
(clone + bun install + dataset download + bash batch runner). The
prior "download / run / hand to evaluate_qa.py" block stayed for the
in-tree CLI path.

Regenerated llms-full.txt to keep the build-llms regen-drift guard
green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… what's hot without being asked (garrytan#730)

* v0.29 foundation: emotional_weight column + formula + anomaly stats

Migration v34 adds pages.emotional_weight REAL DEFAULT 0.0 (column-only,
no index — salience query orders by computed score, not raw weight).
Embedded DDL (schema.sql + pglite-schema.ts + schema-embedded.ts)
mirrors the column so fresh installs don't need migration replay.

types.ts gains: PageFilters.sort enum + PAGE_SORT_SQL whitelist (engines
hardcoded ORDER BY updated_at DESC; threading lands in the next commit);
SalienceOpts/SalienceResult, AnomaliesOpts/AnomalyResult,
EmotionalWeightInputRow/EmotionalWeightWriteRow contracts.

cycle/emotional-weight.ts: pure-function score in [0..1] from tags +
takes (anglocentric default seed list; user-overridable via config key
emotional_weight.high_tags). cycle/anomaly.ts: meanStddev + cohort
threshold helpers with zero-stddev fallback (count > mean + 1) so rare
cohorts don't produce NaN sigmas.

Test coverage: migrate v34 structural assertions + 14-case formula
unit + 13-case anomaly stats unit. Codex review fixes baked in:
formula clamped to [0,1]; per-take weight clamped to [0,1] before
averaging; zero-stddev fallback finite, never NaN.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.29 engine: batch emotional-weight methods + listPages sort

BrainEngine adds 4 methods, both engines implement:

- batchLoadEmotionalInputs(slugs?): CTE-shaped read with per-table
  pre-aggregates. A page with N tags + M takes never produces N×M rows
  (codex C4#4) — page_tags + page_takes CTEs aggregate independently,
  then LEFT JOIN to pages.

- setEmotionalWeightBatch(rows): UPDATE FROM unnest($1::text[],
  $2::text[], $3::real[]) composite-keyed on (slug, source_id). Multi-
  source brains can't fan out (codex C4#3) — pages.slug is unique only
  within source_id. Same shape that v0.18 link batches use.

- getRecentSalience: time boundary computed in JS, bound as TIMESTAMPTZ.
  SQL identical across engines (codex C5/D5 — avoids dialect drift on
  $1::interval binding which has zero current uses on PGLite).

- findAnomalies: tag + type cohort baselines via generate_series-
  densified daily-count CTEs (codex C4#6). Sparse-day rare cohorts get
  correct (mean, stddev) instead of biased upward by zero-omission.
  Year cohort deferred to v0.30.

listPages threads the new PageFilters.sort enum through both engines.
Was hardcoded ORDER BY updated_at DESC; now PAGE_SORT_SQL whitelist
maps the 4 enum values to literal SQL fragments — no injection surface.
postgres.js uses sql.unsafe; PGLite splices the fragment directly.

Regression tests (PGLite, no DATABASE_URL needed):

- multi-source-emotional-weight: same slug under two source_ids,
  setEmotionalWeightBatch on one of them, asserts the other survives
  untouched. Direct codex C4#3 guard.

- list-pages-regression (IRON RULE): old call shape (type, tag, limit)
  still returns updated_desc default; new sort=updated_asc reverses;
  sort=created_desc orders by created_at; sort=slug alphabetical;
  unsupported sort enum falls back to default (defense in depth).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.29 cycle: new recompute_emotional_weight phase

Adds a 9th cycle phase between extract and embed. Sees the union of
syncPagesAffected + synthesizeWrittenSlugs for incremental mode (so
synthesize-written pages get their weight computed too — codex C2 caught
that the prior plan threaded only sync). Full mode (no incremental
anchors) walks every page; users hit this path on first upgrade via
gbrain dream --phase recompute_emotional_weight.

Phase orchestrator (cycle/recompute-emotional-weight.ts) is two SQL
round-trips total regardless of brain size:
  1. batchLoadEmotionalInputs(slugs?) → per-page tag/take inputs.
  2. computeEmotionalWeight in memory (pure function).
  3. setEmotionalWeightBatch(rows) → composite-keyed UPDATE FROM unnest.

Empty affectedSlugs short-circuits (no DB read, no write). Dry-run
computes weights and reports the would-write count without touching
the DB. Engine throw bubbles into status:fail with code
RECOMPUTE_EMOTIONAL_WEIGHT_FAIL — cycle continues to the next phase.

Plumbing:
- CyclePhase type adds 'recompute_emotional_weight'.
- ALL_PHASES + NEEDS_LOCK_PHASES include it.
- CycleReport.totals adds pages_emotional_weight_recomputed (additive,
  schema_version stays "1").
- runCycle's totals rollup + status derivation honor the new field.
- synthesize.ts emits writtenSlugs in details so cycle.ts can union
  with syncPagesAffected for incremental backfill.

Tests: 7-case unit (fake-engine), 3-case PGLite e2e (full mode + dry-
run + ALL_PHASES position), 1000-page perf budget (<5s on PGLite).

Codex C2 → A: clean separation. Phase doesn't modify runExtractCore;
runs on its own seam after the existing 8 phases plus synthesize.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.29 ops: get_recent_salience + find_anomalies + get_recent_transcripts

Three new MCP operations + a transcripts library:

- get_recent_salience: pages ranked by emotional + activity salience.
  Subagent-allow-listed. params: days (default 14), limit (default 20,
  capped 100), slugPrefix (renamed from `kind` per codex C4#10 to
  avoid collision with PageKind/TakeKind).

- find_anomalies: cohort-level activity outliers (tag + type).
  Subagent-allow-listed. Year cohort deferred to v0.30.

- get_recent_transcripts: raw .txt transcripts from the dream-cycle
  corpus dirs. LOCAL-ONLY: rejects ctx.remote === true with
  permission_denied (codex C3). NOT in the subagent allow-list — all
  subagent calls run with remote=true, would always reject (footgun if
  visible). Cycle's synthesize phase calls discoverTranscripts
  directly, so subagents that need transcripts go through the library
  function, not the op.

Tool descriptions extracted to src/core/operations-descriptions.ts so
they're pinnable in tests and stable for the Tier-2 LLM routing eval.
Redirects on query/search/list_pages: personal/emotional questions
should reach the new ops, not semantic search. Anti-flattery hint on
query: "Do NOT assume words like crazy, notable, or big mean
impressive — they often mean difficult or emotionally charged."

list_pages gains updated_after (string ISO) and sort enum params,
surfacing the engine threading from the prior commit.

src/core/transcripts.ts: filesystem walk shared by the gated MCP op
and the (commit 5) CLI command. Reuses discoverTranscripts corpus-dir
resolution + isDreamOutput from cycle/transcript-discovery.ts. Trust
gate lives in the op handler, not the library — the library is
trusted by both the gated op and the local CLI.

Allow-list: 11 → 13 (add salience + anomalies; transcripts excluded
per codex C3, with a comment explaining why).

Tests: 21-case description pin (catches accidental edits that change
LLM-facing surface); 11-case transcripts unit covering trust gate,
mtime window, dream-output skip, summary truncation, no corpus_dir;
2-case salience type-contract smoke (full Garry-test fixture in commit
6's e2e suite).

Codex C1: routing-eval fixtures (skills/<x>/routing-eval.jsonl)
deliberately NOT shipped — routing-eval.ts is substring-match on
resolver triggers, not MCP tool routing. Real coverage lands as
test/e2e/salience-llm-routing.test.ts in commit 6.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.29 CLI: gbrain salience / anomalies / transcripts

Three new CLI commands wired into src/cli.ts dispatch + CLI_ONLY set +
help text:

- gbrain salience [--days N] [--limit N] [--kind PREFIX] [--json]
- gbrain anomalies [--since YYYY-MM-DD] [--lookback-days N] [--sigma N] [--json]
- gbrain transcripts recent [--days N] [--full] [--json]

Each command file mirrors src/commands/orphans.ts shape: pure data fn
+ JSON formatter + human formatter. Calls into engine.getRecentSalience
/ findAnomalies (already shipped) and src/core/transcripts.ts.

salience and anomalies show ranked rows with per-cohort
mean/stddev/sigma. transcripts honors `--full` (caps at 100KB/file)
vs default summary (first non-empty line + ~250 chars). All three
emit JSON with --json for agent consumption.

`--kind` is accepted as a slug-prefix shorthand on `gbrain salience`
even though the underlying op param is `slugPrefix` (kept the CLI
flag short; the MCP-facing param uses the more-explicit name to
align with PageKind/TakeKind/slugPrefix vocabulary).

CLI_ONLY set in src/cli.ts gains the three new command names so
they don't get forwarded to MCP-only routing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.29 e2e: Garry-test fixtures + Postgres parity + LLM routing eval

PGLite e2e (no DATABASE_URL needed):

- salience-pglite: the Garry test. 7 wedding-tagged pages updated today
  + 100 background pages backdated across 30 days via raw SQL UPDATE
  (codex C4#7 — engine.putPage stamps updated_at = now(), so seeding
  via the engine alone can't reproduce historical recency windows).
  Asserts wedding pages outrank random-tag noise in the 7-day window;
  slugPrefix filter narrows correctly; days=0 boundary case; limit cap.

- anomalies-pglite: same fixture shape (7 wedding pages today, 100
  background backdated). findAnomalies with sigma=3 returns the
  wedding-tag cohort with sigma_observed > 3 vs near-zero baseline;
  page_slugs sample carries the wedding pages; date with no activity
  returns []; high sigma threshold suppresses borderline cohorts
  (zero-stddev fallback stays finite — no NaN sigma).

Postgres-gated e2e:

- engine-parity-salience: PGLite ↔ Postgres parity for getRecentSalience
  and findAnomalies. Same fixture into both engines; top-result and
  cohort-set match. Closes the v0.22.0-style parity gap for the new
  v0.29 SQL idioms (EXTRACT(EPOCH ...), generate_series, CTE chain).

Tier-2 LLM routing eval (ANTHROPIC_API_KEY-gated):

- salience-llm-routing: calls Claude with v0.29 tool descriptions and
  12 personal-query phrasings ("anything crazy lately", "what's been
  going on with me", etc.). Asserts the chosen tool is in the v0.29
  set, not query() / search(). ~$0.10 per CI run on Haiku. Tests the
  ACTUAL ship criterion — replaces the discarded fake-coverage
  routing-eval.jsonl fixtures (codex C1 → B).

This is the only test that proves the description edits drive routing.
Without it, we'd ship description changes and only learn from
production behavior.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.29.0: ship-prep — VERSION + CHANGELOG + CLAUDE Key Files

VERSION + package.json bump 0.28.0 → 0.29.0.

CHANGELOG.md adds a v0.29.0 release-summary in the GStack/Garry voice
plus the "To take advantage of v0.29.0" block. Headline two-liner:
"The brain tells you what's hot without being asked. Salience +
anomaly detection ship. Search rewards hypotheses; salience surfaces
them." Numbers-that-matter table covers engine surface delta, MCP op
delta, allow-list delta, cycle-phase delta, schema migration, list_pages
param surface, and test count. Itemized changes section lists the
schema migration + new cycle phase + new MCP ops + redirect
descriptions + subagent allow-list rules + new tests + a contributor
note clarifying that routing-eval.ts is not the right surface for
testing MCP tool routing (use the Tier-2 LLM eval pattern instead).

CLAUDE.md Key Files updated for the v0.29 surface:

- src/core/engine.ts: notes the 4 new methods + PageFilters.sort threading.
- src/core/migrate.ts: v34 (pages_emotional_weight) entry.
- src/core/cycle.ts: 8 → 9 phases, recompute_emotional_weight inserted
  between patterns and embed; totals.pages_emotional_weight_recomputed.
- src/core/cycle/emotional-weight.ts (NEW): formula + override path.
- src/core/cycle/anomaly.ts (NEW): stats helpers + zero-stddev fallback.
- src/core/cycle/recompute-emotional-weight.ts (NEW): phase orchestrator.
- src/core/transcripts.ts (NEW): library shared by gated MCP op + CLI.
- src/core/operations-descriptions.ts (NEW): pinned tool descriptions.
- src/core/minions/tools/brain-allowlist.ts: 11 → 13 entries; comment
  on why get_recent_transcripts is excluded.
- src/commands/salience.ts / anomalies.ts / transcripts.ts (NEW): CLI surface.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.29.1 feat: recency + salience as two orthogonal options on query op (garrytan#696)

* feat: recency boost for search (v0.27.0) — temporal intent auto-detection, date filters, configurable decay

New search pipeline stage: keyword + vector → RRF → cosine re-score → backlink boost → recency boost → dedup

- applyRecencyBoost: hyperbolic decay, two strengths (moderate 30-day halflife, aggressive 7-day halflife)
- Auto-enabled when intent.ts detects temporal/event queries (detail='high')
- Manual override via SearchOpts.recencyBoost (0/1/2)
- Date filtering: afterDate/beforeDate on all three search paths (keyword, keywordChunks, vector)
- getPageTimestamps on both Postgres and PGLite engines
- 15 tests passing (boost math + intent classification)

* v0.29.1 schema: pages.{effective_date, effective_date_source, import_filename, salience_touched_at} + expression index

Migration v38 adds 4 nullable columns to pages and an expression index on
COALESCE(effective_date, updated_at) to support the new since/until date
filters. All additive — no behavior change in the default search path; only
consulted when callers opt into the new salience='on' / recency='on' axes
or pass since/until.

  effective_date         — content date (event_date / date / published /
                           filename-date / fallback). Read by recency boost
                           and date-filter paths only. Auto-link doesn't
                           touch it (immune to updated_at churn).
  effective_date_source  — sentinel for the doctor's effective_date_health
                           check ('event_date' | 'date' | 'published' |
                           'filename' | 'fallback').
  import_filename        — basename without extension, captured at import.
                           Used for filename-date precedence on daily/,
                           meetings/. Older rows leave it NULL.
  salience_touched_at    — bumped by recompute_emotional_weight when
                           emotional_weight changes. Salience window uses
                           GREATEST(updated_at, salience_touched_at) so
                           newly-salient old pages enter the recent salience
                           query.

Index strategy: a partial index on effective_date alone wouldn't help the
COALESCE expression in since/until filters (planner can't use it for the
negative side). The expression index ((COALESCE(effective_date, updated_at)))
is what actually accelerates the filter.

Postgres uses CONCURRENTLY + v14-style pg_index.indisvalid pre-drop guard
for prior failed CONCURRENTLY runs; PGLite uses plain CREATE INDEX. Mirror
of v34's pattern.

src/schema.sql + src/core/pglite-schema.ts updated for fresh installs;
src/core/schema-embedded.ts regenerated via bun run build:schema.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.29.1: computeEffectiveDate helper + putPage integration

Pure helper computing a page's effective_date from frontmatter precedence:
  1. event_date (meeting/event pages)
  2. date (dated essays)
  3. published (writing/)
  4. filename-date (leading YYYY-MM-DD in basename)
  5. updated_at (fallback)
  6. created_at (last resort)

Per-prefix override: for daily/ and meetings/ slugs, filename-date jumps
to position 1 — the filename is the user's primary signal there.

Returns {date, source}. The source label powers the doctor's
effective_date_health check to detect "fell back to updated_at" rows that
look populated but are functionally a NULL.

Range validation: parsed value must be in [1990-01-01, NOW + 1 year].
Out-of-range values drop to the next chain element.

Wired into importFromContent + importFromFile. The put_page MCP op derives
filename from slug-tail when no caller-supplied filename is available.

putPage SQL on both engines extended to write the new columns. ON CONFLICT
uses COALESCE(EXCLUDED.x, pages.x) so callers that don't know about the
new columns (auto-link, code reindex) preserve existing values rather than
blanking them. SELECT projection extended to return them; rowToPage threads
them through.

21 unit tests covering: precedence chain default order, per-prefix override,
parse failure fall-through, range validation [1990, NOW+1y], parseDateLoose
shape variants. All pass; typecheck clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.29.1: backfill orchestrator + library function for existing pages

src/core/backfill-effective-date.ts is the shared library function. Walks
pages in keyset-paginated batches (id > last_id ORDER BY id LIMIT 1000),
runs computeEffectiveDate per row, UPDATEs effective_date +
effective_date_source. Resumable via the `backfill.effective_date.last_id`
checkpoint key in the config table — a killed process can re-run and pick
up without re-doing rows. Idempotent: a full re-walk produces the same
writes.

Postgres-only: SET LOCAL statement_timeout = '600s' per batch. Doesn't
refuse the migration on low session settings (codex pass-2 garrytan#16).

src/commands/migrations/v0_29_1.ts is the orchestrator (4 phases mirroring
v0_12_2). Phase A schema (gbrain init --migrate-only), Phase B backfill
(via the library function), Phase C verify (count NULL effective_date),
Phase D record (handled by runner). The library function is reusable from
the gbrain reindex-frontmatter CLI command in the next commit.

import_filename stays NULL for backfilled rows — pre-v0.29.1 imports
didn't capture it. computeEffectiveDate uses the slug-tail when filename
is NULL; daily/2024-03-15 backfilled gets effective_date from the slug.

Registered in src/commands/migrations/index.ts.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.29.1: gbrain reindex-frontmatter CLI command

Recovery / explicit-rebuild path for pages.effective_date. Used when:
  - User edited frontmatter dates after import
  - Post-upgrade backfill orchestrator finished but the user wants to
    re-walk a subset (e.g. just meetings/) after fixing some frontmatter
  - Precedence rules change between releases

Thin wrapper over backfillEffectiveDate from commit 3 — same code path
the v0_29_1 orchestrator uses; one source of truth.

Flags mirror reindex-code:
  --source <id>      Scope to one sources row (placeholder; library
                     library doesn't filter by source today, tracked v0.30+)
  --slug-prefix P    Scope to slugs starting with P (e.g. 'meetings/')
  --dry-run          Print what WOULD change, no DB writes
  --yes              Skip confirmation prompt (required for non-TTY non-JSON)
  --json             Machine-readable result envelope
  --force            Re-apply even when computed value matches existing

Wired into src/cli.ts. CLI handles its own engine lifecycle (creates +
disconnects).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.29.1: recency-decay map + buildRecencyComponentSql (pure, unused)

src/core/search/recency-decay.ts mirrors source-boost.ts in shape but
drives RECENCY ONLY (per D9 codex resolution). Salience is a separate
orthogonal axis; this map does not feed it.

DEFAULT_RECENCY_DECAY: 10 generic prefixes (no fork-specific names).
  - concepts/      evergreen (halflifeDays=0)
  - originals/     180d × 0.5 (long-tail decay; new essays nudged)
  - writing/       365d × 0.4
  - daily/         14d × 1.5  (aggressive — freshness IS the signal)
  - meetings/      60d × 1.0
  - chat/          7d × 1.0
  - media/x/       7d × 1.5
  - media/articles/ 90d × 0.5
  - people/companies/ 365d × 0.3
  - deals/         180d × 0.5

DEFAULT_FALLBACK: 90d × 0.5 for unmatched slugs.

Override priority: defaults < gbrain.yml recency: < env (GBRAIN_RECENCY_DECAY)
< per-call SearchOpts.recency_decay.

parseRecencyDecayEnv format: comma-separated prefix:halflifeDays:coefficient
triples. Refuses LOUD on parse error (RecencyDecayParseError) — codex
pass-2 #M3 finding. No silent fallback like source-boost's parser.

parseRecencyDecayYaml takes already-parsed YAML; throws on bad shape.

buildRecencyComponentSql in sql-ranking.ts emits a CASE expression with
longest-prefix-first ordering, evergreen short-circuit (literal 0 when
halflifeDays=0 or coefficient=0), and EXTRACT(EPOCH ...) for non-zero
branches. Output: ((CASE WHEN p.slug LIKE 'daily/%' THEN 1.5 * 14.0 /
(14.0 + EXTRACT(EPOCH FROM (NOW() - <dateExpr>))/86400.0) ... END))

Typed NowExpr enum prevents SQL injection (codex pass-1 #5). Tests pass
{ kind: 'fixed', isoUtc } for deterministic output; production NOW().
The 'fixed' branch escapes single quotes via escapeSqlLiteral.

25 unit tests covering: env parser shape, env error cases, yaml parser
shape, merge precedence (defaults < yaml < env < caller), CASE longest-
prefix-first ordering, evergreen short-circuit, NowExpr fixed/now,
single-quote injection defense, empty decayMap fallback path, default
map composition (no fork names, concepts/ evergreen, daily/ aggressive).

Pure module. Zero consumers in this commit; commit 6 wires it into
getRecentSalience, commit 10 wires it into the post-fusion stage.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.29.1: refactor getRecentSalience to consume buildRecencyComponentSql

Both engines (Postgres + PGLite) now build the salience formula's third
term via buildRecencyComponentSql instead of inlining 1.0 / (1 + days_old).
Parameters: empty decayMap + fallback { halflifeDays: 1, coefficient: 1.0 }.
Math expands to 1 * 1.0 / (1.0 + days_old) = 1 / (1 + days_old) — same
numeric output as v0.29.0.

This is a no-behavior-change refactor preparing for commit 7's recency_bias
param. recency_bias='flat' (default) reproduces v0.29.0 exactly; 'on'
swaps in DEFAULT_RECENCY_DECAY for per-prefix decay.

Single source of truth for the recency math: same builder feeds the
salience query AND (in commit 10) the post-fusion applyRecencyBoost stage.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.29.1: get_recent_salience gains recency_bias param (default 'flat')

SalienceOpts.recency_bias: 'flat' | 'on' added; default 'flat' preserves
v0.29.0 ranking verbatim. Pass 'on' to opt into per-prefix decay map
(concepts/originals/writing/ evergreen; daily/, media/x/, chat/ aggressive
decay).

When recency_bias='on', the salience query reads
COALESCE(p.effective_date, p.updated_at) instead of bare p.updated_at, so
the recency component is immune to auto-link updated_at churn — old
concepts/ pages just-touched by auto-link don't suddenly look fresh.

Both engines (Postgres + PGLite) wire the param through. resolveRecencyDecayMap()
honors gbrain.yml + GBRAIN_RECENCY_DECAY env at runtime.

MCP op surface: get_recent_salience gains the param with a load-bearing
description teaching the agent when to use 'on' vs 'flat' (current state →
on; mattering across all time → flat).

No silent v0.29.0 behavior change — opt-in only (per D11 codex resolution).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.29.1: recompute_emotional_weight writes salience_touched_at; window picks up newly-salient pages

setEmotionalWeightBatch on both engines now bumps salience_touched_at to
NOW() ONLY when the new emotional_weight differs from the existing one
(IS DISTINCT FROM, NULL-safe). No-op writes (same weight) leave the
column alone — preserves "actual change" semantics.

getRecentSalience window changes from
  WHERE p.updated_at >= boundary
to
  WHERE GREATEST(p.updated_at, COALESCE(p.salience_touched_at, p.updated_at)) >= boundary

Closes codex pass-1 finding #4: pages whose emotional_weight just changed
in the dream cycle (because tags or takes shifted) but whose updated_at
is older than the salience window now correctly enter the recent-salience
results. Without this, "Garry just added a take to a 6-month-old page"
stayed invisible to get_recent_salience until the next content edit.

COALESCE(salience_touched_at, p.updated_at) handles pre-v0.29.1 rows
where salience_touched_at is NULL — they fall back to p.updated_at and
behave identically to v0.29.0.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.29.1: merge intent.ts → query-intent.ts; emit 3 suggestions per query

D1 + D4 + D6 + D8: single regex-pass classifier returning
{intent, suggestedDetail, suggestedSalience, suggestedRecency}.

intent + suggestedDetail are v0.29.0 behavior verbatim (legacy intent.ts
deleted; classifyQueryIntent + autoDetectDetail compat shims preserved).

NEW for v0.29.1 — two orthogonal recency-axis suggestions:

  suggestedSalience: 'off' | 'on' | 'strong'
  suggestedRecency:  'off' | 'on' | 'strong'

Resolution rules (per D6 narrow temporal-bound exception):
  - CANONICAL patterns (who is X / what is Y / code / graph) → both off
  - UNLESS an EXPLICIT_TEMPORAL_BOUND also matches (today / right now /
    this week / since X / last N days), in which case temporal-bound wins
  - STRONG_RECENCY (today / right now / this morning / just now) → strong
  - RECENCY_ON (latest / recent / this week / meeting prep / catch up
    / remind me / status update) → on
  - SALIENCE_ON (catch up / remind me / status update / prep me /
    what's going on / what matters) → on
  - default → off for both axes (v0.29.1 prime-directive: pure opt-in)

Salience and recency are TRULY orthogonal (per D9). A query like
"latest news on AI" → recency='on' but salience='off' (the user wants
fresh, not emotionally-weighted). "What's going on with widget-co" →
both on. "Who is X right now" → both 'strong'/'on' (temporal bound
beats canonical 'who is').

intent.ts deleted; test/intent.test.ts renamed → test/query-intent-legacy.test.ts
(unchanged behavior coverage). New test/query-intent.test.ts adds 21
cases covering all three axes' interactions: canonical wins on bare
'who is', temporal bound overrides, "catch me up" matches with up to 15
chars between, "today" → strong, intent vs recency independence.

Updated callers:
  - src/core/search/hybrid.ts (autoDetectDetail import)
  - test/recency-boost.test.ts (classifyQueryIntent import)
  - test/benchmark-search-quality.ts (autoDetectDetail import)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.29.1: applySalienceBoost + applyRecencyBoost + runPostFusionStages wrapper

D9 + codex pass-1 #2 + #3 + pass-2 #4: salience and recency are TRULY
ORTHOGONAL post-fusion stages, both running from ALL THREE hybridSearch
return paths (keyword-only, embed-failure-fallback, full-hybrid).

NEW src/core/search/hybrid.ts exports:
  - applySalienceBoost(results, scores, strength)
      score *= 1 + k * log(1 + score) where k = 0.15 (on) or 0.30 (strong)
      No time component. Pure mattering signal.
  - applyRecencyBoost(results, dates, strength, decayMap, fallback, nowMs?)
      Per-prefix decay factor: 1 + strengthMul * coefficient * halflife / (halflife + days_old)
      strengthMul: 1.0 (on) or 1.5 (strong)
      Evergreen prefixes (halflifeDays=0) skipped (factor 1.0).
      Pure recency signal. Independent of mattering.
  - runPostFusionStages(engine, results, opts)
      Wraps backlink + salience + recency. Called from EACH return path so
      keyless installs and embed failures get the same boost surface as
      the full hybrid path.

NEW engine methods (composite-keyed for multi-source isolation):
  - getEffectiveDates(refs: Array<{slug, source_id}>): Map<key, Date>
      Returns COALESCE(effective_date, updated_at, created_at). Key format:
      `${source_id}::${slug}`. Mirror of getBacklinkCounts shape.
  - getSalienceScores(refs: Array<{slug, source_id}>): Map<key, number>
      Returns emotional_weight × 5 + ln(1 + take_count). Composite key.

Deprecated (kept for back-compat through v0.29.x):
  - SearchOpts.afterDate / beforeDate (alias for since/until)
  - SearchOpts.recencyBoost: 0|1|2 (alias for recency: 'off'|'on'|'strong')
  - getPageTimestamps (use getEffectiveDates instead)

NEW SearchOpts fields:
  - salience: 'off' | 'on' | 'strong'
  - recency:  'off' | 'on' | 'strong'
  - since:    string (ISO-8601 or relative, replaces afterDate)
  - until:    string (replaces beforeDate)

Resolution: caller-explicit > legacy alias (recencyBoost) > heuristic
(classifyQuery's suggestedSalience / suggestedRecency).

Deleted: src/core/search/recency.ts (PR garrytan#618's, replaced) +
test/recency-boost.test.ts (its scope is replaced by query-intent.test.ts +
future post-fusion tests).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Wintermute <wintermute@garrytan.com>

* v0.29.1: query op gains salience + recency + since + until params; PGLite since/until parity

Combines commits 12 + 13 of the plan.

Query op surface (src/core/operations.ts):
  - salience: 'off' | 'on' | 'strong' (with load-bearing description)
  - recency:  'off' | 'on' | 'strong'
  - since:    string (ISO-8601 or relative; replaces deprecated afterDate)
  - until:    string (replaces deprecated beforeDate)

Tool descriptions teach the calling agent:
  - salience axis = mattering, no time component
  - recency axis = age decay, no mattering signal
  - omit either to let gbrain auto-detect from query text via classifyQuery

hybrid.ts maps since/until → afterDate/beforeDate at the engine call
boundary so PR garrytan#618's existing engine plumbing keeps working without
rename. Codex pass-1 garrytan#10 finding closed.

PGLite engine (codex pass-1 garrytan#10): since/until parity added to all three
search methods (searchKeyword, searchKeywordChunks, searchVector). SQL
filter against COALESCE(p.effective_date, p.updated_at, p.created_at)
so date filtering matches user content-date intent (a meeting was on
event_date, not when it got reimported). Filter is applied INSIDE the
HNSW inner CTE in searchVector so HNSW's candidate pool already
excludes out-of-range pages — preserves pagination contract.

This also closes existing cross-engine drift: pre-v0.29.1 Postgres had
afterDate/beforeDate from PR garrytan#618; PGLite had nothing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.29.1: migration v39 — eval_candidates capture columns for replay reproducibility

D11 codex pass-2 resolution: extend eval_candidates with 7 new nullable
columns so `gbrain eval replay` can reproduce captured runs of agent-explicit
salience + recency choices.

Without these columns, replays of the new axis params drift. The live
behavior depends on the resolved {salience, recency} values; v0.29.0's
schema doesn't capture them.

  as_of_ts            TIMESTAMPTZ  — brain's logical NOW at capture
                                     (replay uses this instead of wall-clock)
  salience_param      TEXT         — what the caller passed (NULL if omitted)
  recency_param       TEXT         — same
  salience_resolved   TEXT         — final value applied
  recency_resolved    TEXT         — same
  salience_source     TEXT         — 'caller' or 'auto_heuristic'
  recency_source      TEXT         — same

All nullable + additive. Pre-v0.29.1 rows stay valid. NDJSON
schema_version STAYS at 1 — consumers ignore unknown fields (codex
pass-1 #C2 dissolves; no cross-repo coordination needed).

ADD COLUMN with no DEFAULT is metadata-only on PG 11+ and PGLite —
instant on tables of any size.

src/schema.sql + src/core/pglite-schema.ts mirror the additions for fresh
installs; src/core/schema-embedded.ts regenerated. eval_capture.ts
populates the new fields in commit 16 (docs + ship).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.29.1: doctor checks — effective_date_health + salience_health

effective_date_health: sample-1000 scan detects three classes of
problems (codex pass-1 #5 resolution via the effective_date_source
sentinel column added in commit 1):

  fallback_with_fm_date  — page fell back to updated_at even though
                           frontmatter has parseable event_date / date /
                           published. The "wrong but populated" residual
                           that earlier review iterations missed.
  future_dated            — effective_date > NOW() + 1 year (corrupt
                            or typo'd century).
  pre_1990                — effective_date < 1990-01-01 (epoch math gone
                            wrong, bad parse).

Sample of last 1000 pages by default — fast on 200K-page brains. Fix
hint: gbrain reindex-frontmatter.

salience_health: detects pages with active takes whose emotional_weight
is still 0 (recompute_emotional_weight phase hasn't run since the
take landed). Reports the brain's non-zero emotional_weight count as
an informational baseline. Fix hint: gbrain dream --phase
recompute_emotional_weight.

Both checks gracefully skip on pre-v0.29.1 brains (column doesn't
exist → 42703) without surfacing as warnings.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.29.1: docs + skills convention + CHANGELOG + version bump

- VERSION 0.29.0 → 0.29.1
- package.json version bump
- CHANGELOG.md: full release-summary + itemized + "To take advantage"
  block per the project's voice rules. Two-line headline + concrete
  pathology framing (existing callers unchanged; new axes opt-in;
  agent in charge per the prime directive).
- skills/conventions/salience-and-recency.md: agent-readable decision
  rules. "Current state → on. Canonical truth → off." plus the narrow
  temporal-bound exception. Cross-cutting convention propagates to
  brain skills via RESOLVER.md.
- skills/migrations/v0.29.1.md: agent-readable upgrade instructions.
  Verify steps + behavior-change reference + recovery commands.

The build-time tool-description generator from D2 (extract decision
tables from skills/conventions/salience-and-recency.md, embed into
operations.ts at build time) is deferred to a follow-up commit. The
tool descriptions on the query op + get_recent_salience are inline in
operations.ts for v0.29.1; the auto-gen + CI staleness gate land in
v0.29.2 if drift becomes a problem in practice.

148 unit tests pass across the v0.29.1 surface (effective-date,
recency-decay, query-intent, migrate, salience, recompute-emotional-weight).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Wintermute <wintermute@garrytan.com>

---------

Co-authored-by: Wintermute <wintermute@garrytan.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.29 master-rebase fixups: renumber + drift cleanup

- v0.29.1 migrations renumber v38/v39 → v41/v42 (master shipped takes_table at
  v37 + access_tokens_permissions at v38; v0.27.1 took v39). My v0.29.0
  emotional_weight slots in at v40; v0.29.1's pages_recency_columns lands at
  v41 and eval_candidates_recency_capture at v42.
- src/core/utils.ts comment refs updated v37 → v40 (emotional_weight) and
  v38 → v41 (effective_date/etc).
- test/brain-allowlist.test.ts: size assertion 11 → 13 + the new
  get_recent_salience / find_anomalies positive checks + the explicit
  get_recent_transcripts negative check (v0.29 added the salience pair to
  the allow-list; transcripts are deliberately excluded because all
  subagent calls have remote=true and the v0.29 trust gate rejects them —
  visibility would be a footgun).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.29 CI fixups: privacy allow-list + cycle phase count + migration plan

Three CI test failures on PR garrytan#730, all caused by master-side state the
v0.29 cherry-picks didn't yet account for:

1. scripts/check-privacy.sh allow-lists test/recency-decay.test.ts
   The v0.29.1 recency-decay test asserts that DEFAULT_RECENCY_DECAY's
   keys do NOT include fork-specific path prefixes. Because the assertion
   has to name the banned tokens to assert their absence, the privacy
   guard flagged the literal occurrence. Same exception class as
   CHANGELOG.md, CLAUDE.md, and scripts/check-privacy.sh itself —
   meta-rule enforcement requires mentioning what the rule forbids.

2. test/core/cycle.serial.test.ts: 9 → 10 phases.
   The yieldBetweenPhases test was written for v0.26.5 (9 phases incl.
   purge). v0.29 added a 10th phase (recompute_emotional_weight)
   between patterns and embed; the test's expected hookCalls and
   report.phases.length needed bumping.

3. test/apply-migrations.test.ts: append '0.29.1' to skippedFuture lists.
   v0.29.1 added a new entry to src/commands/migrations/index.ts; the
   buildPlan test snapshots the exact ordered list of versions, so it
   needs the new entry in both the fresh-install case and the Codex H9
   regression case.

All three verified locally:
  - bash scripts/check-privacy.sh → exit 0
  - bun test test/apply-migrations.test.ts → 18/18 pass
  - bun test test/core/cycle.serial.test.ts → 28/28 pass

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.29 CI fixup: regenerate llms-full.txt to match CLAUDE.md state

build-llms test asserts the committed llms.txt + llms-full.txt match
what the generator produces from the current source tree. CLAUDE.md
got new v0.29 Key Files entries (recompute_emotional_weight phase,
emotional-weight formula, anomaly stats, transcripts library, salience
ops, etc.) without a corresponding regen. `bun run build:llms` brings
llms-full.txt back in sync; llms.txt is byte-for-byte identical so
only the larger inline bundle changed.

Verified locally: bun test test/build-llms.test.ts → 7/7 pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.29 e2e: cover tool-surfaces + MCP dispatch path

Two gaps were uncovered when reviewing v0.29 coverage against the new
contracts the cherry-picks landed onto master.

1. test/v0_29-tool-surfaces.test.ts (unit, 9 cases)

   Existing tests pin the description constants module and the
   BRAIN_TOOL_ALLOWLIST set membership, but nothing checked the two
   filters that ACT on those constants:

   - serve-http.ts:745 filters operations by !op.localOnly to build the
     HTTP MCP tool list. Without a test, anyone removing `localOnly: true`
     from get_recent_transcripts would silently expose it to remote
     callers — defense-in-depth on top of the in-handler ctx.remote check
     would be the only guard. Now pinned: get_recent_transcripts is
     hidden, salience + anomalies stay visible.

   - buildBrainTools surfaces the v0.29 ops as `brain_get_recent_salience`
     and `brain_find_anomalies`, and EXCLUDES `brain_get_recent_transcripts`
     (codex C3 footgun gate — all subagent calls are remote=true, the op
     would always reject). Now pinned.

   Both filters are pure functions; no DB / engine.connect needed.

2. test/e2e/v0_29-mcp-dispatch-pglite.test.ts (e2e, 5 cases)

   Existing v0.29 e2e tests call engine methods directly. None went
   through the full dispatchToolCall pipeline that stdio MCP and HTTP
   MCP both use. The new file covers:

   - get_recent_salience returns ranked rows via dispatch (top result
     is the wedding-tagged page from the seeded fixture).
   - find_anomalies returns the AnomalyResult shape via dispatch.
   - get_recent_transcripts rejects with permission_denied when
     ctx.remote === true (the in-handler trust gate is the last line if
     localOnly ever drops).
   - get_recent_transcripts succeeds with ctx.remote === false (CLI
     path) and returns [] when no corpus dir is configured.
   - Unknown tool name returns the standard isError + "Unknown tool"
     envelope (regression guard for dispatch shape).

Verified locally — all 14 cases pass:
  bun test test/v0_29-tool-surfaces.test.ts                          → 9 pass
  bun test test/e2e/v0_29-mcp-dispatch-pglite.test.ts                → 5 pass

Re-ran the full v0.29 PGLite e2e suite to confirm no regressions:
  salience-pglite.test.ts                       5 pass
  anomalies-pglite.test.ts                      4 pass
  cycle-recompute-emotional-weight-pglite.test  3 pass
  list-pages-regression.test.ts                 6 pass
  multi-source-emotional-weight-pglite.test     4 pass
  backfill-perf-pglite.test.ts                  1 pass
  v0_29-mcp-dispatch-pglite.test.ts             5 pass
  -----
  Total: 28 pass / 0 fail
  Postgres parity test (DATABASE_URL gated)     7 skip (correct)
  LLM routing eval (ANTHROPIC_API_KEY gated)   12 skip (correct)
  bun run typecheck                             clean

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.29 CI fixup: drop unused PGLiteEngine in tool-surfaces test

scripts/check-test-isolation.sh's R3 + R4 lints flagged the new
test/v0_29-tool-surfaces.test.ts for instantiating PGLiteEngine outside
a beforeAll() block (R3) and lacking the matching afterAll(disconnect)
(R4). The intent of those rules is to prevent engine leaks across the
shard process — every PGLiteEngine must follow the canonical
beforeAll(connect+initSchema) / afterAll(disconnect) pattern.

The fix here is upstream of the rule, not a workaround: this test never
needed an engine. buildBrainTools doesn't issue any SQL at registry-build
time — it only reads `engine.kind` for the put_page namespace-wrap
branch. A `{ kind: 'pglite' } as unknown as BrainEngine` fake-engine
literal keeps the test pure-function: no WASM cold-start, no connect
lifecycle, no test-isolation rule fired.

Verified locally:
  bash scripts/check-test-isolation.sh → OK (257 non-serial unit files)
  bun test test/v0_29-tool-surfaces.test.ts → 9 pass
  bun run typecheck → clean

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Wintermute <wintermute@garrytan.com>
…e ping/doctor + topologies) (garrytan#732)

* feat(config): add remote_mcp field + isThinClient() helper

Adds a top-level optional remote_mcp config block to GBrainConfig
(issuer_url, mcp_url, oauth_client_id, oauth_client_secret) for
thin-client installs that consume a remote `gbrain serve --http` over
MCP instead of running a local engine.

isThinClient(config) returns true when remote_mcp is set; used by the
CLI dispatch guard, doctor branch, and init re-run guard. The engine
field stays as today (postgres|pglite); thin-client mode is a separate
config field, NOT an engine kind extension (codex outside-voice review
flagged the engine='remote' extension as overreach).

GBRAIN_REMOTE_CLIENT_SECRET env var overrides the config-file value at
load time so the secret can stay out of disk for headless agents.

Foundation commit for multi-topology v1; no behavior change yet.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(probe): outbound OAuth + MCP smoke probes

Adds three pure async functions over the standard fetch API:
  - discoverOAuth(issuerUrl): GET /.well-known/oauth-authorization-server
  - mintClientCredentialsToken(tokenEndpoint, id, secret): POST /token
  - smokeTestMcp(mcpUrl, accessToken): POST /mcp initialize

Discriminated 'ok=true' / 'ok=false + reason' return shapes so callers
render error messages consistently. No SDK dependency to keep init's
setup-flow scope tight; Lane B's mcp-client.ts will pull in the
official @modelcontextprotocol/sdk Client for full session semantics.

Used by both 'gbrain init --mcp-only' (Lane A's setup smoke) and
runRemoteDoctor (Lane A's thin-client doctor checks).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(init): --mcp-only branch + re-run guard

Adds 'gbrain init --mcp-only' for thin-client setup. Required flags
(or env vars):
  --issuer-url     OAuth root (e.g. https://host:3001)
  --mcp-url        MCP tool dispatch path (e.g. https://host:3001/mcp)
  --oauth-client-id, --oauth-client-secret

Pre-flight runs three smoke probes (discovery, token round-trip, MCP
initialize) BEFORE writing the config — fail-fast on bad URL beats
fail-late on bad credentials. On success, writes ~/.gbrain/config.json
with remote_mcp set and NO local DB created.

Re-run guard (A8): when ~/.gbrain/config.json already has remote_mcp,
'gbrain init' (any flag set) refuses without --force. Catches the
scripted-setup-loop friction from the user-reported scenario where
re-running setup-gbrain on a thin-client machine kept trying to
re-create a local DB.

Two URLs in config (issuer + mcp) instead of one because OAuth
discovery + /token live at the issuer root while tool dispatch is at
/mcp — they compose from a common base in practice but reverse-proxy
setups need them explicit (codex review #2).

Tests: 15 cases covering happy path, env-var-supplied secret stays
out of disk, all four required-flag missing-error paths, three
smoke-failure paths, network-unreachable path, and the four re-run
guard variants (default/--pglite/--mcp-only without --force / with
--force). Uses async Bun.spawn (NOT execFileSync) — sync exec
deadlocks against in-process HTTP fixtures because the parent's
event loop can't accept connections while sync-blocked on a child.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(doctor): runRemoteDoctor for thin-client mode

Replaces every DB-bound check from runDoctor() with a tighter set
scoped to 'is the remote MCP we configured actually reachable?'.
Five checks:
  - config_integrity (URL fields well-formed)
  - oauth_credentials (secret resolvable from env or config file)
  - oauth_discovery (GET /.well-known/oauth-authorization-server)
  - oauth_token (POST /token client_credentials)
  - mcp_smoke (POST /mcp initialize)

Output shape matches the local doctor's Check surface so JSON
consumers can union the two without conditional logic. schema_version
is 2 (matches local doctor).

collectRemoteDoctorReport() is the pure data collector;
runRemoteDoctor() is the print/exit wrapper. Tests pin the data
collector so we don't have to intercept stdout / process.exit.

Tests: 12 cases over a tiny in-process HTTP fixture covering happy
path, every probe failure mode (404/parse/auth/network/server-error),
malformed-URL config integrity, missing-secret short-circuit, and
the env-var-overrides-config-file secret resolution. withEnv() helper
used for env mutations to satisfy the test-isolation lint.

Module is added but not yet wired into the CLI doctor branch; the
wiring lands in the next commit (cli dispatch guard + doctor routing).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(cli): thin-client dispatch guard + doctor routing

Adds a single canonical refusal at the top of handleCliOnly() for the
9 DB-bound commands when ~/.gbrain/config.json has remote_mcp set:
  sync, embed, extract, migrate, apply-migrations, repair-jsonb,
  orphans, integrity, serve

Single dispatch check (not 9 sprinkled assertLocalEngine calls per
codex review #1) — avoids the blast radius of letting commands enter
connectEngine before the check fires. Refused commands exit 1 with a
canonical error naming the remote mcp_url.

doctor branch routes to runRemoteDoctor when isThinClient(config)
returns true; falls through to the existing local-doctor flow
otherwise. Wires the module added in the previous commit into the
user-facing CLI surface.

Safe commands (init, auth, --version, --help, etc.) still work in
thin-client mode and are NOT in the refused set.

Tests: 14 cases — 9 refused commands × 1 each, 2 safe commands, 1
doctor-routing assertion (fingerprints the thin-client output by
'mode:"thin-client"' in JSON), 2 regression tests asserting local
config still passes through normally.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(topologies): multi-topology architecture guide + setup skill Phase A.5

New docs/architecture/topologies.md covering three deployment shapes:
  1. Single brain (today's default)
  2. Cross-machine thin client (consume a remote brain over MCP)
  3. Split-engine per-worktree (Conductor users with per-worktree
     code engines + shared remote artifacts brain)

Each topology gets an ASCII diagram, when-it-fits guidance, and
concrete setup recipes. Topology 3's alias-level routing footgun
(wrong alias = silent wrong-brain writes) is called out explicitly
per codex review garrytan#6.

Topology 3 needs zero gbrain code changes — GBRAIN_HOME already
overrides ~/.gbrain and 'gbrain serve --http --port N' already runs
on any port. gstack composes these primitives on its side.

skills/setup/SKILL.md gets Phase A.5 BEFORE the local-engine phases.
Asks the user which topology fits, walks thin-client setup through
'gbrain init --mcp-only', skips Phases B/C/C.5/H entirely for thin
clients (host's autopilot handles sync/extract/embed).

README.md gets a one-line link to the topology doc from the
Architecture section.

llms-full.txt regenerated to include the new doc.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(e2e): thin-client end-to-end skeleton

Spins up 'gbrain serve --http' against real Postgres, registers a
client with read,write,admin scope, runs 'gbrain init --mcp-only'
from a separate tempdir GBRAIN_HOME, exercises the canonical
thin-client flows:

  - init --mcp-only succeeds against the live host
  - doctor reports mode: thin-client + all checks green
  - sync is refused with the canonical thin-client error
  - re-running init refuses without --force

Tier B flows (gbrain remote ping / doctor) will be added alongside
their Lane B implementation. Skips when DATABASE_URL unset (matches
the e2e gate convention used across the suite).

Async Bun.spawn (NOT execFileSync) so the test event loop stays
responsive — execFileSync deadlocks against in-process HTTP fixtures
because the parent's event loop can't accept connections while
sync-blocked on a child process.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(doctor): doctorReportRemote core for thin-client + run_doctor op

Adds three new exports to src/commands/doctor.ts that the run_doctor MCP
op + gbrain remote doctor CLI both consume:

  - DoctorReport interface       schema_version=2 stable shape
  - computeDoctorReport(checks)  status + health_score math
  - doctorReportRemote(engine)   focused 5-check thin-client surface

doctorReportRemote runs:
  1. connection      (engine reachable + page count via getStats)
  2. schema_version  (engine.getConfig('version') vs LATEST_VERSION)
  3. brain_score     (the 5-component composite)
  4. sync_failures   (file-plane JSONL count from gbrainPath('sync-failures.jsonl'))
  5. queue_health    (Postgres-only: stalled active jobs > 1h)

Engine-agnostic: works on both Postgres and PGLite via engine.executeRaw +
engine.getConfig + engine.getHealth — no reliance on db.getConnection()
which is Postgres-only.

Deliberately a focused subset of the local doctor surface, NOT a full
mirror. Generalizing to lint/integrity/orphans is filed as follow-up
pending demand. Local doctor (runDoctor) is unchanged; operators on the
host machine still get the full check set.

schema_version=2 matches the local doctor's --json output schema, so JSON
consumers can union the two without conditional logic.

Tests: 11 unit cases against PGLite covering the 5-check happy path,
schema version reporting (latest), PGLite-specific queue_health
informational message, and the score+status math via computeDoctorReport.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(mcp-client): outbound HTTP MCP client over @modelcontextprotocol/sdk

New src/core/mcp-client.ts wraps the official SDK's Client +
StreamableHTTPClientTransport with OAuth client_credentials minting,
in-process token caching with expires_at, and refresh-on-401 retry.

Public surface:
  - callRemoteTool(config, toolName, args)   tool call w/ auto-refresh
  - unpackToolResult(res)                    parse content[0].text JSON
  - RemoteMcpError                           discriminated by `reason`

Token cache: module-level Map keyed by mcp_url. CLI processes are
short-lived; the cache amortizes when one invocation makes multiple
calls (gbrain remote ping submits then polls). Persisting to disk would
be a credential-on-disk surface for marginal benefit since /token
round-trip is sub-100ms.

401 retry: ONLY for mid-session token rotation (initial good token →
stale → 401). If the FIRST mint fails auth, surface immediately as
RemoteMcpError(auth) — retry won't help when credentials are wrong from
the start. If a fresh-mint-after-401 still 401s, surface as
RemoteMcpError(auth_after_refresh) which the CLI renders with a hint
pointing the operator at gbrain auth register-client.

Used by gbrain remote ping (submit_job + get_job poll) and gbrain
remote doctor (run_doctor). Test-only _clearMcpClientTokenCache export
for fixture isolation.

Tests: 13 unit cases over an in-process HTTP fixture mimicking gbrain
serve --http (OAuth discovery + /token + /mcp JSON-RPC handshake).
Covers happy path, token cache reuse + force-refresh, args passthrough,
config-error paths (no remote_mcp / no secret), token mint 401, network
unreachable, tool isError envelope, and unpackToolResult parse failures.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(operations): add run_doctor MCP op (admin scope, HTTP-reachable)

New op in src/core/operations.ts wraps doctorReportRemote() and returns
the structured DoctorReport JSON over MCP.

  scope:     'admin'       (system-state read; not for routine consumers)
  localOnly: false         (reachable over HTTP)
  mutating:  false         (safe to call repeatedly)
  params:    {}            (no caller arguments needed)

First read-only diagnostic op exposed over HTTP MCP. Used by gbrain
remote doctor — the matching client-side renderer lives in
src/commands/remote.ts.

Precedent: doctor only. Generalizing run_lint / run_integrity /
run_orphans to MCP is filed as follow-up work pending demand. Local
doctor stays unchanged; this op is the operator-friendly subset for
remote callers.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(remote): gbrain remote ping + gbrain remote doctor

Two thin-client convenience commands that round-trip through the host's
HTTP MCP endpoint:

  - gbrain remote ping     submit_job(autopilot-cycle) → poll get_job →
                           exit when terminal. The "I just wrote markdown,
                           tell the host to re-index" affordance.
  - gbrain remote doctor   run_doctor MCP op → render the host's
                           DoctorReport → exit 0/1 based on status.

Both require a thin-client install (~/.gbrain/config.json with
remote_mcp). Local installs get a clear error pointing at the local
equivalents.

Polling backoff (ping): 1s × 30s, then 5s × 5min, then 10s. Default cap
15min, configurable via `--timeout`. Without backoff, a 5-min cycle
would burn 300 round-trips against the host's rate limiter.

Payload uses `data: {phases: [...]}`, NOT `params:` — the submit_job op
shape takes `data`. Codex review garrytan#8 catch.

NO `repo` arg passed to autopilot-cycle — uses the server's configured
brain repo. This sidesteps TODO garrytan#1144 (sync_brain repo-path validation
for caller-controlled paths) entirely.

src/cli.ts wires the `remote` subcommand into CLI_ONLY + the dispatch.
Help (`gbrain remote --help`) and unknown-subcommand handling included.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(e2e): thin-client Tier B + scope-mismatch regression

Extends the existing test/e2e/thin-client.test.ts with three new cases:

  1. gbrain remote doctor returns the host's DoctorReport — pins the
     run_doctor MCP op round-trip. Asserts schema_version=2, all 5
     check names present, connection + schema_version ok against a
     fresh host.
  2. gbrain remote ping triggers autopilot-cycle and returns terminal
     state — pins the submit_job → poll → terminal wire path. Accepts
     any terminal state (success / failed / dead / cancelled / timeout)
     because autopilot on an empty no-repo brain may fail-fast in the
     sync phase. What this test pins is the JSON shape (job_id present,
     state populated), NOT cycle success on a no-repo fixture.
  3. read+write client cannot call run_doctor — codex review garrytan#7
     regression guard. Registers a separate client with
     `--scopes "read write"` (no admin), runs `gbrain remote doctor`
     against it, asserts exit 1 with auth/auth_after_refresh/tool_error
     reason. Keeps the verification flow honest: the canonical setup
     MUST require admin scope.

`gbrain auth register-client` doesn't have --json, so the test parses
the human output for "Client ID:" and "Client Secret:" lines via a
helper.

Test-level timeout bumped 60s → 120s for the ping wait + auth/init
overhead.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v0.29.2)

v0.29.2 ships thin-client mode: gbrain init --mcp-only, gbrain remote
ping/doctor, run_doctor MCP op, and the docs/architecture/topologies.md
deployment guide.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…an#731)

* feat(schema): migration v40 — takes_resolved_quality + drift_decisions

Slice A1 of the v0.30 wave. Bundles all wave schema in one migration so
A2/B1/C1 carry no schema of their own (codex F6 schema-first ordering).

- takes.resolved_quality TEXT with CHECK (correct/incorrect/partial).
- takes_resolution_consistency CHECK enforces (quality, outcome) tuple
  consistency at the DB layer. partial → outcome=NULL.
- One-shot backfill maps legacy resolved_outcome → resolved_quality so
  v0.28 brains keep working with no manual reclassification.
- idx_takes_scorecard partial index on (holder, kind, resolved_quality)
  WHERE resolved_quality IS NOT NULL — scorecard hot path.
- drift_decisions audit table (consumed by Slice C1 in v0.30.3).
- PGLite branch via sqlFor.pglite mirrors the same shape; RLS DO-block
  is Postgres-only.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(takes-fence): extend ParsedTake + parser + conditional renderer (codex F3)

A codex consult on the v0.30 plan caught a real bug: the v0.28 parser had
no concept of resolution columns, so every cmdUpdate after a cmdResolve
silently deleted resolution data on the next render. This commit kills
that data-loss path.

ParsedTake gains optional resolvedAt, resolvedQuality, resolvedOutcome,
resolvedEvidence, resolvedValue, resolvedUnit, resolvedBy. parseTakesFence
detects v0.30-shape headers and reads resolution cells when present;
v0.28 7-column fences round-trip byte-identical. renderTakesFence emits
the resolution columns ONLY when at least one row on the page has
resolvedQuality set — pages with no resolved rows keep the narrow shape
exactly as before.

11 new test cases including the round-trip preservation regression gate.
Without those tests, the silent-delete bug returns the moment the parser
shape drifts. Tests cover: parsing v0.30 + v0.28 shapes, conditional
rendering, partial quality round-trip, upsertTakeRow + supersedeRow
preservation when a page already has resolved rows.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(engine): getScorecard + getCalibrationCurve + 3-state TakeResolution

Adds the calibration aggregate methods on BrainEngine. Both engines
implement them with SQL-level allow-list filtering inside the GROUP BY
(D4 fail-closed): hidden-holder rows contribute zero to aggregates.

TakeResolution gains optional `quality` (correct|incorrect|partial). When
both quality and outcome are supplied AND inconsistent, the engine throws
TAKE_RESOLUTION_INVALID rather than silently overwriting. resolveTake
writes both columns: quality directly, outcome derived (correct→true,
incorrect→false, partial→NULL). Schema CHECK is the defense-in-depth
backstop.

Brier scope (D5 + D11): the SQL aggregation excludes partial rows from
the Brier denominator — partial isn't a binary outcome to compare a
probability against. partial_rate is reported alongside as a separate
counter so hedging behavior stays visible. The 20% threshold lives in
src/core/takes-resolution.ts and the CLI surfaces it in v0.30.0's
cmdScorecard.

New module src/core/takes-resolution.ts holds shared pure helpers
(deriveResolutionTuple, finalizeScorecard) consumed by both engines so
the math stays identical across backends. takeRowToTake (utils.ts) reads
resolved_quality through to the Take row shape.

23 new test cases: 16 for the helpers (Brier hand-calc against a 4-bet
reference at 0.205, n=0 no-divide, contradictory-input rejection,
partial-exclusion contract, threshold constant); 7 against PGLite for
the engine path (3-state quality writes, contradictory throws, scorecard
hand-calc, n=0, SQL-level allow-list privacy filter).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(cli): gbrain takes resolve --quality, takes scorecard, takes calibration

cmdResolve widened: --quality correct|incorrect|partial is the new primary
input. --outcome true|false stays as a back-compat alias auto-mapping to
quality, with a stderr deprecation warning on use. Mutually exclusive
with --quality. --evidence is a semantic alias for --source on the
resolve subcommand.

cmdResolve mirrors resolution metadata into the takes-fence on disk via
the page-lock-aware path. Round-trip preservation through parseTakesFence
+ renderTakesFence keeps resolution data intact across unrelated edits to
other rows on the same page. Removes the v0.28 deferred-rendering warning.

cmdScorecard prints `correct | incorrect | partial`, accuracy, Brier
(correct ∨ incorrect only; lower is better; 0.25 = always-50% baseline),
and partial_rate. When partial_rate > 20% the CLI prints
"[!] partial_rate is high — calibration may be optimistic" so hedging
behavior stays visible even though it doesn't enter the math (D11). Small-N
note when resolved < 100. JSON output via --json.

cmdCalibration bins resolved correct/incorrect bets by stated weight
(--bucket-size, default 0.1) and prints observed vs predicted vs delta
per bucket. Diagonal alignment = perfect calibration.

Both new subcommands wire allow-list as undefined for local CLI callers
(trusted); MCP path will thread it from access_tokens.permissions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(mcp): register takes_scorecard + takes_calibration ops

Both ops are read-scope, MCP-callable, allow-list-honoring. Handlers
thread ctx.takesHoldersAllowList into the engine method's required
allowList parameter, which applies WHERE holder = ANY at SQL aggregation
level (D4 fail-closed). Local CLI callers leave the allow-list
undefined and see all holders.

Updates the OperationContext.takesHoldersAllowList contract comment to
list the new aggregate ops alongside takes_list, takes_search, query.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.30.0 release: calibration core (Slice A1 of v0.30 wave)

VERSION + package.json bump. CHANGELOG entry covers the release-summary
(headline + math table + privacy note + data-loss-bug-killed note),
"## To take advantage of v0.30.0" upgrade path, and itemized changes.

llms-full.txt regenerated to capture the v0.28.x annotations that had
been merged but not yet rolled into the docs bundle.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(e2e): scorecard + calibration parity on real Postgres + NUMERIC fix

Adds end-to-end coverage for v0.30.0 (Slice A1) against real Postgres:

- test/e2e/takes-scorecard-parity.test.ts (new): seeds the same 6-bet
  fixture (4 binary garry + 1 partial garry + 1 binary harj) into both
  Postgres and PGLite, asserts getScorecard + getCalibrationCurve return
  byte-identical results across engines, runs the 4-bet hand-calc Brier
  reference (0.205) on real PG, and verifies the SQL-level allow-list
  filter strictly subtracts hidden-holder rows on both engines.

- test/e2e/takes-postgres.test.ts: extended with 8 v0.30 cases — quality
  semantics (correct/partial/back-compat) writes the expected (quality,
  outcome) tuple on real PG; the takes_resolution_consistency CHECK
  constraint actually fires on a contradictory raw UPDATE; getScorecard
  + getCalibrationCurve coherent shape + ordered-bucket invariants;
  PRIVACY allow-list filter on real PG; MCP dispatch path for
  takes_scorecard + takes_calibration with allow-list threading.

While writing the parity test, the e2e harness caught a real bug PGLite
tolerated: postgres.js sends scalar `${bucketSize}` params as text by
default, so `FLOOR(weight / $N)` tried to coerce '0.1' to integer and
threw `invalid input syntax for type integer: "0.1"`. The NUMERIC fix
also kills a separate FP-precision divergence — `FLOOR(0.7 / 0.1)`
returns 6 on real PG (IEEE 754 rounds 0.7/0.1 to 6.9999...) and 7 on
PGLite. Both engines now bucket via `weight::numeric / $N::numeric`
which is exact decimal arithmetic and engine-agnostic.

This is the v0.30.0 wave's first cross-engine parity test. Same shape
will guard A2's getTrajectory + getAnnualReview when those land.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(changelog): correct v40→v43 reference and expand v0.30.0 test note

Two fixes to the v0.30.0 entry after the master merge renumbered the
migration:

- The "#### Added" bullet still said "Schema migration v40"; bumped to v43.
- The "#### Tests" section only enumerated unit tests. The PR also ships
  19 E2E cases (11 in takes-scorecard-parity, 8 extending takes-postgres)
  that exercise the calibration math against real Postgres and the
  PG↔PGLite engine parity. Added the count + a note about the two real
  bugs the parity test caught (postgres.js string-typed scalar params
  and IEEE 754 bucketing divergence) that PGLite tolerated.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…base (garrytan#750)

* v0.30.1 Lane A: connection-manager foundation + X1 initSchema routing

Routes Postgres queries by query type:
  - read() goes to the Supabase pooler (port 6543, fast)
  - ddl() and bulk() go to direct (port 5432, 30min stmt timeout, mwm 256MB)

Auto-detects Supabase via hostname pooler.supabase.com or port 6543.
Override with GBRAIN_DIRECT_DATABASE_URL. Kill-switch via
GBRAIN_DISABLE_DIRECT_POOL=1 falls back to single-pool legacy path.

Foundation modules (Lane A scope):
- src/core/connection-manager.ts: read/ddl/bulk/healthCheck, parent-CM
  inheritance (T5/X1), cached Promise<Sql> lazy init (A1), kill-switch
  inheritance (A2), Supabase URL auto-derivation
- src/core/url-redact.ts: redactPgUrl + redactDeep (F3)
- src/core/retry-matcher.ts: typed predicates for stmt-timeout / lock /
  conn errors (C4)
- src/core/connection-audit.ts: ~/.gbrain/audit/connection-events JSONL
  with ISO-week rotation; doctor tail-reads last 5 errors (F8)
- scripts/check-pg-url-redaction.sh: CI grep guard against unredacted
  postgresql:// URL leaks (F3)

Engine integration:
- PostgresEngine.connect: instantiates instance-owned ConnectionManager,
  inherits from parentConnectionManager when set (worker engines, sync,
  cycle), shares pool with module-singleton path
- PostgresEngine.disconnect: tears down direct pool first
- PostgresEngine.initSchema: routes DDL through connectionManager.ddl()
  when dual-pool active (X1 part 1; lock semantics replacement is Lane B)
- cli.ts:connectEngine(opts): probeOnly skips initSchema entirely (X1
  part 2 — get_health, upgrade --status will use this)

Tests added (51 new cases):
- test/url-redact.test.ts: 11 cases
- test/retry-matcher.test.ts: 13 cases
- test/connection-manager.test.ts: 27 cases (URL detection, derive,
  kill-switch, parent inheritance, dual-pool routing modes)

Foundation for Lanes B-E. Sequential lane work continues.

Plan: ~/.claude/plans/system-instruction-you-are-working-stateless-wadler.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.30.1 Lane B: migration runner retry + verify hooks + namespaced --force flags

Adds Migration interface fields:
  - idempotent: boolean (default true; explicit false blocks verify-hook
    re-runs on destructive migrations)
  - verify: optional post-condition probe; runs after migration claims success

Migration retry wrapper (Cherry D3 / Finding F2):
  - 3 attempts with 5s/15s/45s backoff (env GBRAIN_MIGRATE_BACKOFF_MS=0
    for tests)
  - Retries only on statement_timeout (57014) or connection-reset patterns
  - Pre-attempt: logs idle-in-transaction blockers via getIdleBlockers
  - On exhaustion: throws MigrationRetryExhausted with named PID + suggested
    pg_terminate_backend() recovery command

Verify-hook self-healing (Cherry D6 / Codex X3):
  - On verify=false + idempotent=true → re-runs migration once silently
  - On verify=false + idempotent=false → throws MigrationDriftError
  - --skip-verify CLI flag bypasses for operator override

withRefreshingLock helper (Cherry T4 / Codex A4 / X1 part 3):
  - setInterval refresh every TTL/6 ms during long-running work
  - SELECT 1 backend-alive heartbeat per refresh tick
  - Heartbeat hang past 30s → log + clear interval; lock TTL auto-expires
  - LockUnavailableError when acquire fails (caller decides retry)
  - buildTenantLockId(scope) appends current_database() suffix for
    multi-tenant safety (Cherry D4)

Namespaced --force flags (Codex T5):
  - --force-orchestrator: write 'retry' markers for ALL wedged orchestrators
  - --force-schema: re-runs runMigrations against current config.version
  - --force / --force-all: both
  - --force-retry vX.Y.Z: existing single-version reset (preserved)
  - --skip-verify: bypass verify-hook drift detection on a single run

Test additions:
  - test/migrate-extensions.test.ts: 14 cases (idempotent default,
    error envelopes, MIGRATIONS contract)
  - test/db-lock-refresh.test.ts: 10 cases (LockUnavailableError,
    buildTenantLockId multi-tenant, opts shape)
  - test/migrate.test.ts: updated 2 existing cases (PR garrytan#356 retry shape +
    function-name anchor) for v0.30.1 retry-wrapper semantics

156 unit tests passing across the v0.30.1 surface so far.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.30.1 Lane C: backfill primitive + registry + X4 + X5

First-class generic backfill runner (Fix 3). Generalizes the
keyset+checkpoint+adaptive-batch pattern from
src/core/backfill-effective-date.ts so future backfills (embedding_voyage
in v0.30.2, etc.) reuse one tested runner.

NEW src/core/backfill-base.ts:
  - runBackfill() with keyset pagination, config-table checkpoint, adaptive
    batch halving on stmt timeout, conn-drop reconnect, max-errors bail
  - ensureBackfillIndex() verifies/creates partial index CONCURRENTLY (P2/X4)
  - clearBackfillCheckpoint() for --fresh path
  - T3 fix: writes go through engine.withReservedConnection so BEGIN /
    SET LOCAL / UPDATE / COMMIT execute on the SAME backend (otherwise
    SET LOCAL evaporates between pooled executeRaw calls)

NEW src/core/backfill-registry.ts:
  - effective_date: implemented (wraps existing computeEffectiveDate)
  - emotional_weight: implemented (wraps computeEmotionalWeight + stamps
    new emotional_weight_recomputed_at column)
  - embedding_voyage: declared-only in v0.30.1 (multi-column embedding
    schema lands in v0.30.2)

NEW src/commands/backfill.ts:
  - gbrain backfill <kind> [--batch-size N] [--concurrency N] [--resume]
                          [--fresh] [--dry-run] [--keep-index] [--max-errors N]
  - gbrain backfill list — shows registered backfills + status
  - X5 admission control: clampConcurrency() forces --concurrency to
    GBRAIN_DIRECT_POOL_SIZE - 1 ceiling (always reserves 1 conn for HNSW
    + heartbeat + doctor probes). Loud-warns when user requests above.

Schema migration v44 (X4 / Codex C8 fix):
  - pages.emotional_weight_recomputed_at TIMESTAMPTZ
  - emotional_weight = 0 is a VALID steady-state value per migration v40,
    so the original P2 predicate ("WHERE emotional_weight = 0") would have
    been a permanent large index over normal data. The corrected backlog
    predicate is "emotional_weight_recomputed_at IS NULL"; the partial
    index drops naturally as the cycle phase + this backfill stamp the
    column over time.
  - idempotent: true (ADD COLUMN ... NULL is metadata-only)

CLI integration:
  - src/cli.ts: registers `backfill` subcommand
  - reindex-frontmatter stays as thin alias for v0.30.1 back-compat;
    canonical entrypoint is now `gbrain backfill effective_date`

Test additions:
  - test/backfill-base.test.ts: 11 cases (keyset, checkpoint, dry-run,
    resume/fresh, maxRows cap, withReservedConnection routing, error
    paths, clearCheckpoint, ensureBackfillIndex)
  - test/backfill-concurrency-clamp.test.ts: 6 cases (X5 admission control)

173 unit tests passing across Lanes A+B+C of v0.30.1.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.30.1 Lane D: HNSW lifecycle manager + A3 atomic-swap

Extends src/core/vector-index.ts with the v0.30.1 lifecycle layer.
The original chunkEmbeddingIndexSql / applyChunkEmbeddingIndexPolicy
contract is preserved unchanged.

New surfaces:
  - checkActiveBuild(engine, indexName): probes pg_stat_activity for an
    active CREATE INDEX or REINDEX on the named index. Used as pre-op
    guard so dropAndRebuild doesn't compete with a build already in
    flight (Supabase auto-maintenance, parallel gbrain procs).

  - dropZombieIndexes(engine, tableNames): startup sweep of
    indisvalid=false rows on gbrain tables. Drops them with
    DROP INDEX IF EXISTS, BUT skips any zombie that has an active build
    still in pg_stat_activity (codex Fix-5 in-progress-build guard).
    Wired into PostgresEngine.initSchema() — runs after migrations +
    verifySchema, best-effort, never blocks engine.connect().

  - dropAndRebuild(engine, spec, opts): A3 atomic-swap pattern:
      1. checkActiveBuild → bail if another build is active (--force overrides)
      2. CREATE INDEX CONCURRENTLY <name>_rebuild_<unix-ms> via
         engine.withReservedConnection (CONCURRENTLY can't run in a txn)
      3. Atomic swap inside engine.transaction:
           DROP INDEX <old-name>
           ALTER INDEX <temp-name> RENAME TO <old-name>
      4. If step 2 fails (OOM, timeout, conn drop), the OLD index stays
         intact and search keeps serving queries. This is the headline
         A3 win — no production-degraded silent failure mode.

  - monitorBuild(engine, indexName, onProgress, opts): poll
    pg_stat_activity every 30s; emit elapsed_ms + size_bytes (via
    pg_relation_size) + pid. Used by gbrain backfill embedding_voyage
    when batch > 1000 triggers a rebuild.

  - isSupabaseAutoMaintenance(active): predicate on application_name
    (matches "supabase" / "postgres-meta"). Used by dropAndRebuild to
    log + back off when Supabase auto-maintenance is doing the rebuild.

Engine integration:
  - PostgresEngine.initSchema() calls dropZombieIndexes after verifySchema.
    Surfaces zombie counts via console.log.
  - Best-effort wrapped in try/catch: pg_stat_activity / pg_index access
    can be restricted on managed Postgres tiers; gbrain shouldn't fail
    engine.connect() over diagnostic queries.

Test additions (18 cases):
  - test/vector-index-lifecycle.test.ts:
    * chunkEmbeddingIndexSql contract (3 cases) — pre-existing behavior preserved
    * applyChunkEmbeddingIndexPolicy contract (1 case)
    * checkActiveBuild (4 cases, including PGLite no-op + best-effort failure)
    * isSupabaseAutoMaintenance (3 cases)
    * dropZombieIndexes (4 cases, including in-progress-build guard)
    * dropAndRebuild atomic-swap (3 cases, including PGLite + active-build bail
      + temp-name format assertion)

191 unit tests passing across Lanes A+B+C+D of v0.30.1.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.30.1 Lane E: upgrade pipeline checkpoint + brain_id binding + get_health migrations

NEW src/core/upgrade-checkpoint.ts:
  - Cherry D5: persists step-by-step progress through gbrain post-upgrade
    so partial failures can be resumed via gbrain upgrade --resume.
    Steps: pull → install → schema → features → backfills → verify.
  - Codex X2: checkpoint binds to brain identity via sha256(database_url)
    (userinfo stripped before hashing so cred rotations don't invalidate).
    PGLite uses sha256(database_path). Cross-brain checkpoint application
    is now refused with reason='brain_mismatch'.
  - F4 fall-through: validateCheckpoint returns reason='no_checkpoint'
    when none exists, enabling silent fall-through to a full upgrade.
  - All-complete detection: stale checkpoints (every step done) return
    reason='all_complete' so the next run clears + re-runs from scratch.
  - markStepComplete + markStepFailed maintain the partial-state shape.

T2 preserved: upgrade.ts still re-execs `gbrain post-upgrade` so the NEW
binary's migration registry runs (the existing re-exec pattern is correct
per codex round 1's plan-breaking finding). The checkpoint module is the
substrate that Lane E's --resume / --status surfaces will plumb through
in v0.30.2.

D7 + C3 contract committed:
  - BrainHealth.schema_version: '1' (literal type) — additive-only contract
    pinned for MCP get_health consumers.
  - BrainHealth.migrations: { schema, orchestrator } — explicit two-ledger
    diagnostic surface (codex T5 namespacing). Both fields are OPTIONAL
    in v0.30.1 — engines can populate them in v0.30.2 without a contract
    bump. Backwards/forwards compat: clients default-handle missing fields.

VERSION: 0.30.0 → 0.30.1
package.json: synced

Test additions (18 cases):
  - test/upgrade-checkpoint.test.ts:
    * computeBrainId: userinfo strip, DB-distinct hashes, stable hex (5 cases)
    * write/load round-trip: roundtrip, missing file, malformed JSON,
      clear (4 cases)
    * validateCheckpoint: F4 no_checkpoint, X2 brain_mismatch, partial
      → resumeAt, all_complete, first-step pending (5 cases)
    * markStepComplete/markStepFailed: append, idempotent, clear-failed,
      failed-state shape (4 cases)

209 unit tests passing across all 5 lanes of v0.30.1 (Lanes A-E core
foundations). Plumbing into upgrade.ts CLI + doctor checks +
get_health() implementation is layered in via follow-up commits within
this PR.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.30.1 e2e + test isolation: integration smoke + serial quarantine

NEW test/e2e/v030_1-integration-pglite.test.ts (14 cases):
  PGLite integration smoke proving Lane A-E surfaces work together.
    Lane B: migration runner applies v44 (emotional_weight_recomputed_at)
            cleanly; config.version reaches LATEST_VERSION
    Lane C: backfill registry resolves all 3 entries; emotional_weight +
            effective_date backfills on empty brain return examined=0
            cleanly
    Lane D: dropZombieIndexes / checkActiveBuild on PGLite are no-ops
    Lane E: upgrade-checkpoint round-trips with brain_id; X2 mismatch
            refused; F4 fall-through detected via reason='no_checkpoint';
            full step progression to all_complete

Test isolation hygiene (scripts/check-test-isolation.sh):
  - test/connection-manager.test.ts → connection-manager.serial.test.ts
  - test/backfill-concurrency-clamp.test.ts → .serial.test.ts
  - test/upgrade-checkpoint.test.ts → .serial.test.ts
  All three files mutate process.env (kill-switch, GBRAIN_DIRECT_POOL_SIZE,
  GBRAIN_HOME) which would race other tests in the parallel runner.
  *.serial.test.ts quarantine ensures they run at --max-concurrency=1.
  Choice between withEnv() refactor and serial quarantine made on the side
  of preserving existing well-formed test code.

E2E coverage status:
  - v030_1-integration-pglite.test.ts (this commit): 14 cases, all green
  - backfill-perf-pglite.test.ts: 1 case, green (no regression)
  - cycle-recompute-emotional-weight-pglite.test.ts: green (no regression)
  - multi-source-emotional-weight-pglite.test.ts: green (no regression)
  - dream-synthesize-pglite.test.ts: 14 cases, green (no regression)
  - anomalies-pglite.test.ts + salience-pglite.test.ts: 6 cases, green

Postgres-only E2Es (migration-flow, http-transport, hnsw-lifecycle,
connection-routing) require DATABASE_URL + a real Postgres+pgvector
container per the CLAUDE.md E2E lifecycle. They land as separate
DATABASE_URL-gated work — not regressed by v0.30.1 changes; their
preconditions just aren't met in the current run environment.

`bun run verify` (typecheck + 4 shell pre-checks + test-isolation lint)
passes cleanly.

Final v0.30.1 unit + integration test count: 4547 pass, 0 regressions.
Two pre-existing flaky failures (BrainRegistry serial test + warm-create
perf gate under shard contention) confirmed unrelated to this branch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v0.30.1)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…an#754)

* feat: classify Anthropic prompt-too-long as UnrecoverableError

The subagent handler now detects 400 "prompt is too long" responses
from the Anthropic SDK and rethrows as UnrecoverableError. The worker
already routes UnrecoverableError straight to `dead`, so doomed jobs
fail terminally on first attempt instead of stalling 3x with the same
oversized prompt.

isPromptTooLongError matches the production message verbatim
("prompt is too long: N tokens > N maximum"), case-insensitive, on
both the outer message and inner error.message paths. Defensive
secondary match for status=400 + invalid_request_error/request_too_large
with the words "too long"/"exceed"/"maximum".

9 unit cases pin the detection: production wording, case folding,
nested SDK shape, defensive 400 paths, unrelated 400s, transient
errors, null/empty inputs.

* feat: model-aware chunking + slug-rewrite for dream synthesize

The synthesize phase now chunks oversized transcripts at paragraph
boundaries instead of submitting one giant prompt that 400s on
Anthropic. Closes the v0.30 dream-cycle queue clog where 1.7M-token
transcripts dead-lettered after 3 stalls and re-discovered every
cycle.

D1: per-chunk budget = floor(model_context_tokens × 0.9 × 3.5).
MODEL_CONTEXT_TOKENS keys on resolved Anthropic ids (Opus 4.7 = 1M,
Sonnet 4.6 = 200K, Haiku = 200K). Non-Anthropic models fall back to
180K-token safe default with a once-per-process stderr warning.
dream.synthesize.max_prompt_tokens overrides the model lookup
(token-shaped, name from PR garrytan#748, floor 100K).

D5: on max_chunks_per_transcript cap hit, log + skip; do NOT write to
dream_verdicts. Closes the cache-poisoning class — next cycle
re-attempts under whatever budget is then current.

D6: orchestrator-side deterministic slug rewrite, zero Sonnet trust.
collectChildPutPageSlugs raw-fetches every (job_id, slug) pair (no
SELECT DISTINCT — that erased the collision evidence the audit
claimed to detect) and rewrites bare-hash6 slugs to <hash6>-c<idx>
for chunked children.

D8: pre-fan-out lookup of completed legacy `dream:synth:<filePath>:
<hash16>` jobs. Transcripts already synthesized under the
single-chunk shape skip submission with `already_synthesized_legacy_
single_chunk` instead of resubmitting under chunked keys.

D9: hash-deterministic chunk boundaries. The 3-tier ladder lifted
from PR garrytan#748 (## Topic: > --- > nearest \\n) is fed a back-half
search-window offset derived from contentHash. Same content always
chunks identically across runs; chunk N of a previously-failed
transcript produces byte-identical content on retry.

D10: 24-chunk default cap, operator-configurable via
dream.synthesize.max_chunks_per_transcript.

18 unit cases pin the chunker (boundary ladder, hash determinism,
hard fallback, slug rewrite all 7 shapes). 4 PGLite E2E cases pin
fan-out shape (single-chunk legacy key parity, multi-chunk chunked
key shape) + skip paths (D5 cap hit no verdict-cache write, D8
legacy-key skip).

Credits PR garrytan#748 (Wintermute) for the boundary ladder, config key
naming, and 3.5 chars/token estimator. This branch supersedes garrytan#748
with the structural safeguards (model-aware budget, terminal-error
classify, slug rewrite, hash-determinism, doctor surfacing).

* feat: surface dead-lettered prompt_too_long jobs in doctor queue_health

queue_health gains a 4th subcheck counting dead `subagent` jobs in
the last 24h whose error_text starts with `prompt_too_long:`. When
present, prints a fix hint pointing at
`gbrain dream --phase synthesize --dry-run --json` to identify the
fat transcripts and naming the two operator escape hatches
(`dream.synthesize.max_prompt_tokens` for budget tuning,
larger-context model for capacity).

Operators now see the chunking failure mode without grepping
minion_jobs by hand.

* chore: bump version and changelog (v0.30.2)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: update README + CLAUDE.md for v0.30.2

- README dream help: 8-phase → 9-phase, mention v0.30.2 chunking + config keys
- CLAUDE.md synthesize.ts: chunker + per-chunk idempotency + D6 slug rewrite + D7 scope + D8 legacy-key
- CLAUDE.md subagent.ts: prompt_too_long terminal classification
- CLAUDE.md doctor.ts: queue_health subcheck 4 (dead-lettered prompt_too_long)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs: regenerate llms-full.txt after v0.30.2 CLAUDE.md updates

The docs/ pass extended three Key Files entries in CLAUDE.md
(synthesize.ts, subagent.ts, doctor.ts). The auto-derived
llms-full.txt bundle picks up those CLAUDE.md changes via
build-llms; the build-llms test caught the drift in CI.

Generated by: bun run build:llms

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…olidate phase (garrytan#785)

* v0.31 feat(migrate): facts hot memory schema (migration v40)

Phase 1 of v0.31 hot-memory.

- New facts table with source_id (TEXT FK to sources, per-source isolation),
  kind CHECK (event/preference/commitment/belief/fact), visibility CHECK
  (private/world for takes-style ACL parity), valid_from/valid_until/
  expired_at/superseded_by for temporal + supersession audit, and
  consolidated_at/consolidated_into pointing at takes(id) for the dream-
  cycle hot→cold bridge.
- Embedding column dim resolved at migration time from
  config.embedding_dimensions so non-OpenAI brains (Voyage etc) work
  out-of-the-box. HALFVEC where pgvector >= 0.7; falls back to VECTOR
  with stderr warn on older versions. Matching opclass per column type
  (halfvec_cosine_ops vs vector_cosine_ops).
- 5 partial indexes leading on source_id so every read uses the trust
  boundary as part of the index, not a callback. HNSW partial index
  excludes expired/null rows so footprint stays proportional to active
  fact count.
- RLS DO-block matches takes pattern (Postgres BYPASSRLS gate; PGLite
  no-op).
- v0_31_0.ts orchestrator follows v0_28_0.ts pattern — phase A asserts
  schema version >= 40 + facts table presence; runner owns ledger.

All 87 existing migrate.test.ts cases pass. PGLite smoke test confirms
table + indexes + CHECK constraints + ON DELETE CASCADE all behave.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.31 chore(version): bump VERSION + package.json to 0.31.0

Phase 1 closer. CHANGELOG entry written when Phase 7 lands.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.31 feat(engine): facts hot memory engine API (Phase 2)

Phase 2 of v0.31 hot-memory.

Adds 8 facts methods to BrainEngine implemented on both PGLite and
Postgres engines:

- insertFact(input, ctx) — INSERT with optional supersedeId; expires the
  named row in the same transaction. Per-entity advisory lock on Postgres
  (`pg_advisory_xact_lock(hashtextextended(source_id::text || ':' ||
  entity_slug, 0))`) for the dedup window. PGLite is single-process so
  the lock is a no-op.
- expireFact(id, opts) — sets expired_at + optional superseded_by.
  Idempotent-as-false (already-expired returns false).
- listFactsByEntity / listFactsSince / listFactsBySession — list surfaces
  with FactListOpts filters (activeOnly, kinds, visibility, limit/offset).
  Every query starts WHERE source_id = $X so the trust boundary is part
  of the index path.
- listSupersessions — audit log; activeOnly:false + expired_at IS NOT NULL
  + superseded_by IS NOT NULL.
- findCandidateDuplicates(source_id, entity_slug, factText, k) —
  entity-prefiltered (mandatory), k=5 default, hard cap 20. Embedding-
  cosine ordering when caller supplies an embedding, recency fallback
  otherwise. Bounds the contradiction-classifier blast radius.
- consolidateFact(id, takeId) — sets consolidated_at + consolidated_into.
  Never DELETE; facts stay as audit trail for the resulting take.
- getFactsHealth(source_id) — per-source counters consumed by `gbrain
  doctor` facts_health check.

Public types in engine.ts: FactKind (5-value union), FactVisibility,
FactInsertStatus, FactRow, NewFact, FactListOpts, FactsHealth.

PGLite + Postgres helpers: rowToFact / rowToFactPg parse the
text-format pgvector embedding back into Float32Array; toPgVectorLiteral
encodes for the supersede-path INSERT (postgres-js can't bind Float32Array
directly to a vector column without an explicit literal cast).

Smoke test confirms every method end-to-end on PGLite. Typecheck clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.31 feat(facts): extraction code path (Phase 3)

Phase 3 of v0.31 hot-memory.

Five new modules under src/core/facts/ + src/core/entities/:

- src/core/facts/decay.ts — pure helper. effectiveConfidence(fact, now)
  applies confidence × exp(-age/halflife) with per-kind halflife table
  (event 7d, commitment 90d, preference 90d, belief 365d, fact 365d).
  Returns 0 for expired or past-valid_until rows. Single source of truth
  consumed by recall, supersession audit, facts_health, and the MCP _meta
  injector (eD8 DRY).

- src/core/facts/queue.ts — bounded in-memory queue. Cap 100 default,
  drop-oldest on overflow with counter. Per-session in-flight=1 serializes
  burst chat. AbortSignal threading from server SIGTERM (mirrors minion
  worker pattern per eD7): 5s grace for in-flight, then drop pending with
  counter. getFactsQueue() process-singleton; __resetFactsQueueForTests
  for hermetic tests.

- src/core/facts/classify.ts — contradiction classifier with cosine
  fast-path (D13: ≥0.95 → duplicate, skip LLM) and classifier-failure
  fallback (D12: cosine ≥0.92 → duplicate, else INSERT). Pure cosine
  helper exported. JSON-strict output with 4-strategy parse fallback;
  refusal stop-reason maps to fallback path. Caller-provided abort
  signal propagated to the gateway chat call.

- src/core/facts/extract.ts — Haiku turn-extractor. Reuses
  INJECTION_PATTERNS from src/core/think/sanitize.ts on the way IN
  (turn_text) AND on the way OUT (each fact). Tight system prompt with
  5-kind taxonomy, 0..1 confidence scoring, entity slug or display name.
  Anti-loop check on isDreamGenerated (reuses v0.23.2 marker semantics).
  Synchronous embedOne() per fact via the gateway so classifier paths
  have embeddings available; AbortError re-thrown explicitly so SIGTERM
  during embed never writes a NULL-embedding row meant to be cancelled
  (eE8 distinction).

- src/core/entities/resolve.ts — slug canonicalization shared by
  signal-detector AND facts. Resolution order: exact slug match →
  pg_trgm fuzzy match (similarity ≥0.4) → deterministic slugify
  fallback. slugify exported standalone for tests + callers that want
  the floor.

Smoke tests confirm decay table, cosine math, slugify rules, queue
drop-oldest under overflow, and shutdown grace + drop-pending semantics.
Typecheck clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.31 feat(mcp+cli): MCP ops + recall CLI + _meta + transport refactor (Phase 4)

Phase 4 of v0.31 hot-memory.

Three new MCP ops on the contract-first surface:

- `extract_facts` (write scope, localOnly:false): extracts facts from a
  conversation turn via the Haiku extractor, runs the cosine fast-path
  dedup, INSERTs into per-source hot memory. Returns counts +
  fact_ids[]. Skips on is_dream_generated:true (anti-loop).
- `recall` (read scope): query the per-source hot memory by
  entity / since / session / supersessions / grep filter. Visibility-
  aware: remote callers see visibility='world' rows only (takes-style
  ACL parity, eD21). Returns most-recent first; pagination via limit.
- `forget_fact` (write scope): expireFact wrapper. Idempotent-as-error
  on unknown id; uses the new 'fact_not_found' ErrorCode.

ErrorCode union opened (eD6 / eE7): TS forward-compat via the
`(string & {})` autocomplete-friendly hack so downstream consumers
(gbrain-evals etc) don't break their typecheck on every new code.
Three new codes: 'rate_limited', 'extraction_failed', 'fact_not_found'.

OperationContext gains source_id?:string (eD4 / eE2 — TEXT not INTEGER
per schema reality). Resolved once in buildOperationContext from
DispatchOpts.sourceId. Stdio MCP defaults to GBRAIN_SOURCE env or
'default'; HTTP MCP reads it from the per-token sources scope (eE3).

ToolResult gains _meta?: Record<string, unknown> (eD3). Dispatcher
calls a configurable metaHook AFTER op.handler succeeds, wrapped in
its own try/catch so a DB blip degrades to no-_meta rather than
flipping the whole tool call to error (eE4).

New module src/core/facts/meta-hook.ts:
- getBrainHotMemoryMeta(name, ctx) builds the _meta.brain_hot_memory
  payload. Cache key (source_id, session_id, hash(takesHoldersAllowList
  sorted)) (eD10 / eE5). 30s TTL per session. Visibility filter applies:
  remote → world only; local → all. Top-K=10 ranked by effective
  confidence (decay). Skips injection on recall/extract_facts/forget_fact
  themselves. bumpHotMemoryCache() invalidates per (source_id,
  session_id) on extraction event.

D12 (eE1) accepted: serve-http.ts:801 inlined dispatch path REFACTORED
to call dispatchToolCall. HTTP MCP now inherits source_id, _meta
injection, error envelope unification, and OperationContext shape from
the same code path stdio uses. Scope check + mcp_request_log + SSE
broadcast stay in serve-http.ts (HTTP-specific concerns); the dispatcher
returns ToolResult and the HTTP handler reads isError + content + _meta
to fan into the audit + broadcast paths.

put_page compliance backstop (D23): when a conversation-shape page is
written (note/meeting/slack/email/calendar-event/source/writing) with
a substantive body (>=80 chars) on a non-subagent slug AND no
dream_generated:true marker, fire-and-forget enqueue an extraction job
into the bounded queue. Never blocks the put_page response. Skipped
reasons (no_parsed_page / subagent_namespace / dream_generated /
kind:* / too_short / queue_shutdown / backstop_error) are stable
strings consumed by tests.

`gbrain recall` + `gbrain forget` CLI commands (src/commands/recall.ts):
- recall <entity> | --since DUR | --session ID | --today (markdown
  with kind icons 📅🎯🤝💭📌) | --grep TEXT | --supersessions |
  --include-expired | --as-context (prompt-injection-ready) | --json
- forget <fact-id> shorthand for expireFact

Wired into src/cli.ts dispatch table next to takes / think.

Smoke tests confirm: dispatch surfaces (extract_facts → ops →
listFactsByEntity), forget_fact + idempotent re-call, _meta visibility
filter (remote sees world only, local sees all), CLI markdown render
with kind icons + age strings + decayed confidence.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.31 feat(cycle): consolidate phase — facts → takes promotion (Phase 5)

Phase 5 of v0.31 hot-memory.

New 10th cycle phase `consolidate` between `patterns` and `embed`:

- src/core/cycle.ts:
  * CyclePhase union extended with 'consolidate'
  * ALL_PHASES gets 'consolidate' between patterns and embed (graph-fresh
    after patterns; embed runs after so the new takes get embedded
    same-cycle)
  * NEEDS_LOCK_PHASES gets 'consolidate' (writes takes + UPDATEs facts)
  * CycleReport.totals gains facts_consolidated + consolidate_takes_written
  * runCycle dispatches the new phase via dynamic import

- src/core/cycle/phases/consolidate.ts (new):
  * Scans (source_id, entity_slug) buckets where COUNT(unconsolidated
    facts) >= 3 (uses idx_facts_unconsolidated partial index)
  * Skips buckets where the OLDEST fact is < 24h old (gives signal time
    to settle before locking it into cold memory)
  * Greedy cosine clustering at threshold 0.85; head-element centroid
    keeps it deterministic + cheap. Singletons (no embedding) stay
    unconsolidated this cycle.
  * For each cluster size >= 2: picks the highest-confidence fact's text
    as the take claim (v0.31 deterministic; v0.32 swaps to Sonnet
    synthesis pass). avg confidence → take weight, earliest valid_from →
    take since_date, concatenated source_sessions → take.source.
  * Resolves entity_slug → page_id via pages.slug (per source). Skips
    cluster if page is missing in this source — no auto-page-creation
    in v0.31.
  * INSERT into takes(kind='fact', holder='self') with row_num =
    MAX(existing) + 1.
  * UPDATE contributing facts: consolidated_at = now() +
    consolidated_into = takes.id. NEVER DELETE — facts are the audit
    trail for the resulting take.
  * dryRun honored: pretends the writes happened; counters still tick
    so operators can preview load before the first real run.
  * yieldDuringPhase keepalive between buckets so the Minions worker
    job lock + cycle-lock TTL don't drift on long runs.

Smoke test on PGLite confirms: 4 unconsolidated facts → clustered
(cosine 1.0 since same vector) → 1 take row created → all 4 facts
marked consolidated_into. runCycle({phases:['consolidate']}) wires
through to the report totals. Typecheck clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.31 test: 18 facts test files (Phase 6)

Phase 6 of v0.31 hot-memory: comprehensive coverage across the new
substrate. 110 unit tests pass; 5 E2E test files added (skip gracefully
without DATABASE_URL).

Unit tests (PGLite in-memory, no DATABASE_URL):
- test/facts-decay.test.ts (12 cases) — HALFLIFE_DAYS pinned per kind,
  effectiveConfidence math: age=0 / age=halflife (~1/e) / age=2×halflife
  (~1/e²) / expired returns 0 / valid_until past returns 0 /
  preference-vs-event slower decay / belief-vs-commitment crossover.
- test/facts-queue.test.ts (10 cases) — FIFO within session, drop-oldest
  on overflow, per-session in-flight=1 serializes, different sessions
  parallelize, failed jobs counter, shutdown grace + drop_pending +
  external AbortController triggers shutdown.
- test/facts-classify.test.ts (8 cases) — cosineSimilarity edge cases,
  empty candidates → independent, cheap fast-path ≥0.95 → duplicate
  no LLM, threshold-configurable cosine_fallback path.
- test/facts-engine.test.ts (13 cases) — every BrainEngine fact method
  end-to-end: insertFact (insert/supersede), expireFact idempotency,
  list*, findCandidateDuplicates entity-prefiltered + k cap + cosine
  ordering, consolidateFact never DELETE, getFactsHealth shape +
  total_today ⊆ total_week.
- test/facts-multi-tenant.test.ts (6 cases) — cross-source isolation
  on every list method + CASCADE delete on sources.
- test/facts-visibility.test.ts (6 cases) — visibility column private/
  world; remote=true filters to world-only via dispatchToolCall;
  remote=false sees all.
- test/facts-canonicality.test.ts (10 cases) — slugify rules including
  NFKD diacritic strip ("Crème Brûlée" → "creme-brulee"), exact slug
  match, fallback to slugify when no fuzzy match.
- test/facts-extract.test.ts (4 cases) — empty turn returns [], dream-
  generated short-circuit, graceful no-API-key return.
- test/facts-backstop-gating.test.ts (5 cases) — put_page backstop:
  too_short, subagent_namespace, dream_generated, eligible note path,
  non-eligible kind:guide.
- test/facts-anti-loop.test.ts (4 cases) — extractor + put_page both
  respect dream_generated:true marker.
- test/facts-doctor-shape.test.ts (4 cases) — facts_health JSON shape
  pinned for downstream consumers.
- test/facts-mcp-allowlist.serial.test.ts (5 cases) — extract_facts
  write-scope, recall read-scope, forget_fact write-scope, forget_fact
  fact_not_found error code, extract_facts no-API-key zero counts.
- test/facts-context-injection.serial.test.ts (6 cases) — _meta
  injection on success, world-only filter under remote=true, anti-loop
  on facts ops themselves, best-effort degrade on hook error,
  cache-key includes allow-list hash.
- test/facts-separation-pglite.test.ts (2 cases) — Garry's Separation
  Test as primary ship gate, plus expired hidden-by-default contract.
- test/facts-recall-render.test.ts (3 cases) — --today markdown render
  with all 5 kind icons, --json shape with effective_confidence,
  --as-context emits comment-wrapped block.
- test/facts-migration-dim.test.ts (4 cases) — embedding column type
  is HALFVEC/VECTOR (not arbitrary), dim matches gateway-configured
  embedding_dimensions, HNSW opclass agrees with column type, idempotent
  re-init.
- test/cycle-consolidate.test.ts (5 cases) — below-count + below-age
  thresholds skip, happy path 4 facts → 1 take + all consolidated never
  DELETE, dryRun honored, missing page → bucket skipped.

E2E tests (skip gracefully on DATABASE_URL unset; required gates by
CLAUDE.md test policy):
- test/e2e/facts-separation-postgres.test.ts — Postgres parity for the
  ship gate.
- test/e2e/facts-cross-source-isolation.test.ts — cross-source ACL on PG
  + CASCADE delete.
- test/e2e/facts-forget.test.ts — full forget_fact MCP roundtrip.
- test/e2e/facts-context-injection-postgres.test.ts — _meta injection
  end-to-end on PG.
- test/e2e/facts-recall-render.test.ts — recall --today markdown on PG.
- test/e2e/serve-http-meta.test.ts — eE1 regression: HTTP MCP transport
  inherits _meta + sourceId + scope correctness via dispatchToolCall.

Side-effect: src/core/entities/resolve.ts NFKD post-decompose strips
combining marks (U+0300..U+036F) before hyphenating non-alphanumerics,
so "Crème" → "creme", not "cre-me-".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.31 feat(operational): kill switch + doctor check + CHANGELOG + README (Phase 7)

Phase 7 of v0.31 hot-memory.

- src/core/facts/extract.ts: new isFactsExtractionEnabled(engine) helper
  reads `facts.extraction_enabled` config row. Defaults to TRUE; flip to
  'false'/'0'/'no'/'off' (case-insensitive) via `gbrain config set
  facts.extraction_enabled false` to kill extraction across the brain
  without binary downgrade.
- extract_facts MCP op short-circuits with zero-counts envelope + a
  'skipped: extraction_disabled' field when the flag is off (clean
  success, not permission_denied).
- put_page facts backstop respects the same flag — eligibility check now
  returns 'extraction_disabled' as the skipped reason.
- src/commands/doctor.ts: new facts_health check (runs after queue_health,
  before index_audit). Probes for the facts table existence (post-v40
  guard), then surfaces total_active / total_today / total_week /
  total_consolidated + top-3 entities for the default source. Pre-v0.31
  brains report "facts table not present (pre-v0.31 brain or migration
  pending)".
- CHANGELOG.md: full v0.31.0 entry in the GStack release-summary voice.
  Headline + numbers-table + what-it-ships + itemized changes + "To take
  advantage of v0.31" upgrade block + out-of-scope. Honest about the
  HALFVEC + serve-http refactor + ErrorCode-open-union complications.
- README.md: cycle phase list updated 8 → 10 (consolidate + purge). New
  "v0.31 Hot Memory" command block under Commands with recall + forget
  variants, kind icons, --as-context surface for headless agents.

Test gates: 28 facts unit tests pass after the kill-switch wiring + doctor
check ride-along. Typecheck clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.31 fix(migrate): add facts→sources FK explicitly via ALTER TABLE

The inline column-level FK declaration on facts.source_id worked on
PGLite but silently got dropped on Postgres in the v0.31 e2e run —
the migration handler ran via postgres-js's `unsafe()` multi-statement
path and the resulting facts table came back without the
`facts_source_id_fkey` constraint. Same psql input run directly
against the same database produced the FK; the difference was the
unsafe() pipeline, not the SQL itself.

Splitting the FK into a separate ALTER TABLE inside a DO block makes
the constraint declaration explicit and idempotent: the named
constraint either exists or it doesn't, the ALTER is a no-op on
re-runs, and the failure mode is loud rather than silently leaving
a CASCADE-less foreign key behind.

Without this fix, deleting a source row leaves orphaned facts rows
(test/e2e/facts-cross-source-isolation.test.ts CASCADE-on-sources-
delete case caught it). With this fix the constraint is in place,
the cascade fires, and both PG + PGLite e2e suites stay green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.31 test: update phase-count assertions for the new consolidate phase

Three e2e/unit tests pinned the cycle phase count or order, all now
updated to reflect v0.31's 10-phase cycle:

- test/e2e/dream-cycle-eight-phase-pglite.test.ts:
  describe rename "8-phase cycle" → "10-phase cycle"; ALL_PHASES
  expectation extended to include 'consolidate' (between patterns +
  embed) and 'purge' (the v0.26.5 addition that was already in
  ALL_PHASES but missing from the test's assertion list). totals
  match adds the new facts_consolidated + consolidate_takes_written
  fields plus the pre-existing purged_sources_count + purged_pages_count
  that should have been added when v0.26.5 landed.

- test/e2e/cycle.test.ts: dry-run full cycle now expects
  report.phases.length === 10 (was 9).

- test/core/cycle.serial.test.ts: yieldBetweenPhases hook count + full
  cycle phases.length both updated 9 → 10. Comments call out the
  v0.31 addition lineage so the next person to add a phase sees the
  precedent.

These are mechanical assertion bumps. The tests pass against the
updated assertions on PGLite and Postgres.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.31 fix(test): truncate facts table between e2e describe blocks

setupDB() truncates ALL_TABLES between every describe block's
beforeAll() hook. The list missed the new v0.31 facts table, so
facts seeded by an earlier describe block leaked into Garry's
Separation Test on Postgres — listFactsByEntity('travel') returned
2 rows instead of 1 because a prior facts-context-injection test had
also seeded a 'travel' fact.

Adding 'facts' to the truncate list (before 'pages' to respect FK
ordering) makes every describe-block start from an empty facts table.

Pinned by re-running the e2e file ordering that originally caught it
(facts-recall-render → cross-source-isolation → serve-http-meta →
context-injection → separation-postgres → facts-forget) — 13 pass /
0 fail after the fix.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.31 test: meta-hook cache + Postgres consolidate phase coverage

Two net-new test files filling real coverage gaps the earlier sweep missed:

- test/facts-meta-cache.test.ts (5 cases) — pins the eD3/eD10 cache
  contract that the dispatcher relies on. 30s TTL hit path, post-bump
  fresh-query, scoped invalidation (bump for sess-A leaves sess-B cache
  warm — closes the cross-source leak risk codex F5 originally surfaced
  on the recall payload), facts-self ops skip injection (anti-loop on
  recall / extract_facts / forget_fact), distinct allow-lists produce
  distinct cache entries.

- test/e2e/cycle-consolidate-postgres.test.ts (3 cases) — Postgres
  parity for the dream-cycle consolidate phase. Mirrors the PGLite
  unit test but exercises the real postgres-engine codepaths: sql.begin
  transactions, advisory locks on insertFact's entity-slug dedup window,
  unsafe('::vector') casts on findCandidateDuplicates ordering,
  addTakesBatch postgres-js unnest path. Happy path (4 facts → 1 take +
  all consolidated_into set), age-threshold skip, dry-run no-write.

All 5 unit + 3 e2e tests pass. Closes the unit-only gap on the
consolidate phase (was only PGLite-tested) and pins meta-cache
invariants the dispatcher depends on.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.31 fix: thread auth + sourceId, JSON-shape every error envelope

Three bugs surfaced during the full e2e sweep that all trace back to my
v0.31 dispatch refactor (D12/eE1) silently dropping auth threading +
non-OperationError exceptions emitting plain strings:

1. **HTTP MCP transport lost ctx.auth.** Refactoring serve-http.ts to call
   dispatchToolCall meant auth had to come through DispatchOpts, but the
   field didn't exist yet. Every HTTP whoami call returned
   `unknown_transport` because ctx.auth was undefined. Added `auth?:
   AuthInfo` to DispatchOpts, plumbed it through buildOperationContext,
   and updated serve-http.ts:816 to pass `auth: authInfo` alongside
   sourceId/takesHoldersAllowList. Pinned by sources-remote-mcp e2e
   `whoami reports oauth transport + sources_admin scope`.

2. **Non-OperationError exceptions emitted plain strings, not JSON.**
   The pre-v0.31 serve-http.ts always wrapped errors in JSON envelope
   `{error, message}`; my dispatch refactor missed the unknown-tool +
   uncaught-throw paths and emitted `Error: ${msg}` text content. Every
   caller that did `JSON.parse(content)` (sources-remote-mcp callMcp
   helper at line 104) crashed with `Unexpected identifier "Error"`.
   Both error paths in dispatchToolCall now return JSON-shaped content
   matching the OperationError pattern.

3. **Files→sources FK silently lost on rewound bootstrap path.**
   test/e2e/postgres-bootstrap.test.ts simulates a pre-v0.21 brain by
   `DROP TABLE IF EXISTS sources CASCADE` which removes
   files_source_id_fkey while leaving files.source_id intact. The v23
   migration's `ALTER TABLE files ADD COLUMN IF NOT EXISTS source_id ...
   REFERENCES sources(id) ON DELETE CASCADE` is a no-op when the column
   exists, so the FK never came back on upgrade — and any sources-remove
   afterward stopped cascading to files. Added a defensive
   `IF NOT EXISTS files_source_id_fkey ... ALTER TABLE ADD CONSTRAINT`
   block inside v23's handler. Pinned by `multi-source — cascade delete
   covers every dependent row` after running postgres-bootstrap.

Plus: src/core/preferences.ts now honors GBRAIN_HOME for
`~/.gbrain/migrations/completed.jsonl`. Without this, the doctor
exits-0 mechanical test inherits the developer machine's stale
partial-migration ledger entries (0.21.0, 0.22.4, 0.28.0, 0.29.1
prior dev work) and surfaces them as the [FAIL] minions_migration check.
GBRAIN_HOME-scoped tempdir per test now isolates this state cleanly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.31 chore: scrub personal references from public artifacts

Per the CLAUDE.md privacy rule on `Garry's Separation Test`, replace
personally-coded references in v0.31 artifacts with neutral examples:

- CHANGELOG.md v0.31 entry: rename "Garry's Separation Test" header to
  "The cross-session test" + drop the "topic-2659/topic-1941, 7 AM/2 PM,
  flying to Tokyo" narrative.
- src/commands/migrations/v0_31_0.ts feature pitch: same scrub.
- test/facts-separation-pglite.test.ts + test/e2e/facts-separation-postgres.test.ts:
  rename describe blocks; replace specific topic-NNNN session ids with
  session-A / session-B; replace personal sample fact with
  "sample event Tuesday".
- src/core/facts/extract.ts extractor system prompt example slugs:
  people/sam-altman → people/alice-example; companies/anthropic → companies/acme.
- src/core/entities/resolve.ts comment: Sam Altman → Alice Example.
- All v0.31 test fixtures: people/sam → people/alice-example,
  Sam Altman → Alice Example, sam-the-cofounder → alice-the-cofounder.
  Test names referencing real-world entities replaced with neutral slugs.

Pre-existing references to "Garry" elsewhere in CHANGELOG (v0.17, v0.19,
v0.21+ entries) are untouched — that's a separate scope from this v0.31
ship.

Plus: the truncate fix for the Bun-script-induced syntax error in
test/e2e/mechanical.test.ts (cliEnv arrow function had ", 30_000)" tacked
onto its closing brace by the bulk-add-timeouts script — repaired to a
clean function definition).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.31 fix(test): bump E2E phase-count assertions for 11-phase cycle

Two E2E tests still asserted the v0.31 pre-merge 10-phase shape
(consolidate inserted, but recompute_emotional_weight from v0.29 not yet
absorbed). With master's v0.29 work merged in, the cycle is now 11 phases:
lint → backlinks → sync → synthesize → extract → patterns →
recompute_emotional_weight → consolidate → embed → orphans → purge.

- test/e2e/cycle.test.ts: 10 → 11
- test/e2e/dream-cycle-eight-phase-pglite.test.ts: ALL_PHASES + dry-run order

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.31 fix(merge): close brace between v44 and v45 migration objects

The v0.30.2 merge resolution stitched master's v40-v44 migrations onto
HEAD's v45 (facts hot memory) migration but lost the closing `},` between
v44 and v45. tsc caught it as TS1136 Property assignment expected at
migrate.ts:2188.

This is a one-line bracket fix; the rest of the merge resolution is
correct and tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.31 fix: put_page cliHints + buildPlan v0.31.0 in skippedFuture

Two unit-test failures surfaced after the v0.30.2 merge:

1. operations.ts: put_page had `cliHints: { name: 'put', positional: ['stdin'] }`
   from earlier v0.31 development. The parity test enforces that every name
   in `positional` is a real param. Restored master's correct shape:
   `{ name: 'put', positional: ['slug'], stdin: 'content' }`.

2. test/apply-migrations.test.ts: the H9 regression tests pin the exact
   skippedFuture list. Adding v0.31.0 to the registry meant the list grew
   by one. Updated both `expect(...).toEqual([...])` assertions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.31 docs: clarify consolidate is 11th phase + regen llms-full.txt

CHANGELOG.md narrative said "new 10th phase consolidate"; with v0.29's
recompute_emotional_weight already on master, consolidate is the 11th phase
(between recompute and embed). Schema migration is v45, not v40, after the
merge resolution renumbered it to clear master's v40-v44.

llms-full.txt regenerated to reflect the README's 11-phase dream-cycle
phrasing (the build-llms test enforces commit-time parity).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…arrytan#772)

* v0.31.1 feat: get_brain_identity MCP op (Issue garrytan#734 prep)

Lightweight read-scope op that returns {version, engine, page_count,
chunk_count, last_sync_iso} for the thin-client identity banner.
Reuses engine.getStats() — banner's 60s TTL cache (next commit) bounds
frequency to ≤1/60s per CLI process. Banner-only op, no cliHints.

Pinned by 9 tests in test/get-brain-identity.test.ts.

Part of v0.31.1 fix for garrytan#734 (thin-client mode silently routing
~25 CLI commands to empty local PGLite). See plan at
~/.claude/plans/how-to-make-mcp-iterative-liskov.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.31.1 feat: harden callRemoteTool error normalization + abort/timeout

CDX-4 (Codex outside-voice finding): the previous callRemoteTool let
plain Error escape — undici network errors, AbortError, JSON parse
failures all bubbled untyped. Plan called for an exhaustive switch on
RemoteMcpError.reason at the dispatcher; that contract was unsound.

Hardening:
- New CallRemoteToolOptions {timeoutMs?, signal?} (4th arg, optional).
- buildAbortController composes external signal with timeout into a
  single signal threaded through the SDK transport's requestInit.
- toRemoteMcpError funnel converts ANY thrown value to RemoteMcpError
  before re-raising; the outermost try/catch guarantees the contract.
- RemoteMcpErrorReason exported as a stable union type.
- RemoteMcpErrorDetail.kind ('timeout'|'aborted'|'unreachable') sub-tags
  network errors so the dispatcher can render the right hint.
- RemoteMcpErrorDetail.code carries server-supplied error codes on
  tool_error (e.g. 'missing_scope') for pinpoint refusal hints.
- extractToolErrorCode parses JSON envelopes first, falls back to
  substring detection for legacy server messages.

All 13 existing mcp-client tests still pass. Typecheck clean.

Part of v0.31.1 fix for garrytan#734.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.31.1 feat: --timeout=Ns CLI flag for thin-client routed calls (ENG-4)

New global flag --timeout that accepts ms / s / m / ms-suffix forms
("30s", "2m", "500ms", "500"). Default null = per-command default
(30s for most ops, 180s for `think` per ENG-4). Plumbs through to
callRemoteTool's AbortController via cliOpts.timeoutMs.

Rejection cases (timeoutMs stays null, flag falls through):
- --timeout=0 (must be positive)
- --timeout=garbage (no parse)

Pinned by 8 new tests in test/cli-options.test.ts (total 28 pass).

Part of v0.31.1 fix for garrytan#734.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.31.1 feat: thin-client routing seam in cli.ts (CDX-1)

The keystone fix for Issue garrytan#734. Inserts the routing seam INSIDE the
existing op-dispatch path in cli.ts:78-138 (per Codex finding CDX-1) —
no parallel `src/core/thin-client/` module. Routing is a ~80-line
conditional that runs BEFORE connectEngine() so thin-client installs
never open the empty local PGLite.

Architecture (CDX-1, CDX-4, ENG-2, ENG-4):
- Existing arg parser, image-to-base64 transform, stdin handler, and
  required-param check run UNCHANGED before the routing branch. Zero
  duplicated parsers.
- New runThinClientRouted(op, params, cfg, cliOpts) calls callRemoteTool
  with {timeoutMs, signal}; default 180s for `think`, 30s otherwise;
  --timeout flag overrides.
- SIGINT abort threaded into AbortController → exit 130.
- Exhaustive TS `never` switch on RemoteMcpError.reason produces canned,
  actionable user messages per failure mode (ENG-4 contract).
- ENG-2 renderer parity: local-engine path runs JSON.parse(JSON.stringify())
  on the result before formatResult, killing the Date/bigint/Buffer drift
  class without per-command renderer audit.
- THIN_CLIENT_REFUSE_HINTS table replaces the generic refusal message
  with pinpoint hints (CDX-5 / cherry-pick A). Adds dream/transcripts/storage
  to the refused set with their own hints.
- localOnly ops on thin-client refuse via refuseThinClient (with hint).

Pinned by 14 cli-dispatch-thin-client tests (all pass). Typecheck clean.

Part of v0.31.1 fix for garrytan#734.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.31.1 feat: thin-client identity banner (cherry-pick B)

Prints "[thin-client → wintermute.fly.dev:3131 · brain: 102k pages,
265k chunks · v0.31.1]" to stderr before each routed command. Kills the
"am I empty?" confusion that drove the original Hermes/Neuromancer
report against wintermute (102k pages → empty CLI search results).

Cache: 60s TTL, in-memory Map keyed by mcp_url so switching hosts via
`gbrain init` invalidates cleanly. Cross-process file cache deferred.

Suppression: --quiet, GBRAIN_NO_BANNER=1, non-TTY default suppresses
unless GBRAIN_BANNER=1 explicitly opts in (clean pipes for shell flows).

Failure mode: banner fetch errors swallowed; underlying command runs
normally. Banner is observability, never load-bearing. The hardened
callRemoteTool will surface the same error class on the actual call
if the host is genuinely unreachable.

Inline in cli.ts per CDX-1 (no parallel module). _clearIdentityCacheForTest
exported as test escape hatch.

Backed by the new `get_brain_identity` MCP op (read-scope, banner-only).

Part of v0.31.1 fix for garrytan#734.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.31.1 feat: route CLI-only commands with MCP equivalents (salience/anomalies/graph-query/think)

These four CLI commands bypass the operation-layer dispatch and call
engine methods directly today, so the cli.ts routing seam doesn't catch
them. Each gets a thin per-command branch: when isThinClient(cfg),
callRemoteTool against the corresponding op; otherwise existing engine
path runs unchanged.

Mappings:
- gbrain salience    → get_recent_salience  (read scope, 30s timeout)
- gbrain anomalies   → find_anomalies       (read scope, 30s timeout)
- gbrain graph-query → traverse_graph       (read scope, 30s timeout)
- gbrain think       → think                (write scope, 180s timeout)

`think` is a special case: the server's think op intentionally disables
--save/--take for remote callers (operations.ts:1103-1135 trust-boundary
gate per CLAUDE.md subagent-isolation policy). Thin-client think prints a
loud warning when those flags are set so users know what they lose
instead of silent ignoring. Documented as v0.31.x policy review in plan.

Output format unchanged on both paths — the MCP op handler IS the engine
method, so the unpacked tool result has identical shape.

Part of v0.31.1 fix for garrytan#734.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.31.1 feat: oauth_client_scopes_probe doctor check (CDX-5)

\`gbrain remote doctor\` gains a 5th check that probes the read + admin
scope tiers via two harmless read-only MCP calls (get_brain_identity
and get_health). Surfaces v0.29.2/v0.30.0 thin-client clients that
registered with read+write only and now hit \`gbrain stats\` /
\`gbrain history\` and fail mid-flight — instead of failing
mid-command, doctor names the exact remediation:

  On the host: gbrain auth register-client <name> --grant-types
    client_credentials --scopes read,write,admin

Status semantics (informational by default):
- read.missing_scope  → fail (broken setup)
- admin.missing_scope → warn + pinpoint hint (the load-bearing case)
- both succeed        → ok
- non-scope probe errors (parse/network/timeout) → ok with
  detail.inconclusive=true (doctor's overall status doesn't flap)

GBRAIN_DOCTOR_SKIP_SCOPE_PROBE=1 env-flag for test fixtures that mock
/mcp at JSON-RPC initialize level only (MCP SDK Client hangs on shape
mismatch and doesn't always honor AbortSignal — adversarial test
behavior we don't want to bake into doctor).

Pinned by 8 cases in test/oauth-scope-probe.test.ts (pure-function
buildScopeCheck) plus unchanged passing of all 23 doctor-remote tests.

CDX-5 from the codex outside-voice review. Keeps host-side
\`gbrain auth register-client\` default at \`read\` (no breaking change
for existing scrapers); puts the migration burden on the THIN-CLIENT
side where it belongs.

Part of v0.31.1 fix for garrytan#734.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.31.1 feat: refuse \`takes\`/\`sources\` on thin-client with MCP-tool hints (CDX-2)

Per the CDX-2 op-coverage audit: takes and sources are multi-subcommand
CLIs with mixed local/routable surface. Their READ subcommands
(takes_list, takes_search, sources_list, sources_status) have MCP
equivalents — those land in v0.31.x with per-subcommand splits.

For v0.31.1, refuse both at the top level with hints naming the MCP
tools so agents know exactly which tools to invoke directly. Honest
framing per CDX-2: "thin-client gbrain routes the read+write+admin op
surface; multi-subcommand CLIs land incrementally."

Per-subcommand routing recorded as v0.31.x TODO in the plan.

Storage is also refused (filesystem-bound; no remote equivalent).

Part of v0.31.1 fix for garrytan#734.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.31.1 docs + version: bump VERSION/package.json, CHANGELOG, TODOS, CLAUDE.md

Cross-cut for v0.31.1 ship:
- VERSION: 0.30.0 → 0.31.1
- package.json: "version": "0.31.1" (bun install refreshed bun.lock)
- CHANGELOG.md: full release-summary entry per CLAUDE.md voice contract
  (numbers-that-matter table with before/after comparison, what-this-means
  closer, take-advantage block with exact remediation commands, itemized
  changes by surface, contributor section with plan/decision-history pointer)
- TODOS.md: 7 follow-up entries for v0.31.x (timing telemetry, job-routing,
  per-subcommand takes/sources split, transcripts privacy decision,
  trust-boundary policy review, register-client default flip, cross-process
  token cache, parity test backfill)
- CLAUDE.md: new "Thin-client routing" section under "Key files" annotating
  every changed/new file with its v0.31.1 contract — src/cli.ts routing
  seam, src/core/mcp-client.ts hardening, src/core/cli-options.ts --timeout,
  src/core/doctor-remote.ts scope-probe, get_brain_identity op, per-command
  routing in salience/anomalies/graph-query/think.

Part of v0.31.1 fix for garrytan#734.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.31.1 fix: collectRemoteDoctorReport opts.skipScopeProbe + regen llms.txt

Replaces the env-var GBRAIN_DOCTOR_SKIP_SCOPE_PROBE module-mutation in
test/doctor-remote.test.ts with an explicit opts arg threaded through
collectRemoteDoctorReport(config, opts). Satisfies the test-isolation
lint (rule R1: no process.env.X = ... in non-serial unit files).

Production callers still honor the env-flag for ops bypass; opts wins
when both are set.

Also regenerates llms.txt + llms-full.txt to match the v0.31.1 CLAUDE.md
additions (build:llms drift check passes).

Part of v0.31.1 fix for garrytan#734.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.31.1 test: close coverage gaps — issue garrytan#734 e2e regression + CDX-4 hardening unit tests

Two real gaps the prior coverage missed:

1. **Issue garrytan#734 regression e2e** (test/e2e/thin-client.test.ts +6 cases):
   Existing e2e covered init/doctor/sync-refusal/remote-ping/no-admin but
   never exercised the actual bug — `gbrain search` against a populated
   host. Added the load-bearing regression: seed two pages on the host,
   run thin-client `gbrain search "<unique-token>"`, assert non-zero rows
   AND seeded slug present in stdout. If this assertion ever fails, garrytan#734
   has regressed.

   Plus: routed identity banner verification (GBRAIN_BANNER=1 path),
   --quiet suppression check, routed put round-trip (write reaches host,
   visible from host's local engine), routed admin stats (page_count > 0
   not 0/0), and pinpoint refuse-hint format for `gbrain sync`.

2. **CDX-4 hardening unit tests** (test/mcp-client-hardening.test.ts +31
   cases): pre-fix the hardening pass had ZERO direct unit coverage. The
   "exhaustive switch on RemoteMcpError.reason" promise depended on
   toRemoteMcpError actually normalizing every thrown value, but nothing
   verified that contract. Added:
   - toRemoteMcpError: passthrough for RemoteMcpError, AbortError →
     network/aborted, plain Error → network/unreachable, string/object/null
     non-Error throwables → network/unreachable, mcp_url always populated,
     contract test that EVERY output has a recognized reason
   - extractToolErrorCode: JSON envelope (error.code + top-level code),
     substring fallback for missing-scope-shaped messages, defensive
     handling of non-string code field, malformed-JSON fallthrough
   - buildAbortController: timeout fires on schedule, external signal
     propagates immediately when pre-aborted and lazily when aborted later,
     timeout + external compose (whichever fires first wins), cleanup is
     idempotent and removes external listener (no leak)
   - RemoteMcpError class shape (instanceof Error, reason/detail readonly,
     name="RemoteMcpError", detail optional)
   - CallRemoteToolOptions type contract

Internal helpers (toRemoteMcpError, extractToolErrorCode,
buildAbortController) gain @internal export tags so the test file can
import them without going through the SDK transport.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.31.1 test: move routing tests before remote-ping; fix pre-existing assertion for new refusal format

The newly-added routing tests were running AFTER `gbrain remote ping`,
which submits a 60s autopilot-cycle and can leave the server in a
state where subsequent OAuth probes fail. Moving them before Tier B
so they exercise a healthy server.

Also updated the existing `sync is refused with canonical thin-client error`
test assertion: v0.31.1 changed the refusal format from generic
\`thin client\` (with space) to the pinpoint \`thin-client of <url>\`
(with hyphen) plus \`not routable\` prefix. The test now asserts both
the new format and the pinpoint hint.

E2E result: 10 pass / 3 fail. The 3 failures are pre-existing on master
(remote-ping timeout, client-without-admin OAuth discovery flake) and
not in my diff scope.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.31.1 fix: scrub banned fork name from new test fixtures (CI privacy gate)

CI's check-privacy.sh rejected the v0.31.1 test additions because the
unique-token fixture string used the private OpenClaw fork name as a
prefix. Replaced with neutral names per CLAUDE.md privacy rule:

- test/e2e/thin-client.test.ts: \`wintermute_routing_proof\` →
  \`host_routing_proof\` (the unique-token marker that proves search
  results came from the remote brain, not the empty local PGLite).
  All 6 references updated.

- test/mcp-client-hardening.test.ts: \`https://wintermute.fly.dev/mcp\` →
  \`https://brain-host.example/mcp\` (the synthetic MCP URL used as the
  toRemoteMcpError second arg). Matches the convention used in the
  existing test/cli-dispatch-thin-client.test.ts fixture.

bun run verify passes; 31/31 hardening tests still pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…-path, sync, multi-source, privacy) (garrytan#776)

* fix: bootstrap forward-references for v39-v41 schema replay

Three column-with-index forward references in the embedded schema blob were
missing from applyForwardReferenceBootstrap, so any brain at config.version
< 39 (Postgres) or < 41 (PGLite) wedges before the migration runner can
advance. Reproduced end-to-end on a PlanetScale Postgres brain stuck at
config.version=34 trying to upgrade to v0.30.0:

  ERROR: column "effective_date" does not exist
  ERROR: column cc.modality does not exist

(After upgrading, gbrain search and gbrain reindex-frontmatter both fail.)

The schema-blob references that crash before migrations run:

- v39 (multimodal_dual_column_v0_27_1):
    CREATE INDEX idx_chunks_embedding_image
      ON content_chunks USING hnsw (embedding_image vector_cosine_ops)
      WHERE embedding_image IS NOT NULL;
- v41 (pages_recency_columns):
    CREATE INDEX pages_coalesce_date_idx
      ON pages ((COALESCE(effective_date, updated_at)));

PGLite already covered v39 (lines 273+, 308+, 382-392). Postgres and PGLite
both lacked v40+v41 coverage. This commit adds:

- Postgres engine probe + branch for v39 (modality, embedding_image) — was
  entirely missing on Postgres, so Postgres brains < v39 hit the wedge that
  PGLite already protected against.
- Both engines: probe + branch for v40+v41. Bootstraps all five additive
  pages columns (emotional_weight, effective_date, effective_date_source,
  import_filename, salience_touched_at) gated on `effective_date_exists`
  as the proxy.
- test/schema-bootstrap-coverage.test.ts: extends REQUIRED_BOOTSTRAP_COVERAGE
  with the six new columns AND the pre-test DROP block so both the per-target
  assertion test and the end-to-end "bootstrap + SCHEMA_SQL replay" test
  exercise the new coverage.

All 5 tests in schema-bootstrap-coverage pass. typecheck clean.

Bootstrap stays additive-columns-only. Indexes are left to schema replay /
migrations as before.

* fix(deps): declare @jsquash/png and heic-decode

Both packages are direct imports in src/core/import-file.ts (decodeIfNeeded
for HEIC/AVIF → PNG) but only @jsquash/avif was declared. bun --compile
fails on a fresh install:

  error: Could not resolve: "@jsquash/png/encode.js"
  error: Could not resolve: "heic-decode"

Adds the missing declarations so npm install / bun install bring them in.

Versions chosen as latest at time of fix:
  @jsquash/png  ^3.1.1
  heic-decode   ^2.1.0

* fix(backfill-effective-date): replace bare BEGIN/COMMIT with engine.transaction()

postgres.js refuses bare BEGIN/COMMIT on pooled connections with
UNSAFE_TRANSACTION. The migration runner and other call sites already
use engine.transaction() (which routes through sql.begin() with a
reserved backend) — backfill-effective-date.ts was the holdout.

Reproduces on PlanetScale Postgres (us-east-4.pg.psdb.cloud) running
the v0.29.1 orchestrator's Phase B against a brain that has any rows
needing backfill:

  Reindex ok ... UNSAFE_TRANSACTION: Only use sql.begin, sql.reserved or max: 1

Switches the per-batch transaction to engine.transaction(async tx => …).
The SET LOCAL statement_timeout still scopes to the transaction; UPDATE
runs through the tx-scoped engine. ROLLBACK on error happens
automatically via sql.begin's contract.

Equivalent fix shape to existing usages in src/core/postgres-engine.ts
(lines 703, 806, 925) and the migration runner in src/core/migrate.ts
(line 2147).

* fix(v0_29_1): connect engine before use in Phase B and Phase C

phaseBBackfill() and phaseCVerify() build their own engine via
createEngine(toEngineConfig(cfg)) but never call engine.connect().
This worked accidentally before because executeRaw lazily falls back
to db.getConnection(), but engine.transaction() (added in the
companion backfill fix) requires a connected backend and surfaces
the missing-connect with:

  No database connection: connect() has not been called.
  Fix: Run gbrain init --supabase or gbrain init --url <connection_string>

Other orchestrators in the same directory get this right —
v0_28_0.ts:181 already does `await engine.connect(engineConfig)`
right after createEngine. Aligning v0_29_1 with that pattern.

After this + the backfill fix, v0.29.1 orchestrator runs to
'complete' on a fresh upgrade with backfill-needed rows, instead
of wedging at 'partial' status.

Note: anyone hitting the wedged state after the prior failures will
need `gbrain apply-migrations --force-retry 0.29.1` once before the
next apply-migrations --yes succeeds (the 3-consecutive-partials
guard in apply-migrations.ts is still active).

* fix: connect engine in v0.29.1 migration

* fix(upgrade): detectBunLink fails because bun resolves symlinks in argv[1]

bun resolves the entire symlink chain before setting process.argv[1],
so lstatSync(argv1).isSymbolicLink() always returns false for bun-link
installs, short-circuiting the git-config walk that would correctly
identify the repo. Remove the symlink gate — argv[1] is already the
real path inside the checkout, which is what the walk needs.

Also: return { repoRoot } so the upgrade path can auto-execute
git pull + bun install via execFileSync (no shell injection surface).

Fixes garrytan#368, supersedes incomplete v0.28.5 fix for garrytan#656.

* fix(oauth): clamp authorize() requested scopes against client.scope (RFC 6749 §3.3)

The MCP SDK's authorize handler (`@modelcontextprotocol/sdk/.../auth/handlers/authorize.js`)
splits `?scope=...` verbatim and forwards the parsed list to the provider, so the
provider has to clamp against the client's registered grant. v0.28.11
`authorize()` (src/core/oauth-provider.ts:235-259) inserted `params.scopes || []`
raw into `oauth_codes`, so a `read`-registered client requesting
`?scope=admin` had `['admin']` stored and `exchangeAuthorizationCode` issued
a fully-admin access token at /token exchange.

The asymmetry is the bug: the other two grant entry points already clamp.
`exchangeClientCredentials` (line 513-515) filters requested scopes through
`hasScope(allowedScopes, s)`, and `exchangeRefreshToken`'s F3 (line 372-380)
enforces RFC 6749 §6 subset against the original grant. authorize() lined up
with neither.

Fix mirrors the client_credentials filter shape so all three grant entry
points clamp consistently:

    const allowedScopes = parseScopeString(client.scope);
    const grantedScopes = (params.scopes || []).filter(s => hasScope(allowedScopes, s));

Empty/omitted requested scope keeps storing `[]` (existing shape, not a
security boundary). The clamped subset is what the client sees in the
`scope` field of the token response, which is the spec-compliant signal
that the grant was reduced.

Test coverage:
- New: authorize clamps requested scopes against client.scope (RFC 6749 §3.3)
  — read-only client requests ['read','write','admin'] and the issued token
  carries only ['read'].
- New: authorize subset request returns subset — 'read write' client
  requesting ['read'] gets ['read'] (regression guard against over-clamping).

The existing v0.26.9 oauth.test.ts pins F3 (refresh clamp) but had no
authorize-side coverage, which is why the regression survived.

* fix(sync): handle detached HEAD by skipping pull and ingesting local working tree

* fix(sync): --skip-failed acks pre-existing unacked failures up-front

The recovery flow that doctor + printSyncResult both advertise was broken:

1. User has files with bad YAML → they hit the failure log + sync stays
   blocked at last_commit.
2. User fixes the YAML.
3. User re-runs `gbrain sync` — sync succeeds, advances last_commit.
4. `gbrain doctor` still reports N unacked failures from step 1 because
   sync-failures.jsonl is append-only history, never auto-cleared.
5. doctor message says: "use 'gbrain sync --skip-failed' to acknowledge".
6. User runs `gbrain sync --skip-failed` → "Already up to date." → log
   unchanged.

The bug: --skip-failed only acknowledges failures from the CURRENT run.
performSync's ack path is gated on `failedFiles.length > 0` after sync —
it never fires when the diff is empty (because the user already fixed
the bad files) or when the sync is up to date. So the documented recovery
sequence is a no-op exactly when the user needs it.

The fix: at the top of runSync, when --skip-failed is set, eagerly ack
any pre-existing unacked failures before any sync work runs. Now the flag
means "acknowledge whatever is currently flagged and move on" regardless
of whether the current run produces new failures or finds nothing to do.

The inner per-run ack path stays — it still handles new failures from
the CURRENT run, which is the (a) syncing now produces failures + (b)
caller wants to ack them path. The two paths compose: `gbrain sync
--skip-failed` clears stale + advances past anything new, all in one
command, matching what the doctor message promises.

Tests: 2 added in test/sync-failures.test.ts. One source-string pin on
the new gate (the file's existing pattern for CLI-flag tests). One
behavioral test on the underlying acknowledgeSyncFailures path.

Repro:
  $ gbrain doctor
  [WARN] sync_failures: 27 unacknowledged sync failure(s)...
         Fix the file(s) and re-run 'gbrain sync', or use
         'gbrain sync --skip-failed' to acknowledge.
  $ # ... fix the YAML ...
  $ gbrain sync
  Already up to date.
  $ gbrain sync --skip-failed
  Already up to date.   # before this PR
  $ gbrain doctor
  [WARN] sync_failures: 27 unacknowledged sync failure(s)...   # still!

After:
  $ gbrain sync --skip-failed
  Acknowledged 27 pre-existing failure(s).
  Already up to date.
  $ gbrain doctor
  [OK] sync_failures: N historical sync failure(s), all acknowledged

* fix(extract): default --dir to configured brain dir, not cwd

`gbrain extract links` (and timeline / all) defaulted --dir to '.' when
not explicitly passed (src/commands/extract.ts:357). Combined with a
walker that skips dotfiles but NOT node_modules/dist/build/vendor, this
turned a no-arg invocation into a footgun.

Repro:
  $ cd ~/Documents/some-project   # has a node_modules/ tree
  $ gbrain extract links
  [extract.links_fs] 28989/28989 (100%) done
  Links: created 0 from 28989 pages
  Done: 0 links, 0 timeline entries from 28989 pages

The "28989 pages" is `walkMarkdownFiles('.')` recursively eating package
READMEs, dependency docs, fixture content. Their from_slug doesn't match
any row in the pages table, so addLinksBatch rejects every insert and
returns 0. Output looks like a healthy idempotent no-op; was actually a
wasteful junk walk that wrote nothing.

Fix: when --dir is not passed AND source is fs, resolve from
sources(local_path) via getDefaultSourcePath — same helper sync uses
(src/commands/sync.ts:1089). The default behavior now matches `sync`:
"work on the configured brain". Falls back to a clear error when no
source is configured, telling the user to either pass --dir, register
a source, or use --source db.

Behavior matrix:
  --dir explicit     → use that path (unchanged)
  --dir absent + cfg → resolve from sources(local_path)
  --dir absent + no  → error with actionable hint (was: walk cwd silently)
  --dir .            → cwd (user opted in explicitly — unchanged)

Tests: three added in test/extract-fs.test.ts:
  1. configured source → no-arg invocation extracts from that path
  2. no source configured → exit 1 + actionable error message
  3. explicit --dir wins over a configured (decoy) source path

* fix(extract): normalize slugs to lowercase via pathToSlug() (T-OBS-1)

The extractor was generating from_slug and the allSlugs lookup set from
`relPath.replace('.md', '')` in 5 places, producing CAPS slugs for files
named ETHOS.md, AGENTS.md, ROADMAP.md, etc.

Pages persist in the DB with lowercase slug (core/sync.ts pathToSlug()
applies .toLowerCase()). The CAPS extractor output mismatched the DB rows,
so INSERT ... JOIN pages ON pages.slug = v.from_slug silently dropped
links from CAPS-named source files. The link batch returned 'inserted'
counts that were lower than the wikilinks actually present, with no error.

Reproduction (in a brain with CAPS-named canonical docs):
  1. echo 'See [agents](agents.md).' > ETHOS.md
  2. gbrain put ethos < ETHOS.md  # page row: slug='ethos'
  3. gbrain extract links --source fs
  4. gbrain backlinks agents → []  (expected: contains 'ethos')

Fix: import pathToSlug from core/sync.ts and use it in all 5 sites:
  - extractLinksFromFile (line 200): from_slug derivation
  - runIncrementalExtractInternal (line 456): allSlugs set
  - extractLinksFromDir (line 552): allSlugs set
  - timeline loop (line 643): from_slug for timeline entries
  - extractLinksForSlugs (line 673): allSlugs set used by sync hook

This single-line-per-site change keeps the extractor consistent with the
sync layer's slug normalization and doesn't introduce any new behavior
for already-lowercase paths (idempotent).

Tests: added 'extractLinksFromFile — slug normalization (T-OBS-1
regression)' suite with 4 cases covering CAPS, mixed-case, idempotent
lowercase, and nested path. Full extract suite (54 → 58 tests) passes.

Reported by Claude Code (Opus 4.7) during Obsidian PKM integration on
the gstack-plan Living Repo, where ~111 wikilinks pointing to ETHOS,
AGENTS, ROADMAP, etc. failed to count toward brain_score (54/100 vs
expected 75+/100). Documented as T-OBS-1 in the consumer's blocked.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(cli): CLI_ONLY commands should short-circuit on --help instead of executing

* fix(doctor): correct command syntax in graph_coverage warn message

graph_coverage warn directs users to run `gbrain link-extract &&
gbrain timeline-extract`, but no commands by those names are
registered in cli.ts. The actual commands are `gbrain extract links`
and `gbrain extract timeline` (registered as the 'extract'
subcommand at src/cli.ts:525, with the kind argument 'links' /
'timeline' / 'all' parsed inside src/commands/extract.ts).

A user who runs the suggested command gets:
  $ gbrain link-extract
  Unknown command: link-extract

This is the only place in src/ with the wrong syntax — the rest of
the docs (init.ts:221, init.ts:331, features.ts:120,
v0_13_0.ts:67, sync.ts:752 comment) all already say 'extract links'.
This patch just brings doctor.ts in line.

* fix(doctor): use autoDetectSkillsDir so OpenClaw workspaces are reachable

`gbrain doctor` was the only consumer of `findRepoRoot` from
`core/repo-root.ts`. Every other consumer (check-resolvable.ts:145,
skillify.ts, etc.) uses `autoDetectSkillsDir`, which has the full
detection chain:
  1. \$OPENCLAW_WORKSPACE
  2. ~/.openclaw/workspace
  3. findRepoRoot() walk from cwd
  4. ./skills

`findRepoRoot` only does step 3. Result: when the user runs `gbrain
doctor` from any directory outside the gbrain repo or the OpenClaw
workspace tree (e.g., a project's checkout), `resolver_health` reports
"Could not find skills directory" even though the dispatcher exists at
~/.openclaw/workspace/skills/RESOLVER.md.

Reproduces in any directory other than ~/gbrain or its descendants on
a system with ~/.openclaw/workspace/skills/RESOLVER.md present:

    \$ cd ~/Documents
    \$ gbrain doctor
    [WARN] resolver_health: Could not find skills directory   # before
    [WARN] resolver_health: 5 issue(s): 0 error(s), 5 warning(s)  # after

Switching doctor to `autoDetectSkillsDir` brings it inline with the rest
of the codebase. The detected dir is also passed to
`checkSkillConformance` (step 2 of the resolver_health block), which
previously rebuilt the path from `repoRoot` — now uses the same
detected path for consistency.

All 15 existing tests in test/doctor.test.ts continue to pass.

* fix(mcp): exit serve process on stdin-close/SIGTERM

MCP stdio server was keeping the bun process alive indefinitely after
the client disconnected. Over days this accumulated 20+ orphaned
gbrain serve processes, all holding the PGLite directory open.
Since PGLite is single-writer, this caused write-lock contention that
made email-sync fail its 15s per-put timeout: 114 puts x 15s = 28.5min
runs with 0 emails written.

Now listens for stdin end/close, transport close, and SIGTERM/SIGINT/
SIGHUP; calls engine.disconnect() and exits cleanly.

Root cause for the no-gbrain-run-in-50h alert.

* fix(skills): broaden RESOLVER triggers + 1 ambiguity flag (37 misses → 0, 100% top-1 accuracy)

`bun run src/cli.ts routing-eval` was reporting 37 ROUTING_MISS entries
across 10 skills whose RESOLVER.md trigger phrases didn't match any of
their own routing-eval.jsonl fixture intents. Two distinct causes:

1. Single-phrase triggers in 9 skills under '## Uncategorized' didn't
   cover the paraphrased fixture variations they're supposed to route.
   Broadened each trigger cell to a quoted-phrase list that covers the
   fixtures (5 fixtures per skill on average).

2. The media-ingest row used unquoted prose
   ('Video, audio, PDF, book, YouTube, screenshot') which
   extractTriggerPhrases() collapses into one impossible long phrase
   ('video audio pdf book youtube screenshot') under normalizeText —
   no fixture intent will ever contain that exact substring. Converted
   to a quoted phrase list.

3. One fixture ('web research pass on this person') legitimately
   matches both `perplexity-research` and `data-research`
   (data-research's trigger row contains "Research"). Marked the
   fixture `ambiguous_with: ["data-research"]` since the overlap
   on the keyword 'research' is inherent and expected.

Skills with broadened triggers:
  - voice-note-ingest, article-enrichment, book-mirror,
    archive-crawler, brain-pdf, academic-verify, concept-synthesis,
    perplexity-research, strategic-reading, media-ingest

Before: 58 cases, 37 misses, ~36% top-1 accuracy
After:  58 cases, 0 misses, 100% top-1 accuracy

This also clears `gbrain doctor`'s `resolver_health: 37 issue(s)` warning.

* fix(multi-source): thread source_id through per-page tx surface

Multi-source brains crashed mid-import with Postgres 21000 ("more than one
row returned by a subquery used as an expression"). Root cause: putPage's
INSERT column list omitted source_id, so writes intended for a non-default
source (e.g. 'jarvis-memory') silently fabricated a duplicate row at
(default, slug). The schema has UNIQUE(source_id, slug) but DEFAULT 'default'
for source_id; calling putPage(slug, page) without source_id landed at
(default, slug) and ON CONFLICT updated the wrong row, leaving the intended
source row stale. Subsequent bare-slug subqueries inside the same tx —
(SELECT id FROM pages WHERE slug = $1) in getTags / removeTag / deleteChunks
/ removeLink / addLink (cross-product) — then matched 2 rows and crashed
with 21000, rolling back the entire import. Observed: 18 sync failures
against a 'jarvis-memory'-sourced brain.

Fix:
- putPage adds source_id to the INSERT column list (defaults 'default' for
  back-compat).
- Every bare-slug page-id subquery becomes source-qualified
  (AND source_id = $X) in both engines: createVersion, upsertChunks,
  getChunks, addTag, removeTag, getTags, deleteChunks, removeLink,
  addTimelineEntry, deletePage, updateSlug.
- addLink rewritten away from FROM pages f, pages t cross-product into a
  VALUES + JOIN-on-(slug, source_id) shape mirroring addLinksBatch.
- engine.ts interface: 11 method signatures gain optional opts.sourceId
  (or opts.{from,to,origin}SourceId for addLink/removeLink). All optional;
  existing callers default to source='default' and behave identically.
- import-file.ts: importFromContent / importFromFile / importCodeFile take
  opts.sourceId and thread txOpts = { sourceId } through every per-page tx
  call. engine.getPage callsite source-scoped for accurate idempotency.
- commands/sync.ts: thread opts.sourceId at importFile (line 581 + 641),
  un-syncable cleanup (487-498), delete phase (557), rename phase (574),
  and post-sync extract phase (815-816).
- commands/reindex-code.ts: thread opts.sourceId at importCodeFile call.
- commands/extract.ts: extractLinksForSlugs / extractTimelineForSlugs accept
  opts.sourceId and propagate via linkOpts / entryOpts.
- commands/reconcile-links.ts: ReconcileLinksOpts.sourceId was declared but
  ignored end-to-end; now wired through getPage + addLink calls.
- commands/migrate-engine.ts: --force wipe switched to executeRaw('DELETE
  FROM pages') to preserve the pre-PR all-sources semantic after deletePage
  became default-source-scoped.

Regression test: test/source-id-tx-regression.test.ts (19 tests). Validates
two sources × same slug coexist; getTags/addTag/removeTag/deleteChunks/
upsertChunks/createVersion/addLink/addTimelineEntry/deletePage/updateSlug
source-scoped writes don't 21000; back-compat without opts targets
source='default'; addLink fail-fast on missing source-qualified endpoint;
importFromContent end-to-end tx thread without fabricating duplicate.

Adversarial review: Codex (gpt-5.5 reviewer) + Grok (xAI flagship reviewer)
3-round crew loop. Round 1: 2 HIGH (addTimelineEntry + extract.ts thread)
+ 2 MED. Round 2: 1 CRITICAL + 1 HIGH (deletePage + updateSlug bare-slug)
+ 2 MED. Round 3: 2 HIGH (getChunks + migrate-engine semantic regression
introduced by R2 fix). Round 4: both reviewers CLEAR.

Deferred to follow-up PRs (noted as TODO):
- src/commands/embed.ts source-aware threading (auto-embed at sync.ts:823
  has a TODO; try/catch swallows the failure as best-effort).
- src/core/postgres-engine.ts:1511 / pglite-engine.ts:1446 putRawData
  bare-slug (lower-impact metadata path).
- Read-surface bare-slug consistency cleanup (getLinks/getBacklinks/
  getTimeline/getRawData/getVersions): non-mutating, won't 21000.
- reconcile-links.ts CLI --source flag exposure (internal opt is wired;
  CLI parser is a UX feature for later).

Existing rows in production written under (default, slug) by the old
putPage when caller meant another source remain misrouted. Backfill
heuristics need install-specific knowledge of intended source and are
outside this PR's scope; surface as a deployment-side cleanup task.

bun run typecheck clean, bun run build clean, 19/19 regression tests pass,
4082 unit pass / 1 pre-existing fail (BrainRegistry test depending on
test-env ~/.gbrain/ absence — fails on untouched main, unrelated).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(multi-source): plumb sourceId through performFullSync (PR garrytan#707 gap)

PR garrytan#707 fixed source_id routing for sync's incremental loop (lines 581/641)
but performFullSync (line 922) calls runImport without threading sourceId.
Result: full syncs route pages to default even with --source <id>. Verified
on v0.30.1 by direct PGLite probe after `gbrain sync --source X --full`:
all pages landed in default, not the named source.

Fix:
- runImport accepts sourceId in opts (programmatic only — no CLI flag,
  preserving PR garrytan#707's design intent of `gbrain import` being default-only).
- runImport threads sourceId to importFile + importImageFile.
- performFullSync passes opts.sourceId to runImport.
- ImportImageOptions type accepts sourceId for runImport branch (importImageFile
  body wiring deferred — image imports out of scope for current use case;
  TS error fix only).

Verified: real sync test against /tmp/test-sync routes 1 page to "testsync"
source, 0 to default (post-fix). 19/19 source-id regression tests still pass.
Typecheck clean.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* test: regression test for performFullSync sourceId threading

PR garrytan#707's existing 19-test suite at test/source-id-tx-regression.test.ts
covers the engine-layer transaction surface (putPage / addTag / etc.)
but does NOT exercise commands/sync.ts:performFullSync. Verified via
`grep -c 'performFullSync' test/source-id-tx-regression.test.ts → 0`.

This means the +18/-4 fix at sync.ts:892 (performFullSync passing
sourceId to runImport) had no automated coverage.

Adds 2 PGLite-only regression tests:

1. `performFullSync with --source routes pages to named source (not default)`
   — fixture: temp git repo with 2 markdown files. Calls performSync with
   { full: true, sourceId: 'testsrc-pfs', noPull: true, noEmbed: true }.
   Asserts pages.source_id = 'testsrc-pfs', not 'default'. Pre-fix: FAILS
   (verified by checking out 46cd197 — rebased PR garrytan#707 only, without my
   gap-fix — and running this test). Post-fix: PASSES.

2. `performFullSync WITHOUT --source still targets default (back-compat)`
   — same fixture, no sourceId opt. Asserts pages.source_id = 'default'.
   Both pre-fix and post-fix: PASSES (back-compat preserved by the fix).

Verified: 21/21 tests pass on this branch (19 from PR garrytan#707 + 2 new).
`bun run typecheck` clean. `bun run verify` clean (8 guard checks pass).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(privacy): strip takes fence from get_page / get_versions when token carries an allow-list

v0.28.6 (garrytan#563) introduced the per-token takes-holder allow-list: an OAuth token
carries `permissions.takes_holders` and `takes_list` / `takes_search` /
`think.gather` filter take rows server-side via `WHERE t.holder = ANY($allowList)`
in both engines.

But take rows are stored in two places per the explicit contract in
`extract-takes.ts:5-13` ("markdown is canonical, the takes table is a derived
index"): the structured `takes` table AND inline in `pages.compiled_truth`
between `<!--- gbrain:takes:begin -->` markers as a markdown table whose `who`
column IS the holder. A read-only token whose `takes_holders` is `["world"]`
(the documented default-deny posture from migrate.ts:1221) can call
`get_page <slug>` and recover every non-`world` claim verbatim from the body —
private hunches, founder bets, non-public sourcing notes. `get_versions` has
the same shape: snapshots persist historical compiled_truth verbatim, so a
caller blocked at `get_page` falls through to /history.

The team already shipped a complementary fix in `chunkers/recursive.ts:49`
(stripTakesFence applied before the body is chunked, so `query` results don't
leak fence content). Migration v38 documents this as a "complementary fix" —
the page-CRUD surface was missed.

Fix strips the fence at the op layer when `ctx.takesHoldersAllowList` is set
(i.e. the remote MCP path). Local CLI callers leave the field unset and keep
seeing the full fence.

    const visibleBody = ctx.takesHoldersAllowList
      ? { ...page, compiled_truth: stripTakesFence(page.compiled_truth) }
      : page;

Same shape on `get_versions` over every snapshot in the array. Re-rendering
the fence with allow-list-filtered rows would require joining the takes table
per version_id and inverts the markdown-canonical contract; whole-fence strip
is the conservative posture that closes the leak. A future allow-list-aware
re-render is an additive change that won't break the contract pinned by these
tests.

Test coverage in `test/takes-mcp-allowlist.serial.test.ts`:
- get_page with allow-list strips fence; surrounding body kept.
- get_page without allow-list (local CLI) keeps fence (back-compat).
- get_page fuzzy resolution path also strips for remote tokens.
- get_versions with allow-list strips fence on every snapshot.
- get_versions without allow-list returns historical content intact.

The pre-fix R12 PoC reported `LEAKED garry hidden take? YES` and
`LEAKED brain hidden take? YES`; post-fix the same PoC reports `no` for both
holders and "bypass did not reproduce".

* Fix double-encoded jsonb in subagent_tool_executions breaking slug lookup

persistToolExecPending/Failed/Complete called JSON.stringify(input) before
passing to a $N::jsonb parameter. When input is already an object, this
produces a JSON string which ::jsonb stores as a jsonb scalar -- not a
jsonb object. Downstream queries like input->>slug then return NULL
because the operator does not traverse scalar strings.

Root cause fix: skip JSON.stringify when input is already a string.

Query fix: use COALESCE with (input #>> '{}')::jsonb->>slug fallback
to handle both old double-encoded rows and new properly-encoded rows.

Affects: dream cycle synthesize phase (pages_written always 0) and
patterns phase (same slug collection query).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(adapter/voyage): translate request/response between OpenAI-compat SDK and Voyage's actual contract

The @ai-sdk/openai-compatible package treats Voyage as if it were
OpenAI-shaped, but Voyage's /v1/embeddings endpoint diverges in three places
that combine into a hard-blocking incompatibility:

OUTBOUND request:
  - 'encoding_format=float' (SDK default) is rejected; Voyage only accepts 'base64'
  - 'dimensions' parameter (OpenAI name) is rejected; Voyage uses 'output_dimension'

INBOUND response:
  - With encoding_format=base64, 'embedding' is returned as a base64 string,
    but the SDK's Zod schema (openaiTextEmbeddingResponseSchema) expects an
    'array of number'. The schema fails with 'Invalid JSON response' even
    though the JSON is well-formed.
  - 'usage' lacks 'prompt_tokens'; the schema requires it when usage is present.

Without this patch, ALL embedding requests to Voyage fail. Reproducible by
running 'gbrain put <slug> < text' with embedding_model=voyage:voyage-* and
any current voyage model (voyage-3-large, voyage-3, voyage-4-large).

Solution: pass a custom 'fetch' to createOpenAICompatible only when
recipe.id === 'voyage'. The fetch wrapper:
  1. Forces encoding_format='base64' on outbound (Voyage's only accepted value)
  2. Translates dimensions -> output_dimension on outbound
  3. Drops Content-Length so the runtime recomputes from the mutated body
  4. Decodes base64 embeddings to Float32 arrays on inbound (so the Zod schema
     sees what it expects)
  5. Synthesizes prompt_tokens from total_tokens when missing

This is a minimal, targeted fix. It only activates for Voyage and falls
through cleanly for all other providers. No public API changes.

* feat(dream): support .md files in transcript discovery

Transcript discovery only accepted .txt files. Many brain repos store
meeting transcripts and conversation logs as .md (markdown), which is
the natural format for brain content.

Changes:
- listTextFiles() now accepts both .txt and .md
- basename extraction handles both extensions for date inference
- readSingleTranscript() handles both extensions

No behavior change for existing .txt-only setups.

* fix(test): cast exitCode to unknown for TS strict-narrowing

TS narrows exitCode to null between declaration and assertion because
the mocked process.exit is behind `(process as any).exit`. The cast
preserves test intent without weakening the variable's type annotation.

Wave-side merge fix; ships alongside garrytan#688 (extract --dir default).

* fix(cli): add frontmatter + check-resolvable to CLI_ONLY_SELF_HELP

Companion to garrytan#634. Both commands have their own --help logic that prints
detailed usage with command-specific flags (e.g., --json, --fix, --strict
for check-resolvable). Without this, pr-634's generic short-circuit prints
"Usage: gbrain <cmd> - run gbrain --help for the full command list." and
the existing --help integration tests fail.

Verified: `gbrain frontmatter --help` and `gbrain check-resolvable --help`
now route to their handlers, which print full per-command usage and exit 0.

* fix(test): update discoverTranscripts test expectation for .md support

Companion to garrytan#708. The pre-garrytan#708 test asserted that .md files in the
session-corpus directory were skipped. Post-garrytan#708 they are discovered
alongside .txt. Renamed the test to 'skips non-txt non-md files' (uses
.pdf as the negative case) and added a positive .md discovery test that
pins garrytan#708's intended behavior.

* fix(skills): declare missing RESOLVER triggers in skill frontmatter

Companion to garrytan#718. The RESOLVER round-trip test (test/resolver.test.ts)
fuzzy-matches every RESOLVER.md trigger phrase against the target skill's
frontmatter triggers list. pr-718 added six new RESOLVER routings without
declaring matching triggers:

- media-ingest: 'PDF book', 'summarize this book', 'ingest it into my brain'
- article-enrichment: 'enriching the article', 'enrich the article', 'enrich pass'
- concept-synthesis: 'canon vs riff'
- perplexity-research: 'perplexity-research', 'surface new developments'
- academic-verify: 'Retraction Watch'
- voice-note-ingest: 'audio message'

Adds the missing triggers verbatim to each skill's frontmatter so the
round-trip invariant holds.

* chore: regenerate llms.txt + llms-full.txt after wave skill updates

* v0.30.3 release: bump VERSION + CHANGELOG entry

22-PR community fix wave with one P0 security upgrade (auth-code scope
escalation closed). 19 PRs landed across 5 lanes; 3 superseded by master
during cherry-pick; 1 deferred per E2 protocol (garrytan#681 architectural
conflict with v0.28 takes-holders); follow-up filed.

Headline fixes: garrytan#727 (auth-code scope-clamp, RFC 6749 §3.3 compliance),
garrytan#740/garrytan#751 (v0.29.1 PGLite migration connect), garrytan#741 (v39-v41 forward-
reference bootstrap), garrytan#757 (multi-source sourceId threading, closes
Postgres 21000), garrytan#728 (takes-fence redaction on remote reads).

See CHANGELOG.md for full per-PR attribution and decision history.

Co-Authored-By: lanceretter <lance@csatlanta.com>
Co-Authored-By: alexandreroumieu-codeapprentice <agency.aubergine.code@gmail.com>
Co-Authored-By: brandonlipman <brandon@offdeck.com>
Co-Authored-By: gus <gustavoraularagon@gmail.com>
Co-Authored-By: jeremyknows <jeremyknows@protonmail.com>
Co-Authored-By: Trevin Chow <trevin@trevinchow.com>
Co-Authored-By: WD <wd@WDdeMacBook-Pro.local>
Co-Authored-By: Federico Cachero <federicocachero.tango@gmail.com>
Co-Authored-By: Brandon Lipman <brandon@offdeck.com>
Co-Authored-By: joshsteinvc <josh@stein.vc>
Co-Authored-By: mgunnin <michael.gunnin@gmail.com>
Co-Authored-By: NineClaws Brain <joel@5nine64.com>
Co-Authored-By: joelwp <joel.phillips@gmail.com>
Co-Authored-By: Oscar <oscar@Mac-mini-de-Oscar.local>

* test(C6): regression test for garrytan#745 collectChildPutPageSlugs

Codex-mandated test gate (C6 from /codex review of v0.30.3 plan).

Pins behavior of collectChildPutPageSlugs() under both jsonb shapes:
- jsonb_typeof='object' (post-garrytan#745, normal write path)
- jsonb_typeof='string' (pre-garrytan#745 double-encoded, the bug shape)

Without this guard, a future regression of garrytan#745 would silently drop slugs:
child jobs finish, queue looks healthy, orchestrator writes nothing.
Worst on-call shape — silent failure with no alerting surface.

Adds an `__testing` namespace to src/core/cycle/synthesize.ts re-exporting
collectChildPutPageSlugs at unit-test granularity. Not part of the runtime
contract; matches the v0_29_1.ts `__testing` precedent for engine-internal
helpers.

* test(C8): garrytan#708 .md transcript discovery + self-consumption guard

Codex-mandated test gate (C8 from /codex review of v0.30.3 plan).

Pins three invariants for garrytan#708's broadening of transcript discovery:

  1. .md files ARE discovered alongside .txt (the feature works).
  2. Other extensions (.pdf, .doc, .json) are still SKIPPED.
  3. v0.30.2's dream_generated frontmatter marker MUST guard .md files
     against self-consumption — without this, every dream cycle would
     loop on its own output indefinitely.

Adversarial cases: BOM + CRLF tolerance on .md frontmatter; the
--unsafe-bypass-dream-guard escape hatch for .md output; mixed .txt + .md
corpus dedup behavior pinned.

* test(C4): takes-fence redaction regression on get_page + get_versions

Codex-mandated test gate (C4 from /codex review of v0.30.3 plan).

Pins three privacy invariants for garrytan#728's fence-stripping in operations.ts:

  1. Local CLI caller (no allow-list) sees full takes fence — operator
     reads should preserve everything.
  2. MCP-bound caller (allow-list set) sees compiled_truth with fence
     STRIPPED on get_page AND get_versions.
  3. Allow-list PRESENCE (not contents) flags MCP-bound identity. Even
     a permissive ['world','garry','brain'] still strips, because the
     typed read surface for takes is takes_list / takes_search, not
     get_page or get_versions.

Lane 4 (garrytan#757 + garrytan#728) was the high-risk merge surface for this privacy
invariant. The test runs through dispatchToolCall to exercise the full
threading path (auth → context → handler → engine read → stripTakesFence)
so a future bad merge fails loudly at the conflict seam in operations.ts.

* test(C3): rewound-brain E2E for v39-v41 forward-reference bootstrap

Codex-mandated test gate (C3 from /codex review of v0.30.3 plan).

Pins the upgrade-path claim in the v0.30.3 release notes: brains stuck
at config.version < 39 (Postgres) or < 41 (PGLite) walk forward cleanly
through garrytan#741's bootstrap additions. Without this, the release note's
"old PGLite brains upgrade cleanly through v39-v41" was unproven.

Four cases:
  1. pre-v39 (missing modality + embedding_image)
  2. pre-v40 (missing emotional_weight + effective_date + effective_date_source)
  3. pre-v41 (missing import_filename + salience_touched_at)
  4. compounded pre-v34 wedge (v0.20 + v0.26.3 + v39-v41 all dropped at once)

Pattern follows test/e2e/v0_28_5-fix-wave.test.ts: build a fresh LATEST
brain, surgically rewind via DROP COLUMN CASCADE + UPDATE config.version,
then re-call initSchema and assert advancement to LATEST_VERSION with
the rewound columns restored. PGLite-only — Postgres-side bootstrap is
covered separately by test/e2e/postgres-bootstrap.test.ts.

* fix(test): rename migration-v0-29-1 to .serial.test.ts (CI lint)

CI's check-test-isolation lint flags the test for direct process.env.GBRAIN_HOME
mutation in beforeEach (rule R1: parallel-test-unsafe). The test is genuinely
env-coupled — it sets GBRAIN_HOME so loadConfig() inside the migration phases
finds the test fixture. Per CLAUDE.md ("When to quarantine instead of fix")
and the lint's own fix hint, env-coupled tests get renamed to *.serial.test.ts
to run in the serial bucket.

Verified: bash scripts/check-test-isolation.sh now reports OK; the renamed
test still runs green (1 pass / 0 fail, ~1.5s).

* fix(types): voyageCompatFetch — cast through unknown for Bun typeof fetch

CI's tsc --noEmit failed:
  src/core/ai/gateway.ts(249,7): error TS2741: Property 'preconnect' is
  missing in type '(input: RequestInfo | URL, init: RequestInit | ...) =>
  Promise<Response>' but required in type 'typeof fetch'.

Bun's @types/bun extends the standard fetch type with a preconnect method
that arrow functions can't satisfy. The AI SDK only invokes the call
signature; the Bun extension surface is irrelevant to voyageCompatFetch's
behavior.

Cast through `unknown` (TS2352-safe pattern for cross-type-family casts)
with explicit param types on the arrow function. Comment names the exact
TS2741 the cast suppresses so a future maintainer can audit the choice.

Companion to garrytan#735 (Voyage encoding-format adapter) — the original PR
introduced voyageCompatFetch typed against typeof fetch; the wave-side
typecheck error was caught by CI on the assembled branch.

* fix(test/e2e): rename + update dream-cycle phase-order test

The test file said "v0.23 8-phase cycle" but ALL_PHASES has been 9
since v0.26.5 (added `purge`) and 10 since v0.29 (added
`recompute_emotional_weight` between patterns and embed). The
hardcoded 8-element array assertion was stale documentation.

Renamed the file from dream-cycle-eight-phase-pglite.test.ts to
dream-cycle-phase-order-pglite.test.ts to make the maintenance
contract explicit: this test pins the canonical phase sequence,
whatever its current length, against unintended reorderings or
removals.

Extracted EXPECTED_PHASES as a typed const so the assertion lives in
one place and TypeScript's CyclePhase narrowing catches typos in the
phase names.

* fix(test/e2e): cycle.test.ts expects 10 phases (v0.29 added recompute_emotional_weight)

Same root cause as dream-cycle-phase-order-pglite.test.ts: hardcoded
phase count assertion drifted behind ALL_PHASES growth.

Phase history:
  v0.23  = 8 phases
  v0.26.5 = 9 (added `purge` last)
  v0.29  = 10 (added `recompute_emotional_weight` between patterns and embed)

* fix(test/e2e): scope GBRAIN_HOME to tmpdir for Doctor Command tests

`gbrain doctor`'s minions_migration check reads
`~/.gbrain/migrations/completed.jsonl` to detect half-installed
migrations. Pre-fix the test inherited the developer's local
$HOME, so stale partial entries from in-flight workspaces (e.g.
v0.31.0 in santiago) made the check fail and the test exit 1 —
masking real DB-health failures.

Added per-describe-block `gbrainHome` tmpdir, threaded through
`cliEnv()` so all spawned gbrain CLI calls in this block read a
hermetic, empty migrations ledger. Cleanup in afterAll.

* fix(claw-test): pass --dir explicitly to extract phase (companion to garrytan#688)

Pre-garrytan#688 `gbrain extract` defaulted to cwd. Post-garrytan#688 it requires
either a configured fs source or explicit --dir, otherwise it errors
out: "No brain directory configured."

The claw-test scripted scenarios run `gbrain init --pglite` in their
install_brain phase, which doesn't register a fs source. So the
extract phase needs --dir <brainDir> explicitly. Skip the extract
phase entirely when the scenario has no brain dir.

Captured brainDir at the import-phase site so it's reusable by extract.

* fix(preferences): route migration ledger paths through gbrainPath()

Pre-fix, preferences.ts used `$HOME/.gbrain` directly via its own
`home()` helper. Tests that set `process.env.HOME = tmpdir`
expecting hermetic isolation worked — but tests that set
`GBRAIN_HOME = tmpdir` (the documented override per
`src/core/config.ts`) didn't, because preferences ignored it.

Routed prefsDir(), prefsPath(), migrationsDir(), and
completedJsonlPath() through gbrainPath() (which honors
GBRAIN_HOME, falling back to homedir() when unset). The legacy
home() helper stays for any future code path that wants $HOME
specifically.

Updated three tests that mutated process.env.HOME to also mutate
GBRAIN_HOME so the same test body works against the new contract:
test/preferences.test.ts, test/migration-resume.test.ts,
test/e2e/migration-flow.test.ts.

* release: rename version slot to 0.31.1.1-fixwave

Originally bumped to 0.31.2 during the master merge to stay strictly
monotonic. Garry called the slot back to `0.31.1.1-fixwave` to
communicate intent: this is a fix wave on top of v0.31.1, not a new
minor or patch slot. The next regular release slot (v0.31.2) stays
free for in-flight feature work.

Format check:
- bun install accepts the literal version (verified)
- compareVersions() in src/commands/migrations/index.ts splits on
  '.' and parseInt's each segment, taking only the first 3. So
  '0.31.1.1-fixwave' compares as [0,31,1] = equal to '0.31.1' for
  migration-ordering purposes. Wave has no new schema migrations,
  so equality is fine.
- Compares stable to 0.31.1 in the migration runner; later versions
  (0.31.2, 0.32.x, etc.) sort strictly above as normal.

Updated:
- VERSION
- package.json (with bun.lock refresh)
- CHANGELOG.md entry header + 'To take advantage of' block + 'For
  contributors' reference
- llms.txt + llms-full.txt regenerated to match

---------

Co-authored-by: lanceretter <lance@csatlanta.com>
Co-authored-by: Oscar <oscar@Mac-mini-de-Oscar.local>
Co-authored-by: WD <wd@WDdeMacBook-Pro.local>
Co-authored-by: gus <gustavoraularagon@gmail.com>
Co-authored-by: Trevin Chow <trevin@trevinchow.com>
Co-authored-by: Brandon Lipman <brandon@offdeck.com>
Co-authored-by: Federico Cachero <federicocachero.tango@gmail.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Josh Stein <josh@threshold.vc>
Co-authored-by: Matt Gunnin <mgunnin@esports.one>
Co-authored-by: Michael Dela Cruz <adobobro@mac.lan>
Co-authored-by: Jeremy Knows <jeremy@veefriends.com>
Co-authored-by: joelwp <joel.phillips@gmail.com>
Co-authored-by: NineClaws Brain <joel@5nine64.com>
Co-authored-by: alexandreroumieu-codeapprentice <agency.aubergine.code@gmail.com>
Co-authored-by: jeremyknows <jeremyknows@protonmail.com>
Co-authored-by: joshsteinvc <josh@stein.vc>
Co-authored-by: mgunnin <michael.gunnin@gmail.com>
…nk-rich repos (garrytan#773)

* fix: bound tree-sitter chunker + harden walker + plumb strategy

`gbrain sync --strategy code` against a 1500-file repo could pin one
thread at 99% CPU for hours with zero disk writes and a `page_count`
that stayed at 0. Three real defects, all closed in one commit:

1. **Tree-sitter chunker had no wall-clock cap.** A single pathological
   file could wedge the whole sync inside WASM. New `parseWithTimeout`
   helper in src/core/chunkers/code.ts wraps `parser.parse()` with
   `setTimeoutMicros(timeoutMs * 1000)`, throws `ChunkerTimeoutError`
   on null, and the caller's try/finally reaps parser+tree (closes the
   leak codex flagged where the catch block returned without delete()).
   Default 30s, override via `GBRAIN_CHUNKER_TIMEOUT_MS`. Falls back to
   recursive chunks on timeout — degrades search quality on that one
   file, doesn't wedge sync.

2. **Code-strategy first-sync silently no-op'd on code files.**
   `performFullSync` called `runImport(repoPath)` with no strategy;
   `runImport` only ever walked `.md`/`.mdx`. Now `opts.strategy`
   threads end-to-end (full-sync write path AND dry-run). Code files
   actually reach the dispatcher, which already routes them to
   `importCodeFile` correctly.

3. **Walker was thrice-redundant.** `collectMarkdownFiles` (lstat-safe,
   import path) and `walkSyncableFiles` (statSync, cost-preview path,
   weaker for no good reason) collapsed into one hardened
   `collectSyncableFiles` in src/commands/import.ts: lstat + symlink-
   skip with canonical log line; inode-cycle Map keyed on
   `${st_dev}:${st_ino}` (defense-in-depth for non-symlink loops);
   `MAX_WALK_DEPTH=32` structural backstop with `GBRAIN_MAX_WALK_DEPTH`
   override; `.sort()` output (codex C8: `runImport`'s checkpoint
   resume is index-based against a sorted list). Walker-context
   multimodal carve-out preserved at one site (codex C5).

Plus structured `[gbrain phase] <name> start/done` stderr lines on
git_pull, fullsync.import, collect_files, and per-file slow path
(>5s). When the next hang lands, log says which phase wedged.

Tests:
- `test/sync-walker-symlink.test.ts` — 7 cases (self-symlink loop,
  symlink-chain inode cycle, max-depth bailout, strategy filter,
  dot-dir skip, multimodal preservation, deterministic ordering)
- `test/chunker-timeout.test.ts` — 7 cases (parser-stub seam,
  ChunkerTimeoutError shape, env wiring, fallback behavior, fail-loud
  if setTimeoutMicros API missing, cleanup contract under exception)

Smoke against the user's actual amarillo-v2 repo: 494 code files
walked in 22ms, 2 symlinks skipped with the canonical log line.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: bump version 0.30.1 → 0.31.2 + CHANGELOG + TODOS

VERSION 0.31.2, package.json synced. CHANGELOG entry under [0.31.2]
with full release-summary + numbers + upgrader-cost note + To take
advantage block. v0.30.2 entry preserved below from master. TODOS.md
files the gbrain query <common-keyword> 7-day-zombie investigation
(PIDs 39429, 46624) and the deferred amarillo-shape PGLite + Postgres
E2E as v0.31.3 follow-ups.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(test): use withEnv() helper instead of direct process.env mutation

CI's check-test-isolation lint (rule R1) flagged the two new test files
for mutating process.env directly. The repo-wide convention is to wrap
env mutations in withEnv() (test/helpers/with-env.ts), which saves +
restores prior values via try/finally even when the callback throws.
Direct process.env writes leak across files in the same bun test
process (parallel runner loads multiple files into one shard process).

Both files refactored:
- test/sync-walker-symlink.test.ts (GBRAIN_EMBEDDING_MULTIMODAL)
- test/chunker-timeout.test.ts (GBRAIN_CHUNKER_TIMEOUT_MS)

All 14 cases still pass. `bun run verify` clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…closes garrytan#413, garrytan#446) (garrytan#801)

* fix(serve): clean up stdio MCP server on client disconnect

The PGLite write lock leaked indefinitely when the parent of `gbrain serve`
disconnected. Three root causes: serve.ts never called engine.disconnect()
after startMcpServer() resolved; cli.ts short-circuited with a "serve doesn't
disconnect" comment; and the MCP SDK's StdioServerTransport only listens for
'data'/'error' on stdin, never 'end'/'close', so even a clean stdin EOF never
reached the SDK.

Net effect: the next `gbrain serve` waited for the in-process 5-minute stale-
lock check or hung indefinitely.

stdio path now installs a unified lifecycle:
- SIGTERM/SIGINT/SIGHUP all funnel into one idempotent shutdown path
  (SIGHUP coverage matters for Claude Desktop on macOS / MCP gateway
  restarts; SIGINT for Ctrl-C; SIGTERM for daemon shutdown).
- stdin 'end' (clean EOF) and 'close' (parent SIGKILL with pipe still
  open) both trigger the same graceful path. TTY stdin skips the watchers
  so interactive `gbrain serve` is unaffected.
- Parent-process watchdog polls the live kernel parent PID via spawnSync
  ('ps','-o','ppid=','-p',PID) every 5s. process.ppid is cached at process
  creation by Bun (and Node) and never refreshes on re-parent — empirical
  evidence on macOS shows ps reports the new parent within one tick while
  process.ppid stays at the original PID indefinitely (oven-sh/bun#30305).
- Watchdog fires on `getParentPid() !== initialParentPid` (any reparent),
  not just `=== 1`. Catches launchd / systemd / tmux / parent-shell-with-
  PR_SET_CHILD_SUBREAPER cases where the kernel re-anchors us to a non-1
  subreaper PID. Codex review caught the original `=== 1` was incomplete.
- One-shot startup probe verifies `spawnSync('ps')` actually works on this
  host. If the probe fails (stripped containers / busybox without procps),
  we skip installing the watchdog interval entirely AND emit a loud stderr
  line — the operator sees "watchdog disabled" instead of an installed-
  but-never-fires phantom that silently falls back to cached process.ppid.
- 5-second cleanup deadline: if engine.disconnect() wedges (PGLite WASM
  stall, etc.), the process still calls process.exit(0). The abandoned
  lock dir is reclaimed on the next start by the existing stale-lock
  check in pglite-lock.ts.
- Optional `--stdio-idle-timeout <sec>`: default OFF safety net for
  parents that leak the pipe but never close it. Strict parsing rejects
  `abc` / `30junk` / `-1` / `1.5` / blank values explicitly so a typo
  doesn't silently disable the safety net (closes garrytan#446).

Test seam: ServeOptions { stdin, signals, exit, log, startMcpServer,
getParentPid, setInterval, clearInterval, probeWatchdog } lets the
lifecycle be unit-tested deterministically without spawning a real Bun
child or booting the MCP SDK.

22 test cases covering signals, stdin EOF, TTY skip, watchdog reparent
(both PID-1 and subreaper-PID-N cases), ps-unavailable degraded mode,
idle timeout, idempotent shutdown, and cleanup-deadline behavior.

Closes garrytan#413, garrytan#446. Supersedes garrytan#591.

Co-Authored-By: Aragorn2046 <noreply@github.com>
Co-Authored-By: seungsu-kr <noreply@github.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(auth): route HTTP auth/admin SQL through active engine

`gbrain auth` and `gbrain serve --http` previously routed every SQL
through the postgres.js singleton in src/core/db.ts, which silently fell
back to a file-backed PGLite when DATABASE_URL was set but the config
file disagreed. The HTTP transport's verbatim use of the singleton also
made `gbrain serve --http` Postgres-only, even though the
`access_tokens` and `mcp_request_log` tables exist in both engine
schemas.

Auth, OAuth, admin, file uploads, and HTTP-transport SQL now run through
`engine.executeRaw` via a deliberately narrow tagged-template adapter
(`src/core/sql-query.ts`). The contract is scalar-binds-only — adding
JSONB or fragment composition would invite the adapter to drift into a
partial postgres.js clone. JSONB writes use a separate
`executeRawJsonb(engine, sql, scalarParams, jsonbParams)` helper that
composes positional `$N::jsonb` casts and passes objects through
`engine.executeRaw`. The CI guard at `scripts/check-jsonb-pattern.sh`
doesn't fire because the helper is a method call, not the banned
`${JSON.stringify(x)}::jsonb` template-literal interpolation, and the
v0.12.0 double-encode bug class doesn't apply to positional binding via
`postgres.js`'s `unsafe()` (verified by
`test/e2e/auth-permissions.test.ts:67` on Postgres and the new
`test/sql-query.test.ts` on PGLite).

Migrated call sites:
  - src/commands/auth.ts: takes-holders writes (lines 52, 86) →
    executeRawJsonb. List, revoke, register-client, revoke-client →
    SqlQuery via withConfiguredSql() helper that opens an engine, runs
    the callback, disconnects.
  - src/commands/serve-http.ts: ~25 call sites including the four
    mcp_request_log.params INSERTs (now write real JSONB objects, not
    JSON-encoded strings — the read side `params->>'op'` returns the
    operation name, closing CLAUDE.md's outstanding "JSON-string-into-
    JSONB" note as a side effect). The /admin/api/requests dynamic
    filter pattern (postgres.js fragment composition) is rewritten as
    parametrized SQL string + params array.
  - src/mcp/http-transport.ts: legacy bearer-auth path. The
    Postgres-only fail-fast at startup is removed because both schemas
    now carry access_tokens + mcp_request_log.
  - src/core/oauth-provider.ts: SqlQuery / SqlValue types relocated
    from here to sql-query.ts as the canonical home (Codex finding garrytan#8).
  - src/commands/files.ts: all 5 db.getConnection() sites (lines 104,
    139, 252, 326, 355). The line-256 INSERT into files.metadata uses
    executeRawJsonb; the other four are scalar-only SqlQuery (Codex
    finding garrytan#6 — scope was bigger than the plan's "lone INSERT" framing).
  - src/core/config.ts: env-var DATABASE_URL inference. When dbUrl is
    set, infer Postgres engine and clear the stale database_path.

Engine-internal sql.json() sites in src/core/postgres-engine.ts (5
sites: lines 520, 1689, 1728, 1790, 2313) STAY UNCHANGED. They live
inside PostgresEngine itself, where the postgres.js template-tag
sql.json() pattern is correct — those methods are only loaded when
Postgres is the active engine, so there's no PGLite-routing concern.

Migration v45 (mcp_request_log_params_jsonb_normalize): one-shot UPDATE
that lifts pre-v0.31 string-shaped JSONB rows to objects so the
/admin/api/requests endpoint at serve-http.ts:605 returns one
consistent shape to the admin SPA. Idempotent (subsequent runs find no
rows where jsonb_typeof = 'string'). Closes the mixed-shape window
that would otherwise have made post-deploy admin reads break.

Tests:
  - test/sql-query.test.ts: 7 cases covering scalar binds, the
    .json() rejection (defense in depth — SqlQuery is scalar-only),
    JSONB round-trip with `jsonb_typeof = 'object'` and `->>`
    semantics, the v0.12.0 double-encode regression guard, null
    JSONB handling, and the scalars-then-jsonb call shape.
  - test/config-env.test.ts: migrated from PR's manual `restoreEnv()`
    in afterEach to the canonical `withEnv()` helper at
    test/helpers/with-env.ts (CLAUDE.md R1 / codex finding D3).
    Five cases covering DATABASE_URL precedence, GBRAIN_DATABASE_URL
    operator override, file-only config, env-only config, and the
    no-config null path.
  - test/e2e/auth-takes-holders-pglite.test.ts: 6 cases against
    in-memory PGLite (no DATABASE_URL gate). Covers create / update /
    read of access_tokens.permissions, mcp_request_log.params object
    + null writes, and the migration v45 normalizer (seed
    string-shaped row, run UPDATE, assert object shape; second-run
    no-op for idempotency).
  - test/http-transport.test.ts: mock updated to intercept
    engine.executeRaw (the new code path) instead of the postgres.js
    template tag. 24 cases pass.

Plan reference: ~/.claude/plans/system-instruction-you-are-working-peppy-moore.md.
Codex outside-voice review applied: D-codex-1, D-codex-2, D-codex-5,
D-codex-8, D-codex-9, D-codex-10 (and D1, D5 reversed by codex).

Closes the architectural intent of garrytan#681. Supersedes its branch.

Co-Authored-By: codex-bot <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: update CLAUDE.md key files for v0.31.3

Annotate the v0.31.3 changes in the canonical Key Files section:
new src/core/sql-query.ts adapter (garrytan#681), src/commands/serve.ts stdio
cleanup (garrytan#676), v0.31.3 amendments to auth.ts / serve-http.ts /
oauth-provider.ts surfaces, and migration v46 normalizer in migrate.ts.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* chore: regenerate llms-full.txt for v0.31.3 docs sync

CI's build-llms test asserts the committed llms.txt + llms-full.txt
match what scripts/build-llms.ts produces from current source state.
CLAUDE.md was amended by /document-release post-merge (new entries for
src/core/sql-query.ts and src/commands/serve.ts; amended notes on
auth.ts / serve-http.ts / migrate.ts), so the inlined-bundle fell out
of sync. Regenerated via `bun run build:llms`.

llms.txt unchanged (curated index — no new web URLs added).
llms-full.txt updated to inline the new CLAUDE.md content.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Aragorn2046 <noreply@github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…garrytan#795)

* feat: takes v2 — lessons from 100K-take production extraction

Consolidates everything learned from the first full takes extraction run
(28,256 pages, 100,720 takes, $361 on Azure GPT-5.5) and subsequent
cross-modal eval (GPT-5.5 + Opus 4.6, scored 6.8/10 overall).

## Fixes

**fix(cli): add recall and forget to CLI_ONLY set**
v0.31 added these commands to handleCliOnly() but forgot the gate set.
Both fell through to cliOps.get() → 'Unknown command'.

**feat(synthesize): auto-enable when corpus dir is configured**
Setting session_corpus_dir is now sufficient — enabled defaults to true
when a corpus dir is set. Explicit enabled=false still wins. Eliminates
the footgun where users configure a corpus dir and nothing happens.

**feat(engine): round takes weights to 0.05 increments**
Cross-modal eval found false precision (0.74, 0.82) implies calibration
accuracy that doesn't exist. Both postgres and pglite engines now round
on insert. 1.0 and 0.0 are preserved exactly.

## Documentation

**docs: takes-vs-facts architectural distinction**
New doc explaining the two epistemological layers, why they must never be
conflated, how the dream cycle consolidate phase bridges them, and
production extraction data (model selection, eval dimensions, key
learnings for extraction prompts).

**docs(takes-fence): clarify holder semantics with eval examples**
Holder = who HOLDS the belief, NOT who it's ABOUT. Expanded JSDoc with
concrete right/wrong examples from the cross-modal eval. Additional
rules: amplification ≠ endorsement, self-reported ≠ verified, founder
describing company → people/founder not companies/slug.

## Tests (17 new, all passing)

- 5 synthesize-enabled-default tests
- 6 takes-holder-semantics tests
- 6 takes-weight-rounding tests

## Cross-Modal Eval Context

| Dimension         | GPT-5.5 | Opus 4.6 | Avg  |
|-------------------|---------|----------|------|
| Accuracy          | 7       | 8        | 7.5  |
| Attribution       | 6       | 7        | 6.5  |
| Weight calibration| 7       | 7        | 7.0  |
| Kind classification| 6      | 7        | 6.5  |
| Signal density    | 7       | 6        | 6.5  |

Top improvements addressed in this PR:
1. Holder vs subject confusion (docs + tests)
2. Weight false precision (runtime enforcement)
3. Takes ≠ facts distinction (architectural doc)
4. Synthesis auto-enable (runtime fix)
5. recall/forget CLI routing (bug fix)

* docs(filing-rules): anchor takes attribution rules (EXP-3)

Adds a "Takes attribution" section to skills/_brain-filing-rules.md
distilling the 6 rules from docs/takes-vs-facts.md into a terse
contract that downstream agents (OpenClaw, Wintermute) can read as
their canonical filing surface.

Documentation only — no in-repo runtime consumer (synthesize.ts reads
the .json file, not the .md). EXP-4 lands the runtime parser-level
holder validation.

Codex review garrytan#9: relabels EXP-3 as documentation, not quality work.
The runtime check is EXP-4.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(takes): weight backfill v46 + NaN hardening at 4 sites (EXP-1, Hardening)

Migration v46 (takes_weight_round_to_grid): backfills pre-v0.32 takes.weight
to the 0.05 grid the engine layer (PR garrytan#795) enforces on insert. Cross-modal
eval over 100K production takes flagged 0.74, 0.82-style values as false
precision; this brings existing data to the same grid that all new writes
already use.

Tolerance-based comparison (abs > 0.001) avoids the float32-noise re-touch
loop that the naive `weight <> ROUND(...)` form would create — REAL/NUMERIC
comparison promotes weight to DOUBLE PRECISION first, surfacing ~1e-7
representation noise as inequality. The 0.05 grid is 5e-2, so any genuine
off-grid value clears the 1e-3 threshold cleanly.

`transaction: false` (codex review #2 correction): not for mid-statement
resume (a single SQL statement either completes or rolls back). What it
actually buys is freeing the migration runner from holding a long
transaction so other gbrain processes can interleave.

NaN hardening (codex review garrytan#8): extracts `normalizeWeightForStorage()` to
takes-fence.ts as a single source of truth used by all 4 takes write sites:
  - pglite-engine.ts addTakesBatch
  - pglite-engine.ts updateTake (was missed in original PR — only clamped,
    didn't round; now rounds AND guards NaN)
  - postgres-engine.ts addTakesBatch
  - postgres-engine.ts updateTake (same fix)

The helper guards `!Number.isFinite()` BEFORE the [0,1] range check (NaN
comparisons are always false, so NaN survived the prior clamp and reached
Math.round(NaN * 20) / 20 = NaN, written through to the DB).

Tests:
- test/migrations-v46-takes-weight-backfill.test.ts: behavioral PGLite test
  (rounding fixture + Codex #2 re-run idempotency + on-grid preservation).
- test/takes-weight-rounding.test.ts: imports the real helper, adds NaN /
  Infinity / -Infinity / null / undefined / updateTake-shape coverage.
- test/migrate.test.ts: structural assertions for v46 SQL shape.

All 52 tests pass; typecheck clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(doctor): takes_weight_grid check + pure helper extraction (EXP-2)

Adds doctor's `takes_weight_grid` slice — the post-migration drift detector
for the 0.05 weight grid v0.31 enforces on insert and v46 backfilled.

Codex review garrytan#7 corrected the original plan's "extend test/doctor.test.ts
with 3 cases" estimate. runDoctor() is a side-effectful command with
process.exit branches, and the existing tests are mostly source-structure
assertions. The fix: extract `takesWeightGridCheck(engine: BrainEngine)`
as a pure exported function. runDoctor calls it. Tests target the helper
directly with stubbed engines for the missing-table branch and against
real PGLite for the 4 ratio bands.

Branches:
  - 0 takes total → ok ("No takes yet")
  - off_grid / total > 10% → fail (with apply-migrations fix hint)
  - 1% < off_grid / total ≤ 10% → warn (same fix hint)
  - else → ok
  - takes table missing (pre-v37) → warn, graceful skip

Tolerance comparison matches migration v46 (abs > 1e-3) so float32 noise
doesn't make a healthy brain look broken.

Tests (test/doctor.test.ts):
  - takesWeightGridCheck export shape
  - 0-takes branch (avoids divide-by-zero)
  - 100% on-grid via engine.addTakesBatch (which now normalizes)
  - 8/10 off-grid → fail
  - 5/100 off-grid → warn
  - missing-table branch via stub engine

All 21 doctor tests pass; typecheck clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(takes): holder runtime validation + producer seam (EXP-4)

Adds parser-level holder grammar enforcement so cross-modal eval's #1
attribution error (holder/subject confusion, scored 6.5/10 across 100K
production takes) shows up as a sync-failure record an operator can see.

Changes:

- src/core/sync.ts: exports SLUG_SEGMENT_PATTERN, the actual character
  class slugifySegment() produces ([a-z0-9._-]). Codex review #3 — the
  initial plan's stricter regex would have warned on legitimate slugs
  like `companies/acme.io` and `people/foo_bar`. HOLDER_REGEX now wraps
  this shared pattern instead of inventing a parallel grammar.

- src/core/takes-fence.ts: HOLDER_REGEX + isValidHolder() helper.
  parseTakesFence() emits TAKES_HOLDER_INVALID warnings for non-matching
  holders. Row preserved (markdown source-of-truth contract).

  Catches the eval's failure modes — `Garry`, `people/Garry-Tan`,
  `world/garry-tan`, `users/garry`, whitespace-only — while keeping
  `companies/acme.io`, `people/foo_bar`, `notes/v1.0.0`-style dotted
  slugs valid. Bare-slug form (`garry`, `alice`) accepted as v0.32 legacy
  compat — production brains shipped with bare-slug holders before the
  namespaced JSDoc landed in PR garrytan#795. Reserved for v0.33 promotion.

- src/core/cycle/extract-takes.ts (codex review #4 producer seam): adds
  `failedFiles: Array<{path, error}>` to ExtractTakesResult. Both fs
  and db extraction paths populate it from TAKES_HOLDER_INVALID warnings
  so the migration orchestrator can hand it to recordSyncFailures().
  Without this seam, extending classifyErrorCode would do nothing
  (the regex would have nothing to classify).

- src/commands/migrations/v0_28_0.ts: phaseBBackfill calls
  recordSyncFailures(result.failedFiles, 'migration:v0.28.0-backfill')
  after extractTakes completes. Best-effort — persistence failure
  doesn't fail the backfill phase. Doctor's `sync_failures` check now
  shows TAKES_HOLDER_INVALID=N breakdown after upgrade.

- src/core/sync.ts:classifyErrorCode: extends with TAKES_HOLDER_INVALID
  + TAKES_TABLE_MALFORMED / TAKES_ROW_NUM_COLLISION / TAKES_FENCE_UNBALANCED
  bucket. Previously these warnings bucketed to UNKNOWN.

Tests (test/takes-holder-validation.test.ts — 26 cases):
- Canonical forms (world / brain / people-namespace / companies-namespace)
- Codex #3 dotted-slug + underscore-slug positives
- Legacy bare-slug compat positives
- Eval-flagged error mode rejections (uppercase, mixed case, world/<slug>,
  unrecognized prefix, whitespace, embedded slash)
- HOLDER_REGEX anchoring guard
- SLUG_SEGMENT_PATTERN export shape + drift guard against the wrapping regex
- parseTakesFence end-to-end emission contract
- classifyErrorCode regex coverage

127 tests pass across affected files; typecheck clean. No existing fixtures
broken (legacy bare-slug compat preserves old `garry`-style holders during
the v0.32 transition window).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(eval): gbrain eval takes-quality CLI — DB-authoritative + 4-mode (EXP-5)

Reproducible cross-modal quality eval for the takes layer. Three frontier
models score a sample against the 5-dim rubric, the runner aggregates to
PASS/FAIL/INCONCLUSIVE, the receipt persists to eval_takes_quality_runs.
Trend mode segregates by rubric_version; regress mode is a CI gate that
exits 1 when any dim regresses past --threshold.

Subcommands:
  run     [--limit N --cycles N --budget-usd N --slug-prefix P --models a,b,c]
  replay  <receipt-path> [--json]                 # NO BRAIN required
  trend   [--limit N --rubric-version V --json]
  regress --against <receipt> [--threshold T --json]

Codex review integrations (D7 — all 10 findings landed):

  #1 json-repair shim re-exports BOTH parseModelJSON AND the
     ParsedScore + ParsedModelResult types. The original plan only
     re-exported the function, which would have compile-broken
     cross-modal-eval/aggregate.ts:19's type import.

  #3 Receipt name binds (corpus_sha8, prompt_sha8, models_sha8,
     rubric_sha8) so a future rubric tweak segregates trend rows
     instead of silently corrupting the quality-over-time graph.
     RUBRIC_VERSION + rubric_sha8 are persisted in every receipt.

  #4 Pricing fail-closed: any model not in pricing.ts produces an
     actionable PricingNotFoundError before any HTTP call fires.
     Same drift problem as cross-modal-eval/runner.ts:estimateCost(),
     but explicit instead of silent zero.

  #5 Aggregate requires ALL 5 declared rubric dimensions per model.
     Cross-modal-eval v1's union-of-whatever-parsed pattern allowed a
     model to omit a dim and still PASS — that's a regression-gate
     hole. Now: missing-dim drops the contribution, treated identically
     to a parse failure. Empty-scores PASS regression guard preserved.

  garrytan#6 DB-authoritative receipt persistence. Original two-phase plan had
     a split-brain reconciliation gap (disk-success/DB-fail vanishes
     from trend; DB-success/disk-fail unreplayable). Now DB row is the
     source of truth (carries full receipt JSON in a JSONB column);
     disk artifact is best-effort. replay reads disk first; loadReceiptFromDb
     reconstructs from DB when the disk file is missing.

  garrytan#10 Brain-routing: replay is the only sub-subcommand that doesn't
      need a brain. cli.ts no-DB bypass routes "eval takes-quality replay"
      directly to runReplayNoBrain, which exits 0/1/2 cleanly without
      ever touching the engine. Other modes go through connectEngine.

Files added:
  src/core/eval-shared/json-repair.ts (hoisted from cross-modal-eval)
  src/core/takes-quality-eval/{rubric,pricing,aggregate,receipt-name,
                                receipt-write,receipt,replay,regress,trend,runner}.ts
  src/commands/eval-takes-quality.ts
  docs/eval-takes-quality.md (stable schema_version: 1 contract)
  10 test files (83 cases — aggregate / receipt-name / shim / pricing /
                 rubric / receipt-write / replay / trend / regress / cli)

Files modified:
  src/cli.ts: replay no-DB bypass + engine-required dispatch
  src/core/cross-modal-eval/json-repair.ts → re-export shim
  src/core/migrate.ts: append v47 (eval_takes_quality_runs table)
  src/core/pglite-schema.ts + src/schema.sql: mirror the v47 table for
    fresh-install path. RLS toggled on the new table.
  src/core/schema-embedded.ts: regenerated via build:schema
  test/migrate.test.ts: 6 structural cases for v47

186 tests pass; typecheck clean. Replay verified working end-to-end
(reads receipt JSON file without DATABASE_URL, exits with the verdict
code, prints actionable error on missing file).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(eval): fill EXP-5 unit-test gaps + test-isolation lint fix

Three additions identified during the test-gap audit:

  1. test/eval-takes-quality-boundaries.test.ts (4 cases):
     - empty corpus → "no takes to evaluate" (pre-LLM)
     - source=fs reserved for v0.33 → clear refusal
     - --budget-usd + unknown model → PricingNotFoundError BEFORE any
       network call (codex review #4 fail-closed contract)
     - --budget-usd null + unknown model → no pre-flight pricing error
       (proves pricing pre-flight gates ONLY when budget is set)

  2. test/eval-takes-quality-runner.serial.test.ts (7 cases):
     End-to-end runner integration with mock.module-stubbed gateway.chat.
     Quarantined as *.serial.test.ts because mock.module leaks across
     files in the same shard process (R2 in check-test-isolation.sh).
     Covers:
       - 3 PASS scores → verdict=pass with all dim scores in receipt
       - all model errors → INCONCLUSIVE
       - 1 success + 2 errors → INCONCLUSIVE (need >=2 contributing)
       - 3 successes with low scores → FAIL
       - budget cap fires before cycle 1 (no chat() ever called)
       - budget cap allows cycle when projection fits

  3. test/eval-takes-quality-receipt-write.test.ts: refactored to use
     withEnv() helper for GBRAIN_HOME mutation instead of direct
     process.env writes. The original beforeAll mutation tripped the
     check-test-isolation.sh R1 lint. withEnv() saves/restores via
     try/finally per-test so other shard files don't see the override.

Verification:
  bun run test       → 4977 pass / 0 fail
  bun run test:serial → 179 pass / 0 fail
  bun run verify     → clean (typecheck + 9 pre-checks pass)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(eval): real-Postgres E2E for eval_takes_quality_runs (EXP-5)

Pure-PGLite tests already cover the receipt-write contract; this E2E
verifies the same code path against actual Postgres so the postgres.js
JSONB encoding and the v47 migration apply cleanly under production
conditions.

Coverage (8 cases):
  - migration v47 created the table with all expected columns
  - writeReceiptToDb persists full receipt_json on Postgres
  - 4-sha UNIQUE constraint enforces ON CONFLICT DO NOTHING idempotency
    (3 inserts → 1 row)
  - rubric_version segregation: distinct rubric_sha8 → distinct row
    (codex review #3 — rubric epoch separation)
  - loadTrend reads in DESC order on Postgres
  - loadReceiptFromDb reconstructs receipt JSON via the JSONB column
  - writeReceipt (combined) succeeds with disk artifact + DB row
  - trend SELECT plan executes (planner picks index on larger tables)

Skips gracefully when DATABASE_URL is unset (existing hasDatabase()
helper). Uses the canonical setupDB/teardownDB from test/e2e/helpers.ts.
GBRAIN_HOME mutation is wrapped in withEnv() per the v0.32.0 test-isolation
lint contract.

Verification:
  bash scripts/run-e2e.sh → 71 files / 499 tests / 0 fail (full E2E suite)
  bun test test/e2e/eval-takes-quality.test.ts → 8 / 8 pass standalone

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test: fill v0.32 unit + E2E gap audit (3 new files, 36 cases)

Audit of shipped v0.32 code surfaced 4 wiring gaps that the per-EXP unit
tests didn't cover. Adding direct integration tests for each so a future
refactor can't accidentally bypass the helper or unwire the producer seam.

test/extract-takes-holder-producer-seam.test.ts (7 cases) — codex review
#4 producer seam. Verifies extractTakesFromDb populates ExtractTakesResult.
failedFiles[] when parseTakesFence emits TAKES_HOLDER_INVALID warnings,
and that the entry shape is recordSyncFailures-compatible. Without this
test, the v0_28_0 migration's recordSyncFailures call would have silently
fed it nothing if a refactor accidentally dropped the failedFiles append.
Covers: valid holder (no entry), invalid uppercase, world/<slug>, mixed
valid+invalid, legacy bare-slug compat, malformed-table-only (no leak),
recordSyncFailures shape compatibility.

test/engine-weight-rounding-integration.test.ts (15 cases) — codex review
garrytan#8 integration coverage. Helper is unit-tested; this proves both engines'
addTakesBatch + updateTake paths actually call it. PGLite-side coverage
mirrors the test/e2e/takes-weight-rounding-postgres.test.ts E2E for real
Postgres. Covers: 0.74→0.75, 0.82→0.80, on-grid identity, NaN→0.5,
Infinity→0.5, clamp high/low, undefined default, mixed batch order,
updateTake rounds (was unhardened pre-v0.32), updateTake NaN, updateTake
preserves prior weight when undefined.

test/e2e/takes-weight-rounding-postgres.test.ts (6 cases, 14 expects) —
real-Postgres write-path coverage. Specifically tests the postgres.js
unnest() bind path that PGLite doesn't exercise:
  - addTakesBatch rounds via the unnest() bind shape
  - addTakesBatch handles NaN at the postgres.js array marshaling layer
  - 10-row mixed batch (4 off-grid) rounds each independently
  - updateTake rounds on real Postgres
  - updateTake handles NaN
  - migration v48 tolerance matches engine-write tolerance (round-trip
    proof — engine-rounded value is invisible to v48's WHERE clause)

Verification:
  bun run test       → 5166 pass / 0 fail (parallel unit, 128s)
  bun run test:serial → 190 pass / 0 fail
  bun run test:e2e   → 71 / 74 files; 3 pre-existing env-inheritance
                       failures (serve-http-oauth, sources-remote-mcp,
                       thin-client — confirmed identical on master in
                       this environment, documented in CLAUDE.md)
  bun run verify     → clean

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(auth): connect engine in withConfiguredSql; unbreak 3 OAuth E2E suites

Real production bug, not just a test-environment issue.
withConfiguredSql in src/commands/auth.ts created a PostgresEngine via
createEngine() but never called engine.connect(). The PostgresEngine.sql
getter falls back to db.getConnection() (the module-level singleton) when
its instance _sql is unset — and db.connect() wasn't called either.

So every `gbrain auth` subcommand (create, list, revoke, register-client,
revoke-client) crashed with the misleading "No database connection:
connect() has not been called" error on real Postgres. Anyone with a
Postgres-backed brain hit this. The error pointed at gbrain init which
made the regression invisible — users assumed they hadn't initialized.

Verified by running `gbrain auth register-client` directly:
  Before: "Error: No database connection: connect() has not been called."
  After:  "OAuth client registered: ..." with credentials printed.

This fix unblocked all 3 previously-failing E2E suites (which all use
register-client in beforeAll):
  serve-http-oauth.test.ts:    0/28 → 28/28 pass
  sources-remote-mcp.test.ts:  0/14 → 14/14 pass
  thin-client.test.ts:         0/7  →  6/7 pass + 1 documented skip

Two surgical test-side fixes also landed:

1. test/e2e/thin-client.test.ts:182 — assertion typo. Test expected
   r.stderr to contain "thin client" (space). Actual refusal message
   says "(thin-client of <url>)" with hyphen. Loosened to /thin[- ]client/
   so a future format tweak doesn't false-fail.

2. test/e2e/thin-client.test.ts:239 — skipped "remote ping triggers
   autopilot-cycle" with a clear TODO. Test asks the wrong question
   against the existing fixture: `gbrain serve --http` deliberately
   does NOT start a job worker (workers run via separate `gbrain jobs
   work` process), so the submitted autopilot-cycle job sits in
   `waiting` forever. Test was supposed to fall back to the self-imposed
   `--timeout`, but `gbrain remote ping --timeout` doesn't honor the cap
   when callRemoteTool hangs (loop only checks elapsed time between
   iterations; a single in-flight callTool with no AbortSignal blocks
   forever). Two real follow-ups would unblock: thread an AbortSignal
   through callRemoteTool's MCP callTool path, OR start a `gbrain jobs
   work` subprocess in beforeAll. Either is its own PR. Wire path
   coverage isn't lost — exercised by every other test in this file
   plus the entire serve-http-oauth.test.ts suite.

Verification:
  bun test test/e2e/serve-http-oauth.test.ts test/e2e/sources-remote-mcp.test.ts test/e2e/thin-client.test.ts
    → 47 pass / 1 skip / 0 fail in 8.4s
  bun run verify → clean

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: garrytan-agents <garrytan-agents@users.noreply.github.com>
Co-authored-by: Garry Tan <garrytan@gmail.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…date 4-segment versions (garrytan#815)

* v0.31.4.1 chore: align VERSION/package.json with garrytan#795 + mandate MAJOR.MINOR.PATCH.MICRO

PR garrytan#795 (takes v2) landed on master with `v0.31.4` in its commit subject but
never bumped VERSION, package.json, or CHANGELOG.md. Master shipped at 0.31.3.

This corrective release:
- Bumps VERSION + package.json to 0.31.4.1 (the dot-suffix follow-up channel
  documented in CLAUDE.md, so the patch number doesn't churn to 0.31.5)
- Adds the v0.31.4.1 CHANGELOG entry covering takes v2 (lessons from a 100K-take
  production extraction), the auth-on-Postgres regression fix, and the new
  `gbrain eval takes-quality` CLI surface
- Updates CLAUDE.md to mandate `MAJOR.MINOR.PATCH.MICRO` for every new release.
  Historical 3-segment versions in git log + migration filenames stay valid;
  do not rewrite. Going forward only.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: regenerate llms-full.txt for v0.31.4.1 doc edits

The build-llms regen-drift guard caught that llms-full.txt was stale relative
to the CHANGELOG + CLAUDE.md edits in the prior commit. Per CLAUDE.md the
bundle is auto-derived: bump VERSION/CHANGELOG/CLAUDE.md, then run
`bun run build:llms`. Did the second part now.

llms.txt unchanged (it's just the curated index). Only llms-full.txt picks
up the v0.31.4.1 CHANGELOG entry and the new "Version format is mandatory"
section in CLAUDE.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(ci): exclude *.serial.test.ts from test-shard.sh hash buckets

Root cause of test (2) failing on the v0.31.4.1 PR (and on master since
garrytan#795 landed): CI's scripts/test-shard.sh hashed every test file into 4
shards via FNV-1a, INCLUDING *.serial.test.ts files. Serial files share
file-wide state (top-level mock.module, module singletons) that's
supposed to be quarantined by the .serial.test.ts naming + local
run-serial-tests.sh running them at --max-concurrency=1.

In CI the quarantine didn't apply. eval-takes-quality-runner.serial.test.ts
(new in garrytan#795) hashes into shard 2, where it calls:

  mock.module('../src/core/ai/gateway.ts', () => ({
    chat: async (opts) => { ... },
    configureGateway: () => undefined,
  }));

That replaces every export of gateway.ts at module-load time for the
WHOLE shard process. voyage-multimodal.test.ts also lives in shard 2
(both files happen to hash there), and it imports `embedMultimodal` from
gateway.ts. After the serial file loads, `embedMultimodal` is undefined
inside the shard process, and all 18 of voyage-multimodal's
embedMultimodal tests fail. Tests still passed locally because
run-unit-shard.sh excludes .serial files from its parallel pass.

Fix:
  - scripts/test-shard.sh: add `-not -name '*.serial.test.ts'` to the
    find expression so serial files no longer compete for shard buckets.
    Add --dry-run-list flag to mirror run-unit-shard.sh's interface so
    the regression test can introspect without spawning bun test.
  - .github/workflows/test.yml: add a `bun run test:serial` step that
    runs on shard 1 (which already runs `bun run verify`). Uses the
    existing scripts/run-serial-tests.sh which invokes bun test at
    --max-concurrency=1, matching local behavior.
  - test/scripts/test-shard.slow.test.ts: 4 regression cases that pin
    the contract (no serial files in any shard, no e2e files in any
    shard, plain files partitioned without overlap). .slow.test.ts
    because it shells out 4× with pure-bash FNV-1a hashing (~14s
    wallclock); excluded from the local fast loop, runs in CI via the
    same hash bucketing as other slow tests.
  - CLAUDE.md: update the CI vs local divergence section so this
    intentional asymmetry is documented going forward.

Build-llms drift in test (1) was fixed in the prior commit (c99a4af).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: regenerate llms-full.txt for the CI-fix CLAUDE.md edits

The prior commit updated the "CI vs local: intentionally divergent file sets"
section in CLAUDE.md, which drifted llms-full.txt. Per CLAUDE.md the bundle
is auto-derived: edit CLAUDE.md, then run `bun run build:llms`. Did the
second part now.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tan#796)

* feat: extract facts during sync (real-time hot memory)

Wire facts extraction into the sync pipeline so pages imported via
git get facts extracted immediately, not only through MCP put_page.

Changes:
- Add notability field (high/medium/low) to facts extraction schema
- Upgrade default extraction model from Haiku to Sonnet (configurable
  via facts.extraction_model brain_config)
- Add notability-gated facts extraction to sync post-import hook:
  - Only HIGH notability facts inserted during sync (life events,
    major commitments, relationship/health changes)
  - MEDIUM facts deferred to dream cycle
  - LOW facts (logistical noise) dropped entirely
- Add notability column to facts table DDL
- Pass engine to extraction for config-aware model selection

Before: facts only extracted via MCP put_page (never during git sync)
After: meetings, conversations, personal pages get facts extracted
immediately on sync, with salience filtering

Closes the hot-memory gap where brain content committed via git was
invisible to the facts table until manually processed.

* fix: B1 — pass notability through facts JSON parser

Pre-fix, src/core/facts/extract.ts:tryArrayShape silently dropped the
LLM's notability field on the floor: the function copied fact/kind/
entity/confidence into the output but never read o.notability. The
outer loop in extractFactsFromTurn then read candidate.notability,
found undefined, and defaulted to 'medium'. sync.ts's HIGH-only filter
(`if (f.notability !== 'high') continue`) discarded 100% of facts.

Net: real-time facts on sync was a no-op despite Sonnet running and
costing money. Headline feature was dead on the happy path.

Fix is a one-line change in tryArrayShape. Two layers of test pin it:

  1. Parser-pin (test/facts-extract.test.ts +75 LOC, 5 cases):
     - notability passes through when LLM emits it
     - notability omitted defaults to undefined (legacy compat)
     - non-string notability is dropped defensively
     - every documented field survives the parse (future field-drop guard)
     - fenced JSON output (markdown code blocks) still threads correctly

  2. End-to-end smoke (test/facts-extract-smoke.test.ts NEW, 145 LOC,
     4 cases): drives extractFactsFromTurn with a stubbed gateway chat
     transport. Asserts HIGH input → notability:'high' all the way out.
     Guards against future prompt drift where Sonnet returns 'medium'
     for everything; smoke fails loudly so the eval-mining flow gets
     triggered.

Adds the chat test seam to enable the smoke test:
  src/core/ai/gateway.ts: __setChatTransportForTests(fn) mirrors
  v0.28.7's __setEmbedTransportForTests pattern. When set, chat()
  routes through the stub; isAvailable('chat') returns true so tests
  don't need full gateway configuration. resetGateway() clears it.
  Test files stay regular .test.ts (parallel-safe; no mock.module).

PR 1 commit 1 of 15. See ~/.claude/plans/swift-gliding-key.md for the
full eng review and bisect-friendly commit ordering.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: B2 — migration v46 ALTER facts.notability with idempotent CHECK

Pre-fix, the v0.31.1 PR shipped a CREATE TABLE edit to migration v45 that
added `notability NOT NULL DEFAULT 'medium' CHECK (notability IN (...))`
inline. Fresh installs got the column. But every brain that already ran
v45 BEFORE that edit (i.e., everyone running v0.31.0+ in production) keeps
the old facts table shape. INSERT now crashes with:

  column "notability" of relation "facts" does not exist

This is the canonical "embedded schema mutation breaks upgrades" trap that
CLAUDE.md cites: "bit users 10+ times across 6 schema versions over 2 years."

Fix: new migration v46 ALTER. Idempotent under all four states:

  1. Fresh install (v45 already added column inline)
     → ADD COLUMN IF NOT EXISTS no-ops; named CHECK probe finds existing
       constraint → skip. Postgres emits a NOTICE; no error.

  2. Old brain pre-edit (no column)
     → ADD COLUMN adds it with NOT NULL DEFAULT 'medium'; named CHECK
       probe finds nothing → adds the constraint.

  3. Partial state (column exists, CHECK missing)
     → ADD COLUMN no-ops; CHECK probe adds the named constraint.

  4. Re-run after success
     → all probes skip; no error, no state change.

Implementation notes:
  - CHECK constraint is named `facts_notability_check` (not autogen) so the
    information_schema-equivalent probe via `pg_constraint` can find it
    deterministically.
  - Column-level CHECK in v45 inline (autogen-named) and the named CHECK
    here are additive and non-conflicting — Postgres allows multiple CHECKs
    covering the same predicate. Codex flagged this concern; the named
    constraint addresses it cleanly.
  - Both engines run the same SQL. PGLite is real Postgres in WASM and
    supports DO $$ blocks. PGLite users with persistent older brains hit
    the same bug.

E2E coverage (test/e2e/migration-v46-notability.test.ts, 5 cases):
  - fresh-install fully-migrated: column + named CHECK both exist
  - old brain (column dropped): v46 adds both back
  - partial state (column exists, CHECK missing): v46 adds CHECK
  - idempotent re-run on fully-migrated: no error, state unchanged
  - CHECK constraint actually rejects out-of-domain values

Verified against real Postgres (pgvector/pgvector:pg16): 5/5 pass in 696ms.

PR 1 commit 2 of 15.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: B3 — restore v0_31_0 orchestrator gate to v < 45

Pre-fix, the v0_31_0 orchestrator's phaseASchema gate had been demoted
from `v < 45` to `v < 40` with an operator-facing message claiming
"v40 (facts hot memory + notability)". Facts is at v45, not v40 — the
message was wrong and the gate was permissive.

Symptom: brains at schema_version 40-44 (real states for users mid-
upgrade) passed the precondition, then immediately crashed on the
post-condition check three lines later (`SELECT FROM pg_tables WHERE
tablename = 'facts'`). Operator saw a green light, then a red light.

Fix: restore the gate to `v < 45` (the real semantic precondition:
the facts table is created by migration v45). Drop the misleading
"+ notability" claim — column shape is enforced by migration v46
alone (see MIGRATIONS[v46]), not gated here. Add a one-line comment
pointing at v46 so the next reader sees the separation.

Test coverage (test/migration-orchestrator-v0_31_0.test.ts NEW, 4 cases):
  - schema_version < 45 fails with operator-facing message naming v45
    + recovery command. Negative assertions guard against regression
    to the "v >= 40" / "+ notability" prior text.
  - schema_version >= 45 with facts table present → status complete.
  - dryRun short-circuits before any DB read.
  - null engine short-circuits with no_brain_configured.

Verified: 4/4 pass; v45 + v46 both apply cleanly during test setup.

PR 1 commit 3 of 15.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor: widen FactRow to expose notability across all readers

Codex's outside-voice pass on the cathedral plan flagged P1 #4: the read-
side contract was behind the write-side schema. notability lived in DDL
and the insertFact INSERT, but FactRow type omitted it and both row
mappers (pglite-engine + postgres-engine) silently dropped the column.
Every consumer above the engine (recall op, MCP _meta hook, CLI JSON
output) returned facts without their salience tier. PR2/PR3 surfaces
that need to filter or display notability would have required contract
surgery first; this lands the contract widening as the foundation.

Changes:
  - src/core/engine.ts: add `notability: 'high' | 'medium' | 'low'` to
    FactRow with doc comment naming the row source (column added by
    migration v46) and the consumers (recall, daily-page, admin, MCP).
  - src/core/postgres-engine.ts: FactRowSqlShape gains notability;
    rowToFactPg propagates it with `?? 'medium'` belt-and-suspenders
    fallback (NOT NULL DEFAULT in DDL is the primary; this is the
    second line for any pre-v46 row that survives a SELECT).
  - src/core/pglite-engine.ts: same pair (interface + mapper).
  - src/core/operations.ts: recall op response shape adds notability.
  - src/core/facts/meta-hook.ts: `_meta.brain_hot_memory` payload
    surfaces notability so connected agents can filter or weight
    HIGH-tier facts in their context budget.
  - src/commands/recall.ts: `--json` output adds notability.

Test contract pin (test/facts-engine.test.ts):
  - Existing 'inserts a fact' case asserts default 'medium' on the
    read side (caller-omits-notability path).
  - New 'notability round-trips for each tier' case inserts HIGH /
    MEDIUM / LOW explicitly and reads back the same tier — without
    this assertion, codex P1 #4 reappears silently.

Test fixtures (facts-classify.test.ts + facts-decay.test.ts) also
updated: makeFact() factories now construct complete FactRow objects
with notability:'medium' to match the tightened type.

PR 1 commit 4 of 15.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor: move isFactsBackstopEligible to src/core/facts/eligibility.ts

Single source of truth for "should this page write fire the facts
extraction backstop?" Pre-extraction, lived inline at operations.ts:633
where only put_page could see it; sync.ts had its own divergent type
filter (`['conversation', 'transcript', 'personal', 'therapy', 'call']`
— only `meeting` was a real PageType, the rest never matched). Sync's
filter is deleted in commit 7; everyone routes through this predicate.

Adds the slug-prefix rescue branch the eng review pinned (D-eligibility):
parsed.type ∈ ELIGIBLE_TYPES OR slug.startsWith('meetings/' | 'personal/'
| 'daily/'). The rescue catches `meetings/2026-05-09-foo.md` pages that
frontmatter-typed themselves as 'note' (the legacy default) — directory
location wins.

Test pin (test/facts-eligibility.test.ts NEW, 28 cases):
  - 4 BRANCH cases: typed-only, slug-only (each prefix), both, neither
  - 7 GUARD cases: null/undefined parsed, wiki/agents/, dream_generated,
    body length thresholds (< 80, exactly 80, whitespace-only)
  - 14 COVERAGE cases: every eligible PageType on arbitrary slug → ok;
    every non-eligible PageType on non-rescued slug → kind:<type> reason

Pure-function tests; no DB. The full predicate covered without spinning
a brain.

Existing test/facts-backstop-gating.test.ts still passes (it tests the
predicate via put_page; the move is transparent to that surface).

PR 1 commit 5 of 15.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: add runFactsBackstop helper with full extract→resolve→dedup→insert pipeline

Single shared facts pipeline used by every brain write surface that
wants real-time hot memory extraction. Replaces five divergent
implementations:
  - put_page MCP backstop hook (operations.ts:556)
  - extract_facts MCP op (operations.ts:2438-2486)
  - sync.ts post-import block (deleted in commit 7)
  - file_upload + code_import (wired in commit 10)

Encapsulates the v0.31 smart pipeline:
  extract → resolve → dedup (cosine @ 0.95) → insert
(matches extract_facts op precedent at operations.ts:2460.)

Two execution modes (D8):
  - 'queue' (default): fire-and-forget via getFactsQueue().enqueue.
    Caller awaits ~zero (just enqueue + microtask). Sync stays fast
    on a 50-page batch.
  - 'inline': await full pipeline; return real {inserted, duplicate,
    superseded, fact_ids} counts. Used by extract_facts MCP op.

Discriminated return shape so TypeScript catches mode/result mismatches
at the call site:
  | { mode: 'queue'; enqueued; queueDepth; skipped? }
  | { mode: 'inline'; inserted; duplicate; superseded; fact_ids; skipped? }

Notability filter (D4): per-caller policy via FactsBackstopCtx.notabilityFilter.
Sync passes 'high-only' (HIGH lands now, MEDIUM waits for dream cycle,
LOW dropped at LLM layer). Other surfaces default to 'all'. Filter runs
post-LLM, pre-insert: saves the insert work but not the LLM call (the
notability tier IS what we're calling Sonnet to determine).

Eligibility + kill-switch gates run before any LLM cost. Skipped reasons
are stable strings the future facts:absorb writer (commit 13) and doctor
check (commit 12) consume.

Re-throws AbortError; absorbs gateway/parse/queue errors as `skipped: '...'`
envelope. Operator visibility lands via PR1 commit 13's ingest_log writer
(facts:absorb source_type).

Test pin (test/facts-backstop.test.ts NEW, 12 cases):
  - 3 eligibility/kill-switch cases (extraction_disabled, subagent_namespace,
    dream_generated)
  - 5 inline-mode cases (insert + counts, notability filter, source string,
    empty extraction, abort)
  - 3 queue-mode cases (default mode, explicit mode, kill-switch envelope)
  - 1 dedup contract case (insertions without embeddings short-circuit
    cleanly; embedding-driven dedup is exercised by E2E with real gateway)

PGLite in-memory; LLM stubbed via __setChatTransportForTests (commit 1's
seam). 12/12 pass in 912ms.

PR 1 commit 6 of 15.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor: sync.ts uses runFactsBackstop (deletes dead-code type filter)

Pre-fix sync.ts had a 60-line inline facts extraction block carrying:
  1. Dead-code eligibility filter: ['meeting', 'conversation',
     'transcript', 'personal', 'therapy', 'call'] — only `meeting` is
     a real PageType. The other five never matched anything; eligibility
     rested on the slug-prefix branch alone.
  2. Divergent shape from put_page's backstop: no dedup, no supersede,
     raw extract→insert. Garbage rows on re-sync.
  3. Sequential per-page LLM calls in sync's request path: a 50-page
     sync = 50 Sonnet calls in series ≈ 5+ minutes blocking.

Replaced with `runFactsBackstop(parsedPage, ctx)` from PR1 commit 6:
  - Queue mode (fire-and-forget) so sync stays fast on multi-page batches.
  - 'high-only' notabilityFilter (cathedral spec: HIGH lands now,
    MEDIUM waits for dream cycle, LOW dropped at LLM).
  - isFactsBackstopEligible (commit 5) — eligibility lives in one place.
  - extract → resolve → dedup (cosine @ 0.95) → insert pipeline shared
    with put_page + extract_facts.

Per-page try/catch survives so one failed page doesn't blow up the
whole sync (best-effort posture preserved).

Existing test/sync.test.ts (39 cases) passes unchanged — sync's outer
contract is untouched, only the inner facts-extract block changed.

PR 1 commit 7 of 15.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor: operations.ts put_page uses runFactsBackstop

Replace the inline get-queue-extract-resolve-insert closure (operations.ts:540-583)
with a single `runFactsBackstop(parsed, ctx)` call in queue mode. put_page
and sync now share the same eligibility/extract/dedup/insert pipeline.

Behavioral preservation:
  - Response shape `{queued: true} | {skipped: '<reason>'}` unchanged for
    MCP clients. The helper's namespaced 'eligibility_failed:<reason>'
    discriminator is mapped back to the bare reason ('kind:guide',
    'too_short', 'subagent_namespace', 'dream_generated') before write
    to factsQueued. test/facts-backstop-gating.test.ts (5 cases) passes
    without modification.
  - Default 'all' notabilityFilter (MEDIUM facts continue to land via
    put_page; only sync filters to HIGH-only). This matches the
    pre-v0.31.2 surface: put_page's prior shape inserted everything the
    LLM returned, with the dream cycle's consolidate phase doing the
    salience clustering overnight.

Net: -32 LOC of inline pipeline; one shared call site + one mapping
shim; same observable shape.

PR 1 commit 8 of 15.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor: operations.ts extract_facts uses runFactsPipeline

Replace the 65-line inline extract→resolve→dedup→insert loop in the
extract_facts MCP op (operations.ts:2369-2454) with a single
`runFactsPipeline(turn_text, ctx)` call. The inline pipeline + the
helper are now the same code path; test/facts-mcp-allowlist + test/
facts-anti-loop pass unchanged.

Architecture: the helper has two entry points now —
  - `runFactsBackstop(parsedPage, ctx)` — page-write hook with
    eligibility + kill-switch + queue mode dispatch (PR1 commit 6).
    Used by put_page, sync, file_upload, code_import.
  - `runFactsPipeline(turnText, ctx)` — raw turn-text entry that
    skips the page-shape eligibility predicate. Used by extract_facts
    MCP op (this commit).

Both share an inner `runPipelineWithBody` so the actual extract → resolve
→ dedup (cosine @ 0.95) → insert pipeline lives in one place. Codex P0 #2
called this out: "extract_facts already does the smart pipeline; put_page
+ sync do raw extract→insert. Centralizing only extraction codifies the
worse pipeline." With commit 9, every fact-insert path goes through the
smart pipeline; raw insertFact loops in the brain are gone.

Behavioral preservation:
  - extraction_disabled kill-switch envelope unchanged.
  - is_dream_generated → returns {skipped: 'dream_generated'} envelope
    (the predicate-bypass path; eligibility doesn't apply on raw
    turn_text but dream_generated still does). Pre-fix the extractor
    itself short-circuited; new shape surfaces the skip explicitly to
    MCP clients.
  - Visibility ('private' | 'world') threading preserved.
  - Response shape {inserted, duplicate, superseded, fact_ids} identical
    to pre-fix.

PR 1 commit 9 of 15.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: document why file_upload + code_import don't wire runFactsBackstop

PR1 commit 10 was scoped in the eng review plan to "wire runFactsBackstop
to file_upload and code_import paths." Implementation analysis revealed
all three candidate surfaces are correctly handled WITHOUT explicit
wiring:

  1. file_upload (operations.ts:1713) doesn't write a page. It uploads
     a file to storage + inserts a `files` row. The associated page is
     written separately via put_page, which already fires runFactsBackstop
     in queue mode (commit 8). No double-firing needed.

  2. importCodeFile (this file) writes pages with type='code'. The
     isFactsBackstopEligible predicate rejects 'code' kind with reason
     `kind:code`. Wiring runFactsBackstop here would always return the
     skipped envelope. When README / doc-comment extraction lands in a
     future release, the eligibility predicate is the single place to
     update — adding 'code' to ELIGIBLE_TYPES makes existing call sites
     auto-cover the change.

  3. `gbrain import` (commands/import.ts) is bulk markdown import. Firing
     facts extraction on every imported page would cost-spike on first-
     time bulk imports of large brain repos (10K+ pages × Sonnet =
     hundreds of dollars). User runs `gbrain dream` or the consolidate
     phase to backfill facts from bulk-imported pages.

Adds a docstring above importCodeFile capturing all three rationales so
the next maintainer doesn't re-do this analysis.

PR 1 commit 10 of 15 — no behavior change; documentation only.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: migration v47 — ingest_log.source_id ALTER (codex P1 #3)

Pre-fix the ingest_log table had no source_id column; sync.ts wrote rows
without source-scoping and doctor only checked 'default'. Codex's outside
voice flagged this on the cathedral plan: "facts:absorb logging inherits
a surface that cannot tell you which source is failing."

This commit closes the multi-source observability gap on the foundation:
  - PR1 commit 13's facts:absorb writer (next) writes ingest_log rows
    with source_id so multi-source brains scope failures per source.
  - PR1 commit 12's doctor's facts_extraction_health check (after that)
    iterates over `SELECT DISTINCT id FROM sources` instead of hardcoded
    'default'.

Migration v47 (idempotent, both engines):
  ALTER TABLE ingest_log ADD COLUMN IF NOT EXISTS source_id TEXT
    NOT NULL DEFAULT 'default';
  CREATE INDEX IF NOT EXISTS idx_ingest_log_source_type_created
    ON ingest_log (source_id, source_type, created_at DESC);

Schema-bootstrap coverage:
  - schema.sql / pglite-schema.ts inline definitions add source_id +
    the new index for fresh installs.
  - applyForwardReferenceBootstrap (both PGLite + Postgres) probes for
    `ingest_log.source_id` and adds the column BEFORE SCHEMA_SQL replay
    builds the new composite index. Without this, old brains running
    initSchema() on the new schema-embedded.ts would crash on the index
    creation (the column doesn't exist yet at replay time).
  - test/schema-bootstrap-coverage.test.ts pins ingest_log.source_id as
    REQUIRED_BOOTSTRAP_COVERAGE — adding a forward reference without
    extending applyForwardReferenceBootstrap would fail this guard.

E2E (test/e2e/migration-v47-ingest-log-source-id.test.ts NEW, 3 cases):
  - fresh-install: column + index both exist after runMigrationsUpTo(LATEST).
  - old-brain simulation: drop column, run v47, column reappears with
    NOT NULL DEFAULT 'default'; INSERT without source_id picks up the
    default.
  - idempotent re-run: v47 twice in a row is a no-op.

Verified against real Postgres (pgvector/pgvector:pg16): 3/3 pass; the v46
+ v47 E2Es land green together (8/8 in 2.05s). Bootstrap-coverage unit
test (5 cases) also green.

PR 1 commit 11 of 15.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: facts:absorb writer + reason codes (D5 contract)

D5 from /plan-ceo-review: every absorbed failure in the facts extraction
pipeline writes one row to ingest_log so doctor + admin dashboard
surface failures cross-process. CLAUDE.md's "zero silent failures" rule
gets enforced on the foundation.

Wires three layers:

  1. Type widening (src/core/types.ts):
     - IngestLogEntry gains source_id (codex P1 #3 — migration v47).
     - IngestLogInput gains optional source_id; engines default to 'default'.

  2. Engine row writers (pglite-engine.ts + postgres-engine.ts):
     - logIngest threads source_id into INSERT.
     - getIngestLog applies belt-and-suspenders 'default' fallback for
       any pre-v47 row that somehow survived.

  3. Helper (src/core/facts/absorb-log.ts NEW):
     - writeFactsAbsorbLog(engine, ref, reason, detail, sourceId) writes
       one ingest_log row with source_type='facts:absorb' and
       summary='<reason>: <detail truncated to 240 chars>'.
     - classifyFactsAbsorbError(err) heuristic-pattern-matches arbitrary
       Errors into 6 stable reason codes:
         gateway_error  | parse_failure  | queue_overflow
         queue_shutdown | embed_failure  | pipeline_error
     - Best-effort: any logging failure is caught + stderr-warned;
       the caller's pipeline keeps running.

  4. runFactsBackstop wiring (src/core/facts/backstop.ts):
     - queue mode: errors inside the queue worker classify + log via
       absorb-log.ts. Were previously invisible (counter increment only).
     - queue overflow drop also writes an absorb log row so doctor sees
       the depth of capacity pressure.
     - inline mode: errors bubble; caller decides logging (extract_facts
       MCP op surfaces them as op-error responses).

Test pin (test/facts-absorb-log.test.ts NEW, 12 cases):
  - 7 classifier cases pinning every reason path + fallback
  - 5 writer cases pinning ingest_log row shape, custom sourceId,
    240-char detail truncation, no-throw contract, reason-set
    completeness

PR1 commit 12 (next) reads these rows for the facts_extraction_health
doctor check.

PR 1 commit 13 of 15.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: doctor facts_extraction_health check (multi-source)

Mirrors the eval_capture check shape but reads facts:absorb rows
(written by writeFactsAbsorbLog from PR1 commit 13). Iterates over
EVERY source (codex P1 #3 motivation) so multi-source brains see
per-source failure rates instead of only 'default'.

Configurable threshold: facts.absorb_warn_threshold (default 10 over
the last 24h, per source, per reason). When the threshold is exceeded
for any (source, reason) pair, status flips to warn and the message
names the breakdown:

  facts:absorb activity in last 24h (under threshold 10):
    default: 4 gateway_error, 1 parse_failure |
    team-source: 2 queue_overflow

Single SQL grouping query covers the read; the composite index v47
added (idx_ingest_log_source_type_created on source_id, source_type,
created_at DESC) covers the filter + sort path so the check is fast
on brains with millions of ingest_log rows.

Operator UX:
  - 'ok' under threshold (or zero failures) → quiet.
  - 'warn' over threshold → message names every (source, reason, count)
    tuple. Recovery hint: `gbrain recall --since 24h --json` to inspect
    what landed; `gbrain config set facts.absorb_warn_threshold N` to
    tune.
  - Pre-v47 brain (column missing): 'ok' with skipped reason pointing
    at `gbrain apply-migrations --yes`.
  - RLS denies SELECT: 'warn' calling out that capture INSERTs are
    likely also blocked.

Test pin (test/doctor.test.ts +28 LOC, 1 case):
  Source-string assertions on the doctor.ts block:
    - 'GROUP BY source_id' (multi-source contract)
    - "source_type = 'facts:absorb'" (right table query)
    - 'facts.absorb_warn_threshold' (configurable threshold)
    - INTERVAL '24 hours' (right window)
    - 'Skipped (ingest_log.source_id unavailable' (pre-v47 fallback)
    - 'RLS denies SELECT on ingest_log' (RLS hint)
  Negative: must NOT contain `source_id = 'default'` (the bug we're
  fixing — codex P1 #3 was that doctor only checked 'default').

Live smoke against real Postgres: doctor renders the new check between
'eval_capture' and 'effective_date_health' as expected, shows 'ok' on
an empty test brain.

PR 1 commit 12 of 15.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: notability-eval mining + public-anonymized fixture (40 cases)

The notability gate is the load-bearing differentiator of the cathedral:
"only HIGH lands on sync, MEDIUM waits for the dream cycle, LOW dropped
at the LLM layer." Without an eval, the gate's quality is asserted via
hope; prompt drift (Sonnet returning 'medium' for everything) silently
turns the headline feature into a no-op.

This commit adds the mining half — eval suite is pinned in the next
commit (15).

NEW src/commands/notability-eval.ts:
  - mineNotabilityCandidates(repoPath, opts): walks meetings/, personal/,
    daily/ in the brain repo, splits markdown bodies into paragraphs
    (filtered by 80–800 char length), pre-classifies each paragraph
    with cheap-Haiku to bucket into HIGH/MEDIUM/LOW (round-robin
    fallback when no chat gateway is available — local development
    without API keys still produces a candidates file).
  - Stratified random sample within each bucket: HIGH/MEDIUM/LOW
    targets default 20/20/10 (per cathedral plan D7=B). Stratified
    further across the three corpus dirs so HIGH cases come from
    multiple dirs not just one.
  - JSONL utilities (loadJsonlCases, writeJsonlCases) shared with the
    review path. Default paths: ~/.gbrain/eval/notability-mining-
    candidates.jsonl (mining) + ~/.gbrain/eval/notability-real.jsonl
    (private confirmed).
  - TTY review subcommand: walks candidates one-by-one, asks for
    HIGH/MEDIUM/LOW confirmation, writes confirmed cases. Smoke-only
    test (TTY interactivity is hard to test deterministically).

CLI dispatch (src/cli.ts):
  - `gbrain notability-eval mine` (default targets 20/20/10).
  - `gbrain notability-eval review` (TTY hand-confirm).
  - `gbrain notability-eval help` (flag reference).
  - sync.repo_path resolution mirrors the dream phase pattern; --repo
    PATH overrides.

NEW test/fixtures/notability-eval-public.jsonl (40 cases):
  - 14 HIGH (life events, major commitments, relationship/health changes,
    financial decisions).
  - 13 MEDIUM (durable preferences, beliefs, strong opinions revealing
    character).
  - 13 LOW (logistical noise — restaurant orders, scheduling, errands).
  - Anonymized per CLAUDE.md privacy rule (alice-example, acme-co,
    widget-co, fund-a placeholder names; no real contacts).
  - Each case has a `tier_rationale` string documenting the choice for
    reviewer transparency.
  - Used by CI's eval harness in commit 15 (no API key required for
    deterministic stub-driven contract tests).

PR 1 commit 14 of 15.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: notability-eval harness with precision@HIGH metric (40-case fixture)

Pins the load-bearing gate-quality contract in CI. Without this, prompt
drift (Sonnet returning 'medium' for everything → sync inserts nothing)
ships silently. The harness flips it from "asserted by hope" to "asserted
by metric."

NEW test/notability-eval.test.ts (13 cases across 5 describe blocks):

  1. splitParagraphs (2 cases): blank-line splitting, length filters.
  2. walkMarkdownFiles (1 case): tree walk drops non-.md files.
  3. mineNotabilityCandidates round-robin path (2 cases): empty corpus
     + populated corpus produce expected candidate shape; round-robin
     keeps tests deterministic without an LLM.
  4. JSONL utilities (3 cases): write+read round-trip, malformed-line
     skip, default paths under ~/.gbrain/eval/.
  5. Public-anonymized fixture shape (2 cases): 40 cases, ≥10 per tier,
     every paragraph ≥80 chars, every case has a tier_rationale.
  6. Eval harness contract (3 cases) — the headline assertions:
     - Perfect predictor (LLM-stub returns confirmed_tier verbatim) →
       precision@HIGH = 1.0, recall@HIGH = 1.0.
     - Always-medium model → precision@HIGH = 0 (no HIGH predictions
       at all). Pins the "harness handles the no-positive-prediction
       case correctly" contract.
     - Always-high model → precision drops below the 0.50 PR-fail
       threshold (TP / (TP + FP) = 14 / 40 = 0.35). Pins the
       "harness CORRECTLY flags a misaligned model" contract.

Sample size justification: the public fixture has 14 HIGH cases. For
precision@HIGH = 0.75 with a 95% CI ±10pp, n=14 gives the right floor
for "is the gate dramatically wrong" — tighter measurements need the
private fixture (50 cases via mine + review).

The harness is a CONTRACT test for the metric shape, not a quality
measurement of any specific model. A real quality run uses the same
harness against a real Sonnet (no chat-transport stub) — that flow is
exposed via GBRAIN_NOTABILITY_EVAL_REAL=1 + the private mined fixture.

All 92 tests across all PR1 facts files pass green (extract / extract-
smoke / engine / backstop / eligibility / absorb-log / notability-eval).

Soft gate per the cathedral plan: warn if precision@HIGH < 0.75; fail
PR if < 0.50. CI wiring + the production gate are deferred to PR2 (the
visibility/observability surface PR); this PR1 commit lands the harness
+ fixture + contract tests so the gate is ready to wire.

PR 1 commit 15 of 15. Cathedral foundation lands here.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test: fill PR1 gap-fill — backstop integration + Postgres parity

Test gap analysis flagged three high-priority untested behaviors in
PR1's surface:

  Gap #3: extract_facts MCP op response shape stability after
    routing through runFactsPipeline (commit 9). Existing tests
    pin allowlist + anti-loop but not the {inserted, duplicate,
    superseded, fact_ids} envelope that MCP clients display.

  Gap #4: per-engine row-mapper parity for notability. facts-engine.test.ts
    pins notability round-trip on PGLite; the Postgres row mapper
    (postgres-engine.ts:rowToFactPg) is different code that wasn't
    pinned. Codex P1 #4 was specifically about read-side contracts
    drifting silently.

  Gap #5: multi-source isolation in facts:absorb logging. Codex
    P1 #3 motivated the source_id column; the absorb-log test pins
    that source_id is written but not that source_id-scoped queries
    return only the right source's rows.

NEW test/facts-backstop-integration.test.ts (6 cases):
  - 2 cases on runFactsPipeline (extract_facts path) response shape:
    successful extraction returns full {inserted, duplicate, superseded,
    fact_ids} envelope with positive fact_ids; empty extraction returns
    zero counts (no NaN/undefined).
  - 2 cases on facts:absorb multi-source isolation: writeFactsAbsorbLog
    rows are source-scoped; doctor's GROUP BY source_id query produces
    the expected per-source breakdown.
  - 2 cases on queue mode: happy-path drain pins counters.completed >= 1
    + counters.failed == 0; documented case noting that extract.ts
    absorbs gateway errors silently (errors propagate from layers
    ABOVE extract — resolver, dedup, insert — to backstop's catch,
    not from the chat call itself).

NEW test/e2e/facts-notability-roundtrip.test.ts (5 cases, real Postgres):
  - HIGH/MEDIUM/LOW round-trip via insertFact + listFactsByEntity.
  - Omitting notability defaults to medium (NOT NULL DEFAULT contract).
  - listFactsSince also surfaces notability.
  All 5 pin the postgres.js driver + rowToFactPg row mapper.
  PGLite parity is covered by the existing test/facts-engine.test.ts
  case from commit 4.

Verified: 6/6 unit + 5/5 E2E green. The third high-priority gap
(integration sync.ts → runFactsBackstop end-to-end) is sufficiently
covered by the existing test/sync.test.ts behavior plus the per-page
runFactsBackstop assertions in test/facts-backstop.test.ts; chasing
the full happy-path sync→facts integration would require a real
git fixture which is heavier than warranted for this surface.

PR 1 commit 16 of 16 (gap fill).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Wintermute <wintermute@garrytan.com>
Co-authored-by: Garry Tan <garrytan@gmail.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…n#798 + garrytan#788 + garrytan#536 + garrytan#376 + garrytan#128 adapted) (garrytan#804)

* fix: merge resolver entries from all files (RESOLVER.md + AGENTS.md)

OpenClaw deployments typically have AGENTS.md at the workspace root as the
real skill dispatcher (200+ entries), while gbrain skillpacks install a
thin skills/RESOLVER.md (~40 entries). The previous first-match-wins policy
meant check-resolvable only saw the thin RESOLVER.md, reporting 187 skills
as 'unreachable' when they were fully routed in AGENTS.md.

Now: check-resolvable collects entries from ALL resolver files across both
the skills directory and its parent. Entries are deduped by skillPath
(first occurrence wins). The combined content is also passed to the
routing-eval (Check 5) so routing fixtures see the full trigger index.

New function findAllResolverFiles() in resolver-filenames.ts returns all
matching files instead of just the first. findResolverFile() is unchanged
(backward-compatible for callers that need a single path).

Before: 37/224 reachable (our deployment)
After:  200/224 reachable (remaining 24 are genuine gaps)

Tests: 8 new (findAllResolverFiles + checkResolvable merge behavior)

* fix: graph_coverage skipped when brain has 0 entity pages

Closes garrytan#530.

`graph_coverage` measures `link_coverage` (fraction of entity pages with
inbound links) and `timeline_coverage` (fraction with timeline entries).
Both formulas divide by entity-page count.

For markdown-only brains (journals, wikis, notes — Karpathy's original
LLM Wiki use case) the entity count is 0, so coverage is structurally
undefined. The check still reported 'warn: 0%' under that condition,
which:
1. Brain owners cannot satisfy without indexing code/entities
2. Doctor's hint references stale commands (`link-extract` /
   `timeline-extract` were renamed to `extract` in v0.22)
3. Adds noise to compliance/health automation gating on doctor exit

Fix: detect entity-page count via SQL. If 0, mark check 'ok' with explanation.
Otherwise keep existing logic but update hint to current `gbrain extract all`.

Tested on Nous AGaaS production wiki: 2533 markdown pages, 100% embedded,
6086 wikilinks, 1964 timeline entries — 0 entity pages — graph_coverage
correctly clears.

* fix(doctor): deprecate stale link-extract / timeline-extract verb names

The graph_coverage hint and the link-extraction.ts header comment
still referenced `gbrain link-extract` / `gbrain timeline-extract`,
which were consolidated into `gbrain extract <links|timeline|all>` in
v0.16. Following the consolidation in garrytan#536's resolution (which fixed
the doctor hint to `gbrain extract all`), this commit removes the last
stale reference in `src/core/link-extraction.ts`'s header comment.

Originally PR garrytan#376 by @FUSED-ID. The doctor.ts portion of garrytan#376 is
absorbed by garrytan#536's richer warn message; this commit lands garrytan#376's
`link-extraction.ts` portion only.

Co-Authored-By: Leon-Gerard Vandenberg <FUSED-ID@users.noreply.github.com>

* test(doctor): pin canonical `gbrain extract all` hint, ban stale verbs

IRON-RULE regression guard for PR garrytan#376 + garrytan#536's graph_coverage hint
fix (locked in v0.31.7 eng-review). The removed verbs `gbrain
link-extract` and `gbrain timeline-extract` were consolidated into
`gbrain extract <links|timeline|all>` in v0.16 but the hint kept
suggesting them for ~30 releases. Pin the user-facing copy at the
source-string level so a future edit can't silently re-regress.

Structural assertion in the existing `doctor command` describe block,
matching the file's existing `frontmatter_integrity` / `rls_event_trigger`
pattern. No DB-fixture infrastructure needed.

* fix: sync RESOLVER.md triggers with v0.25.1 skill frontmatter

`gbrain doctor` reported 36 routing-miss/ambiguous warnings against the
v0.25.1 wave skills (book-mirror, article-enrichment, strategic-reading,
concept-synthesis, perplexity-research, archive-crawler, academic-verify,
brain-pdf, voice-note-ingest). Each skill's frontmatter declared 4-5
triggers, but only the first ever made it into RESOLVER.md's hand-curated
rows. The structural matcher couldn't find any specific phrase for
realistic user intents, so requests fell through to broader parents
(`ingest`, `enrich`, `data-research`).

Pulled the missing triggers from each skill's `triggers:` frontmatter
into the matching RESOLVER.md row. Converted media-ingest's prose row
to quoted triggers so the matcher actually sees them. Added
`"summarize this book"` to media-ingest (covers a book-mirror
disambiguation fixture). Marked article-enrichment + perplexity-research
fixtures with `ambiguous_with` for the parent skills they intentionally
chain with — RESOLVER.md's preamble explicitly documents that skills are
designed to chain, so this is acknowledging the truth, not papering over
a bug.

Result: 36 routing warnings → 0. resolver-test/check-resolvable/
routing-eval suite: 140/0.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(doctor): find skills/ on every deployment shape (read-path-only)

Adapts the install-path resolution from PR garrytan#128 (TheAndersMadsen) into
the existing 5-tier autoDetectSkillsDir architecture. Two new code paths,
read-path-only by design:

1. Tier-0 $GBRAIN_SKILLS_DIR explicit operator override on the SHARED
   autoDetectSkillsDir. Safe for both read and write paths because the
   operator explicitly set the var — opt-in retargeting is fine.

2. New autoDetectSkillsDirReadOnly() function for READ-ONLY callers
   (gbrain doctor, check-resolvable, routing-eval). Wraps the shared
   detect; on null, walks up from fileURLToPath(import.meta.url) gated
   by isGbrainRepoRoot() so unrelated repos along the install path
   can't false-positive.

The split is the architectural fix for a write-path regression risk
codex outside-voice review surfaced (eng-review D5): adding the
install-path fallback to the SHARED resolver would let `gbrain skillpack
install` from `~` silently target the bundled gbrain repo's skills/
instead of the user's actual workspace. Three write-path call sites stay
on the original autoDetectSkillsDir; three read-path call sites switch
to the new readOnly variant.

Closes the install-path footgun for hosted-CLI installs:
`bun install -g github:garrytan/gbrain && cd ~ && gbrain doctor` now
finds the bundled skills/ instead of warning "Could not find skills
directory."

Test surface: 8 new cases in test/repo-root.test.ts covering tier-0
valid/invalid/precedence, install-path walk, isGbrainRepoRoot gate
(via primary-success-no-drift assertion), AUTO_DETECT_HINT updates,
and the D5 regression guard that pins the read-path/write-path split.

Co-Authored-By: Anders Madsen <TheAndersMadsen@users.noreply.github.com>

* docs(changelog): expand v0.31.7 entry for full 5-PR doctor wave

Promotes headline from "doctor stops crying wolf about unreachable
skills on OpenClaw" to the assembled wave's narrative: every doctor
false-positive class on disk today, plus the install-path footgun
that bit every hosted-CLI user.

Numbers-that-matter table expanded to 6 rows covering all 5 PRs.
Itemized-changes section grouped by sub-wave: resolver merge,
RESOLVER.md trigger sync, graph_coverage zero-entity, stale verb
hint fix, install-path resolver. Contributors named explicitly:
@mayazbay, @psperera, @FUSED-ID, @TheAndersMadsen. "For contributors"
section flags the new SkillsDirSource variants and the read-path /
write-path split as the canonical pattern for future fallback
additions.

* chore(v0.31.7): bump version + regenerate llms + fix CLI regression-gate

Wraps up the v0.31.7 doctor-fix wave:

- VERSION + package.json: 0.31.1.1-fixwave -> 0.31.7
- llms-full.txt: regenerated against the expanded v0.31.7 CHANGELOG
  entry (committed bundle drift caught by test/build-llms.test.ts)
- test/check-resolvable-cli.test.ts: update the REGRESSION-GATE for
  empty-cwd no_skills_dir error to reflect v0.31.7's intentional
  behavior change. The install-path fallback in autoDetectSkillsDirReadOnly
  now finds the bundled skills/ from any cwd inside the gbrain repo,
  so the test asserts source: 'install_path' instead of error: 'no_skills_dir'.
  This is the wave's headline capability ("doctor finds itself on every
  deployment shape") rather than a regression.

Pre-existing flake unrelated to this wave: BrainRegistry — lazy init >
empty/null/undefined id routes to host fails on machines that have
~/.gbrain/config.json present (the test assumes test env has none).
Reproduces on master before this wave landed; not a v0.31.7 regression.
Filed for follow-up in next maintainer hygiene sweep.

* fix(doctor): close write-path leak in --fix + sync routing-eval merge

Codex adversarial review of v0.31.7 caught a HIGH that the eng review
missed (D6 lock during /ship): the read-path-only architecture for the
install-path fallback is leaky because TWO of the three "read-only"
callers (doctor, check-resolvable) actually have write modes via --fix
that call autoFixDryViolations() and writeFileSync to SKILL.md files.
A user running `cd ~ && gbrain doctor --fix` with no skills/RESOLVER.md
up the cwd tree would resolve via the install-path fallback to the
bundled gbrain repo and silently rewrite the install-tree skills —
exactly the regression D5's split was supposed to prevent.

Fix: when --fix is requested and the resolved skills dir came from the
install-path source, refuse with a clear error pointing at GBRAIN_SKILLS_DIR
/ OPENCLAW_WORKSPACE / --skills-dir as explicit overrides. The read parts
of doctor and check-resolvable continue to benefit from the install-path
fallback (the v0.31.7 capability headline); only --fix is gated.

Plus a MEDIUM consistency fix codex flagged: routing-eval was still
single-file-only while check-resolvable does multi-file merge across
skills/RESOLVER.md + ../AGENTS.md. On OpenClaw layouts this caused
routing-eval and check-resolvable to disagree on what's routable.
routing-eval now uses the same findAllResolverFiles + content-merge
pattern as check-resolvable, so all three commands see the same
trigger index.

Test coverage: D6 regression guard in test/check-resolvable-cli.test.ts
spawning a real subprocess from an empty tempdir (no env, no cwd
fallback) and asserting --fix refuses with the correct stderr message.

Co-Authored-By: Codex (outside-voice review) <noreply@openai.com>

* docs(changelog): note D6 --fix gate + routing-eval merge in v0.31.7 entry

* docs: post-ship sync for v0.31.7

CLAUDE.md updates only. CHANGELOG.md was already authored by /ship and was left untouched.

- src/core/repo-root.ts annotation: read-path/write-path split, tier-0 GBRAIN_SKILLS_DIR override, autoDetectSkillsDirReadOnly install-path fallback, D6 --fix safety gate.
- src/commands/check-resolvable.ts annotation: multi-file resolver merge across skills dir + parent (37/224 -> 200/224 reachable on the reference OpenClaw layout), install-path read-only fallback, D6 --fix gate.
- src/commands/routing-eval.ts annotation: same multi-file merge as check-resolvable; v0.25.1 RESOLVER.md trigger sync.
- src/commands/doctor.ts annotation: switched to autoDetectSkillsDirReadOnly so 'cd ~ && gbrain doctor' finds bundled skills via install-path fallback; --fix D6 install-path refuse-write gate; graph_coverage zero-entity short-circuit + canonical 'gbrain extract all' hint with regression-test pin.
- Test inventory: replaced bare regression-v0_16_4 line with explicit test/repo-root.test.ts entry (20 cases - 12 existing + 8 new D3/D5) and new test/resolver-merge.test.ts entry (8 cases).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(llms): regenerate after CLAUDE.md sync for v0.31.7

* ci(test): quarantine *.serial.test.ts files from test-shard

CI's test-shard.sh was including *.serial.test.ts files in the parallel
shard runs, which broke voyage-multimodal.test.ts: 18 of its 22 tests
failed in CI shard 2 because eval-takes-quality-runner.serial.test.ts
ran before it in the same bun-test process and leaked its mock.module()
substitution of src/core/ai/gateway.ts. The leaked mock omitted
embedMultimodal and resetGateway, so voyage-multimodal saw `undefined
is not a function` everywhere it touched the gateway.

Locally `bun run test` (run-unit-parallel.sh → run-unit-shard.sh)
already excludes *.serial.test.ts and runs them via `bun run test:serial`
in their own pass with --max-concurrency=1. Master ran green there;
only CI's matrix shards exposed the leak. The runner.serial test file's
own header comment explicitly calls out this exact cross-file mock
leak — the quarantine was the design, CI just wasn't honoring it.

Three changes:

1. scripts/test-shard.sh — exclude *.serial.test.ts and *.slow.test.ts
   from the find expression, mirroring scripts/run-unit-shard.sh.

2. .github/workflows/test.yml — add a `test-serial` sibling job that
   runs `bun run test:serial`. Keeps serial tests gating CI without
   merging them back into the parallel shards.

3. test/scripts/test-shard.test.ts — regression test pinning the three
   exclusion clauses (serial, slow, e2e) so a future refactor that
   drops one of them fails loud rather than silently re-introducing
   the cross-file mock leak.

Verified locally:
- shard 2 reproduction: 18 voyage-multimodal failures → 0 (1 unrelated
  env-dependent perf flake remains, won't fail on CI)
- bun run test:serial: 189/190 pass (1 unrelated env-dependent
  BrainRegistry flake from ~/.gbrain/config.json presence)
- typecheck + check:test-isolation clean

* ci(test): rephrase mock-module comment to satisfy R2 lint

The verify gate's check:test-isolation flagged test/scripts/test-shard.test.ts
because the JSDoc comment contained the literal string 'mock.module()'
which matches R2's grep regex 'mock\.module[[:space:]]*\('. The file
itself doesn't use mock.module — it just describes why the linter rule
exists in human-readable prose.

Rephrased to avoid the trailing parens. The regex requires the open
paren, so 'bun's module-mocking primitive' instead of 'mock.module()'
is invisible to the linter while preserving meaning for the next
maintainer who reads the test.

* docs(claude): tighten version-consistency rules + add merge recovery procedure

After several merges from master where VERSION + package.json +
CHANGELOG.md drifted out of sync (each merge hit conflicts on those
three files; auto-merge sometimes resolved silently in the wrong
direction), CLAUDE.md gets an explicit drift-recovery checklist + a
3-line paste-ready audit command anyone can run.

Three additions to the existing "Version locations" section:

1. **Mandatory audit command** — three echo lines that print VERSION,
   package.json version, and the top CHANGELOG header. All three MUST
   match the wave's `MAJOR.MINOR.PATCH.MICRO`. Designed for paste-after-
   every-merge use.

2. **Merge-conflict recovery procedure** — exact sed/echo patterns for
   resolving VERSION + package.json + CHANGELOG conflicts, in the order
   to apply them. Names the anti-pattern (mixing `git checkout --ours`
   on the trio) that's bitten us before.

3. **Pre-push gate** — re-run the audit before `git push` of any merge
   commit. /ship Step 12 catches drift but only if you actually run
   /ship; manual pushes skip the check.

Confirmed consistent at d361482, 7e8f696, 65a5994 (every merge
commit on this branch). The doc gap was the rules being too loose,
not the rules being wrong — this beefs up the procedural side so the
next merge can't silently desync.

* docs(llms): regenerate after CLAUDE.md edit + tighten the rule

CI failed on the build-llms generator test because CLAUDE.md edited
in fe050ae (version-consistency procedure) shipped without a
matching `bun run build:llms` regen. The committed llms-full.txt was
77 lines short of fresh generator output, and test/build-llms.test.ts
caught the drift in CI shard 1.

Two changes:

1. llms.txt + llms-full.txt — regenerated to match current CLAUDE.md.

2. CLAUDE.md — strengthened the "Auto-derived" entry for llms.txt /
   llms-full.txt with explicit "every CLAUDE.md edit chases with
   `bun run build:llms` in the same commit" wording. Notes that
   `verify` doesn't run the build-llms test, only the full unit
   suite does, so a clean typecheck is NOT enough to know you can
   push after touching CLAUDE.md.

This is now the third time this has bitten the wave. The previous
"Auto-derived" entry said the right thing but was buried in a list;
elevating it to imperative voice with a count of past regressions
should make the next CLAUDE.md edit hard to land without the chaser.

---------

Co-authored-by: garrytan-agents <garrytan-agents@users.noreply.github.com>
Co-authored-by: Madi Ayazbay <madia@Mac.localdomain>
Co-authored-by: Leon-Gerard Vandenberg <FUSED-ID@users.noreply.github.com>
Co-authored-by: psperera <pperera@mac.com>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Co-authored-by: Anders Madsen <TheAndersMadsen@users.noreply.github.com>
Co-authored-by: Codex (outside-voice review) <noreply@openai.com>
…(P2 follow-ups) (garrytan#808)

* feat(multi-source): thread ctx.sourceId through op handlers + engine read-surface

Closes the multi-source threading gaps that the v0.31.1.1-fixwave codex
review caught. Multi-source brains were silently misrouting writes from
every CLI/MCP-driven op (put_page, add_tag, add_link, add_timeline_entry,
revert_version, put_raw_data, etc.) because the op handlers in
operations.ts ignored ctx.sourceId. Read-side ops were arbitrary-row
under same-slug-across-sources because the engine's read methods had no
source filter.

Engine layer (D12 + D16 + D21):
- engine.ts interface: getLinks/getBacklinks/getTimeline/getRawData/
  getVersions/getAllSlugs/revertToVersion/putRawData all take
  opts?: { sourceId?: string }.
- pglite-engine.ts + postgres-engine.ts: two-branch query for each
  read method. Without opts.sourceId, NO source filter applies
  (preserves pre-v0.31.8 cross-source semantics for back-link
  validators and any caller that hasn't threaded sourceId yet). With
  opts.sourceId, scoped to that source — the new path used by
  reconcileLinks and ctx.sourceId-aware op handlers.

Op-handler layer (D7 + D16 + D20):
- operations.ts threads ctx.sourceId through 16+ handler sites:
  put_page, revert_version, put_raw_data, add_tag, remove_tag,
  add_link, remove_link, add_timeline_entry, create_version,
  delete_page, restore_page, get_page, get_tags, get_links,
  get_backlinks, get_timeline, get_versions, get_raw_data,
  get_chunks, plus reconcileLinks's tx.getLinks/getBacklinks/
  addLink/removeLink and engine.getAllSlugs.
- Pattern: const sourceOpts = ctx.sourceId ? { sourceId: ctx.sourceId } : {};
  When ctx.sourceId is unset, engine falls through to cross-source
  view (back-compat). MCP callers populate ctx.sourceId via the
  transport layer.

CLI wiring (D11 + D22):
- cli.ts: makeContext is async, calls resolveSourceId() from
  src/core/source-resolver.ts:58 (the canonical 6-tier chain:
  --source flag → GBRAIN_SOURCE env → .gbrain-source dotfile →
  path-match → brain default → 'default'). Wrapped in try/catch
  so a fresh pre-init brain still returns a clean ctx with no
  sourceId set.
- commands/call.ts: runCall accepts --source <id> flag. Resolves
  through the same 6-tier chain and threads to handleToolCall
  via the new opts.sourceId param.
- mcp/server.ts: handleToolCall accepts opts.sourceId and threads
  to buildOperationContext.

Tests (D7 + D16 + D20 regression coverage):
- test/source-id-tx-regression.test.ts: 8 new op-handler-layer
  cases covering add_tag/get_tags/add_link/get_links/delete_page/
  put_raw_data routing under ctx.sourceId='X' vs unset, plus
  D16's two-branch back-compat invariant for getLinks (cross-
  source view preserved when ctx.sourceId is unset).

Closes the codex OV-1/OV-2/OV-3 findings from the v0.31.8 plan
review. Back-compat is strictly additive: callers that don't pass
opts.sourceId see the same results they did pre-v0.31.8.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(doctor): multi_source_drift check surfaces pre-v0.30.3 misroutes

Pre-v0.30.3 putPage misrouted multi-source writes from intended source X
to (default, slug). The fix-wave fixed forward-going writes but explicitly
deferred backfilling the misrouted rows. Operators have had no signal of
this silent corruption.

Adds src/core/multi-source-drift.ts exporting findMisroutedPages(engine,
sources, opts). The heuristic walks each non-default source's local_path
and surfaces slugs that exist at (default, slug) in DB but are MISSING
from (X, slug) — unambiguous evidence of the misroute shape.

Implementation notes (codex OV12 + OV13 + D17):
- FS walk handles BOTH .md and .mdx (matches src/core/sync.ts:133, which
  treats both as markdown). Walks own helper instead of importing from
  extract.ts so doctor doesn't crash if local_path is unreadable
  (try/catch on root statSync; ENOENT/EACCES yields zero files, NOT a
  thrown error that takes down doctor).
- Single batched SQL with VALUES clause: collect all candidate slugs
  into one array, then ONE LEFT JOIN against pages with source_id IN
  ('default', X). Materialize into Map<slug, Set<source_id>>. NOT a
  per-file 20K-round-trip loop.
- Bounded by limit (10K files) AND timeoutMs (5s). Bail with
  walk_truncated=true rather than letting doctor hang.
- Heuristic softened per OV12: "appears misrouted to default" with TWO
  possible causes flagged (pre-v0.30.3 misroute OR source X never
  completed initial sync). The doctor warning suggests verification
  ('gbrain sources status'), not a destructive action.

Wired into runDoctor (3b-multi-source slot, after sync_failures) AND
into doctorReportRemote (D14) so thin-client operators see the check
when 'gbrain doctor' routes through the remote MCP path. Single-source
brains skip the check entirely.

Tests: test/multi-source-drift.test.ts (7 PGLite cases) covers:
- Single-source brain → skip
- Multi-source no-misroutes → ok
- Multi-source 2 misrouted slugs → warn with sample
- Healthy same-slug-across-sources NOT a false positive (the codex
  OV4 redesign case — original heuristic would have false-positived)
- FS walk hits limit → walk_truncated=true
- Unreadable local_path doesn't crash
- .mdx files walked alongside .md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(doctor): wire multi_source_drift + wedge force-retry hint (D14 + D19)

Wires the new multi_source_drift check into both runDoctor (local) and
doctorReportRemote (thin-client remote MCP path), and extends the existing
minions_migration block to detect 3-consecutive-partials wedges and emit
gbrain apply-migrations --force-retry <v> hints (D19).

Pre-v0.31.8, operators wedged on v0.29.1 (or any future migration that
hits the apply-migrations runner's 3-consecutive-partials guard) got the
generic "Run: gbrain apply-migrations --yes" hint. That command refuses
to advance past the guard — so the hint was wrong. Codex OV-11 (and the
v0.31.1.1-fixwave commit message) flagged this, but the prior plan said
to delegate to apply-migrations.ts:statusForVersion(), which would have
re-opened a separate regression: the existing forward-progress override
at doctor.ts:303 (newer completion suppresses old partials) is
cross-version and statusForVersion is per-version only.

This commit extends the existing block in place rather than replacing it:

1. Keep the forward-progress override (lines 348-356) byte-identical so
   installs that moved past an old v0.11 partial don't light up with
   stale wedge alerts.
2. Add a 3-consecutive-partials detector after the stuck filter. Since
   `stuck` already excludes forward-progress-superseded versions, the
   wedge counter only fires on actual unresolved partials.
3. Branch the message:
   - wedged.length > 0 → "WEDGED MIGRATION(s): <v>. Run: gbrain
     apply-migrations --force-retry <v>" (chain with && for multiple)
   - else if stuck.length > 0 → existing --yes hint
   - else → no message

Same shape duplicated in doctorReportRemote so thin-client operators
see the right command on the brain host.

Plus the multi_source_drift wiring (D14): same heuristic from the
new src/core/multi-source-drift.ts library, called from both local and
remote doctor paths. Single-source brains skip. Engine-null guard on
the local path (--fast and DB-down branches pass null).

Tests: test/doctor.test.ts gains 4 wedge-hint regression cases:
- Both branches present in source (forward-progress override + 3-partials
  detection coexisting).
- Anti-regression guard: NO `import { statusForVersion }` from
  apply-migrations.ts. The prior plan would have introduced this
  import; keeping it out means doctor stays decoupled from the
  migration runner's per-version semantics.
- Multiple wedged versions chain force-retry calls with `&&`.
- Both branches present in doctorReportRemote (thin-client coverage,
  D14).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(voyage): Content-Length pre-check + per-item base64 cap (D2 + D10)

The voyage compat fetch wrapper at gateway.ts:294 called
\`await resp.clone().json()\` BEFORE iterating embeddings. A
malicious or compromised Voyage endpoint of arbitrary size was
fully parsed into the JS heap before any size check could fire.
The original v0.31.8 plan put the cap on per-item base64 length,
which fires AFTER the JSON parse — defeating the OOM defense
entirely (codex OV8).

Two-layer fix sized at MAX_VOYAGE_RESPONSE_BYTES = 256 MB
("unambiguously not legit" rather than tight against typical
batches; voyage-3-large × 16K embeddings ≈ 200 MB raw fits within
the cap):

Layer 1 (PRIMARY) — Content-Length header pre-check, fires
BEFORE resp.clone().json(). Throws a descriptive error if the
header reports a length over the cap. The JSON.parse OOM vector
is now gated.

Layer 2 (defense-in-depth) — per-embedding base64 length check
inside the iteration. Catches the rare case where Layer 1 was
skipped (chunked transfer encoding has no Content-Length) AND a
single embedding string is unreasonably large. Estimates decoded
size as 0.75 × base64 length (canonical base64 → bytes ratio).

Tests: test/voyage-response-cap.test.ts — 5 structural source-pin
cases including the critical D10 invariant: "Content-Length
pre-check appears BEFORE \`const json: any = await
resp.clone().json()\` in the inbound block". A future refactor
that moves the cap below the JSON parse fails this test loudly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v0.31.8)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(ci): exclude *.serial.test.ts from sharded parallel run

scripts/test-shard.sh (the GitHub Actions runner) was including
*.serial.test.ts files alongside regular tests. Serial files use
top-level mock.module(...) which leaks across files in the same Bun
process — exactly what the .serial naming convention was meant to
quarantine.

Concretely: test/eval-takes-quality-runner.serial.test.ts mocks
src/core/ai/gateway.ts with `configureGateway: () => undefined`
(no-op). Because both files landed in shard 2, the mock leaked into
test/voyage-multimodal.test.ts: when its tests called
configureVoyageMultimodal() → configureGateway(), the no-op fired and
_config stayed null. Then embedMultimodal() called requireConfig()
which threw "AI gateway is not configured" — 18 tests failed at
gateway.ts:171 with [1.00ms] each.

Local fast loop (scripts/run-unit-shard.sh) already excludes
*.serial.test.ts AND *.slow.test.ts via the same find-arg pattern.
test-shard.sh just hadn't picked up the same exclusion when it was
written. This commit:

1. Mirrors run-unit-shard.sh's exclusion pattern in test-shard.sh
   (`-not -name '*.slow.test.ts' -not -name '*.serial.test.ts'`).
2. Adds a "Run *.serial.test.ts" step to .github/workflows/test.yml
   on shard 1 only, calling scripts/run-serial-tests.sh
   (--max-concurrency=1). Shard 1 already runs extra setup work
   (`bun run verify`), so it has the natural slot for the serial
   pass without slowing the parallel critical path.

Verified locally: shard 2 went from 18 voyage-multimodal failures to
0. Shard 2 file count: 81 → 78 (3 serial files removed). Total test
count after fix: 1438 (1437 pass + 1 pre-existing env-sensitive
warm-create speed gate flake — unrelated to v0.31.8 or this fix).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat: add cold-start and ask-user skills

cold-start: Day-one brain bootstrapping that sequences the highest-leverage
data sources (contacts, calendar, email, conversations, social, archives)
to go from empty brain to useful brain. Recommends ClawVisor for credential
safety. Each phase is independently valuable and gated on user consent.
Includes resume protocol for interrupted sessions.

ask-user: Platform-agnostic choice-gate pattern for presenting users with
2-4 options and stopping execution until they respond. Works with Telegram
inline buttons, Discord, CLI, or Hermes clarify tool. Adapted from the
Wintermute ask-user pattern for the general gbrain ecosystem.

Also:
- Updated manifest.json with both new skills
- Updated RESOLVER.md with cold-start triggers and ask-user convention
- Updated setup/SKILL.md to point to cold-start as natural next step
- Updated GBRAIN_SKILLPACK.md with Getting Started section

* fix: make cold-start the automatic next step after setup

- Add Phase J to setup skill — transitions directly into cold-start
  after verification passes, not as a 'next steps' bullet
- Agent MUST offer cold-start, not just mention it
- Add anti-pattern: 'ending setup without offering cold-start'
- Update output format to flow into cold-start prompt
- Track deferred state if user declines

* safety: make ClawVisor required for API access, not optional

Phase 0 is now 'ClawVisor Setup (Required for API Access)' — not
'Credential Gateway Setup' with three options. The framing changed:

- ClawVisor is the safe path. Direct OAuth is not offered as an alternative.
- If user declines ClawVisor, agent skips to offline-only imports
  (markdown, conversation exports, Twitter archive, file archives).
- Explicitly: 'Do NOT offer direct OAuth as an alternative.'
- Safety boundary callout explains why: raw OAuth tokens + AI agent =
  uncontrolled attack surface (prompt injection → full Google account).
- Anti-pattern #1 is now 'Giving the agent raw OAuth tokens.'
- Revocation advantage highlighted: disable access in one click.

The contract, description, manifest, and skillpack doc all updated
to say 'uses' not 'recommends'.

* fix: PR garrytan#802 ask-user/cold-start clear repo test gates

Four contributor bugs in PR garrytan#802 fail existing test gates:

- ask-user/SKILL.md missing required Contract / Anti-Patterns /
  Output Format sections (test/skills-conformance.test.ts).
- cold-start/SKILL.md description references trigger phrase
  "now what?" but the triggers: list omits it
  (test/resolver.test.ts round-trip).
- ask-user is in skills/manifest.json but has no trigger row in
  RESOLVER.md, breaking manifest reachability
  (test/resolver.test.ts).
- cold-start/SKILL.md writes_to: declares daily/, media/,
  conversations/ which aren't in skills/_brain-filing-rules.json,
  failing test/check-resolvable.test.ts.

Adds the missing skill sections, the missing trigger entries, and
three filing-rules entries to legitimize cold-start's writes_to.
The filing-rules additions describe daily/ as date-keyed (calendar +
daily notes), media/ as format-prefixed for source-format ingest
(media/x/{handle}/), and conversations/ for chat exports.

Test surface:
- bun test test/skills-conformance.test.ts → was 207 pass / 3 fail,
  now 209 pass / 0 fail.
- bun test test/resolver.test.ts → was 82 pass / 2 fail, now 84
  pass / 0 fail.
- bun test test/check-resolvable.test.ts → was 24 pass / 1 fail,
  now 25 pass / 0 fail.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: scrub 'Hermes Agent' references from PR garrytan#802-introduced files

CLAUDE.md privacy doctrine forbids naming private agent forks
(Wintermute, Hermes, Neuromancer) in any public artifact: skills,
README, CHANGELOG, PR titles, commit messages, comments. The
canonical phrasing is "OpenClaw" or "your OpenClaw".

PR garrytan#802 introduced three sites that violated the rule:

- skills/ask-user/SKILL.md:79 section heading "With the `clarify`
  tool (Hermes Agent)".
- skills/ask-user/SKILL.md:80 body line "Hermes agents have a
  built-in `clarify` tool".
- skills/manifest.json ask-user description listed "Hermes clarify
  tool" alongside Telegram / Discord / CLI.

Scrub is narrow: only the three PR-introduced sites. Pre-existing
"Hermes" references elsewhere in the repo (README.md links to
NousResearch/hermes-agent, docs/integrations/credential-gateway.md,
docs/guides/cron-schedule.md, etc.) are intentional public-project
references to the open-source Hermes Agent and stay in place.

scripts/check-privacy.sh enforces the wintermute layer of the rule
on every push; the Hermes / Neuromancer doctrine layer is doctrinal
only. Future hardening (extending the script to also ban Hermes /
Neuromancer in a precise allow-listed way) is filed as TODOS.md P8.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.31.10 feat: cold-start + ask-user skills

PR garrytan#802 ships the cold-start skill (day-one brain bootstrapping
across 8 phases) and the ask-user skill (choice-gate pattern).
Setup skill's Phase J auto-launches cold-start when verification
passes, closing the "now what?" gap that every new gbrain user hits.

Cold-start orchestrates existing recipes (email-to-brain,
calendar-to-brain, x-to-brain) and skills (meeting-ingestion); it
does not reinvent ingestion logic. State persists across agent
crashes via ~/.gbrain/cold-start-state.json, matching the existing
update-state.json convention. Trigger phrases include "cold start",
"fill my brain", "now what?", "bootstrap", "import my data".

Known limitations explicitly flagged in CHANGELOG:

- ClawVisor required for API-backed phases (Contacts / Calendar /
  Gmail). v0.32 will restore the dual A / B pattern that
  recipes/email-to-brain.md and recipes/calendar-to-brain.md
  already document.
- Phase-level resume granularity. Mid-phase failure restarts the
  phase from item 1; idempotent slug writes prevent duplicates.
  Per-item resume lands with the gbrain cold-start CLI counterpart
  in v0.32.

CHANGELOG entry follows the canonical release-summary spec from
CLAUDE.md:930: bold headline, 3-5 sentence lead, "What you can
now do" section, "How it works under the hood", "Known limitations",
"To take advantage of v0.31.10" block, "For contributors".

Version bumps from 0.31.2 (branch base) past master's 0.31.3 to
0.31.10. Slots 0.31.4 through 0.31.9 are reserved for in-flight
work; the gap is deliberate.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Neuromancer <neuromancer@garryslist.org>
Co-authored-by: Garry Tan <garrytan@gmail.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…e bumps) (garrytan#816)

* feat: thin-client upgrade prompt core (orchestrator + helpers)

Adds the maybePromptForUpgrade orchestrator with lockfile gating,
atomic state-file IO, per-entry shape validation, decision matrix,
D5 binary-advance verifier, prompt-scoped SIGINT handler, and DI
seams for tests. Sibling helper promptLineStderr in cli-util.ts
resolves to null on stdin EOF or after a 5min timeout instead of
hanging. 50 unit tests, all green.

Not wired into the CLI yet — that's the next commit.

* feat: wire thin-client upgrade prompt into the identity banner

printIdentityBannerBestEffort calls maybePromptForUpgrade after the
banner prints (both cache hit and cache miss paths). bannerSuppressed
+ BrainIdentity are now exported for the orchestrator's consumption.
bannerSuppressed early return guarantees bannerIsSuppressed=false at
the call site.

* feat: gbrain remote doctor — thin_client_upgrade_drift check

Surfaces remote-version drift in non-TTY/quiet/CI contexts where
the interactive prompt is suppressed. Returns ok+inconclusive on
network error (informational; mcp_smoke covers the genuinely-down
case with fail). Returns ok on local>=remote or patch drift; warn
on minor/major drift with a fix hint pointing at gbrain upgrade,
or the manual install URL if state shows a prior failed attempt.

Test fixture now dispatches JSON-RPC tools/call by tool name so
runUpgradeDriftCheck can exercise the full happy + prior_failed
+ stale-version paths against a real-shape MCP response.

* chore: bump version and changelog (v0.31.11)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
…gbrain models CLI (garrytan#844)

* fix: canonical Anthropic model IDs + reverse alias + Opus 4.7 pricing

Replace claude-sonnet-4-6-20250929 with claude-sonnet-4-6 everywhere it
appears as a model ID. Starting with Claude 4.6, Anthropic API IDs are
dateless and pinned — the date suffix was carried forward from Sonnet 4.5
by mistake, producing a phantom ID that 404'd on every call.

Production impact in v0.31.6: isAvailable("chat") returned false in every
code path that loaded the recipe's model list, and extractFactsFromTurn
silently returned []. The headline real-time facts extraction feature
was a no-op on the happy path.

- gateway.ts:46 DEFAULT_CHAT_MODEL -> anthropic:claude-sonnet-4-6
- recipes/anthropic.ts: chat + expansion model lists drop date suffix;
  remove wrong-direction alias (claude-sonnet-4-6 -> -20250929);
  add reverse alias (-20250929 -> claude-sonnet-4-6) so stale user
  configs in models.dream.synthesize etc. keep working
- facts/extract.ts: routes through resolveModel; both fallbacks corrected
- anthropic-pricing.ts: Opus 4.7 corrected $15/$75 -> $5/$25 per
  Anthropic docs (the $15/$75 was Opus 4.0 pricing)
- cross-modal-eval/runner.ts: PRICING now reads from ANTHROPIC_PRICING
  for Anthropic models instead of duplicating the map (single source of
  truth — fixes the drift trap that motivated this whole patch)

Tests: cherry-pick PR garrytan#830's test/anthropic-model-ids.test.ts verbatim
(6 recipe-shape guardrails). Update gateway-chat tests to assert reverse
alias resolves correctly. Update budget-meter test for new Opus pricing.

Co-Authored-By: garrytan-agents <garrytan-agents@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: model tier system + recipe-models merge + async reconfigure hook

Add 4-tier model routing (utility/reasoning/deep/subagent) so users can
swap defaults with one config key. Each tier maps to a class of work;
override globally via models.default or per-tier via models.tier.<tier>.

Codex flagged three real architecture issues in the v0.31.12 plan review;
this commit addresses each.

F3 — sync/async timing of configureGateway:
  - buildGatewayConfig stays synchronous (pre-engine-connect callers
    keep working)
  - New reconfigureGatewayWithEngine(engine) async function re-resolves
    expansion + chat defaults through resolveModel after engine.connect()
  - cli.ts wires the re-stamp into the post-connect path

F4/F5 — softening assertTouchpoint was too broad:
  - Earlier plan was to flip native-recipe validation from throw to warn,
    affecting gateway.chat AND gateway.expand AND gateway.embed
  - Instead: per-gateway-instance recipe-models merge. assertTouchpoint
    gets an optional extendedModels Set; when the user opted into a model
    via config, it bypasses the throw. Source-code typos still fail fast.
  - Existing contract test (test/ai/gateway-chat.test.ts:106) preserved

Tier defaults are TIER_DEFAULTS in model-config.ts. Resolution chain
inserts at step 5 (between models.default and env var). Each existing
resolveModel call site gains a tier: arg — think (deep), cycle/synthesize
(reasoning + utility for verdict), patterns/drift (reasoning), auto-think
(deep), facts/extract (reasoning).

Plus 10 new tests pinning tier precedence, subagent-tier fallback when
models.default is non-Anthropic, and the F6 alias-chain conflict case.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: subagent runtime enforcement for non-Anthropic models (3 layers)

The subagent loop uses Anthropic's Messages API with prompt caching on
system + tools. OpenAI/Google have different shapes. Setting
models.default = openai:gpt-5.5 and routing the subagent there silently
breaks the loop.

Codex F1+F2+F13 in the v0.31.12 plan review pointed out that "warn at
doctor" wasn't enough — handlers/subagent.ts:148 still did
`const model = data.model ?? DEFAULT_MODEL` and called Anthropic directly,
so a job submitted with data.model = openai:gpt-5.5 bypassed any tier
logic and failed at runtime with a confusing provider error.

Three layers of enforcement, defense in depth:

Layer 1 (queue.ts:add) — submit-time guard. When name === 'subagent'
and data.model is set, validate the provider. Non-Anthropic rejects
before the job enters the queue.

Layer 2 (handlers/subagent.ts) — tier-resolution fallback. The handler
routes through resolveModel({ tier: 'subagent' }). If the chain resolves
to a non-Anthropic provider (via models.default or models.tier.subagent),
the resolver warns + falls back to TIER_DEFAULTS.subagent
(claude-sonnet-4-6).

Layer 3 (doctor.ts:checkSubagentProvider) — surfacing layer. Warns when
models.tier.subagent or models.default is explicitly set to a
non-Anthropic provider, with a paste-ready fix command. Lets users see
config drift before submitting a job.

Tests: 3 new cases in test/agent-cli.test.ts asserting the queue-level
guard rejects non-Anthropic data.model. Existing test/subagent-handler
suite still passes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: gbrain models CLI + doctor probe + silent-no-op regression test

New gbrain models CLI gives the agent and user visibility into routing.
Read mode prints the tier table, current overrides, per-task config,
and aliases with source-of-truth attribution per row. Doctor subcommand
fires a 1-token probe to each configured chat/expansion model and
classifies failures (model_not_found / auth / rate_limit / network /
unknown) so config-time invalid IDs surface without waiting for a
production call that silently degrades.

Per Codex F11 — no specific dollar cost claim in either the help text
or the CHANGELOG (providers have minimum-output billing and prompt-cache
rounding that vary). Probe is opt-in (gbrain doctor --probe-models),
never auto-runs. --skip=<provider> narrows the matrix for cost-sensitive
operators.

Per Codex F7+F8+F15 (the structural regression gap): new
test/facts-extract-silent-no-op.test.ts is THE regression test for the
bug class that motivated v0.31.12. Five cases including the smoking-gun:
when chat IS available, extractFactsFromTurn MUST actually call the chat
transport, not silently return []. Uses the gateway's
__setChatTransportForTests seam so it runs in every shard with no API key.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v0.31.12)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: document v0.31.12 model tier system + gbrain models CLI

Add CLAUDE.md Key Files annotations for the v0.31.12 work:
src/core/model-config.ts (tier system + isAnthropicProvider + TIER_DEFAULTS),
src/core/ai/model-resolver.ts (assertTouchpoint extendedModels arg),
src/core/ai/gateway.ts (reconfigureGatewayWithEngine + extended-models registry),
src/core/minions/queue.ts (subagent submit-time guard, layer 1 of 3),
src/commands/models.ts (new gbrain models CLI + doctor probe),
src/commands/doctor.ts (subagent_provider check, layer 3 of 3),
src/core/ai/recipes/anthropic.ts (canonical model IDs + reverse alias),
src/core/anthropic-pricing.ts (Opus 4.7 corrected to \$5/\$25).

Add CLAUDE.md commands section for gbrain models + gbrain models doctor
+ power-user config recipes. Add README.md command-table rows for the
same. Regenerate llms-full.txt so the bundled docs stay in sync.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: scrub --probe-models reference (flag not actually wired)

The v0.31.12 CHANGELOG and skills/conventions/model-routing.md both
referenced `gbrain doctor --probe-models` as an integrated probe entry
point. The flag was never implemented — only `gbrain models doctor`
landed as the probe surface. Caught by /document-release subagent.

Drop the references rather than wire an untested flag at the last minute.
The probe is reachable via `gbrain models doctor`; users who want it
in doctor's output run that command separately.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: garrytan-agents <garrytan-agents@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…17-PR cluster) (garrytan#810)

* feat(ai/types): add resolveAuth + probe + user_provided_models fields

Foundation commit for the embedding-provider fix-wave (5 API-key recipes
+ discoverability pass). Three optional additions to the recipe contract:

- `EmbeddingTouchpoint.user_provided_models?: true` (D8=A): flag for
  recipes that ship without a fixed model list. Consumed by the contract
  test (permits empty `models[]`), gateway.ts:223 (replaces hardcoded
  `recipe.id === 'litellm'` check in a follow-up commit), and
  init.ts:resolveAIOptions (refuses implicit "first model" pick for
  shorthand `--model <provider>`).

- `Recipe.resolveAuth?(env): {headerName, token}` (D12=A): unified auth
  seam across embed / expansion / chat. Default behavior (returns
  `Authorization: Bearer <env-key>`) covers the existing 9 recipes
  unchanged. Recipes deviating (Azure with `api-key:`; future OAuth
  providers) override this single seam instead of adding parallel
  mechanisms in 3 places. Codex review caught that auth was triplicated
  at gateway.ts:281/728/931; D12=A unifies all three in one follow-up
  commit.

- `Recipe.probe?(): Promise<{ready, hint?}>` (D13=A): recipe-owned
  readiness check for local-server providers (ollama, llama-server).
  Replaces the hardcoded `recipe.id === 'ollama'` special case in
  providers.ts. Wrapped in 200ms timeout at the call sites.

Pure type additions — no behavior change. Typecheck green; existing 9
recipes work unchanged because all three fields are optional.

Plan: ~/.claude/plans/ok-lets-turn-this-enumerated-sonnet.md (decisions
D8=A, D11=C, D12=A, D13=A).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(ai/gateway): unify openai-compatible auth via Recipe.resolveAuth (D12=A)

Pre-v0.32, openai-compatible auth was duplicated 3 times in gateway.ts at
instantiateEmbedding, instantiateExpansion, instantiateChat — with subtle
drift (embedding had a `${recipe.id.toUpperCase()}_API_KEY` fallback the
other two lacked). Codex outside-voice review caught this during /plan-eng-review.

D12=A: unify all three through `Recipe.resolveAuth?(env)` (declared in the
prior commit). Two new module-level helpers:

- `defaultResolveAuth(recipe, env, touchpoint)` — applied when a recipe
  doesn't declare its own resolver. Returns Authorization Bearer with
  `auth_env.required[0]`, falling back to the first present
  `auth_env.optional` env var, or 'unauthenticated' for no-auth recipes
  like Ollama. Throws AIConfigError with the recipe's setup_hint when
  required env is missing.

- `applyResolveAuth(recipe, cfg, touchpoint)` — returns
  `createOpenAICompatible` options. Bearer-via-Authorization paths use
  the SDK's native `apiKey` field; custom-header paths (Azure: api-key)
  use `headers` and OMIT apiKey to avoid double-auth leaks.

The 3 `case 'openai-compatible':` branches in instantiateEmbedding (line
~281), instantiateExpansion (line ~728), instantiateChat (line ~931) each
collapse from ~10 lines of bespoke auth handling to a single
`applyResolveAuth(recipe, cfg, '<touchpoint>')` call.

Also: the litellm-template hardcode at gateway.ts:223 (`recipe.id ===
'litellm'`) is replaced with a union check for
`EmbeddingTouchpoint.user_provided_models === true` (D8=A wire-through
per Codex finding #3). Pre-v0.32 builds keep working via back-compat
`recipe.id === 'litellm'` clause; new recipes declaring
user_provided_models pick up the same gating automatically.

Existing 9 recipes (openai, anthropic, google, deepseek, groq, ollama,
litellm-proxy, together, voyage) gain zero per-recipe edits — the
default resolver covers their existing behavior. Behavior change for
ollama expansion/chat only: now reads OLLAMA_API_KEY when set (pre-v0.32
silently passed 'unauthenticated' for those touchpoints; embedding
already read it). Ollama servers ignore the header so no real-world
impact; this aligns the 3 touchpoints.

Tests: bun test test/ai/ — 77/77 pass.

Plan: ~/.claude/plans/ok-lets-turn-this-enumerated-sonnet.md (D8=A,
D12=A; addresses Codex findings #3, #4).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(ai): IRON RULE regression test for v0.32 resolveAuth refactor

Pins the contract that the v0.32 D2/D12=A resolveAuth refactor preserves
auth behavior for the 9 existing recipes (openai, anthropic, google,
deepseek, groq, ollama, litellm-proxy, together, voyage).

10 cases covering:
- the 9 expected recipe ids are still registered
- every recipe with non-empty required[] returns Authorization Bearer <key>
- missing required env throws AIConfigError naming recipe + touchpoint + env-var
- Ollama (empty required, optional set) reads first present optional env
- Ollama (no env) falls back to "Bearer unauthenticated"
- all 3 touchpoints (embedding/expansion/chat) produce identical auth
  shape for the same recipe + env (this is the core regression: pre-v0.32,
  embedding had a fallback the other two lacked)
- applyResolveAuth converts Authorization Bearer to {apiKey} (SDK-native)
- applyResolveAuth respects a custom-header override (Azure preview; the
  recipe ships in commit 8) and emits {headers} WITHOUT apiKey to avoid
  double-auth
- native-* recipes (openai, anthropic, google) intentionally have no
  resolveAuth declared (they use AI-SDK adapters directly)
- all openai-compatible recipes ship without resolveAuth in v0.32 (default
  applies); the first override is Azure in commit 8

Also: export `defaultResolveAuth` and `applyResolveAuth` as @internal
gateway helpers so tests can pin them directly. Mirrors the pattern of
`splitByTokenBudget` and `isTokenLimitError` already exported with the
same @internal annotation.

Tests: bun test test/ai/ — 87/87 pass (10 new + 77 existing).
Typecheck: clean.

Plan: ~/.claude/plans/ok-lets-turn-this-enumerated-sonnet.md (IRON RULE
per Section 3 test review).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(ai): add llama-server recipe (garrytan#702 reworked)

10th recipe in the registry; first to ship Recipe.probe (D13=A) and the
second user_provided_models recipe (litellm-proxy is the first).

llama.cpp's llama-server exposes an OpenAI-compatible /v1/embeddings
endpoint. Distinct from Ollama: different default port (8080), different
model-management story (you launch it with --model <path>; the server
serves whatever was passed). Recipe ships with `models: []`,
`user_provided_models: true`, `default_dims: 0` so the wizard refuses
implicit defaults and forces explicit --embedding-model + --embedding-dimensions.

Added:
- src/core/ai/recipes/llama-server.ts (61 lines)
- probeLlamaServer() in src/core/ai/probes.ts; reads
  LLAMA_SERVER_BASE_URL with default http://localhost:8080/v1
- Registered in src/core/ai/recipes/index.ts (10 recipes total now)
- test/ai/recipe-llama-server.test.ts (8 cases): registered + shape,
  user_provided_models flag, probe declared + reachability fail-with-hint,
  default-auth covering no-env / API_KEY / URL-shaped-only paths

Hardening: defaultResolveAuth in gateway.ts now skips URL-shaped optional
env entries (names ending in _URL or _BASE_URL) when picking a fallback
auth token. Pre-fix, OLLAMA_BASE_URL=http://my-ollama would have become
the Bearer token; Ollama ignores it but llama-server (and future
local-server recipes) shouldn't depend on the server tolerating garbage
auth. The regression test (recipes-existing-regression) gains one case
pinning this contract.

Per-recipe test file follows D7=B (per-recipe over DRY for readability).

Plan: ~/.claude/plans/ok-lets-turn-this-enumerated-sonnet.md (commit 4
of 11). Reworked from garrytan#702 because the original PR didn't model the
recipe-owned probe pattern (D13=A) or user_provided_models (D8=A).

Tests: bun test test/ai/ — 95/95 pass (8 new + 87 existing).

Co-Authored-By: SiyaoZheng <noreply@github.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(ai): add MiniMax recipe (garrytan#148 reworked)

11th recipe. embo-01 model, 1536 dims, $0.07/1M tokens.

OpenAI-compatible at api.minimax.chat. MiniMax requires a `type:
'db' | 'query'` field for asymmetric retrieval (documents indexed with
type='db', queries embedded with type='query'). gbrain has no
query/document signal at the embed-call site today, so v1 defaults to
type='db' for both indexing and retrieval — same vector space, symmetric
similarity. Asymmetric query support is a follow-up TODO that needs the
embed seam to thread query/document context.

Plumbed via src/core/ai/dims.ts: dimsProviderOptions returns
{openaiCompatible: {type: 'db'}} for modelId === 'embo-01'.

Conservative max_batch_tokens=4096 declared (MiniMax docs don't publish
the limit). Recursive halving in the gateway catches token-limit errors
at runtime.

Tests: bun test test/ai/ — 101/101 (6 new + 95 prior).

Plan: ~/.claude/plans/ok-lets-turn-this-enumerated-sonnet.md (commit 5
of 11). Reworked from garrytan#148.

Co-Authored-By: cacity <20351699+cacity@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(ai): add Alibaba DashScope recipe (garrytan#59 split, part 1/2)

12th recipe. text-embedding-v3 (current) + text-embedding-v2; 1024
default dims with Matryoshka options [64, 128, 256, 512, 768, 1024].

OpenAI-compatible at dashscope-intl.aliyuncs.com. China-region users
override via cfg.base_urls['dashscope']; v0.32 ships with the
international default.

Conservative max_batch_tokens=8192 + chars_per_token=2 declared because
Alibaba doesn't publish a hard batch limit and text-embedding-v3 mixes
English + CJK heavily (CJK density closer to Voyage than OpenAI tiktoken).

Tests: bun test test/ai/ — 106/106 (5 new + 101 prior).

Plan: ~/.claude/plans/ok-lets-turn-this-enumerated-sonnet.md (commit 6
of 11). Reworked from garrytan#59 (DashScope+Zhipu split into 2 commits per
the plan; Zhipu lands next).

Co-Authored-By: Magicray1217 <267836857+Magicray1217@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(ai): add Zhipu AI (BigModel) recipe (garrytan#59 split, part 2/2)

13th recipe. embedding-3 (current) + embedding-2; 1024 default dims
with Matryoshka options [256, 512, 1024, 2048].

OpenAI-compatible at open.bigmodel.cn. embedding-3 at 2048 dims exceeds
pgvector's HNSW cap of 2000 — those brains fall back to exact vector
scans via the existing chunkEmbeddingIndexSql policy at
src/core/vector-index.ts. Default stays at 1024 (HNSW-fast); users who
want maximum fidelity opt into 2048 via --embedding-dimensions and
accept the slower retrieval.

Tests pin the HNSW boundary: 1024 returns the index SQL, 2048 returns
the skip-index/exact-scan SQL.

Tests: bun test test/ai/ — 112/112 (6 new + 106 prior).

Plan: ~/.claude/plans/ok-lets-turn-this-enumerated-sonnet.md (commit 7
of 11). Reworked from garrytan#59. Together with DashScope (commit 6), closes
the China-region embedding gap users repeatedly reported (DashScope
covers Alibaba, Zhipu covers BigModel; both ship with international
endpoints by default).

Co-Authored-By: Magicray1217 <267836857+Magicray1217@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(ai): add Azure OpenAI recipe (garrytan#459 reworked)

14th recipe and the first to exercise both v0.32 architectural seams:

- resolveAuth (D12=A) returns `{headerName: 'api-key', token: <key>}`
  instead of the default Authorization Bearer. Azure rejects double-auth,
  so applyResolveAuth puts the key in `headers` and OMITS apiKey.
- A new `Recipe.resolveOpenAICompatConfig?(env)` seam (Recipe.ts) lets
  the recipe template the baseURL from env (Azure: ENDPOINT + DEPLOYMENT
  combine into a non-/v1 path) and inject a custom fetch wrapper that
  splices ?api-version= onto every request URL.

The fetch wrapper is type-safe via `as unknown as typeof fetch`; AI SDK
never calls TS's strict `preconnect()` method on the wrapper so the cast
is sound. `applyOpenAICompatConfig` (new gateway helper) routes through
the recipe override or falls back to the pre-v0.32 base_urls/base_url_default
behavior — existing 13 recipes get zero behavior change.

API version defaults to `2024-10-21` (current stable as of 2026-05);
override via AZURE_OPENAI_API_VERSION env. Endpoint trailing slash gets
stripped during URL construction so users can copy-paste from the Azure
portal.

Tests (12 cases in test/ai/recipe-azure-openai.test.ts):
- resolveAuth returns api-key NOT Authorization Bearer
- applyResolveAuth puts key in headers, NOT apiKey (no double-auth)
- baseURL templating from endpoint + deployment, with trailing-slash strip
- AIConfigError on missing endpoint OR deployment
- fetch wrapper splices api-version (default + AZURE_OPENAI_API_VERSION override)
- fetch wrapper does NOT double-add api-version when caller already set it
- applyOpenAICompatConfig honors recipe override

IRON RULE regression test updated: now asserts azure-openai is the
documented exception that overrides resolveAuth; any future override
needs review.

Tests: bun test test/ai/ — 124/124 (12 new + 112 prior).

Plan: ~/.claude/plans/ok-lets-turn-this-enumerated-sonnet.md (commit 8
of 11, plus the resolveOpenAICompatConfig seam discovered during fold-in).
Reworked from garrytan#459. The original PR proposed a hardcoded AzureOpenAI
client switch; this implementation routes through the unified seams so
future Azure-shaped providers (other custom-URL services) can reuse them.

Co-Authored-By: JamesJZhang <32652444+JamesJZhang@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(ai): adjacent fixes — no_batch_cap (garrytan#779) + config-key fallbacks (garrytan#121)

Two small ergonomics fixes folded together (garrytan#765 deferred — see TODOS.md
follow-up; the CJK PGLite extraction was bigger than the plan estimated).

garrytan#779 reworked (alexandreroumieu-codeapprentice): silence the
missing-max_batch_tokens startup warning for recipes with genuinely
dynamic batch capacity. New `EmbeddingTouchpoint.no_batch_cap?: true`
field. Set on ollama (capacity depends on locally loaded model +
OLLAMA_NUM_PARALLEL), litellm-proxy (depends on backend), llama-server
(set by --ctx-size at server launch). Three less stderr warnings on
every gateway configure; google still warns (it's a real fixed-cap
provider that ought to ship a max_batch_tokens declaration).

Bonus: litellm-proxy now declares `user_provided_models: true`, removing
the last consumer of the legacy `recipe.id === 'litellm'` hardcode in
gateway.ts:223 (D8=A wire-through completion).

garrytan#121 reworked (vinsew): self-contained API keys. Two parts:

  1. config.ts: ANTHROPIC_API_KEY env merge was silently missing.
     loadConfig() merged OPENAI_API_KEY but not ANTHROPIC_API_KEY into
     the file-config-shape result. One-line addition.

  2. cli.ts:buildGatewayConfig: when ~/.gbrain/config.json declares
     openai_api_key / anthropic_api_key but the process env doesn't
     have those env vars set (common for launchd-spawned daemons,
     agent subprocess tools, containers that don't propagate
     ~/.zshrc), fold the config-file values into the gateway env
     snapshot. Process env still wins (loaded last) so per-process
     overrides keep working.

Tests (4 cases in test/ai/no-batch-cap-suppression.test.ts):
- Ollama / LiteLLM / llama-server all declare no_batch_cap: true
- configureGateway does NOT warn for those three
- configureGateway STILL warns for google (regression guard)
- Cross-cutting invariant: empty-models recipes declare user_provided_models

Tests: bun test test/ai/ — 128/128 (4 new + 124 prior).

Plan: ~/.claude/plans/ok-lets-turn-this-enumerated-sonnet.md (commit 9 of 11).
garrytan#765 (Hunyuan PGLite + CJK keyword fallback) deferred to TODOS.md
follow-up; the CJK extraction (~150 lines + scoring logic + tests) is
larger than the wave's adjacent-fix lane should carry. Closes that PR
with a deferral note.

Co-Authored-By: alexandreroumieu-codeapprentice <noreply@github.com>
Co-Authored-By: vinsew <noreply@github.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(discoverability): doctor alt-provider advisory + init user_provided_models refusal

Two small but high-leverage changes that address the discoverability
problem the v0.32 wave is trying to fix.

src/commands/doctor.ts: new `alternative_providers` check (8c). After
the existing embedding-provider smoke test, walks listRecipes() and
surfaces any recipe whose required env vars are ALL present in the
process env but is not the currently configured provider. Reports as
status: 'ok' with an informational message — never errors. Helps users
discover that, e.g., `OPENAI_API_KEY=x DASHSCOPE_API_KEY=y` configured
for openai means they have a Chinese-region alternative ready without
extra setup.

src/commands/init.ts: user_provided_models recipes (litellm, llama-server)
now refuse the implicit "first model" pick from shorthand --model with
a structured setup hint pointing the user at the explicit form
`--embedding-model <provider>:<your-model-id> --embedding-dimensions <N>`.
Pre-fix, shorthand --model litellm threw "no embedding models listed"
which was technically correct but unhelpful. The new error includes the
recipe's setup_hint when available.

Tests: bun test test/ai/ — 128/128 pass; typecheck clean.

Plan: ~/.claude/plans/ok-lets-turn-this-enumerated-sonnet.md (commit 10
of 11). The full interactive provider chooser in init.ts (the bigger
piece of the discoverability lane) is deferred to a v0.32.x follow-up;
this commit ships the doctor advisory + cleaner refusal that close the
80% case.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(v0.32.0): embedding-providers.md + README callout + CHANGELOG + TODOS.md

Final commit of the v0.32 wave. Closes the discoverability gap that
generated the 17-PR community cluster.

- New docs/integrations/embedding-providers.md: capability matrix, decision
  tree, per-recipe one-pagers, OAuth provider notes, "my provider isn't
  listed" pointer to LiteLLM proxy. Voice: capability not marketing per
  CLAUDE.md voice rules.

- README.md: embedding-providers callout near the top, naming the count
  (14 recipes) and pointing at the new doc.

- CHANGELOG.md: v0.32.0 entry following the verdict-headline format from
  CLAUDE.md voice rules. Lead-with-numbers ("14 providers, 5 new"), what-this-
  means-for-users closer, "to take advantage" upgrade block, itemized
  changes, contributor credits, deferred-with-context list.

- VERSION + package.json: 0.31.1 → 0.32.0. Minor bump justified by the
  new public Recipe surface (resolveAuth, resolveOpenAICompatConfig, probe,
  user_provided_models, no_batch_cap fields), the new OAuth subsystem
  scaffold (deferred to v0.32.x but typed in v0.32.0), and the 5 new
  recipes.

- TODOS.md: 7 follow-up entries for the v0.32 wave's deferred work
  (Vertex ADC, Copilot OAuth, Codex OAuth, CJK PGLite, interactive
  wizard, real-credentials CI matrix, MiniMax asymmetric retrieval,
  multimodal hardcode un-stuck). Each entry has full context + the
  exact file paths + the spike work needed so a future contributor can
  pick up cleanly.

Tests: bun test test/ai/ — 128/128 pass; typecheck clean.

Plan: ~/.claude/plans/ok-lets-turn-this-enumerated-sonnet.md (commit 11
of 11). Wave complete: 11 commits, ~1500 net lines, 5 new recipes, full
docs, doctor advisory, IRON RULE regression test, 7 TODOS for the
v0.32.x follow-up wave.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: regenerate llms.txt + llms-full.txt for v0.32.0

After commit c384fad added the embedding-providers callout to README.md,
the committed llms-full.txt drifted from the generator output and the
build-llms test failed. Running `bun run build:llms` regenerates both
files. The single line addition is the README callout pointing at
docs/integrations/embedding-providers.md.

Tests: bun test test/build-llms.test.ts — 7/7 pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test: hermetic GBRAIN_HOME for brain-registry serial flake + withEnv on recipe-llama-server

Two test-isolation cleanups uncovered while shipping v0.32.

test/brain-registry.serial.test.ts (the BrainRegistry "empty/null/undefined
id routes to host" test): pre-existing flake on dev machines that have a
real ~/.gbrain/config.json. The test asserts getBrain(null) REJECTS but
on those machines the host-init path RESOLVES instead (it found the
maintainer's actual brain). The fix pins GBRAIN_HOME to a guaranteed-empty
tempdir for the test's duration so host-init has nothing to find and fails
loudly with a non-UnknownBrainError — exactly what the assertion wants.
File is .serial.test.ts so direct process.env mutation is allowed by the
test-isolation linter (R1 quarantine).

test/ai/recipe-llama-server.test.ts: rewrites the manual beforeEach/afterEach
env save/restore as withEnv() per the canonical pattern in
test/helpers/with-env.ts. The original was correct in behavior but tripped
the test-isolation linter (R1: process.env mutation). withEnv() is exactly
the cross-test-safe save+try/finally+restore the manual code did, just
factored out. No behavior change.

Tests: bun run test — 5217 pass / 0 fail (was 5027 / 1 pre-existing).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: address 5 codex pre-merge findings (dim passthrough + URL routing + MiniMax host)

Codex adversarial review during /ship caught five real production bugs.
All five fixed with regression test coverage.

1. **dimsProviderOptions on openai-compatible** (src/core/ai/dims.ts):
   text-embedding-3-* (Azure), text-embedding-v3 (DashScope), and
   embedding-3 (Zhipu) now thread `dimensions` to the wire. Without this,
   Azure-default 3072d hard-fails a 1536d brain on first embed; DashScope
   and Zhipu Matryoshka requests silently get the provider's default size
   instead of what the user asked for. New tests in
   recipe-azure-openai/dashscope/zhipu pin the contract.

2. **`gbrain init --embedding-model llama-server:foo` verbose path**
   (src/commands/init.ts): now refuses without `--embedding-dimensions`
   for user_provided_models recipes. Pre-fix, the shorthand `--model`
   path was guarded but the verbose `--embedding-model` path fell through
   to configureGateway's 1536d default and silently created the wrong-
   width schema; failure surfaced only at first real embed.

3. **MiniMax host correction** (src/core/ai/recipes/minimax.ts):
   `api.minimax.chat/v1` → `api.minimaxi.com/v1` matches MiniMax's
   current OpenAI-compatible docs. Default-config users would have hit
   the wrong endpoint before auth or model selection mattered.

4. **`LLAMA_SERVER_BASE_URL` reaches the gateway** (src/cli.ts:
   buildGatewayConfig): env-set local-server URLs (LLAMA_SERVER_BASE_URL,
   OLLAMA_BASE_URL, LMSTUDIO_BASE_URL, LITELLM_BASE_URL) now thread into
   `cfg.base_urls` so embed traffic hits the configured port. Pre-fix,
   the probe would succeed against a custom port while real embed calls
   went to localhost:8080. Caller-supplied `cfg.provider_base_urls` still
   wins over env.

5. **Recipe.probe(baseURL?) accepts the resolved URL** (src/core/ai/types.ts,
   src/core/ai/probes.ts, src/core/ai/recipes/llama-server.ts): when the
   user configures `provider_base_urls.llama-server` in config but no env
   var is set, the probe and gateway no longer disagree. Callers with cfg
   pass the resolved URL; legacy callers fall back to env / recipe default.

CHANGELOG updated; llms-full.txt regenerated.

Tests: bun run test — 5220/5220 pass / 0 fail (was 5217 / 0; +3 new
codex-finding regression tests).

Pre-merge codex adversarial: ran during /ship Step 11 against the v0.32
diff. All 5 findings addressed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(ci): isolate v0.32 no-batch-cap test from mock.module leak (closes 19 CI fails)

Three CI test-isolation fixes uncovered by yesterday's CI run on PR garrytan#810:

1. **`scripts/test-shard.sh` excludes `*.serial.test.ts`** (was running them
   in parallel shards). Without this, serial files race with non-serial
   files in the CI shard process. Mirrors `scripts/run-unit-shard.sh`'s
   exclusion set; 1-line `find` filter.

2. **`scripts/run-serial-tests.sh` runs each serial file in its own bun
   process**. Pre-fix, all serial files ran in ONE bun process with
   `--max-concurrency=1` — that limits intra-file concurrency but does
   NOT prevent module-registry leakage across files. When
   `eval-takes-quality-runner.serial.test.ts` does
   `mock.module('../src/core/ai/gateway.ts', () => ({chat, configureGateway}))`
   (a partial mock missing `resetGateway`, `defaultResolveAuth`, etc.),
   the next file in the same process gets the partial mock on import and
   `import { resetGateway }` fails with "Export named 'resetGateway' not
   found." Per-file processes give true isolation; cost is ~100ms × N
   files (negligible vs CI walltime).

3. **`test/ai/no-batch-cap-suppression.test.ts` → `.serial.test.ts`**.
   The test mutates `console.warn` globally (mock spy). When other tests
   in the same shard process load `src/core/ai/gateway.ts` and call
   `configureGateway()` first, they populate the module-scoped
   `_warnedRecipes` Set; the test's `resetGateway()` clears it but races
   if other gateway-touching code runs concurrently in the same process.
   Renaming to `.serial.test.ts` quarantines it via fix #1 + #2.

4. **CI workflow gains a serial-tests step on shard 1**. Pre-fix, shard 1
   ran `bun run verify` + the parallel shard, but no shard ran
   `*.serial.test.ts` files. After fix #1 excludes them from shards, they
   need explicit invocation. New step:
   `bash scripts/run-serial-tests.sh` (shard 1 only).

Tests: bun run test — 5220 / 0 fail (matches local pre-CI run; was
showing 19 fails on CI for PR garrytan#810 due to fixes #1-#3 missing).

Failure analysis from .context/attachments/test__2__75236697976.log:
- 18 multimodal failures: caused by mock.module leak from
  eval-takes-quality-runner.serial.test.ts being run alongside
  voyage-multimodal.test.ts in the same parallel shard process. After
  fix #1 + fix #3, eval-takes-quality only runs in serial pass; after
  fix #2, its mock.module doesn't leak to subsequent serial files.
- 1 no-batch-cap failure: same root cause; fix #3 quarantines it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: SiyaoZheng <noreply@github.com>
Co-authored-by: cacity <20351699+cacity@users.noreply.github.com>
Co-authored-by: Magicray1217 <267836857+Magicray1217@users.noreply.github.com>
Co-authored-by: JamesJZhang <32652444+JamesJZhang@users.noreply.github.com>
…riant gate (garrytan#885)

* schema: migration v51 facts_fence_columns + fresh-install parity

v0.32.2 commit 1/11.

Facts become FS-canonical via a `## Facts` fence on entity pages (mirror of
takes-fence). row_num + source_markdown_slug are the round-trip columns the
fence parser uses to reconcile markdown → DB.

Schema changes:
- ALTER TABLE facts ADD COLUMN IF NOT EXISTS row_num INTEGER
- ALTER TABLE facts ADD COLUMN IF NOT EXISTS source_markdown_slug TEXT
- CREATE UNIQUE INDEX idx_facts_fence_key (source_id, source_markdown_slug,
  row_num) WHERE row_num IS NOT NULL

Both columns nullable: pre-v0.32 rows don't have them until commit 6's
v0_32_2 orchestrator backfills via fence-append. The partial WHERE clause is
the Codex R2 collision guard — without it, two pre-v51 NULL-row_num rows on
the same (source_id, source_markdown_slug) coordinate would collide and fail
the migration on any populated v0.31 brain.

Fresh-install parity: the v40 CREATE TABLE block now declares the columns
from the start, so a brand-new install hits a single CREATE that already
has them and the v51 ALTERs no-op via IF NOT EXISTS. Existing brains pick
them up through the v51 migration.

Idempotent under all states (re-runs are no-ops). Metadata-only ALTERs on
PG 11+ and PGLite — no table rewrite. Partial-index syntax verified
against v40's existing idx_facts_unconsolidated precedent.

Tests:
- 6 new v51 cases in test/migrate.test.ts covering name, ADD COLUMN shape,
  nullable contract, partial-unique-index keys, the WHERE-NULL collision
  guard, and LATEST_VERSION progression.
- All 109 migration tests pass (was 103); schema walks 15 → 51 cleanly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: facts-fence.ts + extract shared escape helpers from takes-fence

v0.32.2 commit 2/11.

New: src/core/facts-fence.ts — structural mirror of src/core/takes-fence.ts.
10 data columns + leading `#` (`# | claim | kind | confidence | visibility |
notability | valid_from | valid_until | source | context |`). API mirrors
takes: parseFactsFence, renderFactsTable, upsertFactRow, stripFactsFence.

Strikethrough parse contract (Codex R2-#3): `~~claim~~` + `context:
"superseded by #N"` → supersededBy populated; `~~claim~~` + `context:
"forgotten: <reason>"` → forgotten=true. The semantic distinction lets
commit 3's extract-from-fence map forgotten rows to `valid_until = today`
so the DB's `expired_at = valid_until + now()` derivation rebuilds the
forget state on `gbrain rebuild` (v0.32.3 follow-up).

Refactor: extracted shared primitives to src/core/fence-shared.ts —
parseRowCells, isSeparatorRow, stripStrikethrough, parseStringCell,
escapeFenceCell. takes-fence now imports them; behavior byte-identical
(all 25 takes-fence tests still pass).

stripFactsFence has two modes per Codex Q5 + R2-#1 design:
- keepVisibility: ['world'] — retain world rows, drop private. The mode
  both the chunker (Layer A) and get_page over remote MCP (Layer B) use.
  Private fact bytes never reach content_chunks.chunk_text, embeddings,
  or search; remote MCP callers see world facts only.
- default / empty array — drop the entire fence block. Defensive deny-
  by-default at the privacy boundary.

Tests: 36 new cases in test/facts-fence.test.ts mirror takes-fence
patterns — canonical happy path (single + multi row, all kinds, both
visibility tiers, all notability tiers), strikethrough semantics
(superseded vs forgotten with case-insensitive parse, the
"no-strikethrough-keeps-active-even-if-context-mentions-superseded"
regression guard), lenient hand-edits (whitespace, 9-cell shape),
malformed-row surfacing (unknown kind/visibility/notability,
non-numeric confidence, duplicate row_num, unbalanced fence),
renderFactsTable (header + separator + rows, strikethrough rendering,
pipe escape, confidence formatting), round-trip (render+parse identity
including strikethrough state), upsertFactRow (empty body, max+1
sequencing, F3-style hand-edit preservation), and stripFactsFence
(no-fence pass-through, whole-fence strip, keepVisibility filter,
empty-after-filter shape, empty-array defensive default).

76/76 tests across facts-fence + takes-fence + chunker-recursive pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: src/core/facts/extract-from-fence.ts — pure ParsedFact → NewFact mapper

v0.32.2 commit 3/11.

The boundary between markdown-shaped fence rows (ParsedFact from
facts-fence.ts) and DB-shaped engine rows (NewFact). Pure function, no
I/O. Resolves Codex Q7: engines stay markdown-unaware. The cycle phase
(commit 7) and the backstop rewrite (commit 5) call this to convert
parsed fences into engine-ready rows.

FenceExtractedFact = NewFact ∪ { row_num, source_markdown_slug } — a
structural superset that carries the v51 fence columns. Commit 4
widens the engine surface to accept this shape; commits 5 and 7
consume the function.

Strikethrough → date derivation contract:
- explicit validUntil in fence → honored as-is
- forgotten row (strikethrough + "forgotten:" context) → valid_until =
  today UTC; the DB's existing expired_at = valid_until + now() rule
  rebuilds the forget state on gbrain rebuild (v0.32.3 follow-up)
- supersededBy row without explicit validUntil → null; consolidator
  phase fills this in from the newer row's valid_from
- inactive-unrecognized (strikethrough + neither flag) → today; honors
  the user's strikethrough intent for unrecognized contexts

Determinism guard: nowOverride opt makes the today-stamping testable
without freezing global Date. Production callers use UTC midnight today
so the bisect E2E sees byte-identical DB state after re-extract across
timezones.

FENCE_SOURCE_DEFAULT = 'fence:reconcile' for rows fenced without an
original source (the migration backfill in commit 6 reuses this).

Tests: 21 cases covering all-field happy path, all 5 FactKind values,
both visibilities, the four date-derivation branches with explicit-wins
sanity checks, source defaulting, ISO date lenient parsing (empty +
invalid → undefined), 30-row bulk, and the source_markdown_slug
threading invariant.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: engine.insertFacts batch + deleteFactsForPage on both engines

v0.32.2 commit 4/11.

New BrainEngine surface for the reconciliation path:

  insertFacts(
    rows: Array<NewFact & { row_num: number; source_markdown_slug: string }>,
    ctx: { source_id: string },
  ): Promise<{ inserted: number; ids: number[] }>

  deleteFactsForPage(slug: string, source_id: string): Promise<{ deleted: number }>

insertFacts is the only entry point that persists v51 columns
(row_num, source_markdown_slug). Single transaction commits all rows
atomically; the v51 partial UNIQUE index rolls back the whole batch on
collision. Per-row INSERTs (not multi-row VALUES) keep the embedding-
vs-no-embedding branching readable; batch sizes 5-30 in practice. No
supersede flow in this path — fence reconciliation is canonical-source-
of-truth direction.

deleteFactsForPage scopes by (source_id, source_markdown_slug). Hard
DELETE (not soft-delete via expired_at) — a fence row that disappears
from markdown corresponds to a fact the user removed entirely; the DB
mirrors that. Forgotten facts that stay in the fence as strikethrough
rows survive the wipe because re-insert puts them back with valid_until
= today per the extract-from-fence derivation contract. Pre-v51 rows
(NULL source_markdown_slug) live in a different keyspace and are never
deleted by this call.

Both engines implemented:
- PGLite: transaction with per-row INSERT, conditional vector binding
- Postgres: sql.begin() transaction, postgres.js tagged template

Tests (13 new cases in test/insert-facts-batch.test.ts):
- empty batch returns inserted:0
- single-row + multi-row persistence, ids in input-order
- all NewFact + v51 columns round-trip
- v51 partial UNIQUE rolls back whole batch on collision
- different source_markdown_slug + different source_id values don't
  collide on same row_num
- deleteFactsForPage scoping (same source different page; same page
  different source; pre-v51 NULL-source_markdown_slug rows untouched)
- delete-then-reinsert round-trip (the cycle-phase pattern)

226 tests pass across facts surface + migrate + takes-fence (no
regressions in adjacent code).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: markdown-first fact write path in src/core/facts/backstop.ts

v0.32.2 commit 5/11.

THE rewrite. Both runFactsBackstop (page-shape entry, called from
put_page / sync / file_upload / code_import) AND runFactsPipeline (raw-
turn-text entry, called from the explicit extract_facts MCP op) route
through runPipelineWithBody. Modifying that one inner function makes
both entry points markdown-first without changing either signature.
Resolves Codex R2-#2 surface gap.

New: src/core/facts/fence-write.ts — writeFactsToFence orchestrator +
lookupSourceLocalPath helper.

Pipeline (post-dedup, per entity_slug group):
1. Acquire FS page-lock via src/core/page-lock.ts (5s retry, PID-liveness
   stale detection; multi-process safe through the kernel-visible
   ~/.gbrain/page-locks/<sha-of-slug>.lock file)
2. Read entity page from <source.local_path>/<slug>.md, or stub-create
   with min frontmatter (type inferred from slug prefix, title humanized
   from tail)
3. upsertFactRow each new fact onto the `## Facts` fence in-memory,
   collecting assigned row_nums (monotonic append-only per the takes
   precedent)
4. Atomic write: writeFileSync(.tmp) → re-readFileSync(.tmp) →
   parseFactsFence(.tmp) → on warnings: leave .tmp + JSONL surface +
   NO DB write; on clean: renameSync(.tmp → file). Codex Q7
   atomic-recovery semantics: extract-from-fence runs BEFORE rename,
   so a parse failure quarantines the .tmp without corrupting the
   canonical file
5. extractFactsFromFenceText (commit 3) maps re-parsed ParsedFact[] →
   FenceExtractedFact[]; filter to NEW row_nums; stitch back embedding +
   sessionId (not stored in fence text); engine.insertFacts batch
   (commit 4)

Three structural fallbacks to legacy DB-only insertFact:
- sources.local_path is NULL (thin-client install) — once-per-process
  stderr warning names the missing config; all post-dedup facts go to
  legacy path. Documented as named exception in the architecture doc
  (commit 11)
- f.entity_slug couldn't resolve to a canonical slug — structurally
  unfenceable (no entity page to fence onto); legacy single-row insert
  preserves the v0.31 semantic
- Fence parse-validation fails on a .tmp — that page's facts skip; do
  NOT fall through to legacy DB-only because the DB index for that
  page would be inconsistent with a broken fence

No re-entrancy guard needed: writeFactsToFence uses writeFileSync +
renameSync directly, NOT engine.putPage. No code path can re-trigger
runFactsBackstop on the markdown write. The architecture self-prevents
the recursion concern Codex Q7 raised. Documented in fence-write.ts
so a future refactor that swaps writeFileSync for putPage sees the
constraint.

Dedup unchanged: cosine similarity @ 0.95 against DB candidates, before
fence write. Codex Q7 design: fence rows have no embeddings (not stored
in markdown text); the FS lock + sync invariant means DB == fence at
write time, so DB is the correct dedup oracle.

Tests (11 new cases in test/fence-write.test.ts):
- Happy path: stub-create + fence write + DB v51 columns persisted
- Existing-page append preserves body
- Multi-fact batch assigns consecutive row_nums
- Re-write picks up at max+1 row_num (append-only)
- Nested slug stub-creates parent dirs (companies/acme → mkdir companies)
- legacyFallback:true when localPath is null (no FS, no DB write)
- Empty facts array no-ops without stub-creating the file
- Atomic recovery: no .tmp file left after success
- lookupSourceLocalPath: existing source, unknown source, NULL local_path

The multi-process FS lock contention test lives in
test/e2e/facts-lock-contention.test.ts (commit 10's invariant capstone,
since Bun.spawn is an E2E concern). These cover the in-process happy
and recovery paths.

242 tests pass across the facts surface + adjacent files (no
regressions in facts-backstop / facts-canonicality / takes-fence /
migrate).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: migration orchestrator v0_32_2.ts — backfill v0.31 facts to fences

v0.32.2 commit 6/11.

Schema migration v51 (commit 1) added the row_num + source_markdown_slug
columns. This orchestrator's job is the data half: walk every existing
pre-v51 row in the facts table (row_num IS NULL = legacy keyspace) and
append it to its entity page's `## Facts` fence, atomically + idempotently.

Critical sequencing per Codex R2-garrytan#7: this commit lands BEFORE commit 7's
extract_facts cycle phase so existing v0.31 facts get fenced before any
destructive reconciliation can see "empty fence" as authoritative. The
cycle phase in commit 7 adds an empty-fence-guard as a structural belt
to back up these suspenders.

Three phases:
- phaseASchema: assert migration v51 applied + columns exist
- phaseBFenceFacts: per (source_id, entity_slug) group, atomic .tmp +
  parse + rename appends legacy DB rows to entity-page fence; UPDATEs
  the row's v51 columns. Dry-run by default; refuses if any
  source.local_path is a dirty git tree (mirrors src/core/dry-fix.ts
  safety posture). Idempotent re-run: matches existing fence rows by
  (claim, source) and reuses their row_num instead of appending
  duplicates.
- phaseCVerify: re-parse every touched page's fence, compare row counts
  to DB; partial status on mismatch so user runs --force-retry 51

Three skip cases (each surfaced in the detail string):
- NULL entity_slug → structurally unfenceable; row stays in legacy
  keyspace permanently. Operator decides hand-curate vs delete.
- sources.local_path is NULL → thin-client / read-only brain; nothing
  to fence onto.
- Fence parse-validate fails on the .tmp → .tmp stays as quarantine
  evidence; the operator inspects.

Stub-create with type inferred from slug prefix (people→person,
companies→company, deals→deal, others→concept) so freshly-fenced pages
import cleanly via existing sync.

Tests (14 new cases in test/migrations-v0_32_2.test.ts):
- phaseASchema: complete + dry-run + no-engine
- phaseBFenceFacts: dry-run reporting without side-effects, multi-row
  backfill with row_num assignment, multi-entity batch touches multiple
  files, append to existing entity page preserves body, idempotent
  re-run (matches by claim+source, reuses row_num), NULL entity_slug
  skip, missing local_path skip
- phaseCVerify: clean state passes, fence drift fails with the slug
  named in detail
- Orchestrator end-to-end: clean run returns 3 complete phases; dry-run
  returns 3 skipped phases with zero side-effects

216 tests pass across migrations + facts surface (no regressions).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: extract_facts cycle phase + empty-fence guard (Codex R2-garrytan#7)

v0.32.2 commit 7/11.

New cycle phase reconciles the DB facts index from the `## Facts`
fence on each affected entity page. Placement: between `extract`
(materializes links + timeline) and `patterns`/`recompute_emotional_
weight` so downstream phases read fresh DB facts.

Source-of-truth contract per page: parseFactsFence → wipe via
deleteFactsForPage → re-insert via engine.insertFacts. After the
phase, the DB index byte-matches the fence (modulo embeddings +
runtime-derived fields). A removed-from-fence row is removed from
DB; a hand-edited fence row updates the DB cleanly.

Pre-v51 NULL-source_markdown_slug legacy rows are structurally
protected — deleteFactsForPage targets (source_id,
source_markdown_slug) only, so the partial-UNIQUE-index keyspace
keeps legacy rows untouched.

Empty-fence guard (Codex R2-garrytan#7): pre-check `COUNT(*) FROM facts
WHERE row_num IS NULL AND entity_slug IS NOT NULL`. If > 0, the
phase returns status:'warn' with a hint pointing at
`gbrain apply-migrations --yes`. Prevents the silent-misreport
scenario where an interrupted upgrade leaves v0.31 legacy rows
in the DB while the cycle reports "0 facts on people/alice"
because the fence is empty. Belt to the runtime backstop's
suspenders in commit 5.

Wired in src/core/cycle.ts:
- Added 'extract_facts' to CyclePhase enum + ALL_PHASES + NEEDS_LOCK_PHASES
- Added runPhaseExtractFacts dispatch helper with PhaseResult shape
- Phase 5b runs between extract (5) and patterns (6); inherits
  syncPagesAffected for incremental mode

Tests (10 new cases in test/extract-facts-phase.test.ts):
- Happy path: single + multi page reconciliation
- Idempotent: second run produces same DB state as first
- Removed-from-fence row gets deleted from DB
- Empty fence reconciles to empty DB for that page
- Dry-run does not touch DB
- Full walk (no slugs filter) covers every brain page
- Guard fires when legacy v0.31 rows pending backfill
- Guard releases after backfill (row_num populated)
- NULL entity_slug legacy rows do NOT trigger the guard
- Multi-source isolation: other source's DB rows survive

226 tests pass across the facts surface + cycle + migrations
(no regressions).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: 3-layer privacy strip + forget-as-fence (Codex R2 #1/#3/#5)

v0.32.2 commit 8/11.

Layer A — chunker strip (Codex R2-#1 P0):
src/core/chunkers/recursive.ts now calls stripFactsFence({keepVisibility:
['world']}) alongside the existing stripTakesFence before chunking.
Private fact text NEVER reaches content_chunks.chunk_text, embeddings,
or search. World facts remain searchable (public knowledge by
definition). Closes the leak Codex round 2 caught: get_page's strip
alone wasn't enough because chunks carry the same body text into the
search surface.

Layer B — get_page strip trigger flipped (Codex R2-#5):
src/core/operations.ts:413 strip trigger changes from `ctx.takesHolders-
AllowList` to `ctx.remote === true`. Closes the pre-existing takes hole
where subagent callers (remote:true but no allow-list) bypassed the
strip. Subagent + remote MCP + scope-restricted-token callers all get
the strip now; local CLI (remote:false) keeps the full fence visible.
Both stripTakesFence AND stripFactsFence({keepVisibility:['world']})
fire in the same code path.

Forget-as-fence (Codex R2-#3):
New src/core/facts/forget.ts forgetFactInFence({factId, reason}). When
the row has v51 columns + source.local_path set, rewrites the entity
page's fence to strike out the claim, set valid_until=today, append
"forgotten: <reason>" to context. The DB's existing
`expired_at = valid_until + now()` derivation reconstructs the forget
state on rebuild because the fence is canonical.

Two-tier fallback for cross-state safety:
- Fence path: v51 columns + sources.local_path set + fence file exists +
  fence row matches DB row_num → atomic .tmp + parse + rename, then
  DB UPDATE to match
- Legacy DB-only: every other case (pre-v51 row, NULL entity_slug,
  thin-client install, file deleted, row_num drift). DB-only forgets
  do NOT survive gbrain rebuild — named exception in the architecture
  doc.

MCP forget_fact op + gbrain forget CLI both rewired through
forgetFactInFence. New optional `--reason` flag on the CLI; new
`reason` param on the MCP op. Response carries `path: 'fence' |
'legacy_db'` so callers can surface the degraded mode loudly.

Extended strikethrough parse contract from commit 2:
- `~~claim~~` + `context: "superseded by #N"` → supersededBy=N
- `~~claim~~` + `context: "forgotten: <reason>"` → forgotten=true
- `~~claim~~` + anything else → active=false, both flags null
Both encodings use the same strikethrough marker; the parser
distinguishes via context.

Tests (38 new cases in test/privacy-strip-and-forget.test.ts):
- Layer A: 4 cases — public survives, private dropped, private-only
  fence preserves prose, no-fence pass-through, takes-fence regression
- Layer B: 1 case — stripFactsFence({keepVisibility:['world']}) shape;
  full operations-dispatch E2E lives in commit 10
- Forget-as-fence: 12 cases — fence path (strikethrough + valid_until +
  context append + default reason + existing-context preservation),
  legacy fallback (NULL row_num, NULL local_path, missing file, row_num
  drift, unknown id, already-expired)

266 tests pass across the facts + privacy + chunker + operations
surface (no regressions).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: scripts/check-system-of-record.sh CI gate + function-scoped allow-list

v0.32.2 commit 9/11.

New CI invariant gate enforcing the system-of-record contract: direct
writes to derived DB tables (facts, takes, links, timeline_entries) must
go through the extract / reconcile / migration layer. Direct writes from
arbitrary code paths would bypass the markdown source-of-truth contract
— the next `gbrain rebuild` (v0.32.3) would lose the data because the
fence wasn't updated.

Banned methods (the v0.32.2 derived-write surface):
- engine.insertFact, engine.insertFacts
- engine.addLink, engine.addLinksBatch
- engine.addTimelineEntry
- engine.upsertTake
- engine.expireFact

Scoped to src/ + scripts/ per Codex R2-garrytan#8 — test/ is deliberately
excluded because tests legitimately call these methods to seed fixtures
and gating tests would break the test surface without protecting any
invariant.

Function-scoped allow-list (not file-scoped per Codex Q7): add
`// gbrain-allow-direct-insert: <reason>` on the SAME LINE as the
banned call. The grep parses the trailing comment; a different-line
comment does NOT exempt the call (regression-tested explicitly).
Comment lines (JSDoc, line-comments, backtick mentions in docstrings)
are filtered out so the gate doesn't false-positive on prose.

Wired into `bun run verify` (the canonical CI pre-test gate set).
Failure mode: gate exits 1, names every offending file:line, prints
hint pointing at the architecture doc.

Annotated 18 legitimate call sites:
- src/core/cycle/extract-facts.ts: reconcile fence → DB
- src/core/facts/backstop.ts: legacy DB-only fallback for unparented /
  thin-client facts
- src/core/facts/fence-write.ts: markdown-first reconcile path
- src/core/facts/forget.ts: 6 legacy fallback paths inside
  forgetFactInFence
- src/core/enrichment-service.ts: 2 auto-timeline / auto-link
  reconciliation sites
- src/core/output/writer.ts: 3 BrainWriter synthesize-phase sites
- src/core/operations.ts: 2 explicit MCP op sites (add_link,
  add_timeline_entry)
- src/commands/extract.ts: 5 canonical extract command sites
- src/commands/reconcile-links.ts: 2 code-graph reconciliation sites

Tests (6 new cases in test/check-system-of-record.test.ts):
- Positive: real repo passes (regression guard — the allow-list
  comments + the gate together must keep CI green)
- Negative: synthetic violator file → gate exits 1 + names the path
- Allow-list comment on SAME LINE exempts
- Allow-list comment on DIFFERENT line does NOT exempt
- Gate does NOT scan test/ (Codex R2-garrytan#8 — tests legitimately seed
  fixtures via direct insertFact calls)
- Gate DOES scan scripts/ alongside src/

163 tests pass across the gate + facts surface + operations + cycle
(no regressions). typecheck clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test: system-of-record invariant E2E capstone

v0.32.2 commit 10/11.

The architectural rule prove-out. Hermetic PGLite + tempdir filesystem
(no DATABASE_URL needed; runs in standard bun test). Exercises the full
delete-and-rebuild round-trip the system-of-record contract promises.

Capstone test (full round-trip):
1. Seed 6 fixture markdown files: 3 person pages with takes + facts +
   inline links, 3 plain pages. Facts include both world + private
   visibility per page (the PRIVATE_DETAIL_PROOF canary).
2. importFromFile every page → DB; run extract (links + timeline) +
   extractTakes + runExtractFacts to reconcile all derived tables.
3. Snapshot facts + takes derived state.
4. DELETE FROM facts + takes + links + timeline_entries. Simulates the
   "DB lost; rebuild from repo" disaster scenario v0.32.3's
   `gbrain rebuild` will execute.
5. Re-import every file + re-reconcile. Re-import rebuilds tags
   (per Codex R2-garrytan#6: tags is reconciled by import-file.ts:315, NOT
   by extract phases).
6. Snapshot + diff. Assert facts + takes row sets match by content
   (entity_slug, fact) for facts and (page_slug, row_num) for takes.

Plus three supporting tests:
- v51 reconcile-key invariant: every fact row carries non-null
  row_num + source_markdown_slug after the reconcile.
- Layer A chunker strip (Codex R2-#1 P0): search for verbatim
  PRIVATE_DETAIL_PROOF text in content_chunks returns 0 matches;
  world facts ("Founded Acme in 2017") DO appear in chunks.
- Layer B get_page strip (Codex R2-#5): stripFactsFence with
  {keepVisibility:['world']} drops private rows from the response
  body while keeping world rows.

Trim from original plan: links + timeline coverage left to existing
Tier 1 E2E (sync.test.ts + backlinks.test.ts). The v0.32.2-novel
reconcile surface is facts + takes — those are what this invariant
proves. Cuts ~half the test runtime + scope without losing
v0.32.2 coverage.

4/4 pass in 2.23s. 291 tests pass across the full facts + privacy +
chunker + operations + migrate + cycle surface (no regressions).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.32.2 chore: VERSION + package.json + CHANGELOG manifesto + docs + migration guide

v0.32.2 commit 11/11. Release ceremony.

VERSION + package.json + bun.lock all aligned at 0.32.2.

CHANGELOG.md entry leads with the manifesto:

  > The GitHub repo is the system of record. The database is a derived
  > cache. We do not back up the database — we rebuild it from the repo.

Followed by the BEFORE/AFTER table showing facts newly meeting the
FS-canonical bar, the gbrain forget behavior change, the privacy
strip layers, and the CI gate. Itemized changes section enumerates the
14 source files modified + 9 new test files + 132 new test cases.

docs/architecture/system-of-record.md (new, ~250 lines): the canonical
contract doc. Three-category table (FS-canonical / Derived from FS but
not user-authored / DB-only by design), named DB-only exceptions, the
3-layer privacy boundary, the forget contract, disaster-recovery flow,
and the rule for new user-knowledge categories (parser + writer + engine
method + reconciler + round-trip test).

skills/migrations/v0.32.2.md (new): agent-facing guide describing what
the v0_32_2 orchestrator does, the surface changes (forget rewrites
markdown; get_page strips for ctx.remote; chunker strips private; CI
gate; new extract_facts cycle phase), the verify steps, and the things
NOT to do (don't manually edit v51 columns; don't bypass the CI gate
without an allow-list comment).

Closes the 11-commit bisect plan. Every commit leaves the tree green.
Each commit does one conceptual thing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test: v0.32.2 follow-up — update 5 tests that v0.32.2 surface changes broke, plus fix 2 pre-existing flakes

Five test updates for changes v0.32.2 introduced:

- test/core/cycle.serial.test.ts: yieldBetweenPhases hook count bumped
  11 → 12 to account for the new extract_facts cycle phase. Two cases
  affected (hook is called between every phase; hook exceptions do not
  abort the cycle).

- test/apply-migrations.test.ts: buildPlan skippedFuture expectation
  lists v0.32.2 alongside v0.31.0 at the end. Two cases affected (fresh
  install with v0.11.1 installed; Codex H9 regression with v0.12.0).

- test/facts-mcp-allowlist.serial.test.ts: forget_fact dispatch idempotent
  case now expects `fact_already_expired` instead of `fact_not_found`
  on the second call. v0.32.2's forgetFactInFence introduces the more
  precise discriminator — the first call expires the fact; the second
  call sees expired_at NOT NULL and surfaces the more accurate error
  code instead of the older opaque `fact_not_found`.

Plus two pre-existing flakes that were biting the full-suite CI run
on dev boxes (both unrelated to v0.32.2; both confirmed flaking on
master before v0.32.2 work began):

- test/eval-longmemeval.test.ts warm-create speed gate: threshold
  bumped from p50<500ms → p50<1500ms. Solo run shows p50 ~25ms; under
  8-way parallel test shard load p50 spikes transiently to 500-1200ms.
  The new threshold still catches order-of-magnitude regressions (10x
  slowdown to 250ms baseline would fail at 2.5s) without flaking under
  legitimate parallel CPU contention.

- test/brain-registry.serial.test.ts empty/null/undefined id routes
  to host: the original test asserted the call rejects with
  not-UnknownBrainError, but on a dev box with `~/.gbrain/config.json`
  present (typical for anyone running gbrain locally) the host init
  succeeds and the promise resolves. Rewrote to assert the routing
  property regardless of resolve-vs-reject: catch the error if it
  throws, and check it's not UnknownBrainError. Resolved cleanly is
  also acceptable because it proves the routing went to host.

Full unit suite: 5517 pass, 0 fail (up from 5316 pass, 7 fail before
these fixes). `bun run verify` clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test: e2e — update 3 tests that v0.32.2 surface changes broke

- test/e2e/dream-cycle-phase-order-pglite.test.ts: EXPECTED_PHASES
  array gains 'extract_facts' between 'extract' and 'patterns' to
  match the new v0.32.2 cycle phase order.

- test/e2e/cycle.test.ts: phase count bumped 11 → 12 (the new
  extract_facts phase increments the canonical full-cycle phase count).

- test/e2e/facts-forget.test.ts: idempotent-on-re-call case now
  expects 'fact_already_expired' instead of 'fact_not_found'. v0.32.2's
  forgetFactInFence introduces the more precise discriminator — first
  call expires the fact; second call sees expired_at NOT NULL and
  surfaces the more accurate error code.

Full E2E suite (DATABASE_URL set, sequential via scripts/run-e2e.sh)
now: 78/78 files pass, 531/531 tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…outing tables (garrytan#859)

* skill: compress-agents-md — functional-area resolver pattern

Proven via A/B eval: 100% routing accuracy at 48% size reduction.
Converts granular per-skill resolver rows into functional-area dispatchers
with '(dispatcher for: ...)' sub-skill lists.

Includes:
- SKILL.md with full pattern docs, before/after examples, eval results
- routing-eval.jsonl with 5 fixtures
- Anti-patterns (resolver-of-resolvers pipe table = 15% accuracy)

* skill: rename compress-agents-md → functional-area-resolver, cite prior art

The contribution is a pattern (functional-area dispatcher with `(dispatcher
for: ...)` clauses), not a file. Rename describes the contribution; triggers
broaden to cover both AGENTS.md and RESOLVER.md phrasings.

SKILL.md rewrite:
- Three-model A/B table (Opus 4.7 / Sonnet 4.6 / Haiku 4.5) replaces the
  original Sonnet-only claim. Functional-areas beats baseline by +13 to +17pp
  training (lenient) across all three models at 48% the size.
- Strict + lenient scoring documented side by side. Lenient (predicted shares
  dispatcher area with expected) matches production agent behavior.
- Preconditions added: refuse to compress if file <12KB or working tree dirty.
- Multi-file routing precedence section for the v0.31.7 RESOLVER.md/AGENTS.md
  merge case.
- Mandatory verification step (≥95% via the harness).
- Daily-doctor.mjs reference scrubbed (didn't exist in gbrain).
- Three prior-art citations: AnyTool (arXiv:2402.04253), RAG-MCP
  (arXiv:2505.03275), Anthropic Agent Skills progressive disclosure. The
  pattern is the static-prompt analog of runtime hierarchical routing.

routing-eval.jsonl: 8 positive (5 original + 3 broadened triggers) + 4
adversarial negatives targeting skillify, skill-creator, book-mirror,
concept-synthesis to prove broadened triggers don't over-capture adjacent
meta-skills.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* evals: A/B harness for functional-area-resolver (gateway-routed, strict + lenient scoring)

evals/functional-area-resolver/ lives outside skills/ deliberately. The
skillpack bundler walks skills/<skill>/ recursively, so an eval surface in
there would copy harness + variants + fixtures + tests into every downstream
install. The pattern (in SKILL.md) ships everywhere; the eval evidence stays
in the gbrain repo.

What ships:
- Three variant resolvers in variants/ — baseline.md (verbose 25KB) and
  functional-areas.md (compressed 13KB) extracted from a real production
  AGENTS.md at git commits 93848ff3b^ and 93848ff3b (owner PII scrubbed).
  resolver-of-resolvers.md derived mechanically by stripping (dispatcher
  for: ...) clauses — the ablation case.
- 20 hand-authored training fixtures + 5 held-out blind fixtures.
- harness-runner.ts — TypeScript runner via gbrain gateway. Flags:
  --model {opus|sonnet|haiku|<full-id>}, --variants-dir, --variants for
  description-length sweeps, --parallel N (rate-lease bound), --limit N
  for smoke runs, --yes for non-TTY.
- Every output row carries BOTH `correct` (strict) and `correct_lenient`
  (predicted shares dispatcher area with expected). Lenient matches
  production behavior.
- Receipt header binds (model, prompt_template_hash, fixtures_hash,
  harness_sha, ts, cmd_args). Re-runs are auditable.
- harness.mjs — thin Node shim that spawns the TS runner via bun.
- rescore.mjs — zero-cost lenient re-score of an existing JSONL.
- harness-runner.test.ts — 45 unit tests (no API key needed) covering
  every pure function plus the dispatcher-list parser.

The prompt template is load-bearing: without the "drill into (dispatcher
for: ...) list" instruction, every compression variant collapses to
~30-60%. Documented in SKILL.md and README.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* evals: baseline receipts (Opus 4.7 + Sonnet 4.6 + Haiku 4.5, 2026-05-11)

Three canonical 225-row receipts (3 variants × 25 fixtures × 3 seeds per
model). Each receipt header binds (model, prompt_template_hash,
fixtures_hash, harness_sha, ts) so the published SKILL.md numbers are
reproducible.

Training corpus (n=20, lenient):
  baseline      | Opus 81.7% | Sonnet 86.7% | Haiku 73.3% | 25KB
  functional-areas | Opus 98.3% | Sonnet 100%  | Haiku 88.3% | 13KB
  resolver-of-resolvers | Opus 63.3% | Sonnet 41.7% | Haiku 65.0% | 10KB

functional-areas beats baseline by +13 to +17pp across all three models at
48% the size. resolver-of-resolvers' Sonnet collapse (41.7%) is the SKILL.md
"compression without dispatcher clause is broken" claim, observed.

Held-out (n=5, lenient) saturates at 100% across most cells (Sonnet ×
resolver-of-resolvers is 73.3% — the same failure mode visible on a smaller
sample).

~$3 API spend across all three runs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* skill: wire functional-area-resolver into RESOLVER.md + manifests

skills/RESOLVER.md gets a new row in Operational, adjacent to skillify.
Triggers: "Compress my resolver", "AGENTS.md too large", "RESOLVER.md too
big", "functional area dispatcher", "shrink routing table".

skills/manifest.json adds the new entry and bumps manifest version
0.25.1 → 0.32.3.0 (loadOrDeriveManifest reads this for sync-guard).

openclaw.plugin.json adds functional-area-resolver to the skills array
and bumps version 0.25.1 → 0.32.3.0 so install receipts stop being stale
(src/core/skillpack/installer.ts:307-311 uses manifest version on every
install).

Verified:
- gbrain check-resolvable --json: 42/42 reachable, 0 errors.
- gbrain routing-eval: 70/70 pass (100% structural).
- bun test test/skillpack-sync-guard.test.ts: passes (manifest in sync).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.32.3.0 skill: functional-area-resolver — pattern for compressing routing tables

Headline: compress a 25KB AGENTS.md down to 13KB without losing routing
accuracy. Pattern proven across Opus 4.7, Sonnet 4.6, and Haiku 4.5 — beats
the verbose baseline by +13 to +17pp at 48% the size.

Empirical (training, n=20, 3 seeds, lenient):
  baseline 25KB:                Opus 81.7% | Sonnet 86.7% | Haiku 73.3%
  functional-areas 13KB:        Opus 98.3% | Sonnet 100%  | Haiku 88.3%
  resolver-of-resolvers 10KB:   Opus 63.3% | Sonnet 41.7% | Haiku 65.0%

The (dispatcher for: ...) clause is the load-bearing signal. Strip it (the
resolver-of-resolvers variant) and Sonnet collapses to 41.7% — the failure
case the pattern's authors predicted, now observed.

Files in this release:
- VERSION + package.json bumped to 0.32.3.0 (4-segment per CLAUDE.md).
- CHANGELOG.md: full empirical story, cross-model table, three prior-art
  citations (AnyTool, RAG-MCP, Anthropic Agent Skills progressive
  disclosure).
- TODOS.md: nine v0.33.x follow-ups (dogfood on gbrain's own RESOLVER.md,
  CLI promotion to gbrain routing-eval --ab-compare, held-out corpus
  growth, cross-vendor Gemini+GPT verification, per-row description
  length sweep, structural compression to ~10KB, hierarchical
  area-of-areas, embedding pre-router, adversarial fixtures,
  prompt-design ablation doc).
- llms-full.txt regenerated.

Bisect-friendly history on this branch:
  502d447  skill: rename + content rewrite + routing-eval.jsonl
  472cc68  evals: A/B harness + variants + fixtures + tests (no receipts)
  243e013  evals: cross-model baseline receipts (Opus + Sonnet + Haiku)
  ecab180  skill: wire-up to RESOLVER.md + manifest.json + openclaw.plugin.json
  THIS:     v0.32.3.0 release marker

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* evals: codex review fixes — accept ASCII -> arrow + provider-aware auth gate

Two P2 findings from /codex review on commit 8870c64:

P2-2: parseDispatcherLists regex required Unicode `→`, but SKILL.md
Step 4 documents the template with ASCII `->`. Downstream-authored
resolvers following the template silently fell through to strict-only
scoring (correct_lenient == correct always), under-reporting same-area
accuracy with no warning. Regex now accepts both `→` and `->`. Two
new test cases pin the behavior — pure-ASCII variant + mixed-arrow
variant.

P2-3: main() exited with `ANTHROPIC_API_KEY is not set` even when the
user passed `--model openai:gpt-4o` with a valid OPENAI_API_KEY. The
CLI advertises full provider:model support (resolveModel tests cover
openai:* explicitly) and the gateway routes by recipe; the env check
should match the provider that will actually be called. Now extracts
the provider id from the model string and looks up the right env var
from REQUIRED_ENV_BY_PROVIDER (anthropic, openai, google, groq,
voyage, together, deepseek, minimax, dashscope, zhipu). Unknown
providers fall through to the gateway, which raises a clear
recipe-specific error.

47/47 harness unit tests pass after the change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* skill: codex review P2-1 — verification gate now tests the user's edited file

The original SKILL.md Step 6 told users to run `node harness.mjs` from the
gbrain repo as the mandatory ≥95% gate. But that runs the harness against
the COMMITTED sample variants in evals/functional-area-resolver/variants/,
not the file the user just compressed. The gate could pass while the edit
dropped a sub-skill.

Step 6 now:
- Gate 1 stays at `gbrain routing-eval --json` (structural, runs against
  the user's actual routing-eval.jsonl fixtures).
- Gate 2 is rewritten: copy the user's edited routing file into a tmp
  variants dir, then run `node harness.mjs --variants-dir <tmp>
  --variants my-edit --model opus`. This exercises the harness's existing
  --variants flag (added in commit 472cc68 / T4) but now points at the
  user's actual edit. The harness uses gbrain-bundled fixtures, so this
  is a regression check on shared skills, not a full eval of the user's
  fixture set — and the SKILL.md says so explicitly.

Also adds a "common false negatives" callout: when the user's routing
file doesn't expose the skills gbrain's bundled fixtures target (e.g.
`gmail`, `enrich`), expect strict-scoring fails on those rows; lenient
scoring remains accurate.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* evals: codex review P3 — regenerate Opus baseline with current schema

The prior Opus receipt was generated before commit 472cc68 (T4 added
harness_sha to ReceiptRow and correct_lenient to every RunRow). The
Sonnet and Haiku receipts shipped with the new schema, but Opus was
the outlier.

This run was produced with the current harness (sha ca99fbf, after
the P2-1 + P2-2 + P2-3 fixes). The harness_sha in the receipt header
binds the numbers to a specific harness revision so consumers can detect
schema drift.

Numbers (training, lenient, n=20, 3 seeds):
  baseline:              81.7% ± 7.2%  (unchanged — strict and lenient are equal)
  functional-areas:      100% ± 0%     (was 98.3% — one nondeterministic seed
                                         is now in-cluster; pattern continues
                                         to beat baseline at 48% the size)
  resolver-of-resolvers: 66.7% ± 7.2%  (was 63.3% — still in noise; absent
                                         dispatcher clause keeps it ~30pp
                                         behind functional-areas on training)

Held-out (n=5, 3 seeds, lenient): all variants 100% except resolver-of-
resolvers on Sonnet (committed in earlier baseline) — Opus held-out
saturates the small fixture set.

Run cost: ~$1.40 at Opus 4.7 pricing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* post-merge: scrub fork-private paths + add Contract/Output Format sections

Two CI gates landed on master after this branch was cut:

1) scripts/check-privacy.sh (v0.32.2): banned /data/brain/ and /data/.openclaw/
   in committed files. The eval variants extracted from a real production
   AGENTS.md still contained those fork-private path literals. Rewrote to
   /your/brain/path/, /your/agent/.openclaw/, /your/gbrain, /your/gstack,
   /your/tmp, /your/git-projects/. Only path strings changed — the routing
   structure (skill names, dispatcher clauses, trigger phrases) is byte-for-
   byte identical, so harness baseline-runs/ receipts are still valid.

2) test/skills-conformance.test.ts (master): added required sections
   `## Contract` and `## Output Format` to every skill. Added both to
   skills/functional-area-resolver/SKILL.md following the book-mirror
   convention (short body referencing the canonical content above + a
   conformance-test footnote). Contract notes the privacy guarantee +
   the verification-gate semantics; Output Format documents the area
   entry template (with both ASCII -> and Unicode → arrows accepted).

Full unit suite: 5578 pass / 0 fail. bun run verify clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: surface functional-area-resolver in CLAUDE.md + README.md for v0.32.3.0

CLAUDE.md — adds a "Routing-table compression (v0.32.3.0)" entry under Skills,
covering the two-layer dispatch pattern, the load-bearing (dispatcher for: ...)
clause, the eval surface at evals/functional-area-resolver/, the three
cross-model baseline receipts, the 25KB → 13KB compression numbers, and the
nine v0.33.x follow-up TODOs. Cites AnyTool / RAG-MCP / Anthropic Agent Skills
prior art so the pattern's position in the literature is discoverable from the
agent entry point.

README.md — adds a "New in v0.32.3.0" callout in the intro section so users
landing on the repo see the new skill before scrolling to the skills list.
Links the SKILL.md and eval directory; states the cross-model gain (+13 to
+17pp at 48% the size) so the reason to apply the pattern is one click away.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: garrytan-agents <garrytan-agents@users.noreply.github.com>
Co-authored-by: Garry Tan <garrytan@gmail.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat: add sync freshness check to gbrain doctor

- Add checkSyncFreshness function to detect stale sources
- Check all sources with local_path for sync staleness
- Warn if > 24 hours, fail if > 72 hours since last sync
- Include page count drift detection (best-effort)
- Add check to both remote and local doctor flows
- Provides actionable error messages with gbrain sync commands

* chore: bump version and changelog (v0.32.4)

sync_freshness check ships in v0.32.4 — adds detection for stale federated
sources (warn at 24h, fail at 72h) plus best-effort filesystem-vs-DB drift
detection. Surfaces in both runDoctor (local) and doctorReportRemote
(thin-client).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat: rewrite sync_freshness as staleness-only + env overrides + 12 tests

Strip the inline FS-walk drift detector from checkSyncFreshness. Codex
outside-voice review during plan-eng-review caught that doctorReportRemote
runs in the HTTP MCP server (src/commands/serve-http.ts), so walking
DB-supplied sources.local_path values from a remotely-callable endpoint
crosses a trust boundary — an OAuth write-scoped client could mutate
local_path and probe arbitrary server filesystem paths via timing/count
signal. Drift detection belongs in the existing multi_source_drift check
which already has GBRAIN_DRIFT_LIMIT + GBRAIN_DRIFT_TIMEOUT_MS guards.

Functional fixes folded in:
- Future-last_sync_at now warns ("clock skew or corrupted timestamp")
  instead of silently falling through as ok. Negative ageMs previously
  skipped both threshold tests.
- GBRAIN_SYNC_FRESHNESS_WARN_HOURS / GBRAIN_SYNC_FRESHNESS_FAIL_HOURS
  env vars override the 24h / 72h defaults. Invalid values (NaN, <=0)
  fall back to defaults with a once-per-process stderr warn.
- Failure messages embed source.id so `gbrain sync --source <id>` matches
  the user's copy-paste (was source.name, which doesn't match the CLI flag).

checkSyncFreshness is now exported so tests can target it directly,
mirroring the takesWeightGridCheck pattern at doctor.ts:89.

12 unit tests in test/doctor.test.ts cover every branch:
empty sources, never-synced, >72h fail, 72h boundary, 24-72h warn,
24h boundary, <24h ok, future timestamp, mixed sources (highest severity
wins), executeRaw throws -> outer-catch warn, env override fires at 7h,
source.id regression.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs: refresh v0.32.4 CHANGELOG + CLAUDE.md to match staleness-only scope

Drop the filesystem-vs-DB drift detector description from the CHANGELOG
entry. Document the env-var overrides (GBRAIN_SYNC_FRESHNESS_WARN_HOURS /
GBRAIN_SYNC_FRESHNESS_FAIL_HOURS), the future-timestamp warn behavior,
the source.id-in-message fix, and the codex-surfaced trust-boundary
rationale for stripping drift out of scope.

CLAUDE.md doctor.ts annotation updated to reflect the simpler surface
plus the 12 pinning tests.

llms-full.txt regenerated to track the CLAUDE.md edit (mandatory per
CLAUDE.md rule).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: garrytan-agents <garrytan-agents@users.noreply.github.com>
Co-authored-by: Garry Tan <garrytan@gmail.com>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
…temporal/spatial injection (garrytan#880)

* feat: gbrain-context OpenClaw context engine — deterministic temporal/spatial injection

Adds a context engine plugin that runs on every assemble() call to inject
structured live context into the system prompt:

- Garry's current local time (computed from heartbeat-state.json timezone)
- Current location (city + timezone from heartbeat or flight data)
- Home time when traveling (e.g. 'Mon 7:58 AM PT')
- Active travel status
- Quiet hours detection
- Airport→timezone mapping for 30+ airports

This kills the 'time warp' bug class where compacted sessions lose track
of time/location. The engine delegates compaction to the legacy runtime
and only owns systemPromptAddition injection. Zero LLM calls, <5ms.

Files:
- src/core/context-engine.ts — engine implementation (SDK-free, testable)
- src/openclaw-context-engine.ts — plugin entry point (requires SDK)
- test/context-engine.test.ts — 9 tests, all passing

Enable: plugins.slots.contextEngine = 'gbrain-context'

* feat: add activity injection — calendar events + open tasks in context block

Reads memory/calendar-cache.json and ops/tasks.md to inject:
- **Right now:** current meeting (with attendees) from calendar
- **Coming up:** next 3 events within 4-hour window
- **Open tasks:** unchecked items from Today section
- Stale calendar warning when cache is >6 hours old

Skips all-day events and generic markers (Home, OOO, Out of Office).
Caps upcoming events at 3 and tasks at 5 to keep prompt lean.

15 tests passing (was 9).

* v0.32.5 feat: gbrain-context OpenClaw context engine — deterministic temporal/spatial injection

Ships PR garrytan#873 by @garrytan-agents (two underlying commits preserved):
  - f1dbe6e — core engine (heartbeat + flights + airport→tz + quiet hours)
  - 14e8587 — activity injection (calendar events + open tasks + stale-cache warning)

Kills the "time warp" bug class: when sessions compact, the LLM loses track
of current time, location, and active threads. This engine owns the
`systemPromptAddition` slot and reinjects live state on every `assemble()`
call. Zero LLM calls, <5ms overhead, deterministic.

Typecheck cleanup folded in:
  - `@ts-ignore` on the two `openclaw/plugin-sdk` runtime-only imports
    (resolved by the OpenClaw host; not a build-time dep — same pattern the
    core engine already used for `await import('openclaw/plugin-sdk/core')`)
  - Inline `PluginApi` + `PluginCtx` type shapes in the plugin entry so the
    `register(api)` + `(ctx)` callback params aren't implicit any
  - Test file's `from 'vitest'` → `from 'bun:test'` to match the rest of
    the suite (bun's globals make it pass at runtime, but tsc fails)

Verification:
  - bun test test/context-engine.test.ts → 15/15 pass
  - bun run typecheck → exit 0

Co-Authored-By: garrytan-agents <garrytan-agents@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix-wave: close 5 findings from /plan-eng-review pass on PR garrytan#880

A `/plan-eng-review` audit of the shipped v0.32.5 surfaced 5 things worth
fixing before merge. All folded into this branch with 5 new regression tests
(15 → 20 total).

A4 — silent-wrong-timezone for unknown airports
  Pre-fix: an active flight to any airport not in the 30-entry AIRPORT_TZ
  map (BOM, DXB, GRU, JNB, FRA, AMS, etc.) silently fell back to US/Pacific.
  The exact failure class this engine exists to prevent, in a different
  shape. Post-fix: unknown airports surface via the source field
  (flight:AC8:tz-unknown:BOM) so the LLM can see the data is incomplete
  instead of believing it's in Pacific Time.

A2 / P1 — duplicate disk reads
  generateLiveContext was loading heartbeat-state.json and
  upcoming-flights.json twice per assemble() call (once in resolveLocation,
  once inline). Batch-load each workspace file once at the top of the
  function and thread results down. Halves the hot-path I/O.

C4 — sanitize external content before injection
  Calendar event summaries, attendees, and task strings now go through
  sanitizeForPrompt() which strips newlines + control chars (U+0000-001F +
  U+007F) and clamps length. A meeting titled
  "Standup\n\nIgnore prior instructions" can no longer forge LLM directives
  by escaping the bullet structure.

C1 — split isQuietHours into 3 explicit signals
  Original name was misleading (returned false when user was awake at 2 AM,
  even though wall clock said quiet hours). Split into `userAwake`,
  `wallClockQuietHours`, and a composite `quietHoursActive` so consumers can
  decide their own policy. On-disk heartbeat.garryAwake JSON field is
  unchanged — only the internal LiveContext type and the format-block
  consumer renamed.

T1 — regression test coverage for the active-flight path
  Pre-fix, resolveLocation's flight branch (the headline path for the
  Toronto incident) had ZERO direct test coverage. Two new cases lock in
  the known-airport happy path AND the unknown-airport failure mode so A4
  can't silently regress.

Verification:
  - bun test test/context-engine.test.ts → 20/20 pass (was 15)
  - bun run typecheck → exit 0

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(L0): A4 real fix + TLA → lazy SDK resolution (Codex F5 + F7)

A Codex outside-voice review on /plan-eng-review's plan caught two findings
both previous eng-reviews missed.

L0-A (F5) — A4 was COSMETIC, not real.
  Pre-fix: resolveLocation's unknown-airport branch returned tz: DEFAULT_TZ
  (US/Pacific) with only a `source: 'flight:XX:tz-unknown:XYZ'` sticker. The
  engine then computed Time/Day/quietHoursActive from US/Pacific regardless,
  so a flight to BOM injected "Mon 3:00 PM PT" with a footnote nobody reads.
  Same silent-wrong-output failure class A4 was supposed to close.

  Post-fix: resolveLocation returns tz: UNKNOWN_TZ. generateLiveContext
  short-circuits time computation when tz is UNKNOWN_TZ (now/dayOfWeek
  become null, wallClockQuietHours/quietHoursActive become false).
  formatContextBlock renders an explicit Timezone-unavailable warning in
  place of Time:/Day:. The LLM sees the gap, not a guess.

L0-B (F7) — Top-level `await import` is a hard module-load constraint.
  Any OpenClaw deployment in a non-TLA runtime (older Node, CJS bridges,
  certain transpilers, some test shims) fails BEFORE the plugin registers.
  The try/catch inside doesn't help — module load can't be caught by the
  consumer.

  Post-fix: SDK resolution moved to an `ensureSdkLoaded()` async helper
  called from assemble() and compact() on first invocation. Module loads
  cleanly in every runtime; the fallback path actually catches.

Tests:
  - The cosmetic "tz-unknown sticker" assertion is replaced with the
    behavioral assertion: no US/Pacific Time, no Day field, explicit
    Timezone-unavailable warning present.
  - New L0-B contract test asserts engine creation does NOT trigger SDK
    load and the first compact() call exercises the lazy path.

Verification:
  - bun test test/context-engine.test.ts → 21/21 pass (20 + L0-B contract)
  - bun run typecheck → exit 0

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(L1): scrub real names from test fixtures + CI guard (CLAUDE.md privacy rule)

The /plan-eng-review pass flagged pre-existing real-name leaks in PR garrytan#873's
test fixtures. CLAUDE.md's privacy rule is unambiguous: "Never reference real
people, companies, funds, or private agent names in any public-facing
artifact." Tests are checked-in code, distributed with every release, and
indexed by GitHub search.

Fixture scrub (test/context-engine.test.ts, 5 substitutions):
  '1:1 with Diana' → '1:1 with @alice-example'
  'diana@ycombinator.com' → 'alice@example.com'
  'DM Technium re: Hermes PR' → 'DM @charlie-example re: agent-fork PR'
  'Post open source manifesto — from YC Labs' → '... from a-team'
  '~~Reply to Bob McGrew~~ — DONE' → '~~Reply to bob-example~~ — DONE'
Plus matching assertion updates.

Adjacent scrub: test/link-extraction.test.ts line 523 fixture entry
'people/diana-hu' → 'people/alice-example' (single occurrence, never
referenced elsewhere in the test).

New CI guard (scripts/check-test-real-names.sh, ~120 lines):
  Designed per Codex F4 review: drop the broad corporate-email regex
  (@openai|google|stripe...) because legitimate billing/auth fixtures use
  those domains. Replace with two targeted lists:
    - BANNED_NAMES: exact-string list of known real identifiers
      (Diana, Wintermute, Hermes, Technium, McGrew, YC Labs)
    - BANNED_EMAILS: specific addresses (currently just diana@ycombinator.com)
  Plus ALLOWLIST of exact `file:string` pairs that are intentional and
  pre-existing (the user's own email; structural tests that ASSERT a banned
  name is absent and therefore MUST reference it literally).

  Scope: test/**/*.test.ts only. Historical CHANGELOG entries, doc examples,
  and skill READMEs each have their own scrub status and are out of scope
  for this guard.

Wire-in:
  - New `bun run check:test-names` npm script
  - Added to `bun run verify` chain (pre-push gate)
  - Added to `bun run check:all` chain (local-only superset)

Allowlist documents the structural references the guard correctly identifies
but cannot meaningfully strip:
  - test/integrations.test.ts (regex pattern in personal-info filter test)
  - test/recency-decay.test.ts (regression-prevention assertions)
  - test/serve-stdio-lifecycle.test.ts (pre-existing comment)
  - test/extract.test.ts (pre-existing markdown-link fixture)

These flagged-but-not-scrubbed entries belong to a broader repo-wide
privacy-scrub pass (deferred TODO).

Verification:
  - bun run check:test-names → exit 0 (no new banned strings)
  - bun test test/context-engine.test.ts → 21/21 pass
  - bun test test/link-extraction.test.ts → 98/98 pass

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(L2): plugin-shape e2e + compact fallback + selector map + race-condition JSDoc

The unit suite at test/context-engine.test.ts exercised
createGBrainContextEngine directly — that's the ENGINE, not the PLUGIN. Until
this commit, nothing tested the actual OpenClaw plugin discovery + registration
path. Codex outside-voice F1 flagged the gap: "we ship a plugin we don't test
as a plugin."

Layer 2 closures:

T-NEW1 (plugin-shape e2e, test/e2e/openclaw-context-engine-plugin.test.ts, 3 tests):
  - Default export has the expected plugin-entry shape (id, name, description, register)
  - register() wires registerContextEngine with ENGINE_ID and a factory
  - Factory returns a working ContextEngine that injects Live Context and
    threads through the mocked memory-addition SDK call

  Implementation note: dropped the unused `definePluginEntry` import from
  src/openclaw-context-engine.ts. The wrapper was a type-tag with no behavior
  — OpenClaw's loader inspects the default export's shape, not the wrapping.
  Removing it eliminated a brittle build-time SDK import that blocked
  mock.module() interception (Codex F1 was right). Module now loads cleanly
  in any runtime.

T-NEW4 (compact() fallback test, test/context-engine.test.ts):
  - Pins the no-runtime fallback shape so a refactor that drops the fallback
    or returns a different shape gets caught.
  - Codex F9 noted that without a real SDK boundary, a spy-on-delegate test
    is busywork. This commit keeps just the fallback assertion (no spy, no
    __internal export-for-tests hatch).

T-NEW6 (heartbeat-write concurrency contract, src/core/context-engine.ts):
  - JSDoc on loadJsonFile documenting that producers MUST use atomic-rename
    writes (write-to-tmp + rename) to avoid partial-read races. The engine
    silent-degrades to defaults on parse failure; the contract makes the
    expectation explicit instead of buried in behavior.

T-NEW5 (e2e selector map, scripts/e2e-test-map.ts):
  - Added entries mapping src/core/context-engine.ts and
    src/openclaw-context-engine.ts to the new plugin e2e file. ci:local:diff
    now narrows correctly for engine changes.

Verification:
  - bun test test/context-engine.test.ts → 22/22 pass (21 + T-NEW4)
  - bun test test/e2e/openclaw-context-engine-plugin.test.ts → 3/3 pass
  - bun run typecheck → exit 0

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(L3): ENGINE_VERSION → ENGINE_API_VERSION semantic + tasks.md size cap

C-NEW1 — Engine version constant semantic.
  Pre-fix: `ENGINE_VERSION = '0.1.0'` looked like it should track
  package.json. It doesn't — it's the engine's CONTRACT version, bumped
  when the ContextEngine interface shape changes. Rename to
  ENGINE_API_VERSION makes that explicit. ENGINE_VERSION kept as a
  deprecated alias so existing v0.32.5 callers don't break.

C-prior C2 — tasks.md size cap.
  resolveTodayTasks() now refuses to read a tasks file >1MB. Defends
  against a runaway file (clipboard-paste accident, log capture, etc)
  blocking every assemble() call with a multi-megabyte sync read. The
  size check uses statSync — same try/catch already handles
  missing-file via readFileSync throwing.

Verification:
  - bun test test/context-engine.test.ts → 23/23 pass (22 + size-cap test)
  - bun test test/e2e/openclaw-context-engine-plugin.test.ts → 3/3 pass
  - bun run typecheck → exit 0

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: CHANGELOG + TODOS for the Codex recalibration wave; allowlist sibling guard

CHANGELOG.md — extend v0.32.5 entry with a "Codex outside-voice
recalibration" subsection covering L0-A (A4 real fix), L0-B (TLA → lazy),
the privacy guard redesign, the new plugin-shape e2e, and the deferred
v0.32.6 items. Credits gpt-5-codex as the driver.

TODOS.md — append "v0.32.6 follow-ups from PR garrytan#880" section with 13
deferred items:
  - Clock-injection seam (prerequisite for perf + snapshot tests)
  - T-NEW2 perf budget (with Codex F2 math-bug note)
  - T-NEW3 full-block snapshot test
  - C-NEW2 exports map entry (per Codex F8 — premature public API)
  - A3 .ts-extension resolution coupling
  - A5 typed openclaw/plugin-sdk ambient module shim
  - C-prior C5 loadJsonFile parse-error warn
  - C-prior C3 fractional-hour timezone offset
  - DST-boundary test
  - Multibyte sanitizer test
  - Dynamic airport-tz lookup (replace 30-entry static map)
  - DOC1 docs/openclaw-context-engine.md workspace contract
  - DOC2 CLAUDE.md "Key files" annotations
  - Repo-wide privacy scrub (24+ non-test matches)

scripts/check-privacy.sh — allowlist sibling guard
scripts/check-test-real-names.sh, which literally contains 'Wintermute' in
its BANNED_NAMES list (same meta-rule-enforcement exception as
check-privacy.sh's self-reference).

Verification:
  bun run verify → exit 0 (full chain green: check:privacy + check:test-names
  + check:jsonb + check:progress + check:test-isolation + check:wasm +
  check:admin-build + check:admin-scope-drift + check:cli-exec + typecheck)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(L4): real openclaw-loads-the-plugin e2e — closes Codex F1 properly

Until this commit, the gbrain-context plugin had two test paths:
  - test/context-engine.test.ts (23 unit tests against createGBrainContextEngine)
  - test/e2e/openclaw-context-engine-plugin.test.ts (3 e2e tests with mocked SDK)

Both call our engine directly or shim the OpenClaw SDK. Codex outside-voice
F1 (cited at v0.32.5 ship) flagged that nothing in the repo proves OpenClaw's
actual plugin loader walks our entry file, calls register(api) against its
real api object, and accepts the registration. The reviewer was right —
shipping a plugin without an "OpenClaw actually loads it" test is a
credibility hit on a feature whose entire purpose is to integrate with
OpenClaw.

L4 — test/e2e/openclaw-plugin-load-real.test.ts (6 tests, Tier 2):

  beforeAll:
    - Detects `openclaw` CLI; skips suite if missing
    - bun build src/openclaw-context-engine.ts → JS bundle (same packaging
      shape the release ships)
    - Writes minimal package.json + openclaw.plugin.json from templates
    - openclaw plugins install --link --dangerously-force-unsafe-install
      against an isolated --profile dir (won't touch user's openclaw state)

  Tests:
    1. status=loaded, imported=true, activated=true
    2. Default-export id/name/description metadata round-trips through
       openclaw's plugin loader unchanged
    3. register(api) produced zero error-level diagnostics (only the
       expected trust warning for --link installs)
    4. plugins.slots.contextEngine binding to "gbrain-context" passes
       openclaw config validate
    5. openclaw plugins doctor surfaces zero errors for our plugin id
    6. Public-SDK round-trip: imports registerContextEngine from
       openclaw/plugin-sdk (resolved via realpathSync on the openclaw
       binary's symlink so it works for Homebrew, npm -g, nvm, asdf,
       volta installs uniformly), registers our factory, then exercises
       assemble() and asserts the Live Context block appears

  afterAll:
    - Uninstalls the plugin (best-effort) + rm -rf the isolated profile
      dir + the tempdir fixture

Fixture: test/fixtures/openclaw-plugin-real/ holds the manifest templates
(package.json.template + openclaw.plugin.json.template). The test writes
fresh copies into a per-run tempdir so the fixture itself stays read-only.

Selector map: scripts/e2e-test-map.ts now points BOTH source files
(src/core/context-engine.ts, src/openclaw-context-engine.ts) at BOTH the
mocked-SDK plugin-shape e2e AND this real-loader e2e. ci:local:diff fires
both on either change.

Verification:
  - bun test test/e2e/openclaw-plugin-load-real.test.ts → 6/6 pass
  - bun test test/context-engine.test.ts test/e2e/openclaw-context-engine-plugin.test.ts
    test/e2e/openclaw-plugin-load-real.test.ts → 32/32 pass total
  - bun run typecheck → exit 0
  - bun run verify → exit 0 (full chain green)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: garrytan-agents <garrytan-agents@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…re-up (garrytan#901)

* feat(eval-contradictions): types + pure helpers for v0.33.0 probe

Foundational module for the contradiction measurement probe (v0.33.0 plan).
Pure, hermetic, no engine or LLM dependencies. Sets the wire contract for
the rest of the implementation.

- types.ts: schema_version + PROMPT_VERSION + TRUNCATION_POLICY constants,
  ProbeReport + ContradictionPair + JudgeVerdict + cache/run row shapes.
- calibration.ts: Wilson 95% CI on the headline percentage with exact
  clamping at p=0 and p=1 (floating-point overshoot regression guard);
  small_sample_note when n<30.
- judge-errors.ts: first-class typed error collector (Codex fix — bias
  guard for the silent-skip-on-throw decision); classifier maps to
  parse_fail/refusal/timeout/http_5xx/unknown.
- severity-classify.ts: parseSeverity defaults to 'low' on garbage input;
  bucketBySeverity + buildHotPages (descending rank + tie-break by severity).
- date-filter.ts: three-rule A1 pre-filter — same-paragraph-dual-date
  beats the separation rule (flip-flop case); missing dates falls through
  to the judge; only "both explicit AND >30d apart" actually skips.

51 hermetic tests across the four pure modules; typecheck clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(eval-contradictions): schema migrations + engine methods (v0.33.0)

Adds the persistent surface the contradiction probe needs: two new tables
plus five BrainEngine methods, mirrored cleanly across PGLite + Postgres.

Migrations v51 + v52 (idempotent on both engines):
  - eval_contradictions_cache: composite PK on (chunk_a_hash, chunk_b_hash,
    model_id, prompt_version, truncation_policy) per Codex outside-voice
    fix; verdict JSONB; expires_at-driven TTL.
  - eval_contradictions_runs: one row per probe run; Wilson CI bounds,
    judge-error totals, source-tier breakdown, full report_json.

Engine methods (interface + 2 impls each):
  - listActiveTakesForPages(pageIds, opts): P1 batched per-page fetch.
    Single WHERE page_id = ANY($1) AND active = true; replaces the O(K)
    loop the probe would otherwise pay per query.
  - writeContradictionsRun(row): M5 time-series insert; idempotent on
    run_id via ON CONFLICT DO NOTHING.
  - loadContradictionsTrend(days): M5 history read, newest first.
  - getContradictionCacheEntry(key): P2 cache lookup; 5-component key
    includes prompt_version + truncation_policy.
  - putContradictionCacheEntry(opts): cache upsert with TTL refresh.
  - sweepContradictionCache(): periodic expired-row purge.

JSONB writes use sql.json() on Postgres (matches existing eval_takes_quality
+ raw_data patterns; not the literal-template-tag pattern banned by
scripts/check-jsonb-pattern.sh). PGLite uses $N::jsonb positional binds.

17 hermetic tests on PGLite cover P1 (4 cases: empty, grouped, supersede-
excludes, holder-allow-list), M5 (5 cases: write+read, idempotent run_id,
newest-first, days-window, JSONB round-trip), P2 (6 cases: miss, put-get,
prompt-version differs, truncation differs, upsert refreshes, sweep
deletes expired). Existing 109 migrate + bootstrap tests still green.

Schema mirror in pglite-schema.ts; source.sql regenerated to schema-embedded.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(eval-contradictions): cross-source + cost-tracker + cache wrappers

Three pure-orchestration modules between the engine surface and the
runner. Each is independently testable; the cache wrapper does hit the
PGLite engine end-to-end since its job is to round-trip through P2.

- cross-source.ts (M6): classifySlugTier maps a slug to curated/bulk/other
  using DEFAULT_SOURCE_BOOSTS (boost > 1.05 = curated, < 0.95 = bulk).
  buildSourceTierBreakdown produces the {curated_vs_curated,
  curated_vs_bulk, bulk_vs_bulk, other} counts; order-independent on
  the pair members.

- cost-tracker.ts (A2 + P3): estimateUpperBoundCost for pre-flight refuse.
  CostTracker records judge calls (per-token-pricing per model) AND
  embedding calls (Codex P3 fix). Soft-ceiling semantics documented
  in the estimate_note string surfaced in the final report (Codex
  caveat: "hard ceiling" was overclaimed for token estimates).
  Anthropic + OpenAI pricing baked in; unknown models fall back to
  Haiku rates.

- cache.ts (P2 wrapper): hashContent (sha256), buildCacheKey with
  lex-sorted (a, b) so verdicts are order-independent and key bakes in
  PROMPT_VERSION + TRUNCATION_POLICY (Codex outside-voice fix). JudgeCache
  class tracks hits/misses for the run report. Shape validation guards
  against corrupt rows: a cache row that doesn't parse as JudgeVerdict
  treats as a miss instead of crashing downstream.

40 hermetic tests across the three modules. Cache tests hit PGLite for
real round-trip coverage of the new engine methods committed in C2.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(eval-contradictions): judge + auto-supersession + fixture-redact

Three modules that together turn an LLM into a contradiction probe and
its output into actionable resolutions.

- judge.ts: judgeContradiction() is the single LLM call. Query-conditioned
  prompt (Codex outside-voice fix — the judge sees what the user asked).
  Holder context for take pairs (C3). UTF-8-safe truncation at maxPairChars
  (default 1500, --max-pair-chars overridable; C4 wire-up). C1
  double-enforcement: orchestrator filters contradicts:true with confidence
  < 0.7 to false regardless of prompt rules. parseJudgeJSON is a 3-strategy
  generic parser (direct → fence-strip → trailing-comma + quote + first-{}
  extraction) — we don't reuse parseModelJSON because that's shape-locked
  to cross-modal-eval's scores payload. Refusal detection via stopReason
  AND text-pattern fallback. chatFn injection for hermetic tests.

- auto-supersession.ts (M7): proposeResolution classifies each pair into
  takes_supersede / dream_synthesize / takes_mark_debate / manual_review
  and emits a paste-ready CLI command. Judge's hint wins on cross-slug
  pairs (it has semantic context); structural fallback prefers
  dream_synthesize when either side is a curated entity slug
  (companies/, people/, deals/, projects/). pairToFinding merges a pair +
  verdict into a ContradictionFinding.

- fixture-redact.ts (T2): privacy-redacted pass for the gold fixture
  build. Layers PII scrubber (v0.25.0 eval-capture-scrub) + slug rewrites
  (people/<name> → people/alice-example, deterministic per session) +
  capitalized firstname-lastname detection + monetary obfuscation
  (multiply revenues by session salt to preserve magnitude shape).
  isCleanForCommit is the pre-commit safety net: blocks if any raw name
  or email shape survives. Audit trail records every redaction made.

60 hermetic tests. Judge tests use direct chatFn stub (cleaner than
module-level transport seam for one-shot wrapper).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(eval-contradictions): trends + runner orchestrator (v0.33.0)

The heart of the probe — runner.ts ties every prior module together,
trends.ts writes one row per run to eval_contradictions_runs and produces
the trend chart for the CLI `trend` sub-subcommand.

runner.ts:
  - Pair generation: cross-slug across top-K results (same-slug skipped)
    + intra-page chunk-vs-take via P1 batched listActiveTakesForPages.
  - A1 date pre-filter wired: pairs separated by >30 days skip without
    judge calls; same-paragraph-dual-date overrides separation rule
    (flip-flop case sees the judge).
  - A3 deterministic sampling: combined_score DESC, slug-lex tiebreaker,
    stable across re-runs.
  - A2 soft budget ceiling: pre-flight estimate refuses without --yes;
    mid-run cumulative cost stops the run and emits a partial report.
  - P2 cache integration: lookup before judge call, store after; hit/miss
    counters drive the cache stats block in the report.
  - C2 first-class judge_errors: every throw counted via the typed
    collector, surfaced in report.judge_errors with the no-silent-skip
    `note` field.
  - Wilson CI on the headline percentage; small_sample_note when n<30.
  - source_tier_breakdown + hot_pages aggregated across all findings.
  - AbortSignal propagation for cancellation mid-run.
  - PreFlightBudgetError exported as a discriminable rejection class.
  - Hermetic via judgeFn + searchFn dependency injection — runner tests
    stub both without ever touching the real gateway or hybridSearch.

trends.ts:
  - writeRunRow flattens a ProbeReport into the eval_contradictions_runs
    row shape, including Wilson CI bounds + duration_ms.
  - loadTrend reads back as typed TrendRow[].
  - renderTrendChart produces a fixed-width ASCII bar chart; empty input
    prints a friendly message naming the command to populate runs.

41 new hermetic tests on PGLite (15 trends, 26 runner). Full
eval-contradictions suite at 194/194 across 13 files.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(eval-contradictions): CLI + eval dispatch + mini fixture (v0.33.0)

User-facing surface: `gbrain eval suspected-contradictions [run|trend|review]`.
Engine-required sub-subcommand, dispatched via the existing eval.ts pattern
(matches `replay`).

Run mode:
  --queries-file FILE | --query "..." | --from-capture  (mutually exclusive)
  --top-k N=5  --judge MODEL=claude-haiku-4-5  --limit N
  --budget-usd N (default $5 TTY / $1 non-TTY) --yes
  --output FILE  --max-pair-chars N=1500
  --sampling deterministic|score-first  --no-cache  --refresh-cache  --json

Trend mode: --days N=30 [--json]
Review mode: --severity low|medium|high  --since YYYY-MM-DD

A4 wired: --from-capture detects empty eval_candidates and exits 2 with
hint naming GBRAIN_CONTRIBUTOR_MODE=1 / eval.capture config key.

Human summary on stderr always prints Wilson CI band, judge_errors counts
broken out by class, cache hit-rate, source-tier breakdown, hot pages.
Partial-report warning when mid-run budget cap fires.

Run-row persistence (M5) writes to eval_contradictions_runs every successful
run; subsequent `trend` and `review` invocations read from there.

PreFlightBudgetError surfaces as exit 1 with the calculated estimate + cap
in the message — operators see the exact number to pass to --budget-usd
or override with --yes.

TrendRow type extended with report_json so `review` can fetch the latest
run's findings without a second query.

test/fixtures/contradictions-mini.jsonl: 5 redacted queries for CLI smoke.

Full eval-contradictions suite: 194 hermetic tests across 13 files. Real-
brain CLI smoke covered by the E2E in commit 9.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(eval-contradictions): doctor + MCP + synthesize integrations (M1+M2+M3)

Three thin wire-ups that turn the probe's output into action surfaces:

M1 (doctor): src/commands/doctor.ts adds a `contradictions` check after
the eval_capture check. Reads loadContradictionsTrend(7), surfaces the
latest run's headline + severity breakdown + Wilson CI band + first 3
high-severity findings with paste-ready resolution commands. ok status
when no runs exist or no findings; warn when high-severity > 0. Graceful
skip when the table doesn't exist yet (pre-migration brain).

M3 (MCP): src/core/operations.ts adds `find_contradictions` op (scope:
read, NOT localOnly — agent-callable over HTTP MCP). Params: slug
(substring match), severity (low|medium|high), limit. Reads
loadContradictionsTrend(30), returns the latest run's findings filtered.
NOT in the subagent allowlist by design — user-initiated only, not
autonomous-action surface. New FIND_CONTRADICTIONS_DESCRIPTION constant
in operations-descriptions.ts.

M2 (synthesize): src/core/cycle/synthesize.ts pre-fetches the latest
probe findings once at phase start (loadPriorContradictionsBlock helper)
and threads up to 5 highest-severity items into buildSynthesisPrompt as
an informational block. Subagent sees what to reconcile when writing
compiled_truth to flagged slugs. Empty trend yields empty block (existing
behavior unchanged on fresh installs). Try/catch around the engine call
keeps synthesize robust even when the contradiction tables don't exist
yet.

11 new hermetic tests for the MCP op (registry presence, scope, empty
case, slug+severity+limit filters) and the M1/M2 data-shape contracts
(end-to-end runDoctor coverage deferred to commit 9's E2E because doctor
calls process.exit).

Full eval-contradictions suite: 226/226 across 15 test files.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(eval-contradictions): build-contradictions-fixture script (T2)

Local-only operator script for building the privacy-redacted gold fixture
used by the precision/recall test (deferred to v0.34 when probe data
informs the labeling). Runs against the user's REAL brain via the local
gbrain engine config; never auto-run in CI.

Flow:
  1. Read --queries-file (JSONL); spin up engine via loadConfig +
     toEngineConfig + createEngine + connectWithRetry.
  2. Run the contradiction probe with --no-cache and a stubbed judgeFn
     that captures candidate pairs without spending tokens.
  3. Interactive prompts (skipped under --non-interactive): for each
     candidate, the operator labels y/n/skip + severity + axis.
  4. Apply the v0.33.0 fixture-redact passes (slug rewrite, name
     placeholders, monetary obfuscation, PII scrubber).
  5. Pre-commit safety gate: every text field passes isCleanForCommit;
     anything that fails gets a [REDACT?] sentinel + an _operator_review
     marker on the JSONL line, and the script exits 1 so the operator
     can't accidentally commit unredacted output.

Audit comment block at the top of the JSONL records every redaction
the session made (slug→placeholder, name→placeholder, monetary
multiplication) so reviewers can see what was changed.

Usage:
  bun run scripts/build-contradictions-fixture.ts \\
    --queries-file FILE.jsonl \\
    [--top-k N] [--judge MODEL] [--max-pairs N] [--output PATH] \\
    [--non-interactive]

Output defaults to test/fixtures/contradictions-eval-gold.jsonl.

Typecheck clean; redactor + isCleanForCommit guard tested separately
in test/eval-contradictions-fixture-redact.test.ts.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(e2e): real-Postgres E2E for contradiction probe (v0.33.0, T1)

Required-on-DATABASE_URL E2E covering Postgres-specific behavior that
PGLite can't exercise. Six surface areas, 12 cases total. All pass on
fresh pgvector/pgvector:pg16:

1. Migrations v51 + v52 apply cleanly; both tables exist in
   information_schema; Wilson CI columns are REAL; composite PK on
   eval_contradictions_cache includes prompt_version + truncation_policy
   (Codex outside-voice fix pinned at the schema level).

2. JSONB round-trip on Postgres: writeContradictionsRun + loadTrend
   preserves nested object shapes (regression guard against the v0.12
   double-encode bug class). Confirmed via jsonb_typeof = 'object', not
   'string'.

3. P2 cache with real now(): lookup/upsert round-trip, expired rows
   hidden from lookup, sweepContradictionCache deletes them, and
   different prompt_version is a separate cache key.

4. M5 trend semantics: TIMESTAMPTZ ordering DESC is stable on real PG;
   days-window filter via ran_at >= cutoff correctly excludes/includes
   backdated rows.

5. find_contradictions MCP op end-to-end: empty case returns "No probe
   runs" note; populated case returns latest run findings with slug
   substring + severity filters applied.

Verified locally against pgvector:pg16 on port 5434 — all 12 cases pass.
Skips gracefully when DATABASE_URL is unset per gbrain E2E convention.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* v0.33.0 feat: brain-consistency probe + doctor + MCP + dream-cycle wire-up

VERSION 0.32.0 → 0.33.0. package.json + CHANGELOG.md + llms-full.txt synced.

Headline: gbrain learns to detect its own integrity drift.

  - new command: gbrain eval suspected-contradictions [run|trend|review]
  - new MCP op: find_contradictions(slug?, severity?, limit?)
  - new doctor check: contradictions (paste-ready resolution commands)
  - new dream-cycle hook: synthesize reads prior contradictions per slug
  - new schema: v51 (eval_contradictions_cache) + v52 (eval_contradictions_runs)
  - 6 new engine methods (listActiveTakesForPages, write/load run, P2 cache trio)

Codex outside-voice review folded in:
  - Command name "suspected-contradictions" (was "contradictions" — describes
    what the tool actually does, not what it pretends to evaluate)
  - judge_errors first-class output (not silent stderr — biased denominator)
  - prompt_version + truncation_policy in cache key (prompt edits cleanly
    invalidate prior verdicts)
  - Wilson 95% CI on headline % + small_sample_note when n<30
  - Query-conditioned judge prompt (sees user's query, not just two chunks)
  - Deterministic sampling for prevalence metric (stable cache hit-rate)

Decision criterion for the bigger swing (chunk-level revises field):
  Wilson CI lower-bound:
    <5%  → source-boost + recency-decay + curated pages handle the load
    5-15% → operator's call
    >15% → plan for v0.34+

New docs:
  - docs/contradictions.md (architecture, severity rubric, action criteria)
  - docs/eval-bench.md extended (nightly cadence + trend workflow)
  - skills/migrations/v0.33.0.md (post-upgrade agent instructions)

Full test suite green at the cut:
  - 226 hermetic unit tests across 15 files (eval-contradictions-*)
  - 12 real-Postgres E2E (DATABASE_URL=...; verified locally on pgvector:pg16)
  - typecheck clean
  - build:llms regenerated and the test/build-llms.test.ts gate passes

Plan reference:
  ~/.claude/plans/system-instruction-you-are-working-hashed-dewdrop.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: regen llms-full.txt for v0.32.6 rename

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…sew + 313094319-sudo PRs) (garrytan#898)

* feat: shared CJK detection module (cjk.ts)

Foundation for the CJK fix wave. Single source of truth for CJK ranges
(Han, Hiragana, Katakana, Hangul Syllables), the slug-char string used
by adjacent validators, sentence + clause delimiter sets, the 30%
density threshold for word counting, and a LIKE-pattern escape helper.

Replaces the inline hasCJK regex at expansion.ts:58 so four-place
drift becomes impossible. countCJKAwareWords uses density threshold
(per codex outside-voice C13) so a long English doc with one Japanese
term stays whitespace-tokenized, not char-split.

Co-Authored-By: vinsew <vinsew@users.noreply.github.com>

* feat: migration v51 + pages.chunker_version/source_path columns

Schema-level support for the v0.32.7 CJK wave. Two new columns on pages:

  - chunker_version SMALLINT NOT NULL DEFAULT 1 — bumped to
    MARKDOWN_CHUNKER_VERSION (2) on every new import. The post-upgrade
    gbrain reindex --markdown sweep walks chunker_version < 2 to find
    pre-bump rows and rebuilds them.

  - source_path TEXT — captures the repo-relative path at import time
    so sync's delete/rename code can resolve frontmatter-fallback
    slugs (CJK / emoji / exotic-script files where the path itself
    doesn't derive a slug).

Both columns plumbed through PageInput, partial indexes scoped to
markdown-only / non-null. PGLite + Postgres parity via the standard
ALTER TABLE ... IF NOT EXISTS shape.

Replaces the original PR garrytan#599 plan of folding MARKDOWN_CHUNKER_VERSION
into content_hash. Codex outside-voice C2 caught that as a no-op:
performSync gates on actual file change, not hash-would-differ, so
the fold never reached existing pages. Column + sweep is the real fix.

Co-Authored-By: vinsew <vinsew@users.noreply.github.com>

* feat: CJK-aware slugify + SLUG_SEGMENT_PATTERN + adjacent validators

slugifySegment now preserves Han / Hiragana / Katakana / Hangul Syllables
with NFC re-normalization after the NFD-strip-accents pass so Hangul
Jamo recomposes back into precomposed syllables that fall inside the
whitelist. café still slugifies to cafe (regression preserved — iron
rule).

SLUG_SEGMENT_PATTERN (consumed by takes-holder validation) extended
with CJK_SLUG_CHARS in the same commit so CJK slugs aren't rejected by
adjacent validators downstream. Codex outside-voice C4 caught this
exact half-fix in the original plan — leaving the pattern ASCII-only
would have shipped a feature where the slugify produced 品牌圣经 but
adjacent validators flagged it.

src/core/operations.ts: validatePageSlug + validateFilename also
extended with CJK ranges. matchesSlugAllowList is unchanged (works on
string prefixes, no character class).

Co-Authored-By: vinsew <vinsew@users.noreply.github.com>

* feat: recursive chunker — MARKDOWN_CHUNKER_VERSION + CJK splitting + maxChars cap

Four coordinated chunker changes for the v0.32.7 wave:

  - MARKDOWN_CHUNKER_VERSION = 2 exported. Folded into pages.chunker_version
    so the post-upgrade reindex sweep can find pre-bump pages.

  - countWords delegated to countCJKAwareWords from cjk.ts (30% density
    threshold). Below threshold: whitespace-token count (English-dominant
    docs stay tokenized). At/above: char count (Chinese paragraphs actually
    split instead of being treated as one 8192-token-overflowing word).

  - DELIMITERS extends L2 (sentences) with 。!? and L3 (clauses) with
    ;:,、. CJK punctuation now produces real chunk boundaries.

  - maxChars hard cap (default 6000) with sliding-window splitByChars and
    500-char overlap. Catches pathological whitespace-less inputs that the
    word-level pipeline can't bound (pure-Han paragraphs, base64 blobs,
    long URLs). Applied to both single-short-chunk and merged-chunks
    paths.

  - splitOnWhitespace falls through to char-slice when ANY single "word"
    exceeds target chars (the greedy /\S+/g regex returns a whole CJK
    paragraph as one "word"; without this, the L4 fallback produces one
    huge piece). Pre-fix this was the silent-failure path.

Tests in test/chunkers/recursive.test.ts: 9 new cases — pure Chinese,
Japanese + 。, Korean Hangul, mixed CJK+English, 20KB CJK with overlap,
single-short-chunk maxChars edge, pure-English regression.

Co-Authored-By: vinsew <vinsew@users.noreply.github.com>

* feat: PGLite CJK keyword fallback + engine chunker_version/source_path passthrough

PGLite uses websearch_to_tsquery('english') over to_tsvector('english'),
which can't tokenize CJK. Pre-fix, CJK queries returned empty results
on PGLite brains even with proper embeddings.

searchKeyword + searchKeywordChunks now branch on hasCJK(query):

  - ASCII path: unchanged. websearch_to_tsquery('english') continues
    to drive FTS. No regression risk.

  - CJK path: switches to ILIKE '%' || $qLike || '%' ESCAPE '\\' over
    chunk_text with two distinct param bindings ($qLike escaped for
    the ILIKE clause, $qRaw raw for the ranking arithmetic). Empty
    $qRaw guard bails before binding. Bigram-frequency-count ranking
    via (LENGTH(chunk_text) - LENGTH(REPLACE(chunk_text, $qRaw, ''))) /
    LENGTH($qRaw) approximates ts_rank semantics; position-in-chunk
    tiebreaker so earlier matches outrank later ones at the same
    occurrence count.

Codex outside-voice C8 caught the original plan's one-param shortcut
(escaped chars can't be reused as ranking substrings) + missing
ESCAPE clause + asymmetric whitespace strip. C9 corrected the FTS
dialect (websearch_to_tsquery, not to_tsvector('simple')).

Source-boost CASE, hard-exclude clause, visibility clause, and the
DISTINCT ON (slug) page-dedup all survive on both branches. Postgres
engine path stays untouched (multi-tenant Postgres deployments can
install pgroonga / zhparser for CJK; out of scope for this wave).

Postgres + PGLite putPage both extended to write chunker_version
and source_path columns (with COALESCE(EXCLUDED.x, pages.x) so
auto-link / code-reindex callers that don't supply them don't blank
existing values).

Tests: 8 new cases covering Chinese / Japanese / Korean substring
search, bigram ranking (3-hit > 1-hit), LIKE-meta-char escape
(literal % does not wildcard), English query stays on FTS path.

Co-Authored-By: vinsew <vinsew@users.noreply.github.com>
Co-Authored-By: 313094319-sudo <313094319-sudo@users.noreply.github.com>

* feat: import-file frontmatter-slug fallback + audit JSONL

importFromFile gains a fallback branch: when slugifyPath returns
empty (emoji / Thai / Arabic / exotic-script filename — including
post-CJK-wave files that still don't slugify) AND the frontmatter
declares a slug, the frontmatter slug becomes authoritative.

Anti-spoof rule preserved unchanged: when slugifyPath produces a
non-empty path slug AND the frontmatter slug claims a different one,
the file is still rejected. notes/random.md cannot impersonate
people/elon via frontmatter.

D6=B error string when both path slug AND frontmatter slug are empty:
"Filename produces no usable slug. Add a 'slug:' to the frontmatter,
or rename the file to use ASCII / Chinese / Japanese / Korean
characters." Honest about the actually-supported scripts.

Every import now populates pages.chunker_version (set to
MARKDOWN_CHUNKER_VERSION) and pages.source_path (repo-relative). These
drive the post-upgrade reindex sweep + sync's delete/rename slug
resolution.

NEW src/core/audit-slug-fallback.ts — weekly ISO-week-rotated JSONL
at ~/.gbrain/audit/slug-fallback-YYYY-Www.jsonl. Per codex C7, info
events don't belong in sync-failures.jsonl (which gates bookmark
advancement); separate audit surface keeps the failure-handling code
unchanged. logSlugFallback emits a stderr line AND appends to the
audit file (D7=D dual logging).

Tests: 5 new import-file cases (小米 with no frontmatter slug, 🚀.md
with frontmatter fallback, 🌟🚀.md friendly D6=B error, anti-spoof
regression, chunker_version + source_path populated). 6 new audit
cases covering write, weekly rotation, 7-day window, corrupt-row
tolerance.

Co-Authored-By: vinsew <vinsew@users.noreply.github.com>

* feat: git() helper hardening + core.quotepath=false for CJK paths

git CLI emits CJK paths as quoted octal escapes (\345\223\201 ...) by
default in diff --name-status output. Pre-fix, buildSyncManifest
silently dropped these paths because downstream filesystem lookups
saw the literal escape string. gbrain sync reported added=0 while
git had the file committed.

git() helper refactored:
  - New signature: git(repoPath, args: string[], configs?: string[])
  - Config flags emit BEFORE -C and BEFORE the subcommand (git CLI
    requires this order)
  - core.quotepath=false always prepended
  - Future callers needing extra -c config pass configs:[]; no more
    inlining -c into args (the silent-future-drift footgun codex C12
    flagged as a related concern)

New invariant test in test/sync.test.ts pins the emit order.

NEW test/e2e/sync-cjk-git.test.ts — real-git E2E in a tmpdir. Spawns
real git via execFileSync, commits a Chinese-named markdown file,
drives the helper through buildSyncManifest, asserts the manifest
contains the UTF-8 path (not the octal-escape form). Closes the
real-CLI-behavior gap that unit tests can't cover (the helper builds
the right args; only an E2E proves git actually emits UTF-8 under
the flag).

Co-Authored-By: vinsew <vinsew@users.noreply.github.com>

* feat: gbrain reindex --markdown sweep command

NEW src/commands/reindex.ts — operator-facing markdown re-chunk
sweep. Walks SELECT slug, source_path FROM pages WHERE
page_kind = 'markdown' AND chunker_version < MARKDOWN_CHUNKER_VERSION
in 100-row batches, ordered by id ASC so partial-completion re-runs
pick up where they left off.

For rows with non-null source_path: re-imports via importFromFile
when the file exists on disk. For rows without (legacy pre-migration
backfill): fallback to importFromContent using the stored markdown
body.

Flags: --markdown (target selector), --limit N, --dry-run, --json,
--no-embed (offline / CI / test path that lets the chunker run
without a configured AI gateway), --repo PATH.

Wired into src/cli.ts dispatch table. Will also be invoked
automatically by gbrain upgrade's post-upgrade hook (next commit) so
chunker-version bumps reach existing markdown pages without an
explicit operator action.

Tests in test/reindex.test.ts: 5 cases covering dry-run, actual
sweep, idempotent re-run, --limit cap, skipped-already-at-current.

Co-Authored-By: vinsew <vinsew@users.noreply.github.com>
Co-Authored-By: 313094319-sudo <313094319-sudo@users.noreply.github.com>

* feat: post-upgrade chunker-bump cost prompt + auto-reindex sweep

Wires the chunker-version bump into gbrain upgrade so existing brains
heal automatically. Three new pieces:

NEW src/core/embedding-pricing.ts — EMBEDDING_PRICING map keyed
provider:model (OpenAI text-embedding-3-large + 3-small + ada-002,
Voyage 3-large + 3). lookupEmbeddingPrice returns 'known' or
'unknown' shape so the cost-estimate prompt can degrade gracefully
for unknown providers rather than fabricate numbers (codex C3).
estimateCostFromChars uses 3.5 chars/token approximation.

NEW src/core/post-upgrade-reembed.ts — pure-ish functions for the
cost-estimate prompt:
  - computeReembedEstimate: real SQL against
    COUNT(*) + COALESCE(SUM(LENGTH(compiled_truth)) + SUM(LENGTH(timeline))
    on the chunker_version-filtered query. No phantom markdown_body
    column (codex C3 caught the original plan referencing nonexistent
    schema fields).
  - formatReembedPrompt: pure string formatter for the stderr line.
  - runPostUpgradeReembedPrompt: orchestrates the prompt + 10-second
    Ctrl-C window. TTY-only wait so non-TTY upgrades (CI, cron-driven,
    headless) don't hang. GBRAIN_NO_REEMBED=1 bails out entirely
    with a doctor-warning marker; GBRAIN_REEMBED_GRACE_SECONDS=0
    skips the wait.

src/commands/upgrade.ts: after apply-migrations runs, the new prompt
fires through the gateway's configured embedding model, then invokes
gbrain reindex --markdown automatically if the user proceeds.
Wrapped in try-catch so a reindex failure is non-fatal — the user
can re-run manually.

Tests in test/upgrade-reembed-prompt.test.ts: 11 cases covering real
SQL counts, unknown-provider fallback, TTY / non-TTY paths,
GBRAIN_NO_REEMBED bail-out, GBRAIN_REEMBED_GRACE_SECONDS=0 skip-wait.

Codex outside-voice C2 caught the original plan as a no-op
(performSync doesn't re-import unchanged files just because
content_hash would differ). The migration v51 column + this sweep
+ this prompt is the real fix that actually reaches existing pages.

Co-Authored-By: vinsew <vinsew@users.noreply.github.com>

* feat: doctor slug_fallback_audit check + CJK roundtrip E2E

gbrain doctor learns a new slug_fallback_audit check (v0.32.7).
Reads the latest week of ~/.gbrain/audit/slug-fallback-*.jsonl,
counts info-severity entries from the last 7 days, surfaces the
total as an ok-status line. No health-score docking; no warning.

sync-failures.jsonl (which gates bookmark advancement) stays
untouched — info events live in their own surface per codex C7.

NEW test/e2e/cjk-roundtrip.test.ts — proves the wave delivers end-
to-end. PGLite-in-memory fixture with Chinese / Japanese / Korean
content. Each page: importFromContent → chunkText (CJK-aware) →
searchKeyword (LIKE-branch with bigram count). Asserts every CJK
query lands on its source page. ASCII regression: an English query
still uses the FTS path on the same brain. Vector path skips
gracefully without OPENAI_API_KEY.

Co-Authored-By: vinsew <vinsew@users.noreply.github.com>

* chore: bump version and changelog (v0.32.7)

CJK fix wave — six layers from one root cause. Three originating PRs
from @vinsew and one extracted from @313094319-sudo's garrytan#765 land
together as a coherent collector. Codex outside-voice review on the
plan caught four critical bugs the eng review missed (no-op
re-embed, SLUG_SEGMENT_PATTERN half-fix, LIKE SQL needing two
distinct param bindings, countCJKAwareWords over-splitting on
English+1-CJK-term docs). All four addressed in the implementation.

TODOS.md: resolved the v0.32.x PGLite CJK keyword fallback entry;
filed five v0.33+ follow-ups (Postgres CJK FTS via pgroonga / wider
Unicode property escapes / -z NUL git framing / CJK overlap context /
other non-Latin scripts / embedding pricing refresh mechanism).

Co-Authored-By: vinsew <vinsew@users.noreply.github.com>
Co-Authored-By: 313094319-sudo <313094319-sudo@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix: review findings — forceRechunk + source_path lookup (codex post-merge)

Two critical issues caught by codex adversarial on the post-merge tree:

F1 — Reindex sweep was a no-op on unchanged-source pages. importFromContent
short-circuits on existing.content_hash === hash BEFORE the chunker runs,
so the v0.32.7 MARKDOWN_CHUNKER_VERSION bump (and master's v0.32.2
stripFactsFence privacy strip) never reached pages whose markdown body
hadn't been edited.

Fix: new `forceRechunk?: boolean` option on importFromContent + importFromFile.
When set, the hash short-circuit is bypassed and the page re-runs the full
chunk + write pipeline. `gbrain reindex --markdown` now passes forceRechunk:
true on every row. This means:
  - The CJK chunker bump actually reaches existing markdown pages.
  - Master's v0.32.2 stripFactsFence applies retroactively too — any
    pre-strip private fact bytes lingering in content_chunks get cleared
    when the v0.32.7 post-upgrade sweep runs.

New test in test/reindex.test.ts seeds a page, runs the sweep, mocks a
stale chunker_version=1 without changing compiled_truth, runs the sweep
again, asserts chunker_version is bumped despite hash match.

F4 — Sync delete/rename still used resolveSlugForPath(path) only, ignoring
the new pages.source_path column added in v52. Frontmatter-fallback pages
(emoji-only / Thai / Arabic filenames where slugifyPath returns empty and
the slug came from the markdown frontmatter) would orphan on delete or
rename because the path-derived slug doesn't match the stored slug.

Fix: new exported helper resolveSlugByPathOrSourcePath(engine, path,
sourceId?) queries pages.source_path first, falls back to
resolveSlugForPath when no row matches. Threaded into 3 call sites in
sync.ts (un-syncable modified cleanup at :531, deletes at :603, rename
oldSlug at :622). Best-effort: query errors fall through to the legacy
path so pre-migration brains still work.

3 new test cases in test/sync.test.ts cover: stored-slug lookup hits,
fallback when no source_path row exists, and source_id scoping when two
sources have the same source_path value.

Codex finding #3 (reindex not in CLI_ONLY) was verified as a false
positive — CLI_ONLY is the set that doesn't need an engine; reindex
correctly belongs to the engine-backed dispatch.

302 wave tests pass / 0 fail. bun run verify green.

* docs: update CLAUDE.md + llms-full.txt for v0.32.7 CJK fix wave

CLAUDE.md Key Files: added entries for the five new modules introduced by
the wave — src/core/cjk.ts (shared detection + delimiters + density
threshold), src/core/audit-slug-fallback.ts (weekly JSONL),
src/core/embedding-pricing.ts (post-upgrade cost lookup table),
src/core/post-upgrade-reembed.ts (prompt + grace window), and
src/commands/reindex.ts (chunker_version sweep with forceRechunk).

Also noted src/commands/sync.ts:resolveSlugByPathOrSourcePath — the
F4 codex post-merge fix that wires the new pages.source_path column
into sync delete/rename so frontmatter-fallback pages don't orphan.

CLAUDE.md Commands: added a v0.32.7 section covering `gbrain reindex
--markdown`, the new doctor slug_fallback_audit check, PGLite CJK
keyword fallback in `gbrain search`, and the post-upgrade
chunker-bump cost prompt with its env-var overrides.

llms-full.txt: regenerated via bun run build:llms (CI gate runs the
generator on every release; commit must include the bundle).

README.md: no changes needed — v0.32.7 is internal correctness
across the existing pipeline, not a new skill or setup story.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: vinsew <vinsew@users.noreply.github.com>
Co-authored-by: 313094319-sudo <313094319-sudo@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
…akes, patterns, integrity, migrate-engine (garrytan#860)

* fix: thread source_id through embed --stale to fix silent discard of non-default source embeddings

listStaleChunks correctly finds chunks across all sources, but
embedOneSlug called getChunks(slug) and upsertChunks(slug, merged)
without passing sourceId. Both default to source_id='default', so
for non-default sources (e.g. media-corpus):

1. getChunks returns empty (wrong source)
2. merged array has no existing chunks to merge into
3. upsertChunks writes nothing (or errors silently)
4. Embeddings generated by the API are silently discarded

Fix:
- Add source_id to StaleChunkRow type
- Add p.source_id to listStaleChunks SQL in both postgres + pglite engines
- Extract sourceId from stale row in embed command
- Pass { sourceId } to getChunks and upsertChunks
- Group stale chunks by composite key (source_id::slug) instead of bare slug
  to handle same-slug pages across multiple sources

Verified: 97 chunks embedded across 35 pages in first run after fix.
Previously 0 non-default-source chunks were embedded across 3 full runs.

* fix: comprehensive multi-source threading for embed, listPages, and migrate-engine

Multi-source brains (e.g. with a 'media-corpus' source alongside
'default') have a pervasive bug: operations that iterate pages across
all sources then call engine methods (getChunks, upsertChunks,
getChunksWithEmbeddings) without passing sourceId. These methods all
default to source_id='default', silently operating on the wrong page
(or no page at all) for non-default sources.

Changes:

1. Page type + rowToPage: add optional source_id field so downstream
   callers can read the source from page objects returned by listPages.

2. PageFilters: add sourceId filter so listPages can scope to a single
   source (used by embed --source and future extract --source).

3. listPages (postgres + pglite): wire the sourceId filter into SQL.

4. embed command — three paths fixed:
   a. embedPage (single-slug): accepts sourceId, threads to getPage +
      getChunks + upsertChunks.
   b. embedAll (--all): reads page.source_id from listPages results,
      threads to getChunks + upsertChunks per page.
   c. embedAllStale (--stale): reads source_id from StaleChunkRow,
      groups by composite key (source_id::slug) instead of bare slug,
      threads to getChunks + upsertChunks per key.

5. embed CLI: add --source <id> flag, threaded through all paths.

6. migrate-engine: thread page.source_id through
   getChunksWithEmbeddings + upsertChunks so engine migrations don't
   lose non-default-source chunks.

7. getChunksWithEmbeddings (postgres + pglite + BrainEngine interface):
   accept optional { sourceId } to scope the chunk lookup.

8. StaleChunkRow type: add source_id field.

9. listStaleChunks SQL (postgres + pglite): add p.source_id to SELECT.

Verified: embed --stale correctly embeds 97 chunks across 35 pages
(previously 0 non-default-source chunks across 3 full runs).
embed --source media-corpus --dry-run correctly scopes to that source.

* v0.32.4 fix: multi-source threading for embed, listPages, and migrate-engine

Bump VERSION + package.json + CHANGELOG for the comprehensive multi-source
fix. Embed now threads source_id through every page → chunk handoff so
non-default sources stop silently dropping out (~22k chunks recovered on
the brain that surfaced this).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: complete slugs→keys rename in embedAllStale

The composite-key rename in the prior commit missed 4 references in the
worker loop and trailing console.log, so the file failed typecheck
(`Cannot find name 'slugs'`). The author's "Verified compiling + running"
claim was false at the time of the PR.

Also drop the dead `const bySlug = byKey` alias — unused after rename.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* ci: add check-source-id-projection.sh + fix getPage/putPage projections

Two SELECT projections fed `rowToPage` without including `source_id`:
- postgres-engine.ts:562 (getPage), :609 (putPage RETURNING)
- pglite-engine.ts:505 (getPage), :548 (putPage RETURNING)

After the type-tightening in the next commit makes `Page.source_id`
required, those projections would silently produce `Page` rows with
source_id=undefined while TypeScript claims `: string`. Codex's plan
review (F2) caught this; this commit closes it.

The new `scripts/check-source-id-projection.sh` greps for the rowToPage
feeder shape (`SELECT id, slug, type, title, ...`) and fails the build
if any projection lacks `source_id`. Wired into `bun run verify`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(engine): Page.source_id required + listAllPageRefs + validateSourceId

Three coordinated changes that unlock the Phase 3 bug-site fixes:

1. `Page.source_id` is now required (was optional, v0.31.12). The DB column
   is `NOT NULL DEFAULT 'default'` so every row has it; the type now matches.
   `rowToPage` always emits it (falls back to 'default' if a stale projection
   somehow misses the column, but `scripts/check-source-id-projection.sh` is
   the primary guard).

2. `BrainEngine.listAllPageRefs()` returns `Array<{slug, source_id}>` ordered
   by `(source_id, slug)`. Cheap cross-source enumeration for hot loops in
   extract-takes / extract / integrity that previously used
   `getAllSlugs() → getPage(slug)` (N+1 query AND silently defaulted to
   'default'). PGLite + Postgres parity.

3. `validateSourceId(id)` in utils. Allows `[a-z0-9_-]+` only. Used by the
   per-source disk-layout fix coming in Phase 3 before any
   `join(brainDir, source_id, ...)` call so source_id can't traverse out
   of brainDir.

Deferred to v0.33 follow-up:
- D2 strict tightening of BrainEngine slug-method signatures (the compile-
  time guard for "future getPage calls must pass sourceId")
- F3 OperationContext.sourceId required at MCP boundary
- F4 LinkBatchInput / TimelineBatchInput required source_id fields
- D6 forEachPage / listPagesAfter helpers (use listPages directly for now)

Those are nice-to-have guardrails for future regressions. Current commit's
correctness via D7 + listAllPageRefs is what blocks the Phase 3 bug-site
fixes from working multi-source.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: thread source_id through cycle phases, extract, integrity, migrate-engine

Five bug sites that previously called slug-only engine methods inside a
loop over pages, silently defaulting to source_id='default' for every
non-default-source page. Now all five use listAllPageRefs to enumerate
(slug, source_id) pairs and thread sourceId through to engine.getPage,
getTags, addLink, addTimelineEntry, getRawData, getVersions, etc.

Site-by-site:

- src/core/cycle/extract-takes.ts: listAllPageRefs replaces N+1
  getAllSlugs+getPage. Takes for non-default-source pages now extract.

- src/core/cycle/patterns.ts + synthesize.ts: reverseWriteSlugs renamed
  to reverseWriteRefs with Array<{slug, source_id}> contract. Disk
  layout (F6): non-default sources land at brainDir/.sources/<id>/<slug>.md
  so same-slug-different-source pages don't collide. Default-source
  pages stay at brainDir/<slug>.md so single-source brains see no
  change. source_id validated against [a-z0-9_-]+ at write time to
  prevent path traversal.

- src/commands/extract.ts: extractLinksFromDB + extractTimelineFromDB
  use listAllPageRefs. Cross-source link resolution rule (F10): origin's
  source wins, fall back to default, else skip (don't silently push a
  wrong-source edge). addLinksBatch / addTimelineEntriesBatch now fill
  from_source_id / to_source_id / origin_source_id / source_id so
  multi-source JOINs target the correct page row.

- src/commands/integrity.ts: same listAllPageRefs pattern in both the
  primary scan loop and the auto-repair loop.

- src/commands/migrate-engine.ts: end-to-end source_id threading
  (page + tags + timeline + raw + versions + links). Resume manifest
  keyed on `${source_id}::${slug}` so multi-source resumes don't
  collide on same-slug rows (pre-fix entries treated as default for
  back-compat).

test/cycle-synthesize-slug-collection.test.ts updated for the new
collectChildPutPageSlugs return shape (Array<{slug, source_id}>
instead of string[]).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(e2e): multi-source bug class regression + CHANGELOG + e2e-test-map wire-up

test/e2e/multi-source-bug-class.test.ts — 7-case PGLite regression suite
pinning every bug site fixed in this PR:
  - listAllPageRefs ordering by (source_id, slug) [F11]
  - getPage with sourceId picks the right (source, slug) row [F2]
  - extract-takes processes both alice pages independently
  - listPages filters correctly with PageFilters.sourceId
  - addLinksBatch with from/to_source_id targets the right rows [F4]
  - validateSourceId rejects path traversal [F6]
  - reverse-write disk layout uses .sources/<id>/<slug>.md [F6]

No DATABASE_URL needed (PGLite in-memory + canonical R3+R4 pattern).

Wire into scripts/e2e-test-map.ts so changes to any of the 6 touched
source files automatically trigger this test.

CHANGELOG expanded from the embed-only narrative to cover the full
bug-class extermination — extract, takes, patterns, integrity,
migrate-engine, plus the per-source disk layout, the CI gate, and
the new listAllPageRefs primitive. Voice: lead with what users can
DO that they couldn't before; real numbers from the production brain
that surfaced it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(integrity): batch path scans (source_id, slug) pairs too

The batch-load fast path in scanIntegrity used `SELECT DISTINCT ON (slug)`,
which silently collapsed multi-source duplicate slugs into a single scan —
the same bug class this PR fixes. test/e2e/integrity-batch.test.ts had a
case pinning the broken behavior ("scan once, not once-per-source") that
asserted batchResult.pagesScanned===1 for two real (source, slug) rows.

Switching the projection from `DISTINCT ON (slug)` to a plain `SELECT ...
ORDER BY source_id, slug` makes batch + sequential paths report the same
count (2) and matches the v0.32.4 listAllPageRefs walk.

Test renamed + assertion flipped to lock in the correct multi-source-aware
behavior: both paths now report 2, not 1.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: sync CLAUDE.md + llms bundles for v0.32.4

CLAUDE.md annotations updated on the 4 files that materially changed in
this PR's bug-class extermination:

- src/core/engine.ts — new listAllPageRefs() method
- src/core/utils.ts — new validateSourceId() helper + Page.source_id
  required field plumbing
- src/commands/integrity.ts — batch projection switched from DISTINCT ON
  (slug) to ORDER BY (source_id, slug) so multi-source scans aren't
  collapsed
- scripts/check-source-id-projection.sh (NEW entry) — CI guard against
  SELECT projections that drop source_id

Plus a new test inventory entry for test/e2e/multi-source-bug-class.test.ts
in the E2E section.

llms-full.txt regenerated per CLAUDE.md's iron rule. llms.txt is unchanged
(just an index).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: bump version slot v0.32.4 → v0.32.8

VERSION + package.json + CHANGELOG header only. Annotation
sweep across src/tests/scripts and the CLAUDE.md + llms bundle
regen land in the two follow-up commits so each step bisects
independently.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: retag v0.32.4 → v0.32.8 across src/scripts/tests

Inline "introduced in" annotations follow the version slot bump
in the prior commit. No behavior change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: retag CLAUDE.md v0.32.4 → v0.32.8 + regen llms-full.txt

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Merge remote-tracking branch 'origin/master' into fix/multi-source-threading

---------

Co-authored-by: Wintermute <wintermute@garrytan.com>
Co-authored-by: Garry Tan <garrytan@gmail.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…9 commands) (garrytan#879)

* feat(engine): add countUnconsolidatedFacts to BrainEngine + both engines

New `BrainEngine.countUnconsolidatedFacts(sourceId): Promise<number>` returns
the count of active + unconsolidated facts for a source. Single SQL:
COUNT(*) WHERE source_id = $1 AND consolidated_at IS NULL AND expired_at IS NULL.

Backs the v0.33 `gbrain recall --pending` flag and the `recall` MCP op's new
`include_pending` param. Source-scoped, no index needed (existing
facts(source_id) index covers the predicate).

* feat(recall): cursor state + recall rewrite + thin-client routing + watch loop

`gbrain recall` gains four new flags backed by a new cursor-state file:

- `--since-last-run` reads ~/.gbrain/recall-cursors/<source>.json. First run
  defaults to 24h. Cursor is T_start (captured BEFORE the read SQL), not
  T_finish, so facts inserted during render don't fall in a black hole
  (Codex round 1 #2).
- `--pending` appends a "Pending consolidation: N" footer. Backed by the
  new engine method; remote round-trips through one MCP call via the
  recall op's new `include_pending` param.
- `--rollup` prepends a "Top mentions" header — top-5 entities by fact
  count over the FULL result set, not a LIMIT slice (Codex round 1 garrytan#8).
  JSON shape `top_entities: [{entity_slug, count}]` matches the existing
  pinned key at test/facts-doctor-shape.test.ts:49.
- `--watch [SECONDS]` re-runs on interval. Default 60, range [1, 3600].
  TTY: clear-and-redraw. Non-TTY: plain `--- <ts> ---` delimited blocks.
  SIGINT-only clean exit. Per-tick try/catch + exponential backoff
  `min(SECONDS × 2^(N-1), 5×SECONDS)`; exit after 5 consecutive failures
  with briefing cursor NOT advanced. Watch uses a separate cursor file
  (<source>.watch.json) so operator quitting watch doesn't clobber the
  standalone briefing cursor (Codex round 2 garrytan#8).

Thin-client routing: runRecall + runForget mirror the salience.ts:80
pattern. On `gbrain init --mcp-only` installs the local engine call is
swapped for callRemoteTool('recall' | 'forget_fact', ...). The local
canonical source resolver's assertSourceExists check is skipped on
thin-client (empty local sources table); the kebab-case SOURCE_ID_RE
syntactic gate still runs locally. Fixes pre-existing silent-empty-results
on thin-client recall — the v0.31.1 wave missed it (Codex round 2 garrytan#6).

`recall` MCP op extended with optional `include_pending` param +
`pending_consolidation_count` output field. Backward-compatible.
No new MCP op. No schema migration.

State file uses atomic write via unique per-call tmp filename
(<source>.json.tmp.<pid>.<random>) + rename(2) (Codex round 1 garrytan#7).
Read returns null on missing/corrupt/future-shifted timestamps; caller
falls back to 24h.

* feat(thin-client): route jobs list/get + REFUSE 7 host-bound commands

Continues the v0.31.1 thin-client routing wave. v0.33 audit (Codex round 2
#4) source-grounded against operations.ts + each command file:

ROUTE additions (have MCP ops, mirror salience.ts:80 pattern):
- `gbrain jobs list` → callRemoteTool('list_jobs', ...)
- `gbrain jobs get <id>` → callRemoteTool('get_job', ...)
  Other jobs subcommands (submit, cancel, retry, work, supervisor, prune,
  stats, smoke) stay host-bound — they manage local queue state.

REFUSE additions to cli.ts THIN_CLIENT_REFUSED_COMMANDS + matching hints
in THIN_CLIENT_REFUSE_HINTS:
- `pages` — purge-deleted is admin+localOnly (operations.ts:856-864)
- `files` — file_list / file_url MCP ops are localOnly:true
- `eval` — export/prune/replay touch local engine; no MCP equivalent
- `code-def` / `code-refs` / `code-callers` / `code-callees` — NO MCP ops
  exist for symbol lookup in operations.ts:2630-2671; deferred as a v0.34
  candidate to add them

Each refuse hint names the host-side path the user should use instead.
Closes the silent-wrong-brain bug class for 9 commands total (recall +
forget routing landed in the prior commit).

* test: cover v0.33 recall extensions + thin-client routing audit (45 cases)

Three new test files pinning the v0.33 behavior + critical regression
guards from both Codex review rounds:

- test/recall-extensions.test.ts (17 cases, PGLite-backed). Covers
  countUnconsolidatedFacts SQL semantics (ignores expired, ignores
  consolidated, source-scoped, returns 0 on empty), cursor state file
  round-trip + corrupt/future fallback + briefing vs watch separation
  (Codex round 2 garrytan#8 regression guard) + atomic write tmp suffix
  (Codex round 1 garrytan#7 regression guard) + non-fatal write failures.
  Uses withEnv() for GBRAIN_HOME isolation per check-test-isolation.sh R1.

- test/recall-rollup.test.ts (8 pure-function cases). CRITICAL
  regression guards for Codex round 1 garrytan#8:
    1. Top-K computed over the FULL FactRow[], not a LIMIT-100 slice
       (seeded with 150 facts to prove full-window math)
    2. JSON shape pinned to `{entity_slug, count}` matching
       test/facts-doctor-shape.test.ts:49 (the existing shape pin)
    3. null entity_slug skipped, NOT bucketed as "(no entity)"
    4. Ties broken alphabetically for stable output

- test/thin-client-routing-audit.test.ts (20 source-grounded cases).
  Pins every v0.33 REFUSE addition in THIN_CLIENT_REFUSED_COMMANDS +
  every matching hint in THIN_CLIENT_REFUSE_HINTS + every v0.31.1-era
  original (no accidental removals). Pins every ROUTE addition's
  callRemoteTool import + call site in recall.ts and jobs.ts. Catches
  the audit-table regression mode that motivated the v0.31.1 wave
  originally.

Net: 45 new test cases. All pass green against the v0.33 implementation.

* chore: bump version and changelog (v0.33.0)

v0.33.0 — agent integration: gbrain recall morning pulse + thin-client routing fix.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
@chapter37haptics chapter37haptics merged commit 4c3e1b6 into master May 12, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants