Skip to content

fix(auth): route HTTP auth SQL through active engine#681

Open
mutour123 wants to merge 1 commit intogarrytan:masterfrom
mutour123:codex/pglite-http-auth-sql
Open

fix(auth): route HTTP auth SQL through active engine#681
mutour123 wants to merge 1 commit intogarrytan:masterfrom
mutour123:codex/pglite-http-auth-sql

Conversation

@mutour123
Copy link
Copy Markdown

@mutour123 mutour123 commented May 6, 2026

Summary

  • add an engine-backed minimal SQL tag for OAuth/admin infrastructure
  • route gbrain auth and serve --http auth/admin SQL through the active BrainEngine so PGLite works without the Postgres singleton
  • make database URL env vars explicitly select Postgres over file-backed PGLite config

Tests

  • bun test test/sql-query.test.ts test/config-env.test.ts test/oauth.test.ts
  • bun run typecheck (with unrelated local src/core/pglite-engine.ts changes stashed)

View in Codesmith
Need help on this PR? Tag @codesmith with what you need.

  • Let Codesmith autofix CI failures and bot reviews

@garrytan
Copy link
Copy Markdown
Owner

garrytan commented May 9, 2026

Hey @mutour123 — thanks for this. We hit a real architectural conflict during cherry-pick into #776: pr-681's narrow SqlQuery abstraction (only accepts SqlValue = string | number | bigint | boolean | Date | null) is incompatible with v0.28's takes-holders feature, which needs sql.json() for JSONB writes in src/commands/auth.ts. The abstraction was authored before the takes-holders work landed and didn't anticipate JSONB.

Per our PR-wave drop-and-followup protocol, this is being held for a follow-up that either (a) extends SqlQuery to support JSONB writes, or (b) routes JSONB writes through engine.executeRaw directly. We'll ping you here when the re-authored version is ready. Thank you for the contribution and the patience.

garrytan added a commit that referenced this pull request May 10, 2026
…-path, sync, multi-source, privacy) (#776)

* fix: bootstrap forward-references for v39-v41 schema replay

Three column-with-index forward references in the embedded schema blob were
missing from applyForwardReferenceBootstrap, so any brain at config.version
< 39 (Postgres) or < 41 (PGLite) wedges before the migration runner can
advance. Reproduced end-to-end on a PlanetScale Postgres brain stuck at
config.version=34 trying to upgrade to v0.30.0:

  ERROR: column "effective_date" does not exist
  ERROR: column cc.modality does not exist

(After upgrading, gbrain search and gbrain reindex-frontmatter both fail.)

The schema-blob references that crash before migrations run:

- v39 (multimodal_dual_column_v0_27_1):
    CREATE INDEX idx_chunks_embedding_image
      ON content_chunks USING hnsw (embedding_image vector_cosine_ops)
      WHERE embedding_image IS NOT NULL;
- v41 (pages_recency_columns):
    CREATE INDEX pages_coalesce_date_idx
      ON pages ((COALESCE(effective_date, updated_at)));

PGLite already covered v39 (lines 273+, 308+, 382-392). Postgres and PGLite
both lacked v40+v41 coverage. This commit adds:

- Postgres engine probe + branch for v39 (modality, embedding_image) — was
  entirely missing on Postgres, so Postgres brains < v39 hit the wedge that
  PGLite already protected against.
- Both engines: probe + branch for v40+v41. Bootstraps all five additive
  pages columns (emotional_weight, effective_date, effective_date_source,
  import_filename, salience_touched_at) gated on `effective_date_exists`
  as the proxy.
- test/schema-bootstrap-coverage.test.ts: extends REQUIRED_BOOTSTRAP_COVERAGE
  with the six new columns AND the pre-test DROP block so both the per-target
  assertion test and the end-to-end "bootstrap + SCHEMA_SQL replay" test
  exercise the new coverage.

All 5 tests in schema-bootstrap-coverage pass. typecheck clean.

Bootstrap stays additive-columns-only. Indexes are left to schema replay /
migrations as before.

* fix(deps): declare @jsquash/png and heic-decode

Both packages are direct imports in src/core/import-file.ts (decodeIfNeeded
for HEIC/AVIF → PNG) but only @jsquash/avif was declared. bun --compile
fails on a fresh install:

  error: Could not resolve: "@jsquash/png/encode.js"
  error: Could not resolve: "heic-decode"

Adds the missing declarations so npm install / bun install bring them in.

Versions chosen as latest at time of fix:
  @jsquash/png  ^3.1.1
  heic-decode   ^2.1.0

* fix(backfill-effective-date): replace bare BEGIN/COMMIT with engine.transaction()

postgres.js refuses bare BEGIN/COMMIT on pooled connections with
UNSAFE_TRANSACTION. The migration runner and other call sites already
use engine.transaction() (which routes through sql.begin() with a
reserved backend) — backfill-effective-date.ts was the holdout.

Reproduces on PlanetScale Postgres (us-east-4.pg.psdb.cloud) running
the v0.29.1 orchestrator's Phase B against a brain that has any rows
needing backfill:

  Reindex ok ... UNSAFE_TRANSACTION: Only use sql.begin, sql.reserved or max: 1

Switches the per-batch transaction to engine.transaction(async tx => …).
The SET LOCAL statement_timeout still scopes to the transaction; UPDATE
runs through the tx-scoped engine. ROLLBACK on error happens
automatically via sql.begin's contract.

Equivalent fix shape to existing usages in src/core/postgres-engine.ts
(lines 703, 806, 925) and the migration runner in src/core/migrate.ts
(line 2147).

* fix(v0_29_1): connect engine before use in Phase B and Phase C

phaseBBackfill() and phaseCVerify() build their own engine via
createEngine(toEngineConfig(cfg)) but never call engine.connect().
This worked accidentally before because executeRaw lazily falls back
to db.getConnection(), but engine.transaction() (added in the
companion backfill fix) requires a connected backend and surfaces
the missing-connect with:

  No database connection: connect() has not been called.
  Fix: Run gbrain init --supabase or gbrain init --url <connection_string>

Other orchestrators in the same directory get this right —
v0_28_0.ts:181 already does `await engine.connect(engineConfig)`
right after createEngine. Aligning v0_29_1 with that pattern.

After this + the backfill fix, v0.29.1 orchestrator runs to
'complete' on a fresh upgrade with backfill-needed rows, instead
of wedging at 'partial' status.

Note: anyone hitting the wedged state after the prior failures will
need `gbrain apply-migrations --force-retry 0.29.1` once before the
next apply-migrations --yes succeeds (the 3-consecutive-partials
guard in apply-migrations.ts is still active).

* fix: connect engine in v0.29.1 migration

* fix(upgrade): detectBunLink fails because bun resolves symlinks in argv[1]

bun resolves the entire symlink chain before setting process.argv[1],
so lstatSync(argv1).isSymbolicLink() always returns false for bun-link
installs, short-circuiting the git-config walk that would correctly
identify the repo. Remove the symlink gate — argv[1] is already the
real path inside the checkout, which is what the walk needs.

Also: return { repoRoot } so the upgrade path can auto-execute
git pull + bun install via execFileSync (no shell injection surface).

Fixes #368, supersedes incomplete v0.28.5 fix for #656.

* fix(oauth): clamp authorize() requested scopes against client.scope (RFC 6749 §3.3)

The MCP SDK's authorize handler (`@modelcontextprotocol/sdk/.../auth/handlers/authorize.js`)
splits `?scope=...` verbatim and forwards the parsed list to the provider, so the
provider has to clamp against the client's registered grant. v0.28.11
`authorize()` (src/core/oauth-provider.ts:235-259) inserted `params.scopes || []`
raw into `oauth_codes`, so a `read`-registered client requesting
`?scope=admin` had `['admin']` stored and `exchangeAuthorizationCode` issued
a fully-admin access token at /token exchange.

The asymmetry is the bug: the other two grant entry points already clamp.
`exchangeClientCredentials` (line 513-515) filters requested scopes through
`hasScope(allowedScopes, s)`, and `exchangeRefreshToken`'s F3 (line 372-380)
enforces RFC 6749 §6 subset against the original grant. authorize() lined up
with neither.

Fix mirrors the client_credentials filter shape so all three grant entry
points clamp consistently:

    const allowedScopes = parseScopeString(client.scope);
    const grantedScopes = (params.scopes || []).filter(s => hasScope(allowedScopes, s));

Empty/omitted requested scope keeps storing `[]` (existing shape, not a
security boundary). The clamped subset is what the client sees in the
`scope` field of the token response, which is the spec-compliant signal
that the grant was reduced.

Test coverage:
- New: authorize clamps requested scopes against client.scope (RFC 6749 §3.3)
  — read-only client requests ['read','write','admin'] and the issued token
  carries only ['read'].
- New: authorize subset request returns subset — 'read write' client
  requesting ['read'] gets ['read'] (regression guard against over-clamping).

The existing v0.26.9 oauth.test.ts pins F3 (refresh clamp) but had no
authorize-side coverage, which is why the regression survived.

* fix(sync): handle detached HEAD by skipping pull and ingesting local working tree

* fix(sync): --skip-failed acks pre-existing unacked failures up-front

The recovery flow that doctor + printSyncResult both advertise was broken:

1. User has files with bad YAML → they hit the failure log + sync stays
   blocked at last_commit.
2. User fixes the YAML.
3. User re-runs `gbrain sync` — sync succeeds, advances last_commit.
4. `gbrain doctor` still reports N unacked failures from step 1 because
   sync-failures.jsonl is append-only history, never auto-cleared.
5. doctor message says: "use 'gbrain sync --skip-failed' to acknowledge".
6. User runs `gbrain sync --skip-failed` → "Already up to date." → log
   unchanged.

The bug: --skip-failed only acknowledges failures from the CURRENT run.
performSync's ack path is gated on `failedFiles.length > 0` after sync —
it never fires when the diff is empty (because the user already fixed
the bad files) or when the sync is up to date. So the documented recovery
sequence is a no-op exactly when the user needs it.

The fix: at the top of runSync, when --skip-failed is set, eagerly ack
any pre-existing unacked failures before any sync work runs. Now the flag
means "acknowledge whatever is currently flagged and move on" regardless
of whether the current run produces new failures or finds nothing to do.

The inner per-run ack path stays — it still handles new failures from
the CURRENT run, which is the (a) syncing now produces failures + (b)
caller wants to ack them path. The two paths compose: `gbrain sync
--skip-failed` clears stale + advances past anything new, all in one
command, matching what the doctor message promises.

Tests: 2 added in test/sync-failures.test.ts. One source-string pin on
the new gate (the file's existing pattern for CLI-flag tests). One
behavioral test on the underlying acknowledgeSyncFailures path.

Repro:
  $ gbrain doctor
  [WARN] sync_failures: 27 unacknowledged sync failure(s)...
         Fix the file(s) and re-run 'gbrain sync', or use
         'gbrain sync --skip-failed' to acknowledge.
  $ # ... fix the YAML ...
  $ gbrain sync
  Already up to date.
  $ gbrain sync --skip-failed
  Already up to date.   # before this PR
  $ gbrain doctor
  [WARN] sync_failures: 27 unacknowledged sync failure(s)...   # still!

After:
  $ gbrain sync --skip-failed
  Acknowledged 27 pre-existing failure(s).
  Already up to date.
  $ gbrain doctor
  [OK] sync_failures: N historical sync failure(s), all acknowledged

* fix(extract): default --dir to configured brain dir, not cwd

`gbrain extract links` (and timeline / all) defaulted --dir to '.' when
not explicitly passed (src/commands/extract.ts:357). Combined with a
walker that skips dotfiles but NOT node_modules/dist/build/vendor, this
turned a no-arg invocation into a footgun.

Repro:
  $ cd ~/Documents/some-project   # has a node_modules/ tree
  $ gbrain extract links
  [extract.links_fs] 28989/28989 (100%) done
  Links: created 0 from 28989 pages
  Done: 0 links, 0 timeline entries from 28989 pages

The "28989 pages" is `walkMarkdownFiles('.')` recursively eating package
READMEs, dependency docs, fixture content. Their from_slug doesn't match
any row in the pages table, so addLinksBatch rejects every insert and
returns 0. Output looks like a healthy idempotent no-op; was actually a
wasteful junk walk that wrote nothing.

Fix: when --dir is not passed AND source is fs, resolve from
sources(local_path) via getDefaultSourcePath — same helper sync uses
(src/commands/sync.ts:1089). The default behavior now matches `sync`:
"work on the configured brain". Falls back to a clear error when no
source is configured, telling the user to either pass --dir, register
a source, or use --source db.

Behavior matrix:
  --dir explicit     → use that path (unchanged)
  --dir absent + cfg → resolve from sources(local_path)
  --dir absent + no  → error with actionable hint (was: walk cwd silently)
  --dir .            → cwd (user opted in explicitly — unchanged)

Tests: three added in test/extract-fs.test.ts:
  1. configured source → no-arg invocation extracts from that path
  2. no source configured → exit 1 + actionable error message
  3. explicit --dir wins over a configured (decoy) source path

* fix(extract): normalize slugs to lowercase via pathToSlug() (T-OBS-1)

The extractor was generating from_slug and the allSlugs lookup set from
`relPath.replace('.md', '')` in 5 places, producing CAPS slugs for files
named ETHOS.md, AGENTS.md, ROADMAP.md, etc.

Pages persist in the DB with lowercase slug (core/sync.ts pathToSlug()
applies .toLowerCase()). The CAPS extractor output mismatched the DB rows,
so INSERT ... JOIN pages ON pages.slug = v.from_slug silently dropped
links from CAPS-named source files. The link batch returned 'inserted'
counts that were lower than the wikilinks actually present, with no error.

Reproduction (in a brain with CAPS-named canonical docs):
  1. echo 'See [agents](agents.md).' > ETHOS.md
  2. gbrain put ethos < ETHOS.md  # page row: slug='ethos'
  3. gbrain extract links --source fs
  4. gbrain backlinks agents → []  (expected: contains 'ethos')

Fix: import pathToSlug from core/sync.ts and use it in all 5 sites:
  - extractLinksFromFile (line 200): from_slug derivation
  - runIncrementalExtractInternal (line 456): allSlugs set
  - extractLinksFromDir (line 552): allSlugs set
  - timeline loop (line 643): from_slug for timeline entries
  - extractLinksForSlugs (line 673): allSlugs set used by sync hook

This single-line-per-site change keeps the extractor consistent with the
sync layer's slug normalization and doesn't introduce any new behavior
for already-lowercase paths (idempotent).

Tests: added 'extractLinksFromFile — slug normalization (T-OBS-1
regression)' suite with 4 cases covering CAPS, mixed-case, idempotent
lowercase, and nested path. Full extract suite (54 → 58 tests) passes.

Reported by Claude Code (Opus 4.7) during Obsidian PKM integration on
the gstack-plan Living Repo, where ~111 wikilinks pointing to ETHOS,
AGENTS, ROADMAP, etc. failed to count toward brain_score (54/100 vs
expected 75+/100). Documented as T-OBS-1 in the consumer's blocked.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

* fix(cli): CLI_ONLY commands should short-circuit on --help instead of executing

* fix(doctor): correct command syntax in graph_coverage warn message

graph_coverage warn directs users to run `gbrain link-extract &&
gbrain timeline-extract`, but no commands by those names are
registered in cli.ts. The actual commands are `gbrain extract links`
and `gbrain extract timeline` (registered as the 'extract'
subcommand at src/cli.ts:525, with the kind argument 'links' /
'timeline' / 'all' parsed inside src/commands/extract.ts).

A user who runs the suggested command gets:
  $ gbrain link-extract
  Unknown command: link-extract

This is the only place in src/ with the wrong syntax — the rest of
the docs (init.ts:221, init.ts:331, features.ts:120,
v0_13_0.ts:67, sync.ts:752 comment) all already say 'extract links'.
This patch just brings doctor.ts in line.

* fix(doctor): use autoDetectSkillsDir so OpenClaw workspaces are reachable

`gbrain doctor` was the only consumer of `findRepoRoot` from
`core/repo-root.ts`. Every other consumer (check-resolvable.ts:145,
skillify.ts, etc.) uses `autoDetectSkillsDir`, which has the full
detection chain:
  1. \$OPENCLAW_WORKSPACE
  2. ~/.openclaw/workspace
  3. findRepoRoot() walk from cwd
  4. ./skills

`findRepoRoot` only does step 3. Result: when the user runs `gbrain
doctor` from any directory outside the gbrain repo or the OpenClaw
workspace tree (e.g., a project's checkout), `resolver_health` reports
"Could not find skills directory" even though the dispatcher exists at
~/.openclaw/workspace/skills/RESOLVER.md.

Reproduces in any directory other than ~/gbrain or its descendants on
a system with ~/.openclaw/workspace/skills/RESOLVER.md present:

    \$ cd ~/Documents
    \$ gbrain doctor
    [WARN] resolver_health: Could not find skills directory   # before
    [WARN] resolver_health: 5 issue(s): 0 error(s), 5 warning(s)  # after

Switching doctor to `autoDetectSkillsDir` brings it inline with the rest
of the codebase. The detected dir is also passed to
`checkSkillConformance` (step 2 of the resolver_health block), which
previously rebuilt the path from `repoRoot` — now uses the same
detected path for consistency.

All 15 existing tests in test/doctor.test.ts continue to pass.

* fix(mcp): exit serve process on stdin-close/SIGTERM

MCP stdio server was keeping the bun process alive indefinitely after
the client disconnected. Over days this accumulated 20+ orphaned
gbrain serve processes, all holding the PGLite directory open.
Since PGLite is single-writer, this caused write-lock contention that
made email-sync fail its 15s per-put timeout: 114 puts x 15s = 28.5min
runs with 0 emails written.

Now listens for stdin end/close, transport close, and SIGTERM/SIGINT/
SIGHUP; calls engine.disconnect() and exits cleanly.

Root cause for the no-gbrain-run-in-50h alert.

* fix(skills): broaden RESOLVER triggers + 1 ambiguity flag (37 misses → 0, 100% top-1 accuracy)

`bun run src/cli.ts routing-eval` was reporting 37 ROUTING_MISS entries
across 10 skills whose RESOLVER.md trigger phrases didn't match any of
their own routing-eval.jsonl fixture intents. Two distinct causes:

1. Single-phrase triggers in 9 skills under '## Uncategorized' didn't
   cover the paraphrased fixture variations they're supposed to route.
   Broadened each trigger cell to a quoted-phrase list that covers the
   fixtures (5 fixtures per skill on average).

2. The media-ingest row used unquoted prose
   ('Video, audio, PDF, book, YouTube, screenshot') which
   extractTriggerPhrases() collapses into one impossible long phrase
   ('video audio pdf book youtube screenshot') under normalizeText —
   no fixture intent will ever contain that exact substring. Converted
   to a quoted phrase list.

3. One fixture ('web research pass on this person') legitimately
   matches both `perplexity-research` and `data-research`
   (data-research's trigger row contains "Research"). Marked the
   fixture `ambiguous_with: ["data-research"]` since the overlap
   on the keyword 'research' is inherent and expected.

Skills with broadened triggers:
  - voice-note-ingest, article-enrichment, book-mirror,
    archive-crawler, brain-pdf, academic-verify, concept-synthesis,
    perplexity-research, strategic-reading, media-ingest

Before: 58 cases, 37 misses, ~36% top-1 accuracy
After:  58 cases, 0 misses, 100% top-1 accuracy

This also clears `gbrain doctor`'s `resolver_health: 37 issue(s)` warning.

* fix(multi-source): thread source_id through per-page tx surface

Multi-source brains crashed mid-import with Postgres 21000 ("more than one
row returned by a subquery used as an expression"). Root cause: putPage's
INSERT column list omitted source_id, so writes intended for a non-default
source (e.g. 'jarvis-memory') silently fabricated a duplicate row at
(default, slug). The schema has UNIQUE(source_id, slug) but DEFAULT 'default'
for source_id; calling putPage(slug, page) without source_id landed at
(default, slug) and ON CONFLICT updated the wrong row, leaving the intended
source row stale. Subsequent bare-slug subqueries inside the same tx —
(SELECT id FROM pages WHERE slug = $1) in getTags / removeTag / deleteChunks
/ removeLink / addLink (cross-product) — then matched 2 rows and crashed
with 21000, rolling back the entire import. Observed: 18 sync failures
against a 'jarvis-memory'-sourced brain.

Fix:
- putPage adds source_id to the INSERT column list (defaults 'default' for
  back-compat).
- Every bare-slug page-id subquery becomes source-qualified
  (AND source_id = $X) in both engines: createVersion, upsertChunks,
  getChunks, addTag, removeTag, getTags, deleteChunks, removeLink,
  addTimelineEntry, deletePage, updateSlug.
- addLink rewritten away from FROM pages f, pages t cross-product into a
  VALUES + JOIN-on-(slug, source_id) shape mirroring addLinksBatch.
- engine.ts interface: 11 method signatures gain optional opts.sourceId
  (or opts.{from,to,origin}SourceId for addLink/removeLink). All optional;
  existing callers default to source='default' and behave identically.
- import-file.ts: importFromContent / importFromFile / importCodeFile take
  opts.sourceId and thread txOpts = { sourceId } through every per-page tx
  call. engine.getPage callsite source-scoped for accurate idempotency.
- commands/sync.ts: thread opts.sourceId at importFile (line 581 + 641),
  un-syncable cleanup (487-498), delete phase (557), rename phase (574),
  and post-sync extract phase (815-816).
- commands/reindex-code.ts: thread opts.sourceId at importCodeFile call.
- commands/extract.ts: extractLinksForSlugs / extractTimelineForSlugs accept
  opts.sourceId and propagate via linkOpts / entryOpts.
- commands/reconcile-links.ts: ReconcileLinksOpts.sourceId was declared but
  ignored end-to-end; now wired through getPage + addLink calls.
- commands/migrate-engine.ts: --force wipe switched to executeRaw('DELETE
  FROM pages') to preserve the pre-PR all-sources semantic after deletePage
  became default-source-scoped.

Regression test: test/source-id-tx-regression.test.ts (19 tests). Validates
two sources × same slug coexist; getTags/addTag/removeTag/deleteChunks/
upsertChunks/createVersion/addLink/addTimelineEntry/deletePage/updateSlug
source-scoped writes don't 21000; back-compat without opts targets
source='default'; addLink fail-fast on missing source-qualified endpoint;
importFromContent end-to-end tx thread without fabricating duplicate.

Adversarial review: Codex (gpt-5.5 reviewer) + Grok (xAI flagship reviewer)
3-round crew loop. Round 1: 2 HIGH (addTimelineEntry + extract.ts thread)
+ 2 MED. Round 2: 1 CRITICAL + 1 HIGH (deletePage + updateSlug bare-slug)
+ 2 MED. Round 3: 2 HIGH (getChunks + migrate-engine semantic regression
introduced by R2 fix). Round 4: both reviewers CLEAR.

Deferred to follow-up PRs (noted as TODO):
- src/commands/embed.ts source-aware threading (auto-embed at sync.ts:823
  has a TODO; try/catch swallows the failure as best-effort).
- src/core/postgres-engine.ts:1511 / pglite-engine.ts:1446 putRawData
  bare-slug (lower-impact metadata path).
- Read-surface bare-slug consistency cleanup (getLinks/getBacklinks/
  getTimeline/getRawData/getVersions): non-mutating, won't 21000.
- reconcile-links.ts CLI --source flag exposure (internal opt is wired;
  CLI parser is a UX feature for later).

Existing rows in production written under (default, slug) by the old
putPage when caller meant another source remain misrouted. Backfill
heuristics need install-specific knowledge of intended source and are
outside this PR's scope; surface as a deployment-side cleanup task.

bun run typecheck clean, bun run build clean, 19/19 regression tests pass,
4082 unit pass / 1 pre-existing fail (BrainRegistry test depending on
test-env ~/.gbrain/ absence — fails on untouched main, unrelated).

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

* fix(multi-source): plumb sourceId through performFullSync (PR #707 gap)

PR #707 fixed source_id routing for sync's incremental loop (lines 581/641)
but performFullSync (line 922) calls runImport without threading sourceId.
Result: full syncs route pages to default even with --source <id>. Verified
on v0.30.1 by direct PGLite probe after `gbrain sync --source X --full`:
all pages landed in default, not the named source.

Fix:
- runImport accepts sourceId in opts (programmatic only — no CLI flag,
  preserving PR #707's design intent of `gbrain import` being default-only).
- runImport threads sourceId to importFile + importImageFile.
- performFullSync passes opts.sourceId to runImport.
- ImportImageOptions type accepts sourceId for runImport branch (importImageFile
  body wiring deferred — image imports out of scope for current use case;
  TS error fix only).

Verified: real sync test against /tmp/test-sync routes 1 page to "testsync"
source, 0 to default (post-fix). 19/19 source-id regression tests still pass.
Typecheck clean.

Co-Authored-By: Claude Opus 4.7 <[email protected]>

* test: regression test for performFullSync sourceId threading

PR #707's existing 19-test suite at test/source-id-tx-regression.test.ts
covers the engine-layer transaction surface (putPage / addTag / etc.)
but does NOT exercise commands/sync.ts:performFullSync. Verified via
`grep -c 'performFullSync' test/source-id-tx-regression.test.ts → 0`.

This means the +18/-4 fix at sync.ts:892 (performFullSync passing
sourceId to runImport) had no automated coverage.

Adds 2 PGLite-only regression tests:

1. `performFullSync with --source routes pages to named source (not default)`
   — fixture: temp git repo with 2 markdown files. Calls performSync with
   { full: true, sourceId: 'testsrc-pfs', noPull: true, noEmbed: true }.
   Asserts pages.source_id = 'testsrc-pfs', not 'default'. Pre-fix: FAILS
   (verified by checking out 46cd197 — rebased PR #707 only, without my
   gap-fix — and running this test). Post-fix: PASSES.

2. `performFullSync WITHOUT --source still targets default (back-compat)`
   — same fixture, no sourceId opt. Asserts pages.source_id = 'default'.
   Both pre-fix and post-fix: PASSES (back-compat preserved by the fix).

Verified: 21/21 tests pass on this branch (19 from PR #707 + 2 new).
`bun run typecheck` clean. `bun run verify` clean (8 guard checks pass).

Co-Authored-By: Claude Opus 4.7 <[email protected]>

* fix(privacy): strip takes fence from get_page / get_versions when token carries an allow-list

v0.28.6 (#563) introduced the per-token takes-holder allow-list: an OAuth token
carries `permissions.takes_holders` and `takes_list` / `takes_search` /
`think.gather` filter take rows server-side via `WHERE t.holder = ANY($allowList)`
in both engines.

But take rows are stored in two places per the explicit contract in
`extract-takes.ts:5-13` ("markdown is canonical, the takes table is a derived
index"): the structured `takes` table AND inline in `pages.compiled_truth`
between `<!--- gbrain:takes:begin -->` markers as a markdown table whose `who`
column IS the holder. A read-only token whose `takes_holders` is `["world"]`
(the documented default-deny posture from migrate.ts:1221) can call
`get_page <slug>` and recover every non-`world` claim verbatim from the body —
private hunches, founder bets, non-public sourcing notes. `get_versions` has
the same shape: snapshots persist historical compiled_truth verbatim, so a
caller blocked at `get_page` falls through to /history.

The team already shipped a complementary fix in `chunkers/recursive.ts:49`
(stripTakesFence applied before the body is chunked, so `query` results don't
leak fence content). Migration v38 documents this as a "complementary fix" —
the page-CRUD surface was missed.

Fix strips the fence at the op layer when `ctx.takesHoldersAllowList` is set
(i.e. the remote MCP path). Local CLI callers leave the field unset and keep
seeing the full fence.

    const visibleBody = ctx.takesHoldersAllowList
      ? { ...page, compiled_truth: stripTakesFence(page.compiled_truth) }
      : page;

Same shape on `get_versions` over every snapshot in the array. Re-rendering
the fence with allow-list-filtered rows would require joining the takes table
per version_id and inverts the markdown-canonical contract; whole-fence strip
is the conservative posture that closes the leak. A future allow-list-aware
re-render is an additive change that won't break the contract pinned by these
tests.

Test coverage in `test/takes-mcp-allowlist.serial.test.ts`:
- get_page with allow-list strips fence; surrounding body kept.
- get_page without allow-list (local CLI) keeps fence (back-compat).
- get_page fuzzy resolution path also strips for remote tokens.
- get_versions with allow-list strips fence on every snapshot.
- get_versions without allow-list returns historical content intact.

The pre-fix R12 PoC reported `LEAKED garry hidden take? YES` and
`LEAKED brain hidden take? YES`; post-fix the same PoC reports `no` for both
holders and "bypass did not reproduce".

* Fix double-encoded jsonb in subagent_tool_executions breaking slug lookup

persistToolExecPending/Failed/Complete called JSON.stringify(input) before
passing to a $N::jsonb parameter. When input is already an object, this
produces a JSON string which ::jsonb stores as a jsonb scalar -- not a
jsonb object. Downstream queries like input->>slug then return NULL
because the operator does not traverse scalar strings.

Root cause fix: skip JSON.stringify when input is already a string.

Query fix: use COALESCE with (input #>> '{}')::jsonb->>slug fallback
to handle both old double-encoded rows and new properly-encoded rows.

Affects: dream cycle synthesize phase (pages_written always 0) and
patterns phase (same slug collection query).

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

* fix(adapter/voyage): translate request/response between OpenAI-compat SDK and Voyage's actual contract

The @ai-sdk/openai-compatible package treats Voyage as if it were
OpenAI-shaped, but Voyage's /v1/embeddings endpoint diverges in three places
that combine into a hard-blocking incompatibility:

OUTBOUND request:
  - 'encoding_format=float' (SDK default) is rejected; Voyage only accepts 'base64'
  - 'dimensions' parameter (OpenAI name) is rejected; Voyage uses 'output_dimension'

INBOUND response:
  - With encoding_format=base64, 'embedding' is returned as a base64 string,
    but the SDK's Zod schema (openaiTextEmbeddingResponseSchema) expects an
    'array of number'. The schema fails with 'Invalid JSON response' even
    though the JSON is well-formed.
  - 'usage' lacks 'prompt_tokens'; the schema requires it when usage is present.

Without this patch, ALL embedding requests to Voyage fail. Reproducible by
running 'gbrain put <slug> < text' with embedding_model=voyage:voyage-* and
any current voyage model (voyage-3-large, voyage-3, voyage-4-large).

Solution: pass a custom 'fetch' to createOpenAICompatible only when
recipe.id === 'voyage'. The fetch wrapper:
  1. Forces encoding_format='base64' on outbound (Voyage's only accepted value)
  2. Translates dimensions -> output_dimension on outbound
  3. Drops Content-Length so the runtime recomputes from the mutated body
  4. Decodes base64 embeddings to Float32 arrays on inbound (so the Zod schema
     sees what it expects)
  5. Synthesizes prompt_tokens from total_tokens when missing

This is a minimal, targeted fix. It only activates for Voyage and falls
through cleanly for all other providers. No public API changes.

* feat(dream): support .md files in transcript discovery

Transcript discovery only accepted .txt files. Many brain repos store
meeting transcripts and conversation logs as .md (markdown), which is
the natural format for brain content.

Changes:
- listTextFiles() now accepts both .txt and .md
- basename extraction handles both extensions for date inference
- readSingleTranscript() handles both extensions

No behavior change for existing .txt-only setups.

* fix(test): cast exitCode to unknown for TS strict-narrowing

TS narrows exitCode to null between declaration and assertion because
the mocked process.exit is behind `(process as any).exit`. The cast
preserves test intent without weakening the variable's type annotation.

Wave-side merge fix; ships alongside #688 (extract --dir default).

* fix(cli): add frontmatter + check-resolvable to CLI_ONLY_SELF_HELP

Companion to #634. Both commands have their own --help logic that prints
detailed usage with command-specific flags (e.g., --json, --fix, --strict
for check-resolvable). Without this, pr-634's generic short-circuit prints
"Usage: gbrain <cmd> - run gbrain --help for the full command list." and
the existing --help integration tests fail.

Verified: `gbrain frontmatter --help` and `gbrain check-resolvable --help`
now route to their handlers, which print full per-command usage and exit 0.

* fix(test): update discoverTranscripts test expectation for .md support

Companion to #708. The pre-#708 test asserted that .md files in the
session-corpus directory were skipped. Post-#708 they are discovered
alongside .txt. Renamed the test to 'skips non-txt non-md files' (uses
.pdf as the negative case) and added a positive .md discovery test that
pins #708's intended behavior.

* fix(skills): declare missing RESOLVER triggers in skill frontmatter

Companion to #718. The RESOLVER round-trip test (test/resolver.test.ts)
fuzzy-matches every RESOLVER.md trigger phrase against the target skill's
frontmatter triggers list. pr-718 added six new RESOLVER routings without
declaring matching triggers:

- media-ingest: 'PDF book', 'summarize this book', 'ingest it into my brain'
- article-enrichment: 'enriching the article', 'enrich the article', 'enrich pass'
- concept-synthesis: 'canon vs riff'
- perplexity-research: 'perplexity-research', 'surface new developments'
- academic-verify: 'Retraction Watch'
- voice-note-ingest: 'audio message'

Adds the missing triggers verbatim to each skill's frontmatter so the
round-trip invariant holds.

* chore: regenerate llms.txt + llms-full.txt after wave skill updates

* v0.30.3 release: bump VERSION + CHANGELOG entry

22-PR community fix wave with one P0 security upgrade (auth-code scope
escalation closed). 19 PRs landed across 5 lanes; 3 superseded by master
during cherry-pick; 1 deferred per E2 protocol (#681 architectural
conflict with v0.28 takes-holders); follow-up filed.

Headline fixes: #727 (auth-code scope-clamp, RFC 6749 §3.3 compliance),
#740/#751 (v0.29.1 PGLite migration connect), #741 (v39-v41 forward-
reference bootstrap), #757 (multi-source sourceId threading, closes
Postgres 21000), #728 (takes-fence redaction on remote reads).

See CHANGELOG.md for full per-PR attribution and decision history.

Co-Authored-By: lanceretter <[email protected]>
Co-Authored-By: alexandreroumieu-codeapprentice <[email protected]>
Co-Authored-By: brandonlipman <[email protected]>
Co-Authored-By: gus <[email protected]>
Co-Authored-By: jeremyknows <[email protected]>
Co-Authored-By: Trevin Chow <[email protected]>
Co-Authored-By: WD <[email protected]>
Co-Authored-By: Federico Cachero <[email protected]>
Co-Authored-By: Brandon Lipman <[email protected]>
Co-Authored-By: joshsteinvc <[email protected]>
Co-Authored-By: mgunnin <[email protected]>
Co-Authored-By: NineClaws Brain <[email protected]>
Co-Authored-By: joelwp <[email protected]>
Co-Authored-By: Oscar <[email protected]>

* test(C6): regression test for #745 collectChildPutPageSlugs

Codex-mandated test gate (C6 from /codex review of v0.30.3 plan).

Pins behavior of collectChildPutPageSlugs() under both jsonb shapes:
- jsonb_typeof='object' (post-#745, normal write path)
- jsonb_typeof='string' (pre-#745 double-encoded, the bug shape)

Without this guard, a future regression of #745 would silently drop slugs:
child jobs finish, queue looks healthy, orchestrator writes nothing.
Worst on-call shape — silent failure with no alerting surface.

Adds an `__testing` namespace to src/core/cycle/synthesize.ts re-exporting
collectChildPutPageSlugs at unit-test granularity. Not part of the runtime
contract; matches the v0_29_1.ts `__testing` precedent for engine-internal
helpers.

* test(C8): #708 .md transcript discovery + self-consumption guard

Codex-mandated test gate (C8 from /codex review of v0.30.3 plan).

Pins three invariants for #708's broadening of transcript discovery:

  1. .md files ARE discovered alongside .txt (the feature works).
  2. Other extensions (.pdf, .doc, .json) are still SKIPPED.
  3. v0.30.2's dream_generated frontmatter marker MUST guard .md files
     against self-consumption — without this, every dream cycle would
     loop on its own output indefinitely.

Adversarial cases: BOM + CRLF tolerance on .md frontmatter; the
--unsafe-bypass-dream-guard escape hatch for .md output; mixed .txt + .md
corpus dedup behavior pinned.

* test(C4): takes-fence redaction regression on get_page + get_versions

Codex-mandated test gate (C4 from /codex review of v0.30.3 plan).

Pins three privacy invariants for #728's fence-stripping in operations.ts:

  1. Local CLI caller (no allow-list) sees full takes fence — operator
     reads should preserve everything.
  2. MCP-bound caller (allow-list set) sees compiled_truth with fence
     STRIPPED on get_page AND get_versions.
  3. Allow-list PRESENCE (not contents) flags MCP-bound identity. Even
     a permissive ['world','garry','brain'] still strips, because the
     typed read surface for takes is takes_list / takes_search, not
     get_page or get_versions.

Lane 4 (#757 + #728) was the high-risk merge surface for this privacy
invariant. The test runs through dispatchToolCall to exercise the full
threading path (auth → context → handler → engine read → stripTakesFence)
so a future bad merge fails loudly at the conflict seam in operations.ts.

* test(C3): rewound-brain E2E for v39-v41 forward-reference bootstrap

Codex-mandated test gate (C3 from /codex review of v0.30.3 plan).

Pins the upgrade-path claim in the v0.30.3 release notes: brains stuck
at config.version < 39 (Postgres) or < 41 (PGLite) walk forward cleanly
through #741's bootstrap additions. Without this, the release note's
"old PGLite brains upgrade cleanly through v39-v41" was unproven.

Four cases:
  1. pre-v39 (missing modality + embedding_image)
  2. pre-v40 (missing emotional_weight + effective_date + effective_date_source)
  3. pre-v41 (missing import_filename + salience_touched_at)
  4. compounded pre-v34 wedge (v0.20 + v0.26.3 + v39-v41 all dropped at once)

Pattern follows test/e2e/v0_28_5-fix-wave.test.ts: build a fresh LATEST
brain, surgically rewind via DROP COLUMN CASCADE + UPDATE config.version,
then re-call initSchema and assert advancement to LATEST_VERSION with
the rewound columns restored. PGLite-only — Postgres-side bootstrap is
covered separately by test/e2e/postgres-bootstrap.test.ts.

* fix(test): rename migration-v0-29-1 to .serial.test.ts (CI lint)

CI's check-test-isolation lint flags the test for direct process.env.GBRAIN_HOME
mutation in beforeEach (rule R1: parallel-test-unsafe). The test is genuinely
env-coupled — it sets GBRAIN_HOME so loadConfig() inside the migration phases
finds the test fixture. Per CLAUDE.md ("When to quarantine instead of fix")
and the lint's own fix hint, env-coupled tests get renamed to *.serial.test.ts
to run in the serial bucket.

Verified: bash scripts/check-test-isolation.sh now reports OK; the renamed
test still runs green (1 pass / 0 fail, ~1.5s).

* fix(types): voyageCompatFetch — cast through unknown for Bun typeof fetch

CI's tsc --noEmit failed:
  src/core/ai/gateway.ts(249,7): error TS2741: Property 'preconnect' is
  missing in type '(input: RequestInfo | URL, init: RequestInit | ...) =>
  Promise<Response>' but required in type 'typeof fetch'.

Bun's @types/bun extends the standard fetch type with a preconnect method
that arrow functions can't satisfy. The AI SDK only invokes the call
signature; the Bun extension surface is irrelevant to voyageCompatFetch's
behavior.

Cast through `unknown` (TS2352-safe pattern for cross-type-family casts)
with explicit param types on the arrow function. Comment names the exact
TS2741 the cast suppresses so a future maintainer can audit the choice.

Companion to #735 (Voyage encoding-format adapter) — the original PR
introduced voyageCompatFetch typed against typeof fetch; the wave-side
typecheck error was caught by CI on the assembled branch.

* fix(test/e2e): rename + update dream-cycle phase-order test

The test file said "v0.23 8-phase cycle" but ALL_PHASES has been 9
since v0.26.5 (added `purge`) and 10 since v0.29 (added
`recompute_emotional_weight` between patterns and embed). The
hardcoded 8-element array assertion was stale documentation.

Renamed the file from dream-cycle-eight-phase-pglite.test.ts to
dream-cycle-phase-order-pglite.test.ts to make the maintenance
contract explicit: this test pins the canonical phase sequence,
whatever its current length, against unintended reorderings or
removals.

Extracted EXPECTED_PHASES as a typed const so the assertion lives in
one place and TypeScript's CyclePhase narrowing catches typos in the
phase names.

* fix(test/e2e): cycle.test.ts expects 10 phases (v0.29 added recompute_emotional_weight)

Same root cause as dream-cycle-phase-order-pglite.test.ts: hardcoded
phase count assertion drifted behind ALL_PHASES growth.

Phase history:
  v0.23  = 8 phases
  v0.26.5 = 9 (added `purge` last)
  v0.29  = 10 (added `recompute_emotional_weight` between patterns and embed)

* fix(test/e2e): scope GBRAIN_HOME to tmpdir for Doctor Command tests

`gbrain doctor`'s minions_migration check reads
`~/.gbrain/migrations/completed.jsonl` to detect half-installed
migrations. Pre-fix the test inherited the developer's local
$HOME, so stale partial entries from in-flight workspaces (e.g.
v0.31.0 in santiago) made the check fail and the test exit 1 —
masking real DB-health failures.

Added per-describe-block `gbrainHome` tmpdir, threaded through
`cliEnv()` so all spawned gbrain CLI calls in this block read a
hermetic, empty migrations ledger. Cleanup in afterAll.

* fix(claw-test): pass --dir explicitly to extract phase (companion to #688)

Pre-#688 `gbrain extract` defaulted to cwd. Post-#688 it requires
either a configured fs source or explicit --dir, otherwise it errors
out: "No brain directory configured."

The claw-test scripted scenarios run `gbrain init --pglite` in their
install_brain phase, which doesn't register a fs source. So the
extract phase needs --dir <brainDir> explicitly. Skip the extract
phase entirely when the scenario has no brain dir.

Captured brainDir at the import-phase site so it's reusable by extract.

* fix(preferences): route migration ledger paths through gbrainPath()

Pre-fix, preferences.ts used `$HOME/.gbrain` directly via its own
`home()` helper. Tests that set `process.env.HOME = tmpdir`
expecting hermetic isolation worked — but tests that set
`GBRAIN_HOME = tmpdir` (the documented override per
`src/core/config.ts`) didn't, because preferences ignored it.

Routed prefsDir(), prefsPath(), migrationsDir(), and
completedJsonlPath() through gbrainPath() (which honors
GBRAIN_HOME, falling back to homedir() when unset). The legacy
home() helper stays for any future code path that wants $HOME
specifically.

Updated three tests that mutated process.env.HOME to also mutate
GBRAIN_HOME so the same test body works against the new contract:
test/preferences.test.ts, test/migration-resume.test.ts,
test/e2e/migration-flow.test.ts.

* release: rename version slot to 0.31.1.1-fixwave

Originally bumped to 0.31.2 during the master merge to stay strictly
monotonic. Garry called the slot back to `0.31.1.1-fixwave` to
communicate intent: this is a fix wave on top of v0.31.1, not a new
minor or patch slot. The next regular release slot (v0.31.2) stays
free for in-flight feature work.

Format check:
- bun install accepts the literal version (verified)
- compareVersions() in src/commands/migrations/index.ts splits on
  '.' and parseInt's each segment, taking only the first 3. So
  '0.31.1.1-fixwave' compares as [0,31,1] = equal to '0.31.1' for
  migration-ordering purposes. Wave has no new schema migrations,
  so equality is fine.
- Compares stable to 0.31.1 in the migration runner; later versions
  (0.31.2, 0.32.x, etc.) sort strictly above as normal.

Updated:
- VERSION
- package.json (with bun.lock refresh)
- CHANGELOG.md entry header + 'To take advantage of' block + 'For
  contributors' reference
- llms.txt + llms-full.txt regenerated to match

---------

Co-authored-by: lanceretter <[email protected]>
Co-authored-by: Oscar <[email protected]>
Co-authored-by: WD <[email protected]>
Co-authored-by: gus <[email protected]>
Co-authored-by: Trevin Chow <[email protected]>
Co-authored-by: Brandon Lipman <[email protected]>
Co-authored-by: Federico Cachero <[email protected]>
Co-authored-by: Claude Opus 4.7 (1M context) <[email protected]>
Co-authored-by: Josh Stein <[email protected]>
Co-authored-by: Matt Gunnin <[email protected]>
Co-authored-by: Michael Dela Cruz <[email protected]>
Co-authored-by: Jeremy Knows <[email protected]>
Co-authored-by: joelwp <[email protected]>
Co-authored-by: NineClaws Brain <[email protected]>
Co-authored-by: alexandreroumieu-codeapprentice <[email protected]>
Co-authored-by: jeremyknows <[email protected]>
Co-authored-by: joshsteinvc <[email protected]>
Co-authored-by: mgunnin <[email protected]>
garrytan added a commit that referenced this pull request May 10, 2026
Annotate the v0.31.3 changes in the canonical Key Files section:
new src/core/sql-query.ts adapter (#681), src/commands/serve.ts stdio
cleanup (#676), v0.31.3 amendments to auth.ts / serve-http.ts /
oauth-provider.ts surfaces, and migration v46 normalizer in migrate.ts.

Co-Authored-By: Claude Opus 4.7 <[email protected]>
garrytan added a commit that referenced this pull request May 10, 2026
…closes #413, #446) (#801)

* fix(serve): clean up stdio MCP server on client disconnect

The PGLite write lock leaked indefinitely when the parent of `gbrain serve`
disconnected. Three root causes: serve.ts never called engine.disconnect()
after startMcpServer() resolved; cli.ts short-circuited with a "serve doesn't
disconnect" comment; and the MCP SDK's StdioServerTransport only listens for
'data'/'error' on stdin, never 'end'/'close', so even a clean stdin EOF never
reached the SDK.

Net effect: the next `gbrain serve` waited for the in-process 5-minute stale-
lock check or hung indefinitely.

stdio path now installs a unified lifecycle:
- SIGTERM/SIGINT/SIGHUP all funnel into one idempotent shutdown path
  (SIGHUP coverage matters for Claude Desktop on macOS / MCP gateway
  restarts; SIGINT for Ctrl-C; SIGTERM for daemon shutdown).
- stdin 'end' (clean EOF) and 'close' (parent SIGKILL with pipe still
  open) both trigger the same graceful path. TTY stdin skips the watchers
  so interactive `gbrain serve` is unaffected.
- Parent-process watchdog polls the live kernel parent PID via spawnSync
  ('ps','-o','ppid=','-p',PID) every 5s. process.ppid is cached at process
  creation by Bun (and Node) and never refreshes on re-parent — empirical
  evidence on macOS shows ps reports the new parent within one tick while
  process.ppid stays at the original PID indefinitely (oven-sh/bun#30305).
- Watchdog fires on `getParentPid() !== initialParentPid` (any reparent),
  not just `=== 1`. Catches launchd / systemd / tmux / parent-shell-with-
  PR_SET_CHILD_SUBREAPER cases where the kernel re-anchors us to a non-1
  subreaper PID. Codex review caught the original `=== 1` was incomplete.
- One-shot startup probe verifies `spawnSync('ps')` actually works on this
  host. If the probe fails (stripped containers / busybox without procps),
  we skip installing the watchdog interval entirely AND emit a loud stderr
  line — the operator sees "watchdog disabled" instead of an installed-
  but-never-fires phantom that silently falls back to cached process.ppid.
- 5-second cleanup deadline: if engine.disconnect() wedges (PGLite WASM
  stall, etc.), the process still calls process.exit(0). The abandoned
  lock dir is reclaimed on the next start by the existing stale-lock
  check in pglite-lock.ts.
- Optional `--stdio-idle-timeout <sec>`: default OFF safety net for
  parents that leak the pipe but never close it. Strict parsing rejects
  `abc` / `30junk` / `-1` / `1.5` / blank values explicitly so a typo
  doesn't silently disable the safety net (closes #446).

Test seam: ServeOptions { stdin, signals, exit, log, startMcpServer,
getParentPid, setInterval, clearInterval, probeWatchdog } lets the
lifecycle be unit-tested deterministically without spawning a real Bun
child or booting the MCP SDK.

22 test cases covering signals, stdin EOF, TTY skip, watchdog reparent
(both PID-1 and subreaper-PID-N cases), ps-unavailable degraded mode,
idle timeout, idempotent shutdown, and cleanup-deadline behavior.

Closes #413, #446. Supersedes #591.

Co-Authored-By: Aragorn2046 <[email protected]>
Co-Authored-By: seungsu-kr <[email protected]>
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

* fix(auth): route HTTP auth/admin SQL through active engine

`gbrain auth` and `gbrain serve --http` previously routed every SQL
through the postgres.js singleton in src/core/db.ts, which silently fell
back to a file-backed PGLite when DATABASE_URL was set but the config
file disagreed. The HTTP transport's verbatim use of the singleton also
made `gbrain serve --http` Postgres-only, even though the
`access_tokens` and `mcp_request_log` tables exist in both engine
schemas.

Auth, OAuth, admin, file uploads, and HTTP-transport SQL now run through
`engine.executeRaw` via a deliberately narrow tagged-template adapter
(`src/core/sql-query.ts`). The contract is scalar-binds-only — adding
JSONB or fragment composition would invite the adapter to drift into a
partial postgres.js clone. JSONB writes use a separate
`executeRawJsonb(engine, sql, scalarParams, jsonbParams)` helper that
composes positional `$N::jsonb` casts and passes objects through
`engine.executeRaw`. The CI guard at `scripts/check-jsonb-pattern.sh`
doesn't fire because the helper is a method call, not the banned
`${JSON.stringify(x)}::jsonb` template-literal interpolation, and the
v0.12.0 double-encode bug class doesn't apply to positional binding via
`postgres.js`'s `unsafe()` (verified by
`test/e2e/auth-permissions.test.ts:67` on Postgres and the new
`test/sql-query.test.ts` on PGLite).

Migrated call sites:
  - src/commands/auth.ts: takes-holders writes (lines 52, 86) →
    executeRawJsonb. List, revoke, register-client, revoke-client →
    SqlQuery via withConfiguredSql() helper that opens an engine, runs
    the callback, disconnects.
  - src/commands/serve-http.ts: ~25 call sites including the four
    mcp_request_log.params INSERTs (now write real JSONB objects, not
    JSON-encoded strings — the read side `params->>'op'` returns the
    operation name, closing CLAUDE.md's outstanding "JSON-string-into-
    JSONB" note as a side effect). The /admin/api/requests dynamic
    filter pattern (postgres.js fragment composition) is rewritten as
    parametrized SQL string + params array.
  - src/mcp/http-transport.ts: legacy bearer-auth path. The
    Postgres-only fail-fast at startup is removed because both schemas
    now carry access_tokens + mcp_request_log.
  - src/core/oauth-provider.ts: SqlQuery / SqlValue types relocated
    from here to sql-query.ts as the canonical home (Codex finding #8).
  - src/commands/files.ts: all 5 db.getConnection() sites (lines 104,
    139, 252, 326, 355). The line-256 INSERT into files.metadata uses
    executeRawJsonb; the other four are scalar-only SqlQuery (Codex
    finding #6 — scope was bigger than the plan's "lone INSERT" framing).
  - src/core/config.ts: env-var DATABASE_URL inference. When dbUrl is
    set, infer Postgres engine and clear the stale database_path.

Engine-internal sql.json() sites in src/core/postgres-engine.ts (5
sites: lines 520, 1689, 1728, 1790, 2313) STAY UNCHANGED. They live
inside PostgresEngine itself, where the postgres.js template-tag
sql.json() pattern is correct — those methods are only loaded when
Postgres is the active engine, so there's no PGLite-routing concern.

Migration v45 (mcp_request_log_params_jsonb_normalize): one-shot UPDATE
that lifts pre-v0.31 string-shaped JSONB rows to objects so the
/admin/api/requests endpoint at serve-http.ts:605 returns one
consistent shape to the admin SPA. Idempotent (subsequent runs find no
rows where jsonb_typeof = 'string'). Closes the mixed-shape window
that would otherwise have made post-deploy admin reads break.

Tests:
  - test/sql-query.test.ts: 7 cases covering scalar binds, the
    .json() rejection (defense in depth — SqlQuery is scalar-only),
    JSONB round-trip with `jsonb_typeof = 'object'` and `->>`
    semantics, the v0.12.0 double-encode regression guard, null
    JSONB handling, and the scalars-then-jsonb call shape.
  - test/config-env.test.ts: migrated from PR's manual `restoreEnv()`
    in afterEach to the canonical `withEnv()` helper at
    test/helpers/with-env.ts (CLAUDE.md R1 / codex finding D3).
    Five cases covering DATABASE_URL precedence, GBRAIN_DATABASE_URL
    operator override, file-only config, env-only config, and the
    no-config null path.
  - test/e2e/auth-takes-holders-pglite.test.ts: 6 cases against
    in-memory PGLite (no DATABASE_URL gate). Covers create / update /
    read of access_tokens.permissions, mcp_request_log.params object
    + null writes, and the migration v45 normalizer (seed
    string-shaped row, run UPDATE, assert object shape; second-run
    no-op for idempotency).
  - test/http-transport.test.ts: mock updated to intercept
    engine.executeRaw (the new code path) instead of the postgres.js
    template tag. 24 cases pass.

Plan reference: ~/.claude/plans/system-instruction-you-are-working-peppy-moore.md.
Codex outside-voice review applied: D-codex-1, D-codex-2, D-codex-5,
D-codex-8, D-codex-9, D-codex-10 (and D1, D5 reversed by codex).

Closes the architectural intent of #681. Supersedes its branch.

Co-Authored-By: codex-bot <[email protected]>
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

* docs: update CLAUDE.md key files for v0.31.3

Annotate the v0.31.3 changes in the canonical Key Files section:
new src/core/sql-query.ts adapter (#681), src/commands/serve.ts stdio
cleanup (#676), v0.31.3 amendments to auth.ts / serve-http.ts /
oauth-provider.ts surfaces, and migration v46 normalizer in migrate.ts.

Co-Authored-By: Claude Opus 4.7 <[email protected]>

* chore: regenerate llms-full.txt for v0.31.3 docs sync

CI's build-llms test asserts the committed llms.txt + llms-full.txt
match what scripts/build-llms.ts produces from current source state.
CLAUDE.md was amended by /document-release post-merge (new entries for
src/core/sql-query.ts and src/commands/serve.ts; amended notes on
auth.ts / serve-http.ts / migrate.ts), so the inlined-bundle fell out
of sync. Regenerated via `bun run build:llms`.

llms.txt unchanged (curated index — no new web URLs added).
llms-full.txt updated to inline the new CLAUDE.md content.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

---------

Co-authored-by: Aragorn2046 <[email protected]>
Co-authored-by: Claude Opus 4.7 (1M context) <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants