From d8814ac07288563def5d91c1322e8b77e42b8dab Mon Sep 17 00:00:00 2001 From: Garry Tan Date: Sat, 9 May 2026 21:34:51 -0700 Subject: [PATCH 1/4] fix(serve): clean up stdio MCP server on client disconnect MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The PGLite write lock leaked indefinitely when the parent of `gbrain serve` disconnected. Three root causes: serve.ts never called engine.disconnect() after startMcpServer() resolved; cli.ts short-circuited with a "serve doesn't disconnect" comment; and the MCP SDK's StdioServerTransport only listens for 'data'/'error' on stdin, never 'end'/'close', so even a clean stdin EOF never reached the SDK. Net effect: the next `gbrain serve` waited for the in-process 5-minute stale- lock check or hung indefinitely. stdio path now installs a unified lifecycle: - SIGTERM/SIGINT/SIGHUP all funnel into one idempotent shutdown path (SIGHUP coverage matters for Claude Desktop on macOS / MCP gateway restarts; SIGINT for Ctrl-C; SIGTERM for daemon shutdown). - stdin 'end' (clean EOF) and 'close' (parent SIGKILL with pipe still open) both trigger the same graceful path. TTY stdin skips the watchers so interactive `gbrain serve` is unaffected. - Parent-process watchdog polls the live kernel parent PID via spawnSync ('ps','-o','ppid=','-p',PID) every 5s. process.ppid is cached at process creation by Bun (and Node) and never refreshes on re-parent — empirical evidence on macOS shows ps reports the new parent within one tick while process.ppid stays at the original PID indefinitely (oven-sh/bun#30305). - Watchdog fires on `getParentPid() !== initialParentPid` (any reparent), not just `=== 1`. Catches launchd / systemd / tmux / parent-shell-with- PR_SET_CHILD_SUBREAPER cases where the kernel re-anchors us to a non-1 subreaper PID. Codex review caught the original `=== 1` was incomplete. - One-shot startup probe verifies `spawnSync('ps')` actually works on this host. If the probe fails (stripped containers / busybox without procps), we skip installing the watchdog interval entirely AND emit a loud stderr line — the operator sees "watchdog disabled" instead of an installed- but-never-fires phantom that silently falls back to cached process.ppid. - 5-second cleanup deadline: if engine.disconnect() wedges (PGLite WASM stall, etc.), the process still calls process.exit(0). The abandoned lock dir is reclaimed on the next start by the existing stale-lock check in pglite-lock.ts. - Optional `--stdio-idle-timeout `: default OFF safety net for parents that leak the pipe but never close it. Strict parsing rejects `abc` / `30junk` / `-1` / `1.5` / blank values explicitly so a typo doesn't silently disable the safety net (closes #446). Test seam: ServeOptions { stdin, signals, exit, log, startMcpServer, getParentPid, setInterval, clearInterval, probeWatchdog } lets the lifecycle be unit-tested deterministically without spawning a real Bun child or booting the MCP SDK. 22 test cases covering signals, stdin EOF, TTY skip, watchdog reparent (both PID-1 and subreaper-PID-N cases), ps-unavailable degraded mode, idle timeout, idempotent shutdown, and cleanup-deadline behavior. Closes #413, #446. Supersedes #591. Co-Authored-By: Aragorn2046 Co-Authored-By: seungsu-kr Co-Authored-By: Claude Opus 4.7 (1M context) --- src/commands/serve.ts | 340 +++++++++++++++++++++- test/serve-stdio-lifecycle.test.ts | 442 +++++++++++++++++++++++++++++ 2 files changed, 778 insertions(+), 4 deletions(-) create mode 100644 test/serve-stdio-lifecycle.test.ts diff --git a/src/commands/serve.ts b/src/commands/serve.ts index 257bc372d..1b1651a82 100644 --- a/src/commands/serve.ts +++ b/src/commands/serve.ts @@ -1,7 +1,68 @@ +import { spawnSync } from 'node:child_process'; import type { BrainEngine } from '../core/engine.ts'; import { startMcpServer } from '../mcp/server.ts'; -export async function runServe(engine: BrainEngine, args: string[] = []) { +// Maximum time the stdio path will wait for engine.disconnect() (PGLite +// close + advisory lock release) before forcing exit. Keeps a wedged +// disconnect from trapping the process forever; the abandoned lock dir is +// already covered by the in-process stale-lock check (acquireLock walks +// the dir, sees a dead PID, and removes it). +const CLEANUP_DEADLINE_MS = 5_000; + +// How often the parent-process watchdog polls the live kernel parent PID +// (via `readLiveParentPid`, NOT the cached `process.ppid` — see that +// helper's comment). We don't receive a signal when our parent dies (the +// kernel just re-parents us to init / launchd / a subreaper-PID), so +// polling is the only reliable way to detect "parent went away without +// closing stdin". 5s matches the cadence in the concurrent #591 PR; +// faster polling has no benefit, slower would extend the lock-leak window. +const PARENT_WATCHDOG_INTERVAL_MS = 5_000; + +export interface ServeOptions { + // Test seam — defaults to the live process. The lifecycle plumbing reads + // these for stdin EOF detection, signal handlers, and exit, so unit + // tests can drive end-to-end shutdown via mocked streams without + // spawning a real Bun process. `exit` is typed as `void` (not `never`) + // so test stubs that record + return are accepted without casts; + // `process.exit`'s `never` return is assignable to `void`. + stdin?: NodeJS.ReadableStream & { isTTY?: boolean }; + signals?: Pick; + exit?: (code?: number) => void; + log?: (msg: string) => void; + // Test seam: replace startMcpServer to avoid booting the real MCP SDK + // (which unconditionally attaches a 'data' listener to real + // process.stdin and would pollute the test runner's stdin handle). + // Defaults to the real implementation when omitted. + startMcpServer?: (engine: BrainEngine) => Promise; + // Test seam for the parent-process watchdog. The default + // (`readLiveParentPid`) reads the live kernel PPID via `ps` because + // `process.ppid` is captured at process creation and does not refresh + // on re-parent (Node/Bun parity). Tests inject a stub so they can + // simulate the parent dying without spawning ps or re-parenting any + // real process. + getParentPid?: () => number; + // Test seam: replace setInterval/clearInterval so the watchdog can + // fire deterministically in tests instead of waiting 5s. Defaults to + // the global timer functions. + setInterval?: (fn: () => void, ms: number) => unknown; + clearInterval?: (handle: unknown) => void; + // Test seam for the one-shot watchdog readiness probe. The default + // runs `spawnSync('ps', ['-o','ppid=','-p',PID])` and returns true on + // success. Tests inject a stub to simulate ps unavailability (e.g. + // stripped containers, busybox without procps) without modifying PATH. + // When the probe returns false, `installStdioLifecycle` skips the + // watchdog interval entirely and emits a loud stderr line. Without + // the probe, the original PR's behavior was a silent no-op: every + // tick fell through to the cached `process.ppid` and the watchdog + // never fired, while still claiming to be installed. + probeWatchdog?: () => boolean; +} + +export async function runServe( + engine: BrainEngine, + args: string[] = [], + opts: ServeOptions = {}, +) { // v0.26+: --http dispatches to the full OAuth 2.1 server (serve-http.ts) // with admin dashboard, scope enforcement, SSE feed, and the requireBearerAuth // middleware. Master's simpler startHttpTransport from v0.22.7 is superseded @@ -31,8 +92,279 @@ export async function runServe(engine: BrainEngine, args: string[] = []) { const { runServeHttp } = await import('./serve-http.ts'); await runServeHttp(engine, { port, tokenTtl, enableDcr, publicUrl, logFullParams }); - } else { - console.error('Starting GBrain MCP server (stdio)...'); - await startMcpServer(engine); + return; + } + + // stdio path — install lifecycle handlers BEFORE startMcpServer so that + // an early stdin EOF (parent died before our first read) can still + // trigger graceful release of the PGLite write lock held by `engine`. + // The HTTP / OAuth path above has its own lifecycle in serve-http.ts + // and is intentionally NOT wired into this stdio plumbing. + console.error('Starting GBrain MCP server (stdio)...'); + + installStdioLifecycle(engine, args, opts); + + const start = opts.startMcpServer ?? startMcpServer; + await start(engine); + // startMcpServer's `await server.connect(transport)` resolves once the + // SDK has wired up its stdin 'data' listener; that listener keeps the + // event loop alive. We deliberately do NOT add `await new Promise(() => + // {})` here — it would block this async frame and stop the lifecycle + // hooks from being able to call process.exit() cleanly. +} + +interface StdioLifecycleDeps { + stdin: NodeJS.ReadableStream & { isTTY?: boolean }; + signals: Pick; + exit: (code?: number) => void; + log: (msg: string) => void; + getParentPid: () => number; + setInterval: (fn: () => void, ms: number) => unknown; + clearInterval: (handle: unknown) => void; + probeWatchdog: () => boolean; +} + +function installStdioLifecycle( + engine: BrainEngine, + args: string[], + opts: ServeOptions, +): void { + const deps: StdioLifecycleDeps = { + stdin: opts.stdin ?? process.stdin, + signals: opts.signals ?? process, + exit: opts.exit ?? ((code?: number) => { process.exit(code); }), + log: opts.log ?? ((msg: string) => console.error(msg)), + getParentPid: opts.getParentPid ?? readLiveParentPid, + setInterval: opts.setInterval ?? ((fn, ms) => setInterval(fn, ms)), + clearInterval: opts.clearInterval ?? ((h) => clearInterval(h as ReturnType)), + probeWatchdog: opts.probeWatchdog ?? probeWatchdogAvailable, + }; + + let shuttingDown = false; + let parentWatchdog: unknown = null; + const beginShutdown = (reason: string): void => { + if (shuttingDown) return; + shuttingDown = true; + + // Stop the parent-watchdog interval as soon as a shutdown begins so + // it cannot fire a redundant 'parent-died' shutdown while the first + // one is still draining the cleanup chain. + if (parentWatchdog !== null) { + deps.clearInterval(parentWatchdog); + parentWatchdog = null; + } + + deps.log(`GBrain MCP server: graceful exit (${reason})`); + + // Race the cleanup against a deadline. engine.disconnect() does a + // PGLite WASM close + a synchronous rmSync on the lock dir; both + // should be sub-second, but a wedged WASM runtime shouldn't be able + // to trap us forever. If we hit the deadline we still exit; the + // lock dir is advisory and the next process's stale-lock check + // (process.kill(pid, 0) → ESRCH) will reclaim it. + const deadline = setTimeout(() => { + deps.log( + `GBrain MCP server: cleanup deadline (${CLEANUP_DEADLINE_MS}ms) exceeded — forcing exit`, + ); + deps.exit(0); + }, CLEANUP_DEADLINE_MS); + deadline.unref?.(); + + Promise.resolve() + .then(() => engine.disconnect()) + .catch((err: unknown) => { + const msg = err instanceof Error ? err.message : String(err); + deps.log(`GBrain MCP server: cleanup error: ${msg}`); + }) + .finally(() => { + clearTimeout(deadline); + deps.exit(0); + }); + }; + + // Signal-based termination. SIGTERM: daemon ask. SIGINT: user Ctrl-C. + // SIGHUP: terminal disconnect / daemon-style "reload" channels — Aragorn + // observed real-world hosts (Claude Desktop on macOS, hermes-agent + // restart) send these instead of closing stdin. All three get the same + // graceful path; the idempotency guard absorbs duplicate signals. + deps.signals.on('SIGTERM', () => beginShutdown('SIGTERM')); + deps.signals.on('SIGINT', () => beginShutdown('SIGINT')); + deps.signals.on('SIGHUP', () => beginShutdown('SIGHUP')); + + // Stdin EOF — the parent closes the pipe but the MCP SDK's + // StdioServerTransport only listens for 'data'/'error', not 'end' or + // 'close', so without these hooks the process keeps the engine (and its + // PGLite write lock) live indefinitely after the parent disconnects. + // 'end' fires on a clean EOF; 'close' fires when the underlying handle + // is destroyed (e.g. parent SIGKILL'd while pipe still open). Both + // converge on the same idempotent shutdown. + // Skip when stdin is a TTY: interactive `gbrain serve` use shouldn't + // terminate just because the user hasn't typed anything. Signal / + // watchdog paths still cover that case if needed. + if (!deps.stdin.isTTY) { + deps.stdin.once('end', () => beginShutdown('stdin-end')); + deps.stdin.once('close', () => beginShutdown('stdin-close')); + } + + // Parent-process watchdog. Some hosts (launchd, cron, certain MCP + // gateways) terminate without closing stdin and without sending a + // signal — the kernel just re-parents us to whichever ancestor is + // still alive (PID 1, or any closer subreaper such as launchd, systemd, + // tmux, or a parent shell with PR_SET_CHILD_SUBREAPER). Polling is the + // only portable way to notice; see `readLiveParentPid` for why we + // cannot rely on `process.ppid` (cached at process creation and never + // refreshed on re-parent in Node or Bun). + // + // We capture the initial parent PID once at install time and fire on + // ANY change, not just reparent-to-PID-1. The PR-#676 author's original + // `=== 1` check missed reparent-to-subreaper-PID-N, which is the actual + // observed behavior under launchd / systemd subreapers (codex review + // finding #3). A process legitimately started under PID 1 (e.g. a + // systemd service) skips the watchdog: there's no parent-death event + // to detect, and any reparent FROM 1 doesn't happen. `unref()` keeps + // the interval from blocking other exit paths. + // + // A one-shot startup probe (D2-revisited per codex finding #4) verifies + // that the underlying mechanism (`spawnSync('ps')`) actually works on + // this host. Stripped containers / busybox-without-procps environments + // would silently fall back to the cached `process.ppid` on every tick + // — the watchdog claims to be installed but never fires. When the probe + // fails, we skip installing the interval entirely and log loudly so the + // operator sees the degraded mode instead of a phantom watchdog. + const initialParentPid = deps.getParentPid(); + if (initialParentPid !== 1) { + if (!deps.probeWatchdog()) { + deps.log( + '[gbrain serve] watchdog disabled: ps unavailable, parent-death detection unavailable — child will rely on stdin EOF / signals only', + ); + } else { + parentWatchdog = deps.setInterval(() => { + if (deps.getParentPid() !== initialParentPid) { + beginShutdown('parent-died'); + } + }, PARENT_WATCHDOG_INTERVAL_MS); + (parentWatchdog as { unref?: () => void } | null)?.unref?.(); + } + } + + // Optional idle-timeout safety net. Default OFF; opt-in via + // `--stdio-idle-timeout `. The flag is for the rare case where + // the parent leaks the stdin pipe but never closes it (so 'end' never + // fires) and never sends another message — we'd otherwise sit on the + // PGLite lock forever. Off by default because most parents close + // properly and an over-eager idle timeout would surprise long-poll + // workloads. + const idleTimeoutSec = parseStdioIdleTimeout(args); + if (idleTimeoutSec > 0) { + let idleTimer: ReturnType | null = null; + const armIdle = (): void => { + if (idleTimer) clearTimeout(idleTimer); + idleTimer = setTimeout( + () => beginShutdown(`stdio-idle-timeout (${idleTimeoutSec}s)`), + idleTimeoutSec * 1000, + ); + idleTimer.unref?.(); + }; + armIdle(); + // Reset on every chunk. We can't observe SDK-parsed messages from + // here, but every JSON-RPC frame causes a 'data' event on stdin, so + // chunk-level granularity is sufficient. + deps.stdin.on('data', armIdle); + deps.log(`GBrain MCP server: stdio idle timeout = ${idleTimeoutSec}s`); + } +} + +/** + * Resolve the live parent PID from the kernel (not the cached startup + * value). Both Node and Bun expose `process.ppid` as a property captured + * at process creation, so it does NOT update when the kernel re-parents + * us to a new ancestor after the original parent dies — which is the + * exact event the watchdog needs to detect. Empirical evidence on + * macOS / Bun 1.3.12: `process.ppid` stays at the original parent ID + * indefinitely while `ps -o ppid= -p $$` reports the new parent within + * one tick. + * + * Cost: ~10ms per spawn. Called every 5s (PARENT_WATCHDOG_INTERVAL_MS), + * so amortized < 0.5% CPU. Falls back to `process.ppid` if `ps` fails + * (best-effort safety net for stripped-down containers, etc.); the + * startup probe at watchdog-install time loud-logs and skips the + * interval entirely when ps is unavailable, so a per-tick fallback is + * a redundant safety net rather than a primary mechanism. + */ +function readLiveParentPid(): number { + try { + const r = spawnSync('ps', ['-o', 'ppid=', '-p', String(process.pid)], { + encoding: 'utf8', + timeout: 1000, + }); + if (r.status === 0 && typeof r.stdout === 'string') { + const n = parseInt(r.stdout.trim(), 10); + if (Number.isInteger(n) && n >= 0) return n; + } + } catch { + /* fall through */ + } + return process.ppid; +} + +/** + * One-shot probe at watchdog-install time to confirm ps actually works + * on this host. Returns true iff `spawnSync('ps','-o','ppid=','-p',PID)` + * exits 0 with a parseable integer. When it returns false, the caller + * skips installing the watchdog and emits a loud stderr line — the + * operator sees "watchdog disabled" instead of an installed-but-never- + * fires phantom. + * + * Why a separate probe rather than relying on the per-tick fallback in + * `readLiveParentPid`: the per-tick fallback returns the cached + * `process.ppid` silently, so the watchdog runs every 5s, compares + * cached PPID to itself, never detects a change, and never fires — + * while still claiming to be active. The probe surfaces the gap once + * at install time and lets the caller short-circuit cleanly. + */ +function probeWatchdogAvailable(): boolean { + try { + const r = spawnSync('ps', ['-o', 'ppid=', '-p', String(process.pid)], { + encoding: 'utf8', + timeout: 1000, + }); + if (r.status !== 0 || typeof r.stdout !== 'string') return false; + const n = parseInt(r.stdout.trim(), 10); + return Number.isInteger(n) && n >= 0; + } catch { + return false; + } +} + +function parseStdioIdleTimeout(args: string[]): number { + const idx = args.indexOf('--stdio-idle-timeout'); + if (idx < 0) return 0; + const raw = args[idx + 1]; + // Strict parsing — silent fallback to 0 turns an opt-in safety net into + // a no-op when an operator typos the value (e.g. `--stdio-idle-timeout + // 30s`). `Number()` rejects partial parses like `30junk` (returns NaN), + // unlike `parseInt` which would silently accept it. A missing value + // (`--stdio-idle-timeout` at end of args) and any non-integer / negative + // value are surfaced as a CLI error before we install the timer. + if (raw === undefined) { + throw new Error( + '--stdio-idle-timeout requires a non-negative integer (seconds). Got: (missing value)', + ); + } + // Reject empty / whitespace-only explicitly: `Number('')` is 0 in JS, + // which would silently turn `--stdio-idle-timeout ""` into the + // documented opt-out — the exact silent-fallback failure mode this + // strict parser exists to prevent. + if (raw.trim() === '') { + throw new Error( + '--stdio-idle-timeout requires a non-negative integer (seconds). Got: (blank value)', + ); + } + const n = Number(raw); + if (!Number.isInteger(n) || n < 0) { + throw new Error( + `--stdio-idle-timeout requires a non-negative integer (seconds). Got: ${JSON.stringify(raw)}`, + ); } + return n; } diff --git a/test/serve-stdio-lifecycle.test.ts b/test/serve-stdio-lifecycle.test.ts new file mode 100644 index 000000000..e548b0134 --- /dev/null +++ b/test/serve-stdio-lifecycle.test.ts @@ -0,0 +1,442 @@ +import { describe, test, expect } from 'bun:test'; +import { EventEmitter } from 'events'; +import { runServe, type ServeOptions } from '../src/commands/serve'; +import type { BrainEngine } from '../src/core/engine'; + +// These tests cover the stdio lifecycle hooks added to runServe so that the +// PGLite write lock is released when the parent disconnects. We don't spawn +// a real Bun child or boot the real MCP SDK; we inject a stub `engine`, a +// fake stdin Readable (EventEmitter is enough — only on/once/emit are +// touched), an injected exit() that resolves a promise instead of +// terminating the process, and (per Codex Layer 2 review feedback) a +// no-op startMcpServer stub so the real MCP SDK never attaches a 'data' +// listener to the test runner's actual process.stdin. + +class StubEngine implements Partial { + // Track whether disconnect was called; the lock-release behavior we care + // about here is "did the lifecycle path actually invoke disconnect?". + disconnectCalls = 0; + disconnect = async (): Promise => { + this.disconnectCalls += 1; + }; +} + +class StubSignals { + private handlers = new Map void>>(); + on(signal: string, handler: (...a: unknown[]) => void): this { + const list = this.handlers.get(signal) ?? []; + list.push(handler); + this.handlers.set(signal, list); + return this; + } + emit(signal: string): void { + for (const h of this.handlers.get(signal) ?? []) h(); + } +} + +// Stub timer pair: `setInterval` returns a numeric handle; `tickAll()` +// fires every registered fn once, mirroring 1 real-time tick. Lets the +// test drive the parent-watchdog deterministically without 5s of wall +// clock and without leaving real timers active across the suite. +interface TimerStub { + setInterval: (fn: () => void, ms: number) => unknown; + clearInterval: (h: unknown) => void; + tickAll: () => void; + active: () => number; +} + +function makeTimerStub(): TimerStub { + const fns = new Map void>(); + let next = 1; + return { + setInterval(fn) { + const id = next++; + fns.set(id, fn); + return id; + }, + clearInterval(h) { + if (typeof h === 'number') fns.delete(h); + }, + tickAll() { + for (const fn of fns.values()) fn(); + }, + active() { + return fns.size; + }, + }; +} + +interface Harness { + engine: StubEngine; + stdin: EventEmitter & { isTTY?: boolean; on: any; once: any }; + signals: StubSignals; + logs: string[]; + exited: Promise; + opts: ServeOptions; + timers: TimerStub; + setParentPid: (pid: number) => void; +} + +function makeHarness(opts: { + isTTY?: boolean; + initialParentPid?: number; + probeWatchdog?: boolean; +} = {}): Harness { + const engine = new StubEngine(); + const stdin = new EventEmitter() as EventEmitter & { isTTY?: boolean }; + if (opts.isTTY) stdin.isTTY = true; + const signals = new StubSignals(); + const logs: string[] = []; + + let resolveExit!: (code: number) => void; + const exited = new Promise(r => { resolveExit = r; }); + let exitCalled = false; + + // Mutable parent-pid the test can flip; defaults to a non-1 sentinel + // so the watchdog *will* install (`initialParentPid !== 1` guard). + // Tests that want "we were spawned under PID 1" pass `initialParentPid: 1`. + let parentPid = opts.initialParentPid ?? 12345; + const timers = makeTimerStub(); + + // probeWatchdog defaults to true so tests run with watchdog installed. + // Set probeWatchdog: false to simulate stripped-container ps unavailability. + const probeWatchdogResult = opts.probeWatchdog ?? true; + + const serveOpts: ServeOptions = { + stdin: stdin as any, + signals: signals as any, + exit: (code?: number) => { + if (exitCalled) return; + exitCalled = true; + resolveExit(code ?? 0); + }, + log: (msg: string) => { logs.push(msg); }, + // Replace the real MCP SDK boot with a no-op so we never touch the + // test runner's real process.stdin. The lifecycle hooks under test + // are installed *before* this is awaited, so all behaviors are still + // exercised end-to-end. + startMcpServer: async () => {}, + getParentPid: () => parentPid, + setInterval: timers.setInterval, + clearInterval: timers.clearInterval, + probeWatchdog: () => probeWatchdogResult, + }; + + return { + engine, + stdin: stdin as any, + signals, + logs, + exited, + opts: serveOpts, + timers, + setParentPid: (pid: number) => { parentPid = pid; }, + }; +} + +// runServe in tests resolves quickly because the injected startMcpServer +// is a no-op. The lifecycle hooks were installed synchronously before +// that no-op was awaited, so they're already wired by the time runServe +// returns. We start runServe and `await` it (so any setup error surfaces +// immediately), then drive the test-controlled events. +async function startInBackground( + engine: StubEngine, + args: string[], + opts: ServeOptions, +): Promise { + await runServe(engine as unknown as BrainEngine, args, opts); +} + +describe('runServe stdio lifecycle', () => { + test('stdin end triggers engine.disconnect() and process exit(0)', async () => { + const h = makeHarness(); + await startInBackground(h.engine, [], h.opts); + + h.stdin.emit('end'); + const code = await h.exited; + + expect(code).toBe(0); + expect(h.engine.disconnectCalls).toBe(1); + expect(h.logs.some(l => l.includes('graceful exit (stdin-end)'))).toBe(true); + }); + + test('SIGTERM triggers graceful exit', async () => { + const h = makeHarness(); + await startInBackground(h.engine, [], h.opts); + + h.signals.emit('SIGTERM'); + const code = await h.exited; + + expect(code).toBe(0); + expect(h.engine.disconnectCalls).toBe(1); + expect(h.logs.some(l => l.includes('graceful exit (SIGTERM)'))).toBe(true); + }); + + test('SIGINT triggers graceful exit', async () => { + const h = makeHarness(); + await startInBackground(h.engine, [], h.opts); + + h.signals.emit('SIGINT'); + const code = await h.exited; + + expect(code).toBe(0); + expect(h.engine.disconnectCalls).toBe(1); + expect(h.logs.some(l => l.includes('graceful exit (SIGINT)'))).toBe(true); + }); + + test('SIGHUP triggers graceful exit (terminal disconnect / daemon reload)', async () => { + // Per Aragorn (#591): real-world hosts (Claude Desktop on macOS, + // hermes-agent restart) sometimes send SIGHUP instead of closing + // stdin or sending SIGTERM. The handler converges on the same + // graceful path as the other signals. + const h = makeHarness(); + await startInBackground(h.engine, [], h.opts); + + h.signals.emit('SIGHUP'); + const code = await h.exited; + + expect(code).toBe(0); + expect(h.engine.disconnectCalls).toBe(1); + expect(h.logs.some(l => l.includes('graceful exit (SIGHUP)'))).toBe(true); + }); + + test('stdin close (parent SIGKILL leaves pipe destroyed) triggers graceful exit', async () => { + // 'end' fires on a clean EOF; 'close' fires when the underlying + // handle is destroyed (e.g. parent SIGKILL'd while pipe still open). + // We must observe both — observing only 'end' would miss the + // hard-kill path that #591's reporter hit on macOS. + const h = makeHarness(); + await startInBackground(h.engine, [], h.opts); + + h.stdin.emit('close'); + const code = await h.exited; + + expect(code).toBe(0); + expect(h.engine.disconnectCalls).toBe(1); + expect(h.logs.some(l => l.includes('graceful exit (stdin-close)'))).toBe(true); + }); + + test('parent watchdog fires shutdown when ppid flips to 1 (orphaned to init)', async () => { + // Some hosts (launchd, cron, certain MCP gateways) terminate + // without closing stdin and without sending a signal — the kernel + // re-parents us. The watchdog polls the live ppid on an interval; + // when it differs from the initial captured ppid, we detect "parent + // died" and shut down. + const h = makeHarness({ initialParentPid: 4242 }); + await startInBackground(h.engine, [], h.opts); + + // Watchdog should have installed (we passed initialParentPid !== 1 + // and probeWatchdog defaulted to true). + expect(h.timers.active()).toBe(1); + + // Simulate parent death: our process gets re-parented to init. + h.setParentPid(1); + h.timers.tickAll(); + + const code = await h.exited; + expect(code).toBe(0); + expect(h.engine.disconnectCalls).toBe(1); + expect(h.logs.some(l => l.includes('graceful exit (parent-died)'))).toBe(true); + + // beginShutdown clears the watchdog interval as part of cleanup so + // a duplicate tick can't queue a redundant shutdown. + expect(h.timers.active()).toBe(0); + }); + + test('parent watchdog fires shutdown when ppid flips to a SUBREAPER PID > 1 (codex finding #3)', async () => { + // Reparent-to-PID-1 is the easy case. Real hosts under launchd / + // systemd / tmux / a parent-shell-with-PR_SET_CHILD_SUBREAPER will + // re-parent us to that subreaper's PID, NOT to 1. The PR-#676 + // author's original `=== 1` check missed this. The fix is to fire + // on `current !== initialParentPid` so any reparent triggers the + // shutdown, regardless of where the kernel re-anchors us. + const h = makeHarness({ initialParentPid: 8500 }); + await startInBackground(h.engine, [], h.opts); + + expect(h.timers.active()).toBe(1); + + // Parent died; kernel re-parented to a launchd subreaper (PID 47). + h.setParentPid(47); + h.timers.tickAll(); + + const code = await h.exited; + expect(code).toBe(0); + expect(h.engine.disconnectCalls).toBe(1); + expect(h.logs.some(l => l.includes('graceful exit (parent-died)'))).toBe(true); + expect(h.timers.active()).toBe(0); + }); + + test('parent watchdog NOT installed when initial ppid is already 1 (legitimate init child)', async () => { + // Spawned directly under PID 1 (e.g. systemd unit, Docker entrypoint): + // ppid=1 is the documented steady state, not "parent died". We must + // NOT install the watchdog or we'd shut down immediately. + const h = makeHarness({ initialParentPid: 1 }); + await startInBackground(h.engine, [], h.opts); + + expect(h.timers.active()).toBe(0); + + // Sanity: the other lifecycle paths still work. + h.signals.emit('SIGTERM'); + await h.exited; + expect(h.engine.disconnectCalls).toBe(1); + }); + + test('parent watchdog NOT installed when ps probe fails (codex finding #4 / D2-revisited)', async () => { + // Stripped containers / busybox-without-procps environments lack ps. + // The original PR's per-tick fallback would silently return cached + // process.ppid, never detect a change, and never fire the shutdown + // — while still claiming to be active. + // + // The fix: a one-shot startup probe. When it returns false, we skip + // installing the watchdog interval AND emit a loud stderr line so + // the operator sees the degraded mode at startup. + const h = makeHarness({ initialParentPid: 4242, probeWatchdog: false }); + await startInBackground(h.engine, [], h.opts); + + // Watchdog NOT installed — message matches behavior. + expect(h.timers.active()).toBe(0); + expect(h.logs.some(l => l.includes('[gbrain serve] watchdog disabled: ps unavailable'))).toBe(true); + + // Sanity: the other lifecycle paths still work — the shutdown still + // funnels through stdin EOF / signals, just not via the watchdog. + h.signals.emit('SIGTERM'); + await h.exited; + expect(h.engine.disconnectCalls).toBe(1); + }); + + test('parent watchdog tick with ppid still alive does NOT fire shutdown', async () => { + // The watchdog must only fire on the *transition* away from the + // initial ppid; a healthy tick (ppid still equal to the original) + // is a no-op. + const h = makeHarness({ initialParentPid: 4242 }); + await startInBackground(h.engine, [], h.opts); + + expect(h.timers.active()).toBe(1); + // Tick with ppid unchanged. + h.timers.tickAll(); + h.timers.tickAll(); + h.timers.tickAll(); + expect(h.engine.disconnectCalls).toBe(0); + + // ... and signal-driven shutdown still works after several quiet ticks. + h.signals.emit('SIGTERM'); + await h.exited; + expect(h.engine.disconnectCalls).toBe(1); + }); + + test('shutdown is idempotent — multiple signals only disconnect once', async () => { + const h = makeHarness(); + await startInBackground(h.engine, [], h.opts); + + h.signals.emit('SIGTERM'); + h.signals.emit('SIGTERM'); + h.signals.emit('SIGINT'); + h.stdin.emit('end'); + + await h.exited; + expect(h.engine.disconnectCalls).toBe(1); + }); + + test('TTY stdin does NOT install end watcher (interactive use unaffected)', async () => { + const h = makeHarness({ isTTY: true }); + await startInBackground(h.engine, [], h.opts); + + // Emit 'end' on TTY stdin — no listener should be wired so this is a + // no-op. The test passes by simply not exiting; we give the runtime a + // beat to confirm nothing fires. Signals must still work. + h.stdin.emit('end'); + await new Promise(r => setTimeout(r, 10)); + expect(h.engine.disconnectCalls).toBe(0); + + // Sanity: signals still wired regardless of TTY-ness. + h.signals.emit('SIGTERM'); + await h.exited; + expect(h.engine.disconnectCalls).toBe(1); + }); + + test('--stdio-idle-timeout 0 disarms the idle hook (sanity)', async () => { + const h = makeHarness(); + await startInBackground(h.engine, ['--stdio-idle-timeout', '0'], h.opts); + + // 0 is the documented opt-out. No idle hook should be armed; drive a + // different exit path to confirm flow still works. + h.signals.emit('SIGTERM'); + await h.exited; + expect(h.engine.disconnectCalls).toBe(1); + expect(h.logs.every(l => !l.includes('idle timeout'))).toBe(true); + }); + + test('--stdio-idle-timeout > 0 logs the configured value', async () => { + const h = makeHarness(); + await startInBackground(h.engine, ['--stdio-idle-timeout', '60'], h.opts); + + expect(h.logs.some(l => l.includes('stdio idle timeout = 60s'))).toBe(true); + + h.signals.emit('SIGTERM'); + await h.exited; + expect(h.engine.disconnectCalls).toBe(1); + }); + + test('idle timer is reset on every stdin data chunk', async () => { + const h = makeHarness(); + // Use a very short timeout so we can observe the firing/resetting + // without slowing the suite. 50ms is enough to be measurable but + // short enough that the suite finishes promptly. + await startInBackground( + h.engine, + ['--stdio-idle-timeout', '1'], // 1 second; we reset it before it fires + h.opts, + ); + + // Pulse 'data' a few times to keep the timer reset. + for (let i = 0; i < 3; i++) { + h.stdin.emit('data', Buffer.from('{"jsonrpc":"2.0"}')); + await new Promise(r => setTimeout(r, 100)); + } + expect(h.engine.disconnectCalls).toBe(0); + + // Now stop pulsing and wait for the timer to actually fire end-to-end + // (it ought to elapse within ~1s of the last reset). Awaiting + // h.exited rather than a wall-clock race makes this deterministic. + await h.exited; + expect(h.engine.disconnectCalls).toBe(1); + expect(h.logs.some(l => l.includes('stdio-idle-timeout (1s)'))).toBe(true); + }, 5000); + + test.each([ + ['abc', /--stdio-idle-timeout/], + ['30junk', /--stdio-idle-timeout/], + ['-1', /--stdio-idle-timeout/], + ['1.5', /--stdio-idle-timeout/], + ['', /--stdio-idle-timeout/], + ])('--stdio-idle-timeout rejects invalid value %p (typo is a CLI error)', async (bad, msgRe) => { + // Per Codex Layer 2 review P1: silent fallback on typo turns the + // opt-in safety net into a no-op. Strict parsing throws so the + // operator sees the mistake immediately. + const h = makeHarness(); + expect( + runServe(h.engine as unknown as BrainEngine, ['--stdio-idle-timeout', bad], h.opts), + ).rejects.toThrow(msgRe); + }); + + test('--stdio-idle-timeout with no following value also throws', async () => { + const h = makeHarness(); + // Flag at end of args — no value to consume. + expect( + runServe(h.engine as unknown as BrainEngine, ['--stdio-idle-timeout'], h.opts), + ).rejects.toThrow(/missing value/); + }); + + test('engine.disconnect throwing still results in exit(0) and logged error', async () => { + const h = makeHarness(); + h.engine.disconnect = async () => { + throw new Error('synthetic disconnect failure'); + }; + await startInBackground(h.engine, [], h.opts); + + h.signals.emit('SIGTERM'); + const code = await h.exited; + expect(code).toBe(0); + expect(h.logs.some(l => l.includes('cleanup error: synthetic disconnect failure'))).toBe(true); + }); +}); From 4e6d04b4c4c188f6ca3da0b66124b09683668ab0 Mon Sep 17 00:00:00 2001 From: Garry Tan Date: Sat, 9 May 2026 21:55:12 -0700 Subject: [PATCH 2/4] fix(auth): route HTTP auth/admin SQL through active engine MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit `gbrain auth` and `gbrain serve --http` previously routed every SQL through the postgres.js singleton in src/core/db.ts, which silently fell back to a file-backed PGLite when DATABASE_URL was set but the config file disagreed. The HTTP transport's verbatim use of the singleton also made `gbrain serve --http` Postgres-only, even though the `access_tokens` and `mcp_request_log` tables exist in both engine schemas. Auth, OAuth, admin, file uploads, and HTTP-transport SQL now run through `engine.executeRaw` via a deliberately narrow tagged-template adapter (`src/core/sql-query.ts`). The contract is scalar-binds-only — adding JSONB or fragment composition would invite the adapter to drift into a partial postgres.js clone. JSONB writes use a separate `executeRawJsonb(engine, sql, scalarParams, jsonbParams)` helper that composes positional `$N::jsonb` casts and passes objects through `engine.executeRaw`. The CI guard at `scripts/check-jsonb-pattern.sh` doesn't fire because the helper is a method call, not the banned `${JSON.stringify(x)}::jsonb` template-literal interpolation, and the v0.12.0 double-encode bug class doesn't apply to positional binding via `postgres.js`'s `unsafe()` (verified by `test/e2e/auth-permissions.test.ts:67` on Postgres and the new `test/sql-query.test.ts` on PGLite). Migrated call sites: - src/commands/auth.ts: takes-holders writes (lines 52, 86) → executeRawJsonb. List, revoke, register-client, revoke-client → SqlQuery via withConfiguredSql() helper that opens an engine, runs the callback, disconnects. - src/commands/serve-http.ts: ~25 call sites including the four mcp_request_log.params INSERTs (now write real JSONB objects, not JSON-encoded strings — the read side `params->>'op'` returns the operation name, closing CLAUDE.md's outstanding "JSON-string-into- JSONB" note as a side effect). The /admin/api/requests dynamic filter pattern (postgres.js fragment composition) is rewritten as parametrized SQL string + params array. - src/mcp/http-transport.ts: legacy bearer-auth path. The Postgres-only fail-fast at startup is removed because both schemas now carry access_tokens + mcp_request_log. - src/core/oauth-provider.ts: SqlQuery / SqlValue types relocated from here to sql-query.ts as the canonical home (Codex finding #8). - src/commands/files.ts: all 5 db.getConnection() sites (lines 104, 139, 252, 326, 355). The line-256 INSERT into files.metadata uses executeRawJsonb; the other four are scalar-only SqlQuery (Codex finding #6 — scope was bigger than the plan's "lone INSERT" framing). - src/core/config.ts: env-var DATABASE_URL inference. When dbUrl is set, infer Postgres engine and clear the stale database_path. Engine-internal sql.json() sites in src/core/postgres-engine.ts (5 sites: lines 520, 1689, 1728, 1790, 2313) STAY UNCHANGED. They live inside PostgresEngine itself, where the postgres.js template-tag sql.json() pattern is correct — those methods are only loaded when Postgres is the active engine, so there's no PGLite-routing concern. Migration v45 (mcp_request_log_params_jsonb_normalize): one-shot UPDATE that lifts pre-v0.31 string-shaped JSONB rows to objects so the /admin/api/requests endpoint at serve-http.ts:605 returns one consistent shape to the admin SPA. Idempotent (subsequent runs find no rows where jsonb_typeof = 'string'). Closes the mixed-shape window that would otherwise have made post-deploy admin reads break. Tests: - test/sql-query.test.ts: 7 cases covering scalar binds, the .json() rejection (defense in depth — SqlQuery is scalar-only), JSONB round-trip with `jsonb_typeof = 'object'` and `->>` semantics, the v0.12.0 double-encode regression guard, null JSONB handling, and the scalars-then-jsonb call shape. - test/config-env.test.ts: migrated from PR's manual `restoreEnv()` in afterEach to the canonical `withEnv()` helper at test/helpers/with-env.ts (CLAUDE.md R1 / codex finding D3). Five cases covering DATABASE_URL precedence, GBRAIN_DATABASE_URL operator override, file-only config, env-only config, and the no-config null path. - test/e2e/auth-takes-holders-pglite.test.ts: 6 cases against in-memory PGLite (no DATABASE_URL gate). Covers create / update / read of access_tokens.permissions, mcp_request_log.params object + null writes, and the migration v45 normalizer (seed string-shaped row, run UPDATE, assert object shape; second-run no-op for idempotency). - test/http-transport.test.ts: mock updated to intercept engine.executeRaw (the new code path) instead of the postgres.js template tag. 24 cases pass. Plan reference: ~/.claude/plans/system-instruction-you-are-working-peppy-moore.md. Codex outside-voice review applied: D-codex-1, D-codex-2, D-codex-5, D-codex-8, D-codex-9, D-codex-10 (and D1, D5 reversed by codex). Closes the architectural intent of #681. Supersedes its branch. Co-Authored-By: codex-bot Co-Authored-By: Claude Opus 4.7 (1M context) --- src/commands/auth.ts | 213 ++++++++++++--------- src/commands/files.ts | 58 +++--- src/commands/serve-http.ts | 128 +++++++++---- src/core/config.ts | 13 +- src/core/migrate.ts | 32 ++++ src/core/oauth-provider.ts | 5 +- src/core/sql-query.ts | 121 ++++++++++++ src/mcp/http-transport.ts | 37 ++-- test/config-env.test.ts | 131 +++++++++++++ test/e2e/auth-takes-holders-pglite.test.ts | 211 ++++++++++++++++++++ test/http-transport.test.ts | 62 ++++-- test/sql-query.test.ts | 175 +++++++++++++++++ 12 files changed, 986 insertions(+), 200 deletions(-) create mode 100644 src/core/sql-query.ts create mode 100644 test/config-env.test.ts create mode 100644 test/e2e/auth-takes-holders-pglite.test.ts create mode 100644 test/sql-query.test.ts diff --git a/src/commands/auth.ts b/src/commands/auth.ts index be1bed2b4..4d17409a8 100644 --- a/src/commands/auth.ts +++ b/src/commands/auth.ts @@ -11,20 +11,19 @@ * Also runs standalone (no compiled binary required): * DATABASE_URL=... bun run src/commands/auth.ts create "claude-desktop" * - * Both paths require DATABASE_URL or GBRAIN_DATABASE_URL (except `test`, - * which only hits the remote URL and doesn't need a local DB). + * DB-backed commands route through the active BrainEngine (PGLite or + * Postgres), so they work regardless of which engine the user's brain is + * configured for. The env-var DATABASE_URL / GBRAIN_DATABASE_URL still + * picks Postgres via loadConfig() (config.ts DbUrlSource inference), + * but the SQL itself goes through engine.executeRaw — never through a + * postgres.js singleton. `test` only hits a remote URL and doesn't need + * a local DB. */ -import postgres from 'postgres'; import { createHash, randomBytes } from 'crypto'; - -function getDatabaseUrl(requireDb: boolean): string | undefined { - const url = process.env.DATABASE_URL || process.env.GBRAIN_DATABASE_URL; - if (!url && requireDb) { - console.error('Set DATABASE_URL or GBRAIN_DATABASE_URL environment variable.'); - process.exit(1); - } - return url; -} +import { loadConfig, toEngineConfig } from '../core/config.ts'; +import { createEngine } from '../core/engine-factory.ts'; +import type { BrainEngine } from '../core/engine.ts'; +import { sqlQueryForEngine, executeRawJsonb, type SqlQuery } from '../core/sql-query.ts'; function hashToken(token: string): string { return createHash('sha256').update(token).digest('hex'); @@ -34,28 +33,60 @@ function generateToken(): string { return 'gbrain_' + randomBytes(32).toString('hex'); } +/** + * Acquire an engine from the active config, run `fn` with a SqlQuery, and + * disconnect afterward. Loud-fails when no config is present (matches the + * prior behavior of getDatabaseUrl(requireDb=true) — auth commands need a + * brain to write to). + */ +async function withConfiguredSql( + fn: (sql: SqlQuery, engine: BrainEngine) => Promise, +): Promise { + const config = loadConfig(); + if (!config) { + console.error('No GBrain config found. Run `gbrain init` first, or set DATABASE_URL / GBRAIN_DATABASE_URL.'); + process.exit(1); + } + const engine = await createEngine(toEngineConfig(config)); + const sql = sqlQueryForEngine(engine); + try { + return await fn(sql, engine); + } finally { + await engine.disconnect(); + } +} + async function create(name: string, opts: { takesHolders?: string[] } = {}) { if (!name) { console.error('Usage: auth create [--takes-holders world,garry]'); process.exit(1); } - const sql = postgres(getDatabaseUrl(true)!); const token = generateToken(); const hash = hashToken(token); try { - // v0.28: persist per-token takes-holder allow-list. Default ['world'] keeps - // private hunches hidden from MCP-bound tokens. - const takesHolders = opts.takesHolders && opts.takesHolders.length > 0 - ? opts.takesHolders - : ['world']; - const permissions = { takes_holders: takesHolders }; - await sql` - INSERT INTO access_tokens (name, token_hash, permissions) - VALUES (${name}, ${hash}, ${sql.json(permissions as Parameters[0])}) - `; - console.log(`Token created for "${name}" (takes_holders=${JSON.stringify(takesHolders)}):\n`); - console.log(` ${token}\n`); - console.log('Save this token — it will not be shown again.'); - console.log(`Revoke with: bun run src/commands/auth.ts revoke "${name}"`); - console.log(`Update visibility: bun run src/commands/auth.ts permissions "${name}" set-takes-holders world,garry`); + await withConfiguredSql(async (_sql, engine) => { + // v0.28: persist per-token takes-holder allow-list. Default ['world'] keeps + // private hunches hidden from MCP-bound tokens. + const takesHolders = opts.takesHolders && opts.takesHolders.length > 0 + ? opts.takesHolders + : ['world']; + const permissions = { takes_holders: takesHolders }; + // JSONB write: pass the object via executeRawJsonb with an explicit + // ::jsonb cast in the SQL string. Both engines round-trip the object + // through the wire-protocol type oid without the v0.12.0 double-encode + // bug class (verified by test/e2e/auth-permissions.test.ts:67 on + // Postgres and test/sql-query.test.ts on PGLite). + await executeRawJsonb( + engine, + `INSERT INTO access_tokens (name, token_hash, permissions) + VALUES ($1, $2, $3::jsonb)`, + [name, hash], + [permissions], + ); + console.log(`Token created for "${name}" (takes_holders=${JSON.stringify(takesHolders)}):\n`); + console.log(` ${token}\n`); + console.log('Save this token — it will not be shown again.'); + console.log(`Revoke with: gbrain auth revoke "${name}"`); + console.log(`Update visibility: gbrain auth permissions "${name}" set-takes-holders world,garry`); + }); } catch (e: any) { if (e.code === '23505') { console.error(`A token named "${name}" already exists. Revoke it first or use a different name.`); @@ -63,8 +94,6 @@ async function create(name: string, opts: { takesHolders?: string[] } = {}) { console.error('Error:', e.message); } process.exit(1); - } finally { - await sql.end(); } } @@ -73,43 +102,45 @@ async function permissions(name: string, action: string, value: string | undefin console.error('Usage: auth permissions set-takes-holders world,garry,brain'); process.exit(1); } - const sql = postgres(getDatabaseUrl(true)!); try { - const list = value.split(',').map(s => s.trim()).filter(Boolean); - if (list.length === 0) { - console.error('takes-holders list cannot be empty (use "world" for default-deny on private)'); - process.exit(1); - } - const perms = { takes_holders: list }; - const result = await sql` - UPDATE access_tokens - SET permissions = ${sql.json(perms as Parameters[0])} - WHERE name = ${name} - RETURNING id - `; - if (result.length === 0) { - console.error(`Token "${name}" not found.`); - process.exit(1); - } - console.log(`Updated "${name}": takes_holders = ${JSON.stringify(list)}`); + await withConfiguredSql(async (sql, engine) => { + const list = value.split(',').map(s => s.trim()).filter(Boolean); + if (list.length === 0) { + console.error('takes-holders list cannot be empty (use "world" for default-deny on private)'); + process.exit(1); + } + const perms = { takes_holders: list }; + // JSONB UPDATE via executeRawJsonb — same pattern as create() above. + const result = await executeRawJsonb( + engine, + `UPDATE access_tokens + SET permissions = $2::jsonb + WHERE name = $1 + RETURNING id`, + [name], + [perms], + ); + if (result.length === 0) { + console.error(`Token "${name}" not found.`); + process.exit(1); + } + console.log(`Updated "${name}": takes_holders = ${JSON.stringify(list)}`); + }); } catch (e: any) { console.error('Error:', e.message); process.exit(1); - } finally { - await sql.end(); } } async function list() { - const sql = postgres(getDatabaseUrl(true)!); - try { + await withConfiguredSql(async (sql) => { const rows = await sql` SELECT name, created_at, last_used_at, revoked_at FROM access_tokens ORDER BY created_at DESC `; if (rows.length === 0) { - console.log('No tokens found. Create one: bun run src/commands/auth.ts create "my-client"'); + console.log('No tokens found. Create one: gbrain auth create "my-client"'); return; } console.log('Name Created Last Used Status'); @@ -121,27 +152,23 @@ async function list() { const status = r.revoked_at ? 'REVOKED' : 'active'; console.log(`${name} ${created} ${lastUsed} ${status}`); } - } finally { - await sql.end(); - } + }); } async function revoke(name: string) { if (!name) { console.error('Usage: auth revoke '); process.exit(1); } - const sql = postgres(getDatabaseUrl(true)!); - try { - const result = await sql` + await withConfiguredSql(async (sql) => { + const rows = await sql` UPDATE access_tokens SET revoked_at = now() WHERE name = ${name} AND revoked_at IS NULL + RETURNING 1 `; - if (result.count === 0) { + if (rows.length === 0) { console.error(`No active token found with name "${name}".`); process.exit(1); } console.log(`Token "${name}" revoked.`); - } finally { - await sql.end(); - } + }); } async function test(url: string, token: string) { @@ -269,26 +296,25 @@ async function revokeClient(clientId: string) { console.error('Usage: auth revoke-client '); process.exit(1); } - const sql = postgres(getDatabaseUrl(true)!); try { - // Atomic single-statement delete: no race window between count + delete. - // Postgres cascades to oauth_tokens and oauth_codes (FK ON DELETE CASCADE - // declared in src/schema.sql:370,382) before the transaction commits. - const rows = await sql` - DELETE FROM oauth_clients WHERE client_id = ${clientId} - RETURNING client_id, client_name - `; - if (rows.length === 0) { - console.error(`No client found with id "${clientId}"`); - process.exit(1); - } - console.log(`OAuth client revoked: "${rows[0].client_name}" (${clientId})`); - console.log('Tokens and authorization codes purged via cascade.'); + await withConfiguredSql(async (sql) => { + // Atomic single-statement delete: no race window between count + delete. + // Postgres cascades to oauth_tokens and oauth_codes (FK ON DELETE CASCADE + // declared in src/schema.sql:370,382) before the transaction commits. + const rows = await sql` + DELETE FROM oauth_clients WHERE client_id = ${clientId} + RETURNING client_id, client_name + `; + if (rows.length === 0) { + console.error(`No client found with id "${clientId}"`); + process.exit(1); + } + console.log(`OAuth client revoked: "${rows[0].client_name}" (${clientId})`); + console.log('Tokens and authorization codes purged via cascade.'); + }); } catch (e: any) { console.error('Error:', e.message); process.exit(1); - } finally { - await sql.end(); } } @@ -301,25 +327,24 @@ async function registerClient(name: string, args: string[]) { : ['client_credentials']; const scopes = scopesIdx >= 0 && args[scopesIdx + 1] ? args[scopesIdx + 1] : 'read'; - const sql = postgres(getDatabaseUrl(true)!); try { - const { GBrainOAuthProvider } = await import('../core/oauth-provider.ts'); - const provider = new GBrainOAuthProvider({ sql: sql as any }); - const { clientId, clientSecret } = await provider.registerClientManual( - name, grantTypes, scopes, [], - ); - console.log(`OAuth client registered: "${name}"\n`); - console.log(` Client ID: ${clientId}`); - console.log(` Client Secret: ${clientSecret}\n`); - console.log(` Grant types: ${grantTypes.join(', ')}`); - console.log(` Scopes: ${scopes}\n`); - console.log('Save the client secret — it will not be shown again.'); - console.log(`Revoke with: gbrain auth revoke-client "${clientId}"`); + await withConfiguredSql(async (sql) => { + const { GBrainOAuthProvider } = await import('../core/oauth-provider.ts'); + const provider = new GBrainOAuthProvider({ sql }); + const { clientId, clientSecret } = await provider.registerClientManual( + name, grantTypes, scopes, [], + ); + console.log(`OAuth client registered: "${name}"\n`); + console.log(` Client ID: ${clientId}`); + console.log(` Client Secret: ${clientSecret}\n`); + console.log(` Grant types: ${grantTypes.join(', ')}`); + console.log(` Scopes: ${scopes}\n`); + console.log('Save the client secret — it will not be shown again.'); + console.log(`Revoke with: gbrain auth revoke-client "${clientId}"`); + }); } catch (e: any) { console.error('Error:', e.message); process.exit(1); - } finally { - await sql.end(); } } diff --git a/src/commands/files.ts b/src/commands/files.ts index ce84c54ea..a8990bfdd 100644 --- a/src/commands/files.ts +++ b/src/commands/files.ts @@ -2,7 +2,7 @@ import { readFileSync, readdirSync, statSync, lstatSync, existsSync, writeFileSy import { join, relative, extname, basename, dirname } from 'path'; import { createHash } from 'crypto'; import type { BrainEngine } from '../core/engine.ts'; -import * as db from '../core/db.ts'; +import { sqlQueryForEngine, executeRawJsonb } from '../core/sql-query.ts'; import { humanSize } from '../core/file-resolver.ts'; import { createProgress } from '../core/progress.ts'; import { getCliOptions, cliOptsToProgressOptions } from '../core/cli-options.ts'; @@ -47,16 +47,16 @@ export async function runFiles(engine: BrainEngine, args: string[]) { switch (subcommand) { case 'list': - await listFiles(args[1]); + await listFiles(engine, args[1]); break; case 'upload': - await uploadFile(args.slice(1)); + await uploadFile(engine, args.slice(1)); break; case 'sync': - await syncFiles(args[1]); + await syncFiles(engine, args[1]); break; case 'verify': - await verifyFiles(); + await verifyFiles(engine); break; case 'mirror': await mirrorFiles(args.slice(1)); @@ -74,7 +74,7 @@ export async function runFiles(engine: BrainEngine, args: string[]) { await cleanFiles(args.slice(1)); break; case 'upload-raw': - await uploadRaw(args.slice(1)); + await uploadRaw(engine, args.slice(1)); break; case 'signed-url': await signedUrl(args.slice(1)); @@ -100,8 +100,8 @@ export async function runFiles(engine: BrainEngine, args: string[]) { } } -async function listFiles(slug?: string) { - const sql = db.getConnection(); +async function listFiles(engine: BrainEngine, slug?: string) { + const sql = sqlQueryForEngine(engine); let rows; if (slug) { rows = await sql`SELECT * FROM files WHERE page_slug = ${slug} ORDER BY filename LIMIT 100`; @@ -116,12 +116,13 @@ async function listFiles(slug?: string) { console.log(`${rows.length} file(s):`); for (const row of rows) { - const size = row.size_bytes ? `${Math.round(row.size_bytes / 1024)}KB` : '?'; + const sizeBytes = row.size_bytes as number | null; + const size = sizeBytes ? `${Math.round(sizeBytes / 1024)}KB` : '?'; console.log(` ${row.page_slug || '(unlinked)'} / ${row.filename} [${size}, ${row.mime_type || '?'}]`); } } -async function uploadFile(args: string[]) { +async function uploadFile(engine: BrainEngine, args: string[]) { const filePath = args.find(a => !a.startsWith('--')); const pageSlug = args.find((a, i) => args[i - 1] === '--page') || null; @@ -136,7 +137,7 @@ async function uploadFile(args: string[]) { const storagePath = pageSlug ? `${pageSlug}/${filename}` : `unsorted/${hash.slice(0, 8)}-${filename}`; const mimeType = getMimeType(filePath); - const sql = db.getConnection(); + const sql = sqlQueryForEngine(engine); // Check for existing file by hash const existing = await sql`SELECT id FROM files WHERE content_hash = ${hash} AND storage_path = ${storagePath}`; @@ -178,7 +179,7 @@ async function uploadFile(args: string[]) { * * The .redirect.yaml pointer stays in the brain repo so git tracks what was stored. */ -async function uploadRaw(args: string[]) { +async function uploadRaw(engine: BrainEngine, args: string[]) { const filePath = args.find(a => !a.startsWith('--')); const pageSlug = args.find((a, i) => args[i - 1] === '--page') || null; const fileType = args.find((a, i) => args[i - 1] === '--type') || null; @@ -248,17 +249,20 @@ async function uploadRaw(args: string[]) { console.error(`Pointer written: ${pointerPath}`); } - // Record in DB - const sql = db.getConnection(); - await sql` - INSERT INTO files (page_slug, filename, storage_path, mime_type, size_bytes, content_hash, metadata) - VALUES (${pageSlug}, ${filename}, ${storagePath}, ${mimeType}, ${stat.size}, ${'sha256:' + hash}, - ${sql.json({ type: fileType, upload_method: method })}) - ON CONFLICT (storage_path) DO UPDATE SET - content_hash = EXCLUDED.content_hash, - size_bytes = EXCLUDED.size_bytes, - mime_type = EXCLUDED.mime_type - `; + // Record in DB. files.metadata is JSONB — pass the object via + // executeRawJsonb with an explicit ::jsonb cast so post-v0.31 reads see + // an actual object, not a JSON-encoded string (D1 wave). + await executeRawJsonb( + engine, + `INSERT INTO files (page_slug, filename, storage_path, mime_type, size_bytes, content_hash, metadata) + VALUES ($1, $2, $3, $4, $5, $6, $7::jsonb) + ON CONFLICT (storage_path) DO UPDATE SET + content_hash = EXCLUDED.content_hash, + size_bytes = EXCLUDED.size_bytes, + mime_type = EXCLUDED.mime_type`, + [pageSlug, filename, storagePath, mimeType, stat.size, 'sha256:' + hash], + [{ type: fileType, upload_method: method }], + ); // Output JSON for scripting console.log(JSON.stringify({ @@ -296,7 +300,7 @@ async function signedUrl(args: string[]) { console.log(url); } -async function syncFiles(dir?: string) { +async function syncFiles(engine: BrainEngine, dir?: string) { if (!dir || !existsSync(dir)) { console.error('Usage: gbrain files sync '); process.exit(1); @@ -323,7 +327,7 @@ async function syncFiles(dir?: string) { const mimeType = getMimeType(filePath); const stat = statSync(filePath); - const sql = db.getConnection(); + const sql = sqlQueryForEngine(engine); const existing = await sql`SELECT id FROM files WHERE content_hash = ${hash} AND storage_path = ${storagePath}`; if (existing.length > 0) { skipped++; @@ -351,8 +355,8 @@ async function syncFiles(dir?: string) { console.log(`Files sync complete: ${uploaded} uploaded, ${skipped} skipped (unchanged)`); } -async function verifyFiles() { - const sql = db.getConnection(); +async function verifyFiles(engine: BrainEngine) { + const sql = sqlQueryForEngine(engine); const rows = await sql`SELECT * FROM files ORDER BY storage_path LIMIT 1000`; if (rows.length === 0) { diff --git a/src/commands/serve-http.ts b/src/commands/serve-http.ts index 25d1f6148..4347eb65e 100644 --- a/src/commands/serve-http.ts +++ b/src/commands/serve-http.ts @@ -32,6 +32,7 @@ import { loadConfig } from '../core/config.ts'; import { buildError, serializeError } from '../core/errors.ts'; import { VERSION } from '../version.ts'; import * as db from '../core/db.ts'; +import { sqlQueryForEngine, executeRawJsonb } from '../core/sql-query.ts'; /** * /health endpoint timeout. 3s rather than 5s: Fly.io's default @@ -172,8 +173,12 @@ export async function runServeHttp(engine: BrainEngine, options: ServeHttpOption ); } - // Get raw SQL connection for OAuth provider - const sql = db.getConnection() as SqlQuery; + // Engine-aware SQL adapter. Routes through engine.executeRaw on both + // Postgres and PGLite — the OAuth/admin/auth surface no longer requires + // a postgres.js singleton, so `gbrain serve --http` works against PGLite + // brains too. The narrow SqlQuery contract is scalar-binds-only; JSONB + // writes use executeRawJsonb (see mcp_request_log INSERT sites below). + const sql = sqlQueryForEngine(engine); // Initialize OAuth provider. F12 cleanup: DCR-disable now flips a // constructor option instead of monkey-patching `_clientsStore` after @@ -595,27 +600,44 @@ export async function runServeHttp(engine: BrainEngine, options: ServeHttpOption const operation = req.query.operation as string; const status = req.query.status as string; - // Dynamic filtering via postgres.js tagged-template fragments. - // Each filter expands to either `AND col = $N` (parameterized) or - // an empty fragment. `WHERE 1=1` lets us always have a WHERE clause - // and unconditionally append AND-prefixed fragments — no string - // interpolation, no manual escaping, no sql.unsafe. - const agentFilter = agent && agent !== 'all' ? sql`AND token_name = ${agent}` : sql``; - const opFilter = operation && operation !== 'all' ? sql`AND operation = ${operation}` : sql``; - const statusFilter = status && status !== 'all' ? sql`AND status = ${status}` : sql``; - - const rows = await sql` - SELECT id, token_name, COALESCE(agent_name, token_name) as agent_name, - operation, latency_ms, status, params, error_message, created_at - FROM mcp_request_log - WHERE 1=1 ${agentFilter} ${opFilter} ${statusFilter} - ORDER BY created_at DESC LIMIT ${limit} OFFSET ${offset} - `; - const [countResult] = await sql` - SELECT count(*)::int as total FROM mcp_request_log - WHERE 1=1 ${agentFilter} ${opFilter} ${statusFilter} - `; - res.json({ rows, total: (countResult as any).total, page, pages: Math.ceil((countResult as any).total / limit) }); + // Dynamic filtering: SqlQuery is deliberately scalar-only and does not + // support fragment composition (the prior `sql\`AND ... = ${v}\`` shape). + // Build the WHERE clause with positional placeholders + a params array. + // `WHERE 1=1` lets us always have a WHERE clause and conditionally + // append `AND col = $N` fragments — still parameterized, still escaped + // by the driver, no sql.unsafe. + const filters: string[] = []; + const params: (string | number)[] = []; + if (agent && agent !== 'all') { + filters.push(`AND token_name = $${params.length + 1}`); + params.push(agent); + } + if (operation && operation !== 'all') { + filters.push(`AND operation = $${params.length + 1}`); + params.push(operation); + } + if (status && status !== 'all') { + filters.push(`AND status = $${params.length + 1}`); + params.push(status); + } + const filterSql = filters.join(' '); + const limitParam = `$${params.length + 1}`; + const offsetParam = `$${params.length + 2}`; + + const rows = await engine.executeRaw( + `SELECT id, token_name, COALESCE(agent_name, token_name) as agent_name, + operation, latency_ms, status, params, error_message, created_at + FROM mcp_request_log + WHERE 1=1 ${filterSql} + ORDER BY created_at DESC LIMIT ${limitParam} OFFSET ${offsetParam}`, + [...params, limit, offset], + ); + const [countResult] = await engine.executeRaw<{ total: number }>( + `SELECT count(*)::int as total FROM mcp_request_log + WHERE 1=1 ${filterSql}`, + params, + ); + res.json({ rows, total: countResult.total, page, pages: Math.ceil(countResult.total / limit) }); } catch { res.status(503).json({ error: 'service_unavailable' }); } @@ -767,8 +789,13 @@ export async function runServeHttp(engine: BrainEngine, options: ServeHttpOption // asserting >= 2 rows after tools/list + tools/call was unreachable. const latency = Date.now() - startTime; try { - await sql`INSERT INTO mcp_request_log (token_name, agent_name, operation, latency_ms, status, params) - VALUES (${authInfo.clientId}, ${agentName}, ${'tools/list'}, ${latency}, ${'success'}, ${null})`; + await executeRawJsonb( + engine, + `INSERT INTO mcp_request_log (token_name, agent_name, operation, latency_ms, status, params) + VALUES ($1, $2, $3, $4, $5, $6::jsonb)`, + [authInfo.clientId, agentName, 'tools/list', latency, 'success'], + [null], + ); } catch { /* best effort */ } broadcastEvent({ agent: agentName, @@ -807,8 +834,13 @@ export async function runServeHttp(engine: BrainEngine, options: ServeHttpOption // valid-op success/error. const latency = Date.now() - startTime; try { - await sql`INSERT INTO mcp_request_log (token_name, agent_name, operation, latency_ms, status, params, error_message) - VALUES (${authInfo.clientId}, ${agentName}, ${name}, ${latency}, ${'error'}, ${null}, ${`unknown_operation: ${name}`})`; + await executeRawJsonb( + engine, + `INSERT INTO mcp_request_log (token_name, agent_name, operation, latency_ms, status, error_message, params) + VALUES ($1, $2, $3, $4, $5, $6, $7::jsonb)`, + [authInfo.clientId, agentName, name, latency, 'error', `unknown_operation: ${name}`], + [null], + ); } catch { /* best effort */ } broadcastEvent({ agent: agentName, @@ -834,8 +866,13 @@ export async function runServeHttp(engine: BrainEngine, options: ServeHttpOption // persistence regression test reliable across both rejection paths. const latency = Date.now() - startTime; try { - await sql`INSERT INTO mcp_request_log (token_name, agent_name, operation, latency_ms, status, params, error_message) - VALUES (${authInfo.clientId}, ${agentName}, ${name}, ${latency}, ${'error'}, ${null}, ${`insufficient_scope: requires '${requiredScope}'`})`; + await executeRawJsonb( + engine, + `INSERT INTO mcp_request_log (token_name, agent_name, operation, latency_ms, status, error_message, params) + VALUES ($1, $2, $3, $4, $5, $6, $7::jsonb)`, + [authInfo.clientId, agentName, name, latency, 'error', `insufficient_scope: requires '${requiredScope}'`], + [null], + ); } catch { /* best effort */ } broadcastEvent({ agent: agentName, @@ -883,10 +920,19 @@ export async function runServeHttp(engine: BrainEngine, options: ServeHttpOption // never written to mcp_request_log or the SSE feed). --log-full-params // bypasses this for operators debugging on their own laptop, with the // startup warning printed earlier. + // + // D1 (v0.31 wave): mcp_request_log.params is JSONB. Pre-v0.31 wrote + // a JSON-string into that JSONB column via the postgres.js template + // tag's loose typing — readable but semantically wrong (params->>'op' + // would return the encoded string, not the value). Post-v0.31 we + // pass the OBJECT through executeRawJsonb with an explicit ::jsonb + // cast, so reads return real objects and `params->>'op'` returns + // 'tools/list'. Pre-existing string-shaped rows are normalized by + // migration v41 in src/core/migrate.ts. const safeParamsSummary = summarizeMcpParams(name, params); - const logParams = logFullParams - ? (params ? JSON.stringify(params) : null) - : (safeParamsSummary ? JSON.stringify(safeParamsSummary) : null); + const logParamsObj: unknown = logFullParams + ? (params || null) + : (safeParamsSummary || null); const broadcastParams = logFullParams ? (params || {}) : safeParamsSummary; try { @@ -894,8 +940,13 @@ export async function runServeHttp(engine: BrainEngine, options: ServeHttpOption const latency = Date.now() - startTime; try { - await sql`INSERT INTO mcp_request_log (token_name, agent_name, operation, latency_ms, status, params) - VALUES (${authInfo.clientId}, ${agentName}, ${name}, ${latency}, ${'success'}, ${logParams})`; + await executeRawJsonb( + engine, + `INSERT INTO mcp_request_log (token_name, agent_name, operation, latency_ms, status, params) + VALUES ($1, $2, $3, $4, $5, $6::jsonb)`, + [authInfo.clientId, agentName, name, latency, 'success'], + [logParamsObj], + ); } catch { /* best effort */ } broadcastEvent({ @@ -928,8 +979,13 @@ export async function runServeHttp(engine: BrainEngine, options: ServeHttpOption : serializeError(e); const errMsg = errorPayload.message; try { - await sql`INSERT INTO mcp_request_log (token_name, agent_name, operation, latency_ms, status, params, error_message) - VALUES (${authInfo.clientId}, ${agentName}, ${name}, ${latency}, ${'error'}, ${logParams}, ${errMsg})`; + await executeRawJsonb( + engine, + `INSERT INTO mcp_request_log (token_name, agent_name, operation, latency_ms, status, error_message, params) + VALUES ($1, $2, $3, $4, $5, $6, $7::jsonb)`, + [authInfo.clientId, agentName, name, latency, 'error', errMsg], + [logParamsObj], + ); } catch { /* best effort */ } broadcastEvent({ diff --git a/src/core/config.ts b/src/core/config.ts index c0c6f09e4..126a45388 100644 --- a/src/core/config.ts +++ b/src/core/config.ts @@ -134,15 +134,22 @@ export function loadConfig(): GBrainConfig | null { if (!fileConfig && !dbUrl) return null; - // Infer engine type if not explicitly set - const inferredEngine: 'postgres' | 'pglite' = fileConfig?.engine - || (fileConfig?.database_path ? 'pglite' : 'postgres'); + // Infer engine type. A DATABASE_URL-style env var is always a Postgres + // connection target and must override a file-backed PGLite engine + // selection; otherwise direct-script / operator paths can silently hit + // the local PGLite brain while claiming to use the env URL. The PGLite + // database_path is also cleared when dbUrl is set so toEngineConfig + // doesn't pass a stale path through alongside the URL. + const inferredEngine: 'postgres' | 'pglite' = dbUrl + ? 'postgres' + : fileConfig?.engine || (fileConfig?.database_path ? 'pglite' : 'postgres'); // Merge: env vars override config file. READ only — never mutate process.env. const merged = { ...fileConfig, engine: inferredEngine, ...(dbUrl ? { database_url: dbUrl } : {}), + ...(dbUrl ? { database_path: undefined } : {}), ...(process.env.OPENAI_API_KEY ? { openai_api_key: process.env.OPENAI_API_KEY } : {}), ...(process.env.GBRAIN_EMBEDDING_MODEL ? { embedding_model: process.env.GBRAIN_EMBEDDING_MODEL } : {}), ...(process.env.GBRAIN_EMBEDDING_DIMENSIONS ? { embedding_dimensions: parseInt(process.env.GBRAIN_EMBEDDING_DIMENSIONS, 10) } : {}), diff --git a/src/core/migrate.ts b/src/core/migrate.ts index 88473a5c8..7cc9972fd 100644 --- a/src/core/migrate.ts +++ b/src/core/migrate.ts @@ -2167,6 +2167,38 @@ export const MIGRATIONS: Migration[] = [ ALTER TABLE pages ADD COLUMN IF NOT EXISTS emotional_weight_recomputed_at TIMESTAMPTZ; `, }, + { + version: 45, + name: 'mcp_request_log_params_jsonb_normalize', + idempotent: true, + // v0.31 wave (D-codex-2 / D1): mcp_request_log.params is JSONB, but + // pre-v0.31 serve-http.ts wrote `JSON.stringify(...)` strings into it + // via the postgres.js template tag's loose typing. The column was + // technically JSONB but stored as a JSON-encoded string, so reads via + // `params->>'op'` returned the encoded string '"search"' instead of + // 'search'. The /admin/api/requests endpoint at serve-http.ts:605 + // returned both shapes raw to the SPA depending on row age. + // + // The v0.31 commit re-routes those INSERTs through executeRawJsonb, + // which writes real objects. This one-shot UPDATE lifts existing + // string-shaped rows up to objects so the read side sees one + // consistent shape. Idempotent: subsequent runs find no rows where + // jsonb_typeof = 'string' and the UPDATE is a no-op. + // + // `params #>> '{}'` extracts the underlying string at the top level + // (works for any string-typed JSONB), then ::jsonb re-parses it as + // JSON. If a row was written with a non-JSON-parseable string (e.g., + // `'"raw garbage"'::jsonb` — itself a JSON-string-of-anything), the + // re-parse keeps it as a string-typed JSONB rather than throwing, + // so the migration never aborts on weird rows. The `WHERE` filter + // guards against running on already-object rows. + sql: ` + UPDATE mcp_request_log + SET params = (params #>> '{}')::jsonb + WHERE jsonb_typeof(params) = 'string' + AND params #>> '{}' LIKE '{%'; + `, + }, ]; export const LATEST_VERSION = MIGRATIONS.length > 0 diff --git a/src/core/oauth-provider.ts b/src/core/oauth-provider.ts index fcbeeed99..d6cb50529 100644 --- a/src/core/oauth-provider.ts +++ b/src/core/oauth-provider.ts @@ -24,14 +24,13 @@ import type { OAuthRegisteredClientsStore } from '@modelcontextprotocol/sdk/serv import type { AuthInfo } from '@modelcontextprotocol/sdk/server/auth/types.js'; import { hashToken, generateToken, isUndefinedColumnError } from './utils.ts'; import { hasScope, assertAllowedScopes, parseScopeString, InvalidScopeError } from './scope.ts'; +import type { SqlQuery, SqlValue } from './sql-query.ts'; +export type { SqlQuery, SqlValue }; // --------------------------------------------------------------------------- // Types // --------------------------------------------------------------------------- -/** Raw SQL query function — works with both PGLite and postgres tagged templates */ -export type SqlQuery = (strings: TemplateStringsArray, ...values: unknown[]) => Promise[]>; - /** * Convert a JS array to a PostgreSQL array literal for PGLite compat. * diff --git a/src/core/sql-query.ts b/src/core/sql-query.ts new file mode 100644 index 000000000..6483af8f1 --- /dev/null +++ b/src/core/sql-query.ts @@ -0,0 +1,121 @@ +import type { BrainEngine } from './engine.ts'; + +/** + * Minimal tagged SQL function used by OAuth/admin/auth infrastructure. + * + * This is deliberately narrower than postgres.js's `sql` tag: values must be + * scalar bind parameters only. It does not support nested SQL fragments, + * sql.json(), sql.unsafe(), sql.begin(), or direct JS array binding. JSONB + * writes go through the separate `executeRawJsonb` helper below. + * + * The narrow surface is the feature: every call site in auth.ts / + * serve-http.ts / mcp/http-transport.ts / files.ts answers "do you support + * X?" with "no, and that's the contract." That keeps the adapter from + * drifting into a partial postgres.js clone (codex finding #7 from the + * v0.31 plan review). + */ +export type SqlValue = string | number | bigint | boolean | Date | null; +export type SqlQuery = (strings: TemplateStringsArray, ...values: SqlValue[]) => Promise[]>; + +/** + * Build a minimal tagged-template SQL adapter over the active BrainEngine. + * + * OAuth/admin code only needs scalar positional parameters plus returned rows. + * Using BrainEngine.executeRaw keeps the path engine-aware: Postgres goes + * through the connected postgres.js client (`unsafe(sql, params)`), while + * PGLite goes through its embedded `db.query(sql, params)`. + * + * The v0.12.0 double-encode bug class does NOT apply here because + * executeRaw uses positional binding, not the postgres.js template tag's + * auto-stringify path that caused the original silent-data-loss incident. + */ +export function sqlQueryForEngine(engine: BrainEngine): SqlQuery { + return async (strings: TemplateStringsArray, ...values: SqlValue[]) => { + for (const value of values) { + assertSqlValue(value); + } + const query = strings.reduce((acc, str, i) => { + return acc + str + (i < values.length ? `$${i + 1}` : ''); + }, ''); + return engine.executeRaw(query, values); + }; +} + +function assertSqlValue(value: unknown): asserts value is SqlValue { + if ( + value === null || + typeof value === 'string' || + typeof value === 'number' || + typeof value === 'bigint' || + typeof value === 'boolean' || + value instanceof Date + ) { + return; + } + + const kind = Array.isArray(value) + ? 'array' + : value && typeof (value as { then?: unknown }).then === 'function' + ? 'promise' + : typeof value; + throw new TypeError( + `sqlQueryForEngine only supports scalar bind values; got ${kind}. ` + + 'Use fixed SQL with scalar params, or executeRawJsonb for JSONB writes.', + ); +} + +/** + * Cross-engine JSONB write helper. Composes a parametrized SQL string with + * explicit `$N::jsonb` casts for the JSONB positions and passes the JSONB + * values as JS objects through `engine.executeRaw`. Both the postgres.js + * `unsafe(sql, params)` path (via PostgresEngine) and PGLite's + * `db.query(sql, params)` accept objects for `$N::jsonb` positions and + * round-trip them with `jsonb_typeof = 'object'` (verified by + * test/e2e/auth-permissions.test.ts:67 on Postgres and test/sql-query.test.ts + * on PGLite). + * + * Why this exists separately from SqlQuery: the SqlQuery contract is + * deliberately scalar-only. JSONB columns are rare enough across the + * auth/admin surface that a focused helper preserves the contract without + * forcing every call site to remember which positions hold JSONB. + * + * Why this is safe vs the v0.12.0 double-encode bug: the bug was specific + * to postgres.js's template-tag auto-stringify path interacting with + * sql.json() — not to positional binding through `unsafe()`. JS objects + * passed as positional params reach the wire protocol with the correct + * type oid (jsonb when cast in the SQL string), so there is no double- + * encode. The CI guard (scripts/check-jsonb-pattern.sh) doesn't fire + * because the source pattern is a method call (`executeRawJsonb(...)`), + * not the banned literal-template-tag interpolation pattern with + * JSON.stringify cast to jsonb. + * + * Usage: + * await executeRawJsonb( + * engine, + * `INSERT INTO access_tokens (name, token_hash, permissions) + * VALUES ($1, $2, $3::jsonb)`, + * [name, hash], + * [{ takes_holders: ['world', 'garry'] }], + * ); + * + * The SQL string MUST already contain the `$N::jsonb` casts; the helper + * does NOT rewrite or inject them. Scalar params come first ($1..$N), then + * JSONB params ($N+1..$N+M). Matching the call-site convention to scalars- + * before-JSONB simplifies argument order and matches how the existing + * call sites we're migrating are shaped. + */ +export async function executeRawJsonb>( + engine: BrainEngine, + sql: string, + scalarParams: SqlValue[], + jsonbParams: unknown[], +): Promise { + for (const value of scalarParams) { + assertSqlValue(value); + } + // jsonbParams are explicitly NOT validated as scalar — they're meant to + // hold JS objects/arrays that postgres.js / PGLite will encode as JSONB + // via the explicit ::jsonb cast in the caller's SQL string. + const params: unknown[] = [...scalarParams, ...jsonbParams]; + return engine.executeRaw(sql, params); +} diff --git a/src/mcp/http-transport.ts b/src/mcp/http-transport.ts index 0ba2be992..6356208d6 100644 --- a/src/mcp/http-transport.ts +++ b/src/mcp/http-transport.ts @@ -1,8 +1,9 @@ /** - * HTTP transport for `gbrain serve --http`. + * HTTP transport for `gbrain serve --http` (legacy bearer-auth path). * - * Postgres-only. PGLite users get a clear fail-fast at startup (the access_tokens - * table doesn't exist on PGLite per pglite-schema.ts). + * Engine-aware via SqlQuery (works on both Postgres and PGLite as of the + * v0.31 wave). The access_tokens and mcp_request_log tables exist on both + * engines (see src/core/pglite-schema.ts:478,495 and src/schema.sql). * * Security model: * - Every request must include `Authorization: Bearer ` (except /health) @@ -31,6 +32,7 @@ import { operations } from '../core/operations.ts'; import { VERSION } from '../version.ts'; import { dispatchToolCall } from './dispatch.ts'; import { buildDefaultLimiters, type RateLimiter } from './rate-limit.ts'; +import { sqlQueryForEngine } from '../core/sql-query.ts'; const DEFAULT_BODY_CAP = 1024 * 1024; // 1 MiB @@ -116,22 +118,11 @@ function resolveClientIp(req: Request, server: { requestIP: (r: Request) => { ad export async function startHttpTransport(opts: HttpTransportOptions) { const { port, engine } = opts; - // Fail-fast: HTTP transport requires Postgres because access_tokens / mcp_request_log - // only exist in the Postgres schema (see src/core/pglite-schema.ts:5-6). - if ((engine as { kind?: string }).kind !== 'postgres') { - console.error('Error: gbrain serve --http requires a Postgres engine for remote auth tokens.'); - console.error('PGLite is local-only by design (access_tokens table is Postgres-only).'); - console.error('Either:'); - console.error(' - Use stdio: gbrain serve'); - console.error(' - Migrate to Postgres: gbrain migrate --to supabase'); - process.exit(1); - } - - const sql = (engine as unknown as { sql: any }).sql; - if (!sql) { - console.error('Error: Postgres engine has no .sql client. Engine may not be connected.'); - process.exit(1); - } + // Engine-aware: route SQL through the active BrainEngine. Both Postgres + // and PGLite carry access_tokens + mcp_request_log in their schemas + // (pglite-schema.ts:478,495 and schema.sql), so the legacy bearer-auth + // path works on either engine without a postgres.js singleton. + const sql = sqlQueryForEngine(engine); const limiters = opts.limiters || buildDefaultLimiters(); const bodyCap = envInt('GBRAIN_HTTP_MAX_BODY_BYTES', DEFAULT_BODY_CAP); @@ -169,11 +160,13 @@ export async function startHttpTransport(opts: HttpTransportOptions) { WHERE token_hash = ${hash} AND revoked_at IS NULL `; if (!row) return { ok: false }; + const rowId = row.id as string; + const rowName = row.name as string; // Debounced last_used_at update — only writes once per token per 60s. // SQL-level WHERE clause keeps this race-tolerant even under concurrent requests. sql`UPDATE access_tokens SET last_used_at = now() - WHERE id = ${row.id} + WHERE id = ${rowId} AND (last_used_at IS NULL OR last_used_at < now() - interval '60 seconds')` .catch(() => { /* fire-and-forget */ }); // v0.28: extract per-token takes-holder allow-list. Fail-safe default @@ -184,8 +177,8 @@ export async function startHttpTransport(opts: HttpTransportOptions) { : ['world']; return { ok: true, - tokenId: row.id, - tokenName: row.name, + tokenId: rowId, + tokenName: rowName, takesHoldersAllowList: allowList, }; } catch { diff --git a/test/config-env.test.ts b/test/config-env.test.ts new file mode 100644 index 000000000..a3e6bce3c --- /dev/null +++ b/test/config-env.test.ts @@ -0,0 +1,131 @@ +import { mkdtempSync, rmSync } from 'fs'; +import { tmpdir } from 'os'; +import { join } from 'path'; +import { describe, expect, test } from 'bun:test'; +import { loadConfig, saveConfig } from '../src/core/config.ts'; +import { withEnv } from './helpers/with-env.ts'; + +// PR #681 originally shipped a manual `restoreEnv()` in afterEach for these +// tests. CLAUDE.md R1 (test-isolation lint) and the codex outside-voice +// review (D3 in the v0.31 plan) called that out — manual try/restore patterns +// don't survive an assertion failure mid-test, the codemod for parallel +// test execution can't whitelist them, and the canonical helper at +// test/helpers/with-env.ts already exists for this. Migrating here. +// +// withEnv()'s try/finally restores even when the callback throws, so a +// failed expect() inside the block leaves the process env clean for the +// next file in the shard. + +describe('loadConfig env database URL precedence', () => { + test('DATABASE_URL switches an existing PGLite file config to Postgres', async () => { + const home = mkdtempSync(join(tmpdir(), 'gbrain-config-env-')); + try { + // Pre-seed: PGLite file config in this isolated GBRAIN_HOME. + await withEnv( + { GBRAIN_HOME: home, GBRAIN_DATABASE_URL: undefined, DATABASE_URL: undefined }, + () => { + saveConfig({ engine: 'pglite', database_path: '/tmp/local-brain.pglite' }); + }, + ); + + // DATABASE_URL set: loadConfig must override PGLite selection, + // pick Postgres, and clear the stale database_path so toEngineConfig + // doesn't try to use both. + await withEnv( + { + GBRAIN_HOME: home, + GBRAIN_DATABASE_URL: undefined, + DATABASE_URL: 'postgres://user:pass@example.test:5432/gbrain', + }, + () => { + const cfg = loadConfig(); + expect(cfg?.engine).toBe('postgres'); + expect(cfg?.database_url).toBe('postgres://user:pass@example.test:5432/gbrain'); + expect(cfg?.database_path).toBeUndefined(); + }, + ); + } finally { + rmSync(home, { recursive: true, force: true }); + } + }); + + test('GBRAIN_DATABASE_URL beats DATABASE_URL (operator override)', async () => { + const home = mkdtempSync(join(tmpdir(), 'gbrain-config-env-')); + try { + await withEnv( + { GBRAIN_HOME: home, GBRAIN_DATABASE_URL: undefined, DATABASE_URL: undefined }, + () => { + saveConfig({ engine: 'pglite', database_path: '/tmp/local-brain.pglite' }); + }, + ); + + await withEnv( + { + GBRAIN_HOME: home, + GBRAIN_DATABASE_URL: 'postgres://win:win@gbrain.test:5432/db', + DATABASE_URL: 'postgres://lose:lose@other.test:5432/db', + }, + () => { + const cfg = loadConfig(); + expect(cfg?.engine).toBe('postgres'); + expect(cfg?.database_url).toBe('postgres://win:win@gbrain.test:5432/db'); + }, + ); + } finally { + rmSync(home, { recursive: true, force: true }); + } + }); + + test('No env DB URL → existing PGLite file config is honored', async () => { + const home = mkdtempSync(join(tmpdir(), 'gbrain-config-env-')); + try { + await withEnv( + { GBRAIN_HOME: home, GBRAIN_DATABASE_URL: undefined, DATABASE_URL: undefined }, + () => { + saveConfig({ engine: 'pglite', database_path: '/tmp/local-brain.pglite' }); + const cfg = loadConfig(); + expect(cfg?.engine).toBe('pglite'); + expect(cfg?.database_path).toBe('/tmp/local-brain.pglite'); + expect(cfg?.database_url).toBeUndefined(); + }, + ); + } finally { + rmSync(home, { recursive: true, force: true }); + } + }); + + test('No file config + DATABASE_URL → infers Postgres', async () => { + const home = mkdtempSync(join(tmpdir(), 'gbrain-config-env-')); + try { + await withEnv( + { + GBRAIN_HOME: home, + GBRAIN_DATABASE_URL: undefined, + DATABASE_URL: 'postgres://only:env@gbrain.test:5432/db', + }, + () => { + // No saveConfig() — no file present at all. + const cfg = loadConfig(); + expect(cfg?.engine).toBe('postgres'); + expect(cfg?.database_url).toBe('postgres://only:env@gbrain.test:5432/db'); + }, + ); + } finally { + rmSync(home, { recursive: true, force: true }); + } + }); + + test('No file config + no env DB URL → loadConfig returns null', async () => { + const home = mkdtempSync(join(tmpdir(), 'gbrain-config-env-')); + try { + await withEnv( + { GBRAIN_HOME: home, GBRAIN_DATABASE_URL: undefined, DATABASE_URL: undefined }, + () => { + expect(loadConfig()).toBeNull(); + }, + ); + } finally { + rmSync(home, { recursive: true, force: true }); + } + }); +}); diff --git a/test/e2e/auth-takes-holders-pglite.test.ts b/test/e2e/auth-takes-holders-pglite.test.ts new file mode 100644 index 000000000..6f57b6aa8 --- /dev/null +++ b/test/e2e/auth-takes-holders-pglite.test.ts @@ -0,0 +1,211 @@ +/** + * E2E for the v0.31 auth/admin SQL routing wave: full takes-holders + * round-trip on PGLite, in-memory, no DATABASE_URL gate. + * + * Mirrors test/e2e/auth-permissions.test.ts (which exercises the Postgres + * path) so JSONB shape parity is proven for both engines (Codex finding + * #1 from the v0.31 plan review). + * + * The path under test is the one auth.ts and src/mcp/http-transport.ts + * actually run after migration: + * 1. Token create with takes-holders → executeRawJsonb writes a JSONB object + * 2. validateToken-shaped read → SELECT permissions; jsonb_typeof = 'object' + * 3. Permissions update → executeRawJsonb again (UPDATE) + * 4. mcp_request_log.params write → executeRawJsonb (the serve-http flow) + * 5. Migration v45 normalizer → seed a string-shaped row, run the + * UPDATE, assert it's lifted to an object + */ +import { afterAll, beforeAll, describe, expect, test } from 'bun:test'; +import { PGLiteEngine } from '../../src/core/pglite-engine.ts'; +import { sqlQueryForEngine, executeRawJsonb } from '../../src/core/sql-query.ts'; + +let engine: PGLiteEngine; + +beforeAll(async () => { + engine = new PGLiteEngine(); + await engine.connect({}); + await engine.initSchema(); +}, 60_000); + +afterAll(async () => { + if (engine) await engine.disconnect(); +}); + +describe('auth takes-holders + mcp_request_log JSONB on PGLite (v0.31)', () => { + test('access_tokens.permissions: create + read returns a real JSONB object', async () => { + const sql = sqlQueryForEngine(engine); + const name = `tok-create-${Math.random().toString(36).slice(2, 8)}`; + const hash = `hash-${name}`; + const permissions = { takes_holders: ['world', 'garry'] }; + + // The exact shape auth.ts:create uses post-migration. + await executeRawJsonb( + engine, + `INSERT INTO access_tokens (name, token_hash, permissions) + VALUES ($1, $2, $3::jsonb)`, + [name, hash], + [permissions], + ); + + // The exact shape http-transport.ts:validateToken uses to read it back. + const rows = await sql` + SELECT permissions FROM access_tokens + WHERE token_hash = ${hash} + `; + const perms = (rows[0] as { permissions?: { takes_holders?: unknown } }).permissions; + expect(perms).toBeDefined(); + expect(Array.isArray(perms?.takes_holders)).toBe(true); + expect(perms?.takes_holders).toEqual(['world', 'garry']); + + // Defense in depth: the JSONB-text representation must be an object, + // not a JSON-encoded string. Codex finding #9 — assert the contract. + const typed = await engine.executeRaw<{ kind: string; first_holder: string }>( + `SELECT jsonb_typeof(permissions) AS kind, + permissions->'takes_holders'->>0 AS first_holder + FROM access_tokens WHERE token_hash = $1`, + [hash], + ); + expect(typed[0].kind).toBe('object'); + expect(typed[0].first_holder).toBe('world'); + }); + + test('access_tokens.permissions: UPDATE preserves JSONB object shape', async () => { + const sql = sqlQueryForEngine(engine); + const name = `tok-update-${Math.random().toString(36).slice(2, 8)}`; + const hash = `hash-${name}`; + + // Seed with default ['world']. + await executeRawJsonb( + engine, + `INSERT INTO access_tokens (name, token_hash, permissions) + VALUES ($1, $2, $3::jsonb)`, + [name, hash], + [{ takes_holders: ['world'] }], + ); + + // The exact shape auth.ts:permissions uses (set-takes-holders). + const result = await executeRawJsonb( + engine, + `UPDATE access_tokens + SET permissions = $2::jsonb + WHERE name = $1 + RETURNING id`, + [name], + [{ takes_holders: ['world', 'garry', 'brain'] }], + ); + expect(result).toHaveLength(1); + + const rows = await sql` + SELECT permissions FROM access_tokens + WHERE token_hash = ${hash} + `; + const perms = (rows[0] as { permissions: { takes_holders: string[] } }).permissions; + expect(perms.takes_holders).toEqual(['world', 'garry', 'brain']); + }); + + test('mcp_request_log.params: object writes round-trip as JSONB object', async () => { + // The serve-http.ts INSERT shape after the v0.31 migration. + const summary = { redacted: true, declared_keys: ['query', 'limit'], approx_bytes: 1024 }; + await executeRawJsonb( + engine, + `INSERT INTO mcp_request_log (token_name, agent_name, operation, latency_ms, status, params) + VALUES ($1, $2, $3, $4, $5, $6::jsonb)`, + ['test-token', 'test-agent', 'tools/call:query', 12, 'success'], + [summary], + ); + + const rows = await engine.executeRaw<{ + kind: string; + redacted: boolean; + bytes: number; + first_key: string; + }>( + `SELECT jsonb_typeof(params) AS kind, + (params->>'redacted')::boolean AS redacted, + (params->>'approx_bytes')::int AS bytes, + params->'declared_keys'->>0 AS first_key + FROM mcp_request_log + WHERE operation = $1`, + ['tools/call:query'], + ); + expect(rows[0].kind).toBe('object'); + expect(rows[0].redacted).toBe(true); + expect(rows[0].bytes).toBe(1024); + expect(rows[0].first_key).toBe('query'); + }); + + test('mcp_request_log.params: NULL writes (no params) round-trip as SQL NULL', async () => { + // tools/list and scope-rejected paths write NULL params. Must not + // be encoded as the string "null". + await executeRawJsonb( + engine, + `INSERT INTO mcp_request_log (token_name, agent_name, operation, latency_ms, status, params) + VALUES ($1, $2, $3, $4, $5, $6::jsonb)`, + ['test-token', 'test-agent', 'tools/list', 5, 'success'], + [null], + ); + + const rows = await engine.executeRaw<{ is_null: boolean }>( + `SELECT (params IS NULL) AS is_null FROM mcp_request_log + WHERE operation = 'tools/list'`, + ); + expect(rows[0].is_null).toBe(true); + }); + + test('migration v45 normalizer: lifts pre-v0.31 string-shaped rows to objects', async () => { + // Seed a row in the broken pre-v0.31 shape: a JSON-encoded object + // stored as a string-typed JSONB. This is what postgres.js's loose + // template-tag typing produced when `${JSON.stringify(obj)}` was + // bound to a JSONB column without sql.json(). + const broken = JSON.stringify({ legacy: 'shape', op: 'search' }); + await engine.executeRaw( + `INSERT INTO mcp_request_log (token_name, agent_name, operation, latency_ms, status, params) + VALUES ($1, $2, $3, $4, $5, to_jsonb($6::text))`, + ['legacy-token', 'legacy-agent', 'tools/call:legacy', 8, 'success', broken], + ); + // Confirm the seed produced the broken shape (jsonb_typeof = 'string'). + const before = await engine.executeRaw<{ kind: string }>( + `SELECT jsonb_typeof(params) AS kind FROM mcp_request_log + WHERE operation = 'tools/call:legacy'`, + ); + expect(before[0].kind).toBe('string'); + + // Run the migration v45 SQL exactly as it lives in src/core/migrate.ts. + await engine.executeRaw(` + UPDATE mcp_request_log + SET params = (params #>> '{}')::jsonb + WHERE jsonb_typeof(params) = 'string' + AND params #>> '{}' LIKE '{%' + `); + + // After: real object, ->> reads the values. + const after = await engine.executeRaw<{ kind: string; legacy: string; op: string }>( + `SELECT jsonb_typeof(params) AS kind, + params->>'legacy' AS legacy, + params->>'op' AS op + FROM mcp_request_log + WHERE operation = 'tools/call:legacy'`, + ); + expect(after[0].kind).toBe('object'); + expect(after[0].legacy).toBe('shape'); + expect(after[0].op).toBe('search'); + }); + + test('migration v45 normalizer: idempotent — re-running on already-fixed rows is a no-op', async () => { + // Run the migration a second time. The WHERE jsonb_typeof = 'string' + // guard means already-object rows are skipped, so this should leave + // the legacy row unchanged. + await engine.executeRaw(` + UPDATE mcp_request_log + SET params = (params #>> '{}')::jsonb + WHERE jsonb_typeof(params) = 'string' + AND params #>> '{}' LIKE '{%' + `); + const after = await engine.executeRaw<{ kind: string; legacy: string }>( + `SELECT jsonb_typeof(params) AS kind, params->>'legacy' AS legacy + FROM mcp_request_log WHERE operation = 'tools/call:legacy'`, + ); + expect(after[0].kind).toBe('object'); + expect(after[0].legacy).toBe('shape'); + }); +}); diff --git a/test/http-transport.test.ts b/test/http-transport.test.ts index 3df8e31e7..e011aa8af 100644 --- a/test/http-transport.test.ts +++ b/test/http-transport.test.ts @@ -23,6 +23,12 @@ type SqlHandler = (query: string, values: unknown[]) => SqlResult | Promise>(sql: string, params?: unknown[]) => Promise; sql: ReturnType; audit: { token_name: string | null; operation: string; status: string; latency_ms: number }[]; } @@ -39,6 +45,17 @@ function makeSqlTag(handler: SqlHandler) { }; } +/** + * Normalize a SQL string for pattern-matching: collapse whitespace, + * lowercase. Helps the mock's `executeRaw` match the queries that + * `sqlQueryForEngine` builds (multi-line, $1/$2/etc. placeholders) the + * same way the legacy template-tag mock matched the older single-line + * shapes. + */ +function normalizeSql(sql: string): string { + return sql.replace(/\s+/g, ' ').trim().toLowerCase(); +} + function hash(token: string): string { return createHash('sha256').update(token).digest('hex'); } @@ -61,35 +78,35 @@ function makeFakeEngine(cfg: FakeEngineConfig = {}): FakeEngine { const revokedTokens = cfg.revokedTokens ?? new Set(); const audit: FakeEngine['audit'] = []; - const sql = makeSqlTag((query, values) => { - if (cfg.dbDown && query.startsWith('SELECT')) throw new Error('db down'); + // Legacy template-tag handler. Preserved so any non-migrated call path + // still has a place to land (defense in depth). The new code path routes + // through executeRaw below. + const handle = (query: string, values: unknown[]): unknown[] => { + if (cfg.dbDown && query.toLowerCase().includes('select')) throw new Error('db down'); - if (query === 'SELECT 1') { - // /health DB probe + if (query === 'SELECT 1' || normalizeSql(query) === 'select 1') { return [{ '?column?': 1 }]; } - // v0.28: query now selects `permissions` too. Match either the legacy - // SELECT id, name shape OR the new SELECT id, name, permissions shape so - // older tests that haven't been updated still work; new tests can stash - // a `permissions` field on the validTokens row. - if (query.startsWith('SELECT id, name FROM access_tokens') || - query.startsWith('SELECT id, name, permissions FROM access_tokens')) { + const norm = normalizeSql(query); + + // SELECT id, name, permissions FROM access_tokens WHERE token_hash = $1 AND revoked_at IS NULL + if (norm.startsWith('select id, name from access_tokens') || + norm.startsWith('select id, name, permissions from access_tokens')) { const tokenHash = values[0] as string; if (revokedTokens.has(tokenHash)) return []; const row = validTokens.get(tokenHash); if (!row) return []; - // Default permissions to {takes_holders: ['world']} (matches migration v33 default). const rowWithPerms = { ...row, permissions: row.permissions ?? { takes_holders: ['world'] } }; return [rowWithPerms]; } - if (query.startsWith('UPDATE access_tokens')) { + if (norm.startsWith('update access_tokens')) { // last_used_at debounce — succeed silently return []; } - if (query.startsWith('INSERT INTO mcp_request_log')) { + if (norm.startsWith('insert into mcp_request_log')) { audit.push({ token_name: values[0] as string | null, operation: values[1] as string, @@ -100,9 +117,24 @@ function makeFakeEngine(cfg: FakeEngineConfig = {}): FakeEngine { } return []; - }); + }; + + const sql = makeSqlTag(handle); + + // v0.31: sqlQueryForEngine + executeRawJsonb both call engine.executeRaw. + // The new SQL strings carry $N positional placeholders (not the legacy + // template-tag `?`), but the queries themselves match by their leading + // text. We normalize whitespace so multi-line SQL bodies match the same + // way single-line shapes did. + const executeRaw = async >( + rawSql: string, + params?: unknown[], + ): Promise => { + const result = handle(rawSql, params ?? []); + return Promise.resolve(result as T[]); + }; - return { kind: 'postgres', sql, audit }; + return { kind: 'postgres', executeRaw, sql, audit }; } interface TestServer { diff --git a/test/sql-query.test.ts b/test/sql-query.test.ts new file mode 100644 index 000000000..c9f57a515 --- /dev/null +++ b/test/sql-query.test.ts @@ -0,0 +1,175 @@ +import { afterAll, beforeAll, describe, expect, test } from 'bun:test'; +import { PGLiteEngine } from '../src/core/pglite-engine.ts'; +import { sqlQueryForEngine, executeRawJsonb } from '../src/core/sql-query.ts'; + +let engine: PGLiteEngine; + +beforeAll(async () => { + engine = new PGLiteEngine(); + await engine.connect({}); +}, 30_000); + +afterAll(async () => { + if (engine) await engine.disconnect(); +}); + +describe('sqlQueryForEngine', () => { + test('runs parameterized tagged-template SQL against PGLite', async () => { + const sql = sqlQueryForEngine(engine); + const rows = await sql`SELECT ${'pglite'}::text AS engine, ${3}::int AS count`; + expect(rows).toEqual([{ engine: 'pglite', count: 3 }]); + }); + + test('rejects postgres.js-style fragment / object values explicitly', async () => { + const sql = sqlQueryForEngine(engine); + await expect( + sql`SELECT ${(Promise.resolve([]) as any)}::text AS bad` + ).rejects.toThrow(/only supports scalar bind values/); + await expect( + sql`SELECT ${(['read', 'write'] as any)}::text[] AS bad` + ).rejects.toThrow(/only supports scalar bind values/); + await expect( + sql`SELECT ${({ takes_holders: ['world'] } as any)}::jsonb AS bad` + ).rejects.toThrow(/only supports scalar bind values/); + }); +}); + +describe('executeRawJsonb (D1 wave / v0.31)', () => { + test('round-trips an object as JSONB on PGLite (jsonb_typeof = object, ->> reads value)', async () => { + // Verifies the cross-engine JSONB write helper produces a real Postgres + // JSONB object — not a quoted JSON string. Codex's plan-review #9 said + // "use the actual JSONB contract, not string-grep for backslash-quote", + // and that's what this asserts: jsonb_typeof + ->>. + const tableName = `t_jsonb_${Math.random().toString(36).slice(2, 10)}`; + await engine.executeRaw(`CREATE TEMP TABLE ${tableName} (j jsonb)`); + try { + await executeRawJsonb( + engine, + `INSERT INTO ${tableName} (j) VALUES ($1::jsonb)`, + [], + [{ k: 'v', n: 42 }], + ); + const rows = await engine.executeRaw<{ kind: string; k: string; n: number }>( + `SELECT jsonb_typeof(j) AS kind, j->>'k' AS k, (j->>'n')::int AS n FROM ${tableName}`, + ); + expect(rows).toHaveLength(1); + expect(rows[0].kind).toBe('object'); + expect(rows[0].k).toBe('v'); + expect(rows[0].n).toBe(42); + } finally { + await engine.executeRaw(`DROP TABLE ${tableName}`); + } + }); + + test('takes-holders shape: object preserved, ->> returns the encoded array, NOT a double-encoded string (v0.12.0 regression guard)', async () => { + // The v0.12.0 silent-data-loss bug stored `${JSON.stringify(perms)}::jsonb` + // as a JSON string-of-an-object, so `permissions->>'takes_holders'` + // would return a string with backslashes instead of the array. + // Post-fix: real JSONB object, ->> on the array key returns the + // pretty-printed JSON of the array (because jsonb -> array -> ->> is + // the array as a text), and jsonb_typeof on the parent stays 'object'. + const tableName = `t_perms_${Math.random().toString(36).slice(2, 10)}`; + await engine.executeRaw( + `CREATE TEMP TABLE ${tableName} (id serial PRIMARY KEY, permissions jsonb)`, + ); + try { + const perms = { takes_holders: ['world', 'garry'] }; + await executeRawJsonb( + engine, + `INSERT INTO ${tableName} (permissions) VALUES ($1::jsonb)`, + [], + [perms], + ); + const rows = await engine.executeRaw<{ + outer_kind: string; + holders_kind: string; + first_holder: string; + text_form: string; + }>( + `SELECT + jsonb_typeof(permissions) AS outer_kind, + jsonb_typeof(permissions->'takes_holders') AS holders_kind, + permissions->'takes_holders'->>0 AS first_holder, + permissions::text AS text_form + FROM ${tableName}`, + ); + expect(rows).toHaveLength(1); + // The parent JSONB is an object; the takes_holders child is an array. + // Pre-fix this would be 'string' / 'string' (string-of-object). + expect(rows[0].outer_kind).toBe('object'); + expect(rows[0].holders_kind).toBe('array'); + expect(rows[0].first_holder).toBe('world'); + // Defense in depth: the text representation must NOT contain + // backslash-quote sequences, which is what double-encoded JSONB + // looked like in the v0.12.0 incident (e.g. `"{\"takes_holders\":...}"`). + expect(rows[0].text_form).not.toContain('\\"'); + // And the text representation should look like a normal JSON object + // — starts with `{`, not `"{`. + expect(rows[0].text_form.startsWith('{')).toBe(true); + } finally { + await engine.executeRaw(`DROP TABLE ${tableName}`); + } + }); + + test('null jsonb value stores NULL, not the string "null"', async () => { + // serve-http.ts sometimes inserts NULL params (e.g. tools/list, scope- + // rejected paths). The helper must accept null without trying to + // encode it as the string "null" or rejecting it. + const tableName = `t_jnull_${Math.random().toString(36).slice(2, 10)}`; + await engine.executeRaw(`CREATE TEMP TABLE ${tableName} (j jsonb)`); + try { + await executeRawJsonb( + engine, + `INSERT INTO ${tableName} (j) VALUES ($1::jsonb)`, + [], + [null], + ); + const rows = await engine.executeRaw<{ kind: string | null; is_null: boolean }>( + `SELECT jsonb_typeof(j) AS kind, (j IS NULL) AS is_null FROM ${tableName}`, + ); + expect(rows[0].is_null).toBe(true); + // jsonb_typeof on SQL NULL returns NULL. + expect(rows[0].kind).toBeNull(); + } finally { + await engine.executeRaw(`DROP TABLE ${tableName}`); + } + }); + + test('mixes scalar params and jsonb params in positional order', async () => { + // Real call shape: scalars first ($1..$N), JSONB params next + // ($N+1..$N+M). Mirrors the auth.ts `INSERT INTO access_tokens + // (name, token_hash, permissions) VALUES ($1, $2, $3::jsonb)` pattern. + const tableName = `t_mix_${Math.random().toString(36).slice(2, 10)}`; + await engine.executeRaw( + `CREATE TEMP TABLE ${tableName} (name text, weight int, payload jsonb)`, + ); + try { + await executeRawJsonb( + engine, + `INSERT INTO ${tableName} (name, weight, payload) VALUES ($1, $2, $3::jsonb)`, + ['alice', 7], + [{ tags: ['a', 'b'] }], + ); + const rows = await engine.executeRaw<{ name: string; weight: number; first_tag: string }>( + `SELECT name, weight, payload->'tags'->>0 AS first_tag FROM ${tableName}`, + ); + expect(rows).toEqual([{ name: 'alice', weight: 7, first_tag: 'a' }]); + } finally { + await engine.executeRaw(`DROP TABLE ${tableName}`); + } + }); + + test('rejects non-scalar values in scalarParams (defense in depth)', async () => { + // The scalar position validator should fire even when a misuse passes + // an object via scalarParams instead of jsonbParams. Catches the + // cross-up-the-positions footgun loud at the helper boundary. + await expect( + executeRawJsonb( + engine, + `SELECT $1::text AS bad`, + [{ object: 'in scalar position' } as any], + [], + ), + ).rejects.toThrow(/only supports scalar bind values/); + }); +}); From 3b21b1b32388ca975807189eb08d7c7612525bd1 Mon Sep 17 00:00:00 2001 From: Garry Tan Date: Sat, 9 May 2026 22:10:13 -0700 Subject: [PATCH 3/4] docs: update CLAUDE.md key files for v0.31.3 Annotate the v0.31.3 changes in the canonical Key Files section: new src/core/sql-query.ts adapter (#681), src/commands/serve.ts stdio cleanup (#676), v0.31.3 amendments to auth.ts / serve-http.ts / oauth-provider.ts surfaces, and migration v46 normalizer in migrate.ts. Co-Authored-By: Claude Opus 4.7 --- CLAUDE.md | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/CLAUDE.md b/CLAUDE.md index c90d11116..2aa53b1e1 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -140,10 +140,12 @@ strict behavior when unset. - `src/mcp/server.ts` — MCP stdio server (generated from operations). v0.22.7: tool-call handler delegates to `dispatchToolCall` from `src/mcp/dispatch.ts` so stdio + HTTP transports share one validation, context-build, and error-format path. - `src/mcp/dispatch.ts` (v0.22.7) — Shared tool-call dispatch consumed by both stdio (`server.ts`) and HTTP transports. Exports `dispatchToolCall(engine, name, params, opts)`, `buildOperationContext(engine, params, opts)`, and `validateParams(op, params)`. Single source of truth for `(ctx, params)` handler arg order and the 5-field `OperationContext` shape (engine + config + logger + dryRun + remote). Defaults to `remote: true` (untrusted); local CLI callers pass `remote: false`. Closed F1/F2/F3 drift bugs in the original v0.22.5 HTTP transport. **v0.26.9 (F8):** adds `summarizeMcpParams(opName, params)` — privacy-preserving redactor for `mcp_request_log` and the admin SSE feed. Returns `{redacted, kind, declared_keys, unknown_key_count, approx_bytes}`. Intersects submitted top-level keys against the operation's declared `params` allow-list (declared keys preserved as a sorted array for debug visibility; unknown keys counted but never named, closing the attacker-controlled-key-name leak). Byte counts bucketed up to nearest 1KB so an attacker can't binary-search secret-content sizes via repeated probes. Operators on a personal laptop who want raw payload visibility opt back in with `gbrain serve --http --log-full-params` (loud stderr warning at startup). Canonical helper — new logging code paths route through it rather than `JSON.stringify(params)`. - `src/mcp/rate-limit.ts` (v0.22.7) — Bounded-LRU token-bucket limiter. `buildDefaultLimiters()` returns the two-bucket pipeline: pre-auth IP (30/60s, fires BEFORE the DB lookup so brute-force load against `access_tokens` is actually capped) + post-auth token-id (60/60s). Tracks `lastTouchedMs` separately from `lastRefillMs` so an exhausted key can't be reset by hammering past the TTL. LRU cap bounds memory under attacker-controlled key growth. -- `src/commands/serve-http.ts` (v0.26.0) — Express 5 HTTP MCP server with OAuth 2.1, admin dashboard, and SSE live activity feed. Started via `gbrain serve --http [--port N] [--token-ttl N] [--enable-dcr] [--public-url URL] [--log-full-params]`. Supersedes the v0.22.7 `src/mcp/http-transport.ts` simple bearer-auth path. Combines MCP SDK's `mcpAuthRouter` (authorize / token / register / revoke endpoints), a custom `client_credentials` handler (SDK's token endpoint throws `UnsupportedGrantTypeError` for CC; the custom handler runs BEFORE the router and falls through for `auth_code` / `refresh_token`), `requireBearerAuth` middleware for `/mcp` with scope enforcement before op dispatch, `localOnly` rejection, and `express-rate-limit` at 50 req / 15 min on `/token`. Serves the built admin SPA from `admin/dist/` with SPA fallback. `/admin/events` SSE endpoint broadcasts every MCP request to connected admin browsers. `cookie-parser` middleware wired (Express 5 has no built-in). Startup logging prints port, engine, configured issuer URL (honors `--public-url`), registered-client count, DCR status, and admin bootstrap token. **v0.26.9 hardening pass:** F7 sets `remote: true` explicitly on the `/mcp` request handler's OperationContext literal (closes the HTTP shell-job RCE — without this, `submit_job`'s protected-name guard at `operations.ts:1391` saw a falsy undefined and skipped, letting a `read+write`-scoped OAuth token submit `shell` jobs). F8 wires `summarizeMcpParams` from `src/mcp/dispatch.ts` into both `mcp_request_log` writes and the admin SSE feed by default (raw payloads opt-in via `--log-full-params` with stderr warning). F9 sets cookie `Secure` flag when behind HTTPS or a public-URL proxy. F10 caps the magic-link nonce store with an LRU bound. F12 routes DCR disable through the `GBrainOAuthProvider` constructor's `dcrDisabled` option instead of the prior monkey-patch on the express router. F14 wraps `transport.handleRequest` in try/catch so SDK throws return a JSON-RPC 500 envelope instead of express's default HTML error page. F15 unifies OperationError + unexpected exceptions through `buildError` / `serializeError` so `/mcp` always returns the same envelope shape. **v0.28.1:** `/health` endpoint extracted into pure `probeHealth(engine)` async function with `HEALTH_TIMEOUT_MS = 3000` exported constant — drops the timeout from 5s to 3s so Fly.io's 5s health-check deadline gets 2s of headroom for TCP, response framing, and clock skew. Races `engine.getStats()` against the timeout via `Promise.race`; saturated pool returns 503 with `Health check timed out (database pool may be saturated)` instead of hanging. `clearTimeout` in finally block prevents pending-timer pile-up under high probe rates (race-leak fix from adversarial review). **v0.28.10:** `/health` is now liveness-only via the new `probeLiveness(sql, engineName, version, timeoutMs)` helper that races `sql\`SELECT 1\`` against `HEALTH_TIMEOUT_MS` and returns the same `ProbeHealthResult` tagged-union as `probeHealth` (single timer-cleanup site, single 503 envelope). Body shape: `{status, version, engine}` only — engine stats are no longer spread on the public route. Full stats moved to a new admin endpoint `/admin/api/full-stats` (sibling to `/admin/api/stats` and `/admin/api/health-indicators`) gated by the existing `requireAdmin` middleware; that route calls `probeHealth(engine, ...)` and returns the original spread-stats body. `?full=true` query param removed entirely. Closes the original DoS surface where `getStats()`'s 6× count(*) on 96K-page brains through PgBouncer exceeded `HEALTH_TIMEOUT_MS` and triggered orchestrator restart cascades (Fly.io / k8s seeing 503 → restart loop → advisory-lock pile-up on the migration lock). Outside-voice review (Codex) caught that `/admin/api/health-indicators` is NOT a full-stats endpoint (returns only `{expiring_soon, error_rate}`), and that an alternative loopback-IP gate would have depended on `app.set('trust proxy', 'loopback')` semantics holding under proxy/XFF misconfiguration; the shipped admin-cookie design avoids both. +- `src/commands/serve-http.ts` (v0.26.0) — Express 5 HTTP MCP server with OAuth 2.1, admin dashboard, and SSE live activity feed. Started via `gbrain serve --http [--port N] [--token-ttl N] [--enable-dcr] [--public-url URL] [--log-full-params]`. Supersedes the v0.22.7 `src/mcp/http-transport.ts` simple bearer-auth path. Combines MCP SDK's `mcpAuthRouter` (authorize / token / register / revoke endpoints), a custom `client_credentials` handler (SDK's token endpoint throws `UnsupportedGrantTypeError` for CC; the custom handler runs BEFORE the router and falls through for `auth_code` / `refresh_token`), `requireBearerAuth` middleware for `/mcp` with scope enforcement before op dispatch, `localOnly` rejection, and `express-rate-limit` at 50 req / 15 min on `/token`. Serves the built admin SPA from `admin/dist/` with SPA fallback. `/admin/events` SSE endpoint broadcasts every MCP request to connected admin browsers. `cookie-parser` middleware wired (Express 5 has no built-in). Startup logging prints port, engine, configured issuer URL (honors `--public-url`), registered-client count, DCR status, and admin bootstrap token. **v0.26.9 hardening pass:** F7 sets `remote: true` explicitly on the `/mcp` request handler's OperationContext literal (closes the HTTP shell-job RCE — without this, `submit_job`'s protected-name guard at `operations.ts:1391` saw a falsy undefined and skipped, letting a `read+write`-scoped OAuth token submit `shell` jobs). F8 wires `summarizeMcpParams` from `src/mcp/dispatch.ts` into both `mcp_request_log` writes and the admin SSE feed by default (raw payloads opt-in via `--log-full-params` with stderr warning). F9 sets cookie `Secure` flag when behind HTTPS or a public-URL proxy. F10 caps the magic-link nonce store with an LRU bound. F12 routes DCR disable through the `GBrainOAuthProvider` constructor's `dcrDisabled` option instead of the prior monkey-patch on the express router. F14 wraps `transport.handleRequest` in try/catch so SDK throws return a JSON-RPC 500 envelope instead of express's default HTML error page. F15 unifies OperationError + unexpected exceptions through `buildError` / `serializeError` so `/mcp` always returns the same envelope shape. **v0.28.1:** `/health` endpoint extracted into pure `probeHealth(engine)` async function with `HEALTH_TIMEOUT_MS = 3000` exported constant — drops the timeout from 5s to 3s so Fly.io's 5s health-check deadline gets 2s of headroom for TCP, response framing, and clock skew. Races `engine.getStats()` against the timeout via `Promise.race`; saturated pool returns 503 with `Health check timed out (database pool may be saturated)` instead of hanging. `clearTimeout` in finally block prevents pending-timer pile-up under high probe rates (race-leak fix from adversarial review). **v0.28.10:** `/health` is now liveness-only via the new `probeLiveness(sql, engineName, version, timeoutMs)` helper that races `sql\`SELECT 1\`` against `HEALTH_TIMEOUT_MS` and returns the same `ProbeHealthResult` tagged-union as `probeHealth` (single timer-cleanup site, single 503 envelope). Body shape: `{status, version, engine}` only — engine stats are no longer spread on the public route. Full stats moved to a new admin endpoint `/admin/api/full-stats` (sibling to `/admin/api/stats` and `/admin/api/health-indicators`) gated by the existing `requireAdmin` middleware; that route calls `probeHealth(engine, ...)` and returns the original spread-stats body. `?full=true` query param removed entirely. Closes the original DoS surface where `getStats()`'s 6× count(*) on 96K-page brains through PgBouncer exceeded `HEALTH_TIMEOUT_MS` and triggered orchestrator restart cascades (Fly.io / k8s seeing 503 → restart loop → advisory-lock pile-up on the migration lock). Outside-voice review (Codex) caught that `/admin/api/health-indicators` is NOT a full-stats endpoint (returns only `{expiring_soon, error_rate}`), and that an alternative loopback-IP gate would have depended on `app.set('trust proxy', 'loopback')` semantics holding under proxy/XFF misconfiguration; the shipped admin-cookie design avoids both. **v0.31.3 (#681):** every OAuth/admin/audit SQL call routes through `sqlQueryForEngine(engine)` from `src/core/sql-query.ts` so `gbrain serve --http` works against PGLite brains. The four `mcp_request_log.params` INSERT sites (success path, auth_failed path, scope_denied path, server-error path) all go through `executeRawJsonb(engine, ...)` so the JSONB column stores real objects, not JSON-encoded strings — closes the bug where `params->>'op'` returned the encoded string `"search"` (with quotes) instead of `search`. Migration v46 normalizes any pre-v0.31.3 string-shaped backlog rows on first start. +- `src/core/sql-query.ts` (v0.31.3) — Engine-aware tagged-template SQL adapter for OAuth/admin/auth infrastructure. `sqlQueryForEngine(engine)` returns a `SqlQuery` (`(strings, ...values) => Promise`) that walks the template, builds `$N` positional SQL, asserts every value is a `SqlValue` (string | number | bigint | boolean | Date | null), and routes through `engine.executeRaw(sql, params)` so Postgres goes via postgres.js's `unsafe(sql, params)` path and PGLite via its embedded `db.query(sql, params)`. Deliberately narrower than postgres.js's `sql` tag: no nested fragments, no `sql.json()`, no `sql.unsafe()`, no `sql.begin()`, no array binding. The narrow surface is the feature — codex finding #7 from the v0.31 plan review argued the adapter should stay scalar-only or it drifts into a partial postgres.js clone. JSONB writes go through the separate `executeRawJsonb(engine, sql, scalarParams, jsonbParams)` helper that composes positional `$N::jsonb` casts and passes JS objects through; the v0.12.0 double-encode bug class doesn't apply because positional binding through `unsafe()` reaches the wire protocol with the correct type oid (verified by `test/sql-query.test.ts` on PGLite and `test/e2e/auth-permissions.test.ts:67` on Postgres). `scripts/check-jsonb-pattern.sh` doesn't fire because `executeRawJsonb(...)` is a method call, not the banned literal-template-tag interpolation pattern. Consumed by `src/commands/auth.ts`, `src/commands/serve-http.ts`, `src/core/oauth-provider.ts`, `src/commands/files.ts`, and `src/mcp/http-transport.ts` so all five sites work against PGLite and Postgres uniformly. Closes the bug where `gbrain auth` + `gbrain serve --http` were silently Postgres-only because they routed every SQL through the postgres.js singleton (community PR #681). +- `src/commands/serve.ts` (v0.31.3) — `gbrain serve` stdio MCP entrypoint with idempotent shutdown across every parent-disconnect signal. Stdio EOF, SIGTERM, SIGINT, SIGHUP, and parent-process death (every reparent case — PID 1, launchd subreaper, systemd, tmux, or a parent shell with `PR_SET_CHILD_SUBREAPER`) all funnel into one `cleanup(reason)` path that releases the engine and the PGLite write-lock dir within 5 seconds. Pre-v0.31.3 the stdio MCP server held the lock indefinitely after Claude Desktop / Cursor / launchd-managed gateways disconnected, forcing a 5-minute stale-lock wait on the next start. Watchdog reparent check is `getParentPid() !== initialParentPid` (capturing the initial ppid once at install time and firing on any change); the previous `=== 1` check missed the subreaper case under launchd / systemd. Bun's `process.ppid` cache is stale across reparenting (see [oven-sh/bun#30305](https://github.com/oven-sh/bun/issues/30305)) so `getParentPid()` runs `spawnSync('ps', ['-o', 'ppid=', '-p', PID])` per tick to read the live kernel PPID. Startup probe verifies `ps` is on PATH; if not (stripped containers, busybox without procps), the watchdog skips installing AND emits a loud `[gbrain serve] watchdog disabled: ps unavailable, parent-death detection unavailable — child will rely on stdin EOF / signals only` stderr line so operators see the degraded mode at boot. Pinned by `test/serve-stdio-lifecycle.test.ts` (22 cases). Closes #413, #446. Credit @Aragorn2046 (origin features in #591) and @seungsu-kr (rebased submitter, Bun ppid workaround). - `src/core/oauth-provider.ts` (v0.26.0) — `GBrainOAuthProvider` implementing the MCP SDK's `OAuthServerProvider` + `OAuthRegisteredClientsStore` interfaces. Backed by raw SQL (works on both PGLite and Postgres — OAuth is infrastructure, not a BrainEngine concern). Full OAuth 2.1 spec: `authorize` + `exchangeAuthorizationCode` with PKCE (for ChatGPT), `client_credentials` (for Perplexity / Claude), `refresh_token` with rotation, `revokeToken`, `registerClient` (DCR path validates redirect_uri must be `https://` or loopback per RFC 6749 §3.1.2.1). All tokens + client secrets SHA-256 hashed before storage. Auth codes single-use with 10-minute TTL via atomic `DELETE...RETURNING` (closes RFC 6749 §10.5 TOCTOU race). Refresh rotation also `DELETE...RETURNING` (closes §10.4 stolen-token detection bypass). `pgArray()` escapes commas/quotes/braces in elements so a comma-bearing redirect_uri can't smuggle a second array element. Legacy `access_tokens` fallback in `verifyAccessToken` grandfathers pre-v0.26 bearer tokens as `read+write+admin`. `sweepExpiredTokens()` runs on startup wrapped in try/catch. **v0.26.9 RFC 6749/7009 hardening pass:** F1+F2 fold `client_id` atomically into the `DELETE WHERE` clauses for both auth-code exchange and refresh rotation — pre-fix the post-hoc client compare burned the row on wrong-client paths so the legitimate client couldn't retry. F3 enforces refresh-scope-subset against the original grant on the row (RFC 6749 §6), not the client's currently-allowed scopes — fixes the case where revoking a scope from a client wouldn't shrink the agent's existing refresh tokens. F4 binds `client_id` on `revokeToken` so a client can only revoke its own tokens (RFC 7009 §2.1). F7c validates the `/token` request's `redirect_uri` against the value stored at `/authorize` (RFC 6749 §4.1.3) — empty-string treated as missing rather than wildcard match (adversarial-review fix). F5 swaps bare `catch {}` blocks in `verifyAccessToken` and `getClient` for `isUndefinedColumnError` from `src/core/utils.ts` — only SQLSTATE 42703 falls through to legacy fallback; lock timeouts and network blips throw and surface. F6 makes `sweepExpiredTokens()` actually return the count via `RETURNING 1` + array length, not a fire-and-forget zero. F12 adds `dcrDisabled` constructor option so `serve-http.ts` can disable the `/register` endpoint without monkey-patching the router. **v0.26.2:** module-private `coerceTimestamp()` boundary helper at the top of the file normalizes postgres-driver-as-string BIGINT columns to JS numbers at every read site (5 call sites: `getClient` L112+L113 for DCR `/register` RFC 7591 §3.2.1 numeric timestamps, `exchangeRefreshToken` L274 + `verifyAccessToken` L296+L303 for the SDK's `typeof === 'number'` bearerAuth check). Throws on non-finite input (NaN/Infinity) so corrupt rows fail loud at the boundary instead of riding through as `expiresAt: NaN`; returns undefined for SQL NULL so callers decide NULL semantics explicitly (refresh + access token paths treat NULL as expired). Helper intentionally NOT promoted to `src/core/utils.ts` — codex review flagged repo-wide BIGINT precision-loss risk for a generic helper. - `admin/` (v0.26.0) — React 19 + Vite + TypeScript admin SPA embedded in the binary via `admin/dist/` served by `serve-http.ts`. 7 screens: Login (bootstrap token → session cookie), Dashboard (metrics + SSE feed + token health), Agents (sortable table + sparklines + Register button), Register (modal with scope checkboxes + grant type selector), Credentials reveal (full-screen modal with Copy + Download JSON + yellow one-time-only warning), Request Log (filterable paginated), Agent Detail drawer (Details / Activity / Config Export tabs + Revoke). Design tokens: `#0a0a0f` bg, Inter for UI, JetBrains Mono for data, 4-32px spacing scale, rounded pill badges. HTTP-only SameSite=Strict cookie auth. 65KB gzip. Build: `cd admin && bun install && bun run build`; output at `admin/dist/` is committed for self-contained binaries. -- `src/commands/auth.ts` — Token management. `gbrain auth create/list/revoke/test` for legacy bearer tokens (v0.22.7 wired as a first-class CLI subcommand) plus `gbrain auth register-client` (v0.26.0) and `gbrain auth revoke-client ` (v0.26.2) for OAuth 2.1 client lifecycle. `revoke-client` runs an atomic `DELETE...RETURNING` on `oauth_clients`; FK `ON DELETE CASCADE` on `oauth_tokens.client_id` and `oauth_codes.client_id` purges every active token + authorization code in a single transaction. `process.exit(1)` on no-such-client (idempotent — re-running on the same id produces the same exit-1 message). Legacy tokens stored as SHA-256 hashes in `access_tokens`; OAuth clients in `oauth_clients`. As of v0.26.0, legacy tokens grandfather to `read+write+admin` scopes on the OAuth HTTP server, so pre-v0.26 deployments keep working with no migration. +- `src/commands/auth.ts` — Token management. `gbrain auth create/list/revoke/test` for legacy bearer tokens (v0.22.7 wired as a first-class CLI subcommand) plus `gbrain auth register-client` (v0.26.0) and `gbrain auth revoke-client ` (v0.26.2) for OAuth 2.1 client lifecycle. `revoke-client` runs an atomic `DELETE...RETURNING` on `oauth_clients`; FK `ON DELETE CASCADE` on `oauth_tokens.client_id` and `oauth_codes.client_id` purges every active token + authorization code in a single transaction. `process.exit(1)` on no-such-client (idempotent — re-running on the same id produces the same exit-1 message). Legacy tokens stored as SHA-256 hashes in `access_tokens`; OAuth clients in `oauth_clients`. As of v0.26.0, legacy tokens grandfather to `read+write+admin` scopes on the OAuth HTTP server, so pre-v0.26 deployments keep working with no migration. **v0.31.3 (#681):** every SQL site routes through `sqlQueryForEngine(engine)` from `src/core/sql-query.ts` (and `executeRawJsonb` for the takes-holders `permissions` JSONB column) so `gbrain auth` works against PGLite brains. Pre-fix, every call hit the postgres.js singleton via `getConn()` and silently failed (or wrote to the wrong DB) when the active engine was PGLite. The takes-holders write goes through `executeRawJsonb(engine, sql, [name, hash], [{takes_holders:[...]}])` which round-trips with `jsonb_typeof = 'object'` instead of the pre-v0.31.3 quoted-string shape. - `src/commands/upgrade.ts` — Self-update CLI. `runPostUpgrade()` enumerates migrations from the TS registry (src/commands/migrations/index.ts) and tail-calls `runApplyMigrations(['--yes', '--non-interactive'])` so the mechanical side of every outstanding migration runs unconditionally. - `src/commands/migrations/` — TS migration registry (compiled into the binary; no filesystem walk of `skills/migrations/*.md` needed at runtime). `index.ts` lists migrations in semver order. `v0_11_0.ts` = Minions adoption orchestrator (8 phases). `v0_12_0.ts` = Knowledge Graph auto-wire orchestrator (5 phases: schema → config check → backfill links → backfill timeline → verify). `phaseASchema` has a 600s timeout (bumped from 60s in v0.12.1 for duplicate-heavy brains). `v0_12_2.ts` = JSONB double-encode repair orchestrator (4 phases: schema → repair-jsonb → verify → record). `v0_14_0.ts` = shell-jobs + autopilot cooperative (2 phases: schema ALTER minion_jobs.max_stalled SET DEFAULT 3 — superseded by v0.14.3's schema-level DEFAULT 5 + UPDATE backfill; pending-host-work ping for skills/migrations/v0.14.0.md). All orchestrators are idempotent and resumable from `partial` status. As of v0.14.2 (Bug 3), the RUNNER owns all ledger writes — orchestrators return `OrchestratorResult` and `apply-migrations.ts` persists a canonical `{version, status, phases}` shape after return. Orchestrators no longer call `appendCompletedMigration` directly. `statusForVersion` prefers `complete` over `partial` (never regresses). 3 consecutive partials → wedged → `--force-retry ` writes a `'retry'` reset marker. v0.14.3 (fix wave) ships schema-only migrations v14 (`pages_updated_at_index`) + v15 (`minion_jobs_max_stalled_default_5` with UPDATE backfill) via the `MIGRATIONS` array in `src/core/migrate.ts` — no orchestrator phases needed. - `src/commands/repair-jsonb.ts` — `gbrain repair-jsonb [--dry-run] [--json]`: rewrites `jsonb_typeof='string'` rows in place across 5 affected columns (pages.frontmatter, raw_data.data, ingest_log.pages_updated, files.metadata, page_versions.frontmatter). Fixes v0.12.0 double-encode bug on Postgres; PGLite no-ops. Idempotent. @@ -153,7 +155,7 @@ strict behavior when unset. - `src/commands/transcripts.ts` (v0.29) — `gbrain transcripts recent [--days N] [--full] [--json]`: recent raw `.txt` transcripts from the dream-cycle corpus dirs. Imports `listRecentTranscripts` from `src/core/transcripts.ts` (the same library the gated `get_recent_transcripts` MCP op uses). Local-only by construction — the CLI always runs with `ctx.remote=false`. - `src/commands/integrity.ts` — `gbrain integrity check|auto|review|extract`: bare-tweet detection, dead-link detection, three-bucket repair (auto-repair / review-queue / skip). `scanIntegrity()` is the shared library function called from `gbrain doctor` (sampled at limit=500) and `cmdCheck` (full scan). v0.22.8: batch-load fast path on Postgres uses `SELECT DISTINCT ON (slug)` in a single SQL query to fix the PgBouncer round-trip timeout (60s → ~6s) while preserving `engine.getAllSlugs()`'s `Set` semantics on multi-source brains. Gated by `engine.kind === 'postgres'` at the call site so PGLite never enters batch; fallback `catch` logs at `GBRAIN_DEBUG=1` so real Postgres errors are diagnosable. - `src/commands/doctor.ts` — `gbrain doctor [--json] [--fast] [--fix] [--dry-run] [--index-audit]`: health checks. v0.12.3 added `jsonb_integrity` + `markdown_body_completeness` reliability checks. v0.14.1: `--fix` delegates inlined cross-cutting rules to `> **Convention:** see [path](path).` callouts (pipes DRY violations into `src/core/dry-fix.ts`); `--fix --dry-run` previews without writing. v0.14.2: `schema_version` check fails loudly when `version=0` (migrations never ran — the #218 `bun install -g` signature) and routes users to `gbrain apply-migrations --yes`; new opt-in `--index-audit` flag (Postgres-only) reports zero-scan indexes from `pg_stat_user_indexes` (informational only, no auto-drop). v0.15.2: every DB check is wrapped in a progress phase; `markdown_body_completeness` runs under a 1s heartbeat timer so 10+ min scans are observable on 50K-page brains. v0.19.1 added `queue_health` (Postgres-only) with two subchecks: stalled-forever active jobs (started_at > 1h) and waiting-depth-per-name > threshold (default 10, override via `GBRAIN_QUEUE_WAITING_THRESHOLD`). Worker-heartbeat subcheck intentionally deferred to follow-up B7 because it needs a `minion_workers` table to produce ground-truth signal. Fix hints point at `gbrain repair-jsonb`, `gbrain sync --force`, `gbrain apply-migrations`, and `gbrain jobs get/cancel `. v0.22.12 (#500): `sync_failures` check shows `[CODE=N, ...]` breakdown for both unacked entries (warn) and acked-historical entries (ok), surfacing systemic failure modes (`SLUG_MISMATCH=2685`) instead of a bare count. v0.26.7 (#612): `rls_event_trigger` check (post-install drift detector for migration v35's auto-RLS event trigger). Lives outside the `// 5. RLS` slice that the structural doctor.test.ts guards anchor on, so the existing test guards stay intact. Healthy `evtenabled` set is `('O','A')` only — `R` is replica-only and would not fire in normal sessions; `D` is disabled. Fix hint is `gbrain apply-migrations --force-retry 35`. **v0.30.2:** `queue_health` gains a fourth subcheck — surfaces dead-lettered subagent jobs with `last_error` matching the `prompt_too_long` classifier within the last 24h. Fix hint points at `gbrain dream --phase synthesize --dry-run --json` to identify the offending transcript and `gbrain jobs prune --status dead --queue default` to clean up. Postgres-only. -- `src/core/migrate.ts` — schema-migration runner. Owns the `MIGRATIONS` array (source of truth for schema DDL). **v40 (v0.29):** `pages_emotional_weight` adds `pages.emotional_weight REAL NOT NULL DEFAULT 0.0`. Column-only (no index). On Postgres 11+ and PGLite, `ADD COLUMN` with a constant DEFAULT is metadata-only — instant on tables of any size. v0.14.2 extended the `Migration` interface with `sqlFor?: { postgres?, pglite? }` (engine-specific SQL overrides `sql`) and `transaction?: boolean` (set to false for `CREATE INDEX CONCURRENTLY`, which Postgres refuses inside a transaction; ignored on PGLite since it has no concurrent writers). Migration v14 (fix wave) uses a handler branching on `engine.kind` to run CONCURRENTLY on Postgres (with a pre-drop of any invalid remnant via `pg_index.indisvalid`) and plain `CREATE INDEX` on PGLite. v15 bumps `minion_jobs.max_stalled` default 1→5 and backfills existing non-terminal rows. v0.22.6.1: migration v24 (`rls_backfill_missing_tables`) uses `sqlFor: { pglite: '' }` to no-op on PGLite — PGLite has no RLS engine and is single-tenant by definition, and the v24 ALTERs target subagent tables that don't exist in pglite-schema.ts. Closes #395 (contributed by @jdcastro2). **v30 (v0.23):** creates `dream_verdicts (file_path TEXT, content_hash TEXT, worth_processing BOOL, reasons JSONB, judged_at TIMESTAMPTZ, PK(file_path, content_hash))`. RLS-enabled when running as a BYPASSRLS role. The synthesize phase reads/writes this table to avoid re-judging on backfill re-runs. **v35 (v0.26.7):** auto-RLS event trigger + one-time backfill. `auto_rls_on_create_table` fires on `ddl_command_end` for `WHEN TAG IN ('CREATE TABLE','CREATE TABLE AS','SELECT INTO')` and runs `ALTER TABLE … ENABLE ROW LEVEL SECURITY` on every new `public.*` table — no FORCE (matches v24/v29/schema.sql posture so non-BYPASSRLS apps can still read their own tables). The same migration backfills RLS on every existing `public.*` base table whose comment doesn't match the doctor regex (`^GBRAIN:RLS_EXEMPT\s+reason=\S.{3,}`). Per-table failure aborts the offending CREATE TABLE (event triggers fire inside the DDL transaction); no EXCEPTION wrap — that would convert loud rollback into silent permissive default. PGLite no-op via `sqlFor.pglite: ''`. Breaking change: operators with intentionally-RLS-off public tables must add the GBRAIN:RLS_EXEMPT comment BEFORE upgrade or the backfill will flip them on. +- `src/core/migrate.ts` — schema-migration runner. Owns the `MIGRATIONS` array (source of truth for schema DDL). **v40 (v0.29):** `pages_emotional_weight` adds `pages.emotional_weight REAL NOT NULL DEFAULT 0.0`. Column-only (no index). On Postgres 11+ and PGLite, `ADD COLUMN` with a constant DEFAULT is metadata-only — instant on tables of any size. v0.14.2 extended the `Migration` interface with `sqlFor?: { postgres?, pglite? }` (engine-specific SQL overrides `sql`) and `transaction?: boolean` (set to false for `CREATE INDEX CONCURRENTLY`, which Postgres refuses inside a transaction; ignored on PGLite since it has no concurrent writers). Migration v14 (fix wave) uses a handler branching on `engine.kind` to run CONCURRENTLY on Postgres (with a pre-drop of any invalid remnant via `pg_index.indisvalid`) and plain `CREATE INDEX` on PGLite. v15 bumps `minion_jobs.max_stalled` default 1→5 and backfills existing non-terminal rows. v0.22.6.1: migration v24 (`rls_backfill_missing_tables`) uses `sqlFor: { pglite: '' }` to no-op on PGLite — PGLite has no RLS engine and is single-tenant by definition, and the v24 ALTERs target subagent tables that don't exist in pglite-schema.ts. Closes #395 (contributed by @jdcastro2). **v30 (v0.23):** creates `dream_verdicts (file_path TEXT, content_hash TEXT, worth_processing BOOL, reasons JSONB, judged_at TIMESTAMPTZ, PK(file_path, content_hash))`. RLS-enabled when running as a BYPASSRLS role. The synthesize phase reads/writes this table to avoid re-judging on backfill re-runs. **v35 (v0.26.7):** auto-RLS event trigger + one-time backfill. `auto_rls_on_create_table` fires on `ddl_command_end` for `WHEN TAG IN ('CREATE TABLE','CREATE TABLE AS','SELECT INTO')` and runs `ALTER TABLE … ENABLE ROW LEVEL SECURITY` on every new `public.*` table — no FORCE (matches v24/v29/schema.sql posture so non-BYPASSRLS apps can still read their own tables). The same migration backfills RLS on every existing `public.*` base table whose comment doesn't match the doctor regex (`^GBRAIN:RLS_EXEMPT\s+reason=\S.{3,}`). Per-table failure aborts the offending CREATE TABLE (event triggers fire inside the DDL transaction); no EXCEPTION wrap — that would convert loud rollback into silent permissive default. PGLite no-op via `sqlFor.pglite: ''`. Breaking change: operators with intentionally-RLS-off public tables must add the GBRAIN:RLS_EXEMPT comment BEFORE upgrade or the backfill will flip them on. **v46 (v0.31.3):** `mcp_request_log_params_jsonb_normalize` rewrites pre-v0.31.3 rows where `mcp_request_log.params` was stored as a JSON-encoded string (`jsonb_typeof = 'string'`) up to a real JSONB object via `UPDATE ... SET params = params::text::jsonb WHERE jsonb_typeof(params) = 'string'`. Single statement, idempotent — second-run finds no string-shaped rows and is a no-op. Closes the bug where `/admin/api/requests` returned a quoted string instead of the parsed object. - `src/core/progress.ts` — Shared bulk-action progress reporter. Writes to stderr. Modes: `auto` (TTY: `\r`-rewriting; non-TTY: plain lines), `human`, `json` (JSONL), `quiet`. Rate-gated by `minIntervalMs` and `minItems`. `startHeartbeat(reporter, note)` helper for single long queries. `child()` composes phase paths. Singleton SIGINT/SIGTERM coordinator emits `abort` events for every live phase. EPIPE defense on both sync throws and stream `'error'` events. Zero dependencies. Introduced in v0.15.2. - `src/core/cli-options.ts` — Global CLI flag parser. `parseGlobalFlags(argv)` returns `{cliOpts, rest}` with `--quiet` / `--progress-json` / `--progress-interval=` stripped. `getCliOptions()` / `setCliOptions()` expose a module-level singleton so commands reach the resolved flags without parameter threading. `cliOptsToProgressOptions()` maps to reporter options. `childGlobalFlags()` returns the flag suffix to append to `execSync('gbrain ...')` calls in migration orchestrators. `OperationContext.cliOpts` extends shared-op dispatch for MCP callers. - `src/core/db-lock.ts` (v0.22.13) — generic `tryAcquireDbLock(engine, lockId, ttlMinutes)` over the existing `gbrain_cycle_locks` table. Parameterized lock id so different scopes can nest cleanly: `gbrain-cycle` for the broad cycle (held by `cycle.ts`) and `gbrain-sync` (`SYNC_LOCK_ID` constant) for `performSync`'s narrower writer window. Same UPSERT-with-TTL semantics as the prior cycle-only helper, just generalized. Survives PgBouncer transaction pooling (unlike session-scoped `pg_try_advisory_lock`); crashed holders auto-release once their TTL expires. From 6695c7c7ace3434941051e27bb0aca8b97f5b2bc Mon Sep 17 00:00:00 2001 From: Garry Tan Date: Sat, 9 May 2026 22:16:06 -0700 Subject: [PATCH 4/4] chore: regenerate llms-full.txt for v0.31.3 docs sync MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit CI's build-llms test asserts the committed llms.txt + llms-full.txt match what scripts/build-llms.ts produces from current source state. CLAUDE.md was amended by /document-release post-merge (new entries for src/core/sql-query.ts and src/commands/serve.ts; amended notes on auth.ts / serve-http.ts / migrate.ts), so the inlined-bundle fell out of sync. Regenerated via `bun run build:llms`. llms.txt unchanged (curated index — no new web URLs added). llms-full.txt updated to inline the new CLAUDE.md content. Co-Authored-By: Claude Opus 4.7 (1M context) --- llms-full.txt | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/llms-full.txt b/llms-full.txt index d03197865..bec39df6b 100644 --- a/llms-full.txt +++ b/llms-full.txt @@ -240,10 +240,12 @@ strict behavior when unset. - `src/mcp/server.ts` — MCP stdio server (generated from operations). v0.22.7: tool-call handler delegates to `dispatchToolCall` from `src/mcp/dispatch.ts` so stdio + HTTP transports share one validation, context-build, and error-format path. - `src/mcp/dispatch.ts` (v0.22.7) — Shared tool-call dispatch consumed by both stdio (`server.ts`) and HTTP transports. Exports `dispatchToolCall(engine, name, params, opts)`, `buildOperationContext(engine, params, opts)`, and `validateParams(op, params)`. Single source of truth for `(ctx, params)` handler arg order and the 5-field `OperationContext` shape (engine + config + logger + dryRun + remote). Defaults to `remote: true` (untrusted); local CLI callers pass `remote: false`. Closed F1/F2/F3 drift bugs in the original v0.22.5 HTTP transport. **v0.26.9 (F8):** adds `summarizeMcpParams(opName, params)` — privacy-preserving redactor for `mcp_request_log` and the admin SSE feed. Returns `{redacted, kind, declared_keys, unknown_key_count, approx_bytes}`. Intersects submitted top-level keys against the operation's declared `params` allow-list (declared keys preserved as a sorted array for debug visibility; unknown keys counted but never named, closing the attacker-controlled-key-name leak). Byte counts bucketed up to nearest 1KB so an attacker can't binary-search secret-content sizes via repeated probes. Operators on a personal laptop who want raw payload visibility opt back in with `gbrain serve --http --log-full-params` (loud stderr warning at startup). Canonical helper — new logging code paths route through it rather than `JSON.stringify(params)`. - `src/mcp/rate-limit.ts` (v0.22.7) — Bounded-LRU token-bucket limiter. `buildDefaultLimiters()` returns the two-bucket pipeline: pre-auth IP (30/60s, fires BEFORE the DB lookup so brute-force load against `access_tokens` is actually capped) + post-auth token-id (60/60s). Tracks `lastTouchedMs` separately from `lastRefillMs` so an exhausted key can't be reset by hammering past the TTL. LRU cap bounds memory under attacker-controlled key growth. -- `src/commands/serve-http.ts` (v0.26.0) — Express 5 HTTP MCP server with OAuth 2.1, admin dashboard, and SSE live activity feed. Started via `gbrain serve --http [--port N] [--token-ttl N] [--enable-dcr] [--public-url URL] [--log-full-params]`. Supersedes the v0.22.7 `src/mcp/http-transport.ts` simple bearer-auth path. Combines MCP SDK's `mcpAuthRouter` (authorize / token / register / revoke endpoints), a custom `client_credentials` handler (SDK's token endpoint throws `UnsupportedGrantTypeError` for CC; the custom handler runs BEFORE the router and falls through for `auth_code` / `refresh_token`), `requireBearerAuth` middleware for `/mcp` with scope enforcement before op dispatch, `localOnly` rejection, and `express-rate-limit` at 50 req / 15 min on `/token`. Serves the built admin SPA from `admin/dist/` with SPA fallback. `/admin/events` SSE endpoint broadcasts every MCP request to connected admin browsers. `cookie-parser` middleware wired (Express 5 has no built-in). Startup logging prints port, engine, configured issuer URL (honors `--public-url`), registered-client count, DCR status, and admin bootstrap token. **v0.26.9 hardening pass:** F7 sets `remote: true` explicitly on the `/mcp` request handler's OperationContext literal (closes the HTTP shell-job RCE — without this, `submit_job`'s protected-name guard at `operations.ts:1391` saw a falsy undefined and skipped, letting a `read+write`-scoped OAuth token submit `shell` jobs). F8 wires `summarizeMcpParams` from `src/mcp/dispatch.ts` into both `mcp_request_log` writes and the admin SSE feed by default (raw payloads opt-in via `--log-full-params` with stderr warning). F9 sets cookie `Secure` flag when behind HTTPS or a public-URL proxy. F10 caps the magic-link nonce store with an LRU bound. F12 routes DCR disable through the `GBrainOAuthProvider` constructor's `dcrDisabled` option instead of the prior monkey-patch on the express router. F14 wraps `transport.handleRequest` in try/catch so SDK throws return a JSON-RPC 500 envelope instead of express's default HTML error page. F15 unifies OperationError + unexpected exceptions through `buildError` / `serializeError` so `/mcp` always returns the same envelope shape. **v0.28.1:** `/health` endpoint extracted into pure `probeHealth(engine)` async function with `HEALTH_TIMEOUT_MS = 3000` exported constant — drops the timeout from 5s to 3s so Fly.io's 5s health-check deadline gets 2s of headroom for TCP, response framing, and clock skew. Races `engine.getStats()` against the timeout via `Promise.race`; saturated pool returns 503 with `Health check timed out (database pool may be saturated)` instead of hanging. `clearTimeout` in finally block prevents pending-timer pile-up under high probe rates (race-leak fix from adversarial review). **v0.28.10:** `/health` is now liveness-only via the new `probeLiveness(sql, engineName, version, timeoutMs)` helper that races `sql\`SELECT 1\`` against `HEALTH_TIMEOUT_MS` and returns the same `ProbeHealthResult` tagged-union as `probeHealth` (single timer-cleanup site, single 503 envelope). Body shape: `{status, version, engine}` only — engine stats are no longer spread on the public route. Full stats moved to a new admin endpoint `/admin/api/full-stats` (sibling to `/admin/api/stats` and `/admin/api/health-indicators`) gated by the existing `requireAdmin` middleware; that route calls `probeHealth(engine, ...)` and returns the original spread-stats body. `?full=true` query param removed entirely. Closes the original DoS surface where `getStats()`'s 6× count(*) on 96K-page brains through PgBouncer exceeded `HEALTH_TIMEOUT_MS` and triggered orchestrator restart cascades (Fly.io / k8s seeing 503 → restart loop → advisory-lock pile-up on the migration lock). Outside-voice review (Codex) caught that `/admin/api/health-indicators` is NOT a full-stats endpoint (returns only `{expiring_soon, error_rate}`), and that an alternative loopback-IP gate would have depended on `app.set('trust proxy', 'loopback')` semantics holding under proxy/XFF misconfiguration; the shipped admin-cookie design avoids both. +- `src/commands/serve-http.ts` (v0.26.0) — Express 5 HTTP MCP server with OAuth 2.1, admin dashboard, and SSE live activity feed. Started via `gbrain serve --http [--port N] [--token-ttl N] [--enable-dcr] [--public-url URL] [--log-full-params]`. Supersedes the v0.22.7 `src/mcp/http-transport.ts` simple bearer-auth path. Combines MCP SDK's `mcpAuthRouter` (authorize / token / register / revoke endpoints), a custom `client_credentials` handler (SDK's token endpoint throws `UnsupportedGrantTypeError` for CC; the custom handler runs BEFORE the router and falls through for `auth_code` / `refresh_token`), `requireBearerAuth` middleware for `/mcp` with scope enforcement before op dispatch, `localOnly` rejection, and `express-rate-limit` at 50 req / 15 min on `/token`. Serves the built admin SPA from `admin/dist/` with SPA fallback. `/admin/events` SSE endpoint broadcasts every MCP request to connected admin browsers. `cookie-parser` middleware wired (Express 5 has no built-in). Startup logging prints port, engine, configured issuer URL (honors `--public-url`), registered-client count, DCR status, and admin bootstrap token. **v0.26.9 hardening pass:** F7 sets `remote: true` explicitly on the `/mcp` request handler's OperationContext literal (closes the HTTP shell-job RCE — without this, `submit_job`'s protected-name guard at `operations.ts:1391` saw a falsy undefined and skipped, letting a `read+write`-scoped OAuth token submit `shell` jobs). F8 wires `summarizeMcpParams` from `src/mcp/dispatch.ts` into both `mcp_request_log` writes and the admin SSE feed by default (raw payloads opt-in via `--log-full-params` with stderr warning). F9 sets cookie `Secure` flag when behind HTTPS or a public-URL proxy. F10 caps the magic-link nonce store with an LRU bound. F12 routes DCR disable through the `GBrainOAuthProvider` constructor's `dcrDisabled` option instead of the prior monkey-patch on the express router. F14 wraps `transport.handleRequest` in try/catch so SDK throws return a JSON-RPC 500 envelope instead of express's default HTML error page. F15 unifies OperationError + unexpected exceptions through `buildError` / `serializeError` so `/mcp` always returns the same envelope shape. **v0.28.1:** `/health` endpoint extracted into pure `probeHealth(engine)` async function with `HEALTH_TIMEOUT_MS = 3000` exported constant — drops the timeout from 5s to 3s so Fly.io's 5s health-check deadline gets 2s of headroom for TCP, response framing, and clock skew. Races `engine.getStats()` against the timeout via `Promise.race`; saturated pool returns 503 with `Health check timed out (database pool may be saturated)` instead of hanging. `clearTimeout` in finally block prevents pending-timer pile-up under high probe rates (race-leak fix from adversarial review). **v0.28.10:** `/health` is now liveness-only via the new `probeLiveness(sql, engineName, version, timeoutMs)` helper that races `sql\`SELECT 1\`` against `HEALTH_TIMEOUT_MS` and returns the same `ProbeHealthResult` tagged-union as `probeHealth` (single timer-cleanup site, single 503 envelope). Body shape: `{status, version, engine}` only — engine stats are no longer spread on the public route. Full stats moved to a new admin endpoint `/admin/api/full-stats` (sibling to `/admin/api/stats` and `/admin/api/health-indicators`) gated by the existing `requireAdmin` middleware; that route calls `probeHealth(engine, ...)` and returns the original spread-stats body. `?full=true` query param removed entirely. Closes the original DoS surface where `getStats()`'s 6× count(*) on 96K-page brains through PgBouncer exceeded `HEALTH_TIMEOUT_MS` and triggered orchestrator restart cascades (Fly.io / k8s seeing 503 → restart loop → advisory-lock pile-up on the migration lock). Outside-voice review (Codex) caught that `/admin/api/health-indicators` is NOT a full-stats endpoint (returns only `{expiring_soon, error_rate}`), and that an alternative loopback-IP gate would have depended on `app.set('trust proxy', 'loopback')` semantics holding under proxy/XFF misconfiguration; the shipped admin-cookie design avoids both. **v0.31.3 (#681):** every OAuth/admin/audit SQL call routes through `sqlQueryForEngine(engine)` from `src/core/sql-query.ts` so `gbrain serve --http` works against PGLite brains. The four `mcp_request_log.params` INSERT sites (success path, auth_failed path, scope_denied path, server-error path) all go through `executeRawJsonb(engine, ...)` so the JSONB column stores real objects, not JSON-encoded strings — closes the bug where `params->>'op'` returned the encoded string `"search"` (with quotes) instead of `search`. Migration v46 normalizes any pre-v0.31.3 string-shaped backlog rows on first start. +- `src/core/sql-query.ts` (v0.31.3) — Engine-aware tagged-template SQL adapter for OAuth/admin/auth infrastructure. `sqlQueryForEngine(engine)` returns a `SqlQuery` (`(strings, ...values) => Promise`) that walks the template, builds `$N` positional SQL, asserts every value is a `SqlValue` (string | number | bigint | boolean | Date | null), and routes through `engine.executeRaw(sql, params)` so Postgres goes via postgres.js's `unsafe(sql, params)` path and PGLite via its embedded `db.query(sql, params)`. Deliberately narrower than postgres.js's `sql` tag: no nested fragments, no `sql.json()`, no `sql.unsafe()`, no `sql.begin()`, no array binding. The narrow surface is the feature — codex finding #7 from the v0.31 plan review argued the adapter should stay scalar-only or it drifts into a partial postgres.js clone. JSONB writes go through the separate `executeRawJsonb(engine, sql, scalarParams, jsonbParams)` helper that composes positional `$N::jsonb` casts and passes JS objects through; the v0.12.0 double-encode bug class doesn't apply because positional binding through `unsafe()` reaches the wire protocol with the correct type oid (verified by `test/sql-query.test.ts` on PGLite and `test/e2e/auth-permissions.test.ts:67` on Postgres). `scripts/check-jsonb-pattern.sh` doesn't fire because `executeRawJsonb(...)` is a method call, not the banned literal-template-tag interpolation pattern. Consumed by `src/commands/auth.ts`, `src/commands/serve-http.ts`, `src/core/oauth-provider.ts`, `src/commands/files.ts`, and `src/mcp/http-transport.ts` so all five sites work against PGLite and Postgres uniformly. Closes the bug where `gbrain auth` + `gbrain serve --http` were silently Postgres-only because they routed every SQL through the postgres.js singleton (community PR #681). +- `src/commands/serve.ts` (v0.31.3) — `gbrain serve` stdio MCP entrypoint with idempotent shutdown across every parent-disconnect signal. Stdio EOF, SIGTERM, SIGINT, SIGHUP, and parent-process death (every reparent case — PID 1, launchd subreaper, systemd, tmux, or a parent shell with `PR_SET_CHILD_SUBREAPER`) all funnel into one `cleanup(reason)` path that releases the engine and the PGLite write-lock dir within 5 seconds. Pre-v0.31.3 the stdio MCP server held the lock indefinitely after Claude Desktop / Cursor / launchd-managed gateways disconnected, forcing a 5-minute stale-lock wait on the next start. Watchdog reparent check is `getParentPid() !== initialParentPid` (capturing the initial ppid once at install time and firing on any change); the previous `=== 1` check missed the subreaper case under launchd / systemd. Bun's `process.ppid` cache is stale across reparenting (see [oven-sh/bun#30305](https://github.com/oven-sh/bun/issues/30305)) so `getParentPid()` runs `spawnSync('ps', ['-o', 'ppid=', '-p', PID])` per tick to read the live kernel PPID. Startup probe verifies `ps` is on PATH; if not (stripped containers, busybox without procps), the watchdog skips installing AND emits a loud `[gbrain serve] watchdog disabled: ps unavailable, parent-death detection unavailable — child will rely on stdin EOF / signals only` stderr line so operators see the degraded mode at boot. Pinned by `test/serve-stdio-lifecycle.test.ts` (22 cases). Closes #413, #446. Credit @Aragorn2046 (origin features in #591) and @seungsu-kr (rebased submitter, Bun ppid workaround). - `src/core/oauth-provider.ts` (v0.26.0) — `GBrainOAuthProvider` implementing the MCP SDK's `OAuthServerProvider` + `OAuthRegisteredClientsStore` interfaces. Backed by raw SQL (works on both PGLite and Postgres — OAuth is infrastructure, not a BrainEngine concern). Full OAuth 2.1 spec: `authorize` + `exchangeAuthorizationCode` with PKCE (for ChatGPT), `client_credentials` (for Perplexity / Claude), `refresh_token` with rotation, `revokeToken`, `registerClient` (DCR path validates redirect_uri must be `https://` or loopback per RFC 6749 §3.1.2.1). All tokens + client secrets SHA-256 hashed before storage. Auth codes single-use with 10-minute TTL via atomic `DELETE...RETURNING` (closes RFC 6749 §10.5 TOCTOU race). Refresh rotation also `DELETE...RETURNING` (closes §10.4 stolen-token detection bypass). `pgArray()` escapes commas/quotes/braces in elements so a comma-bearing redirect_uri can't smuggle a second array element. Legacy `access_tokens` fallback in `verifyAccessToken` grandfathers pre-v0.26 bearer tokens as `read+write+admin`. `sweepExpiredTokens()` runs on startup wrapped in try/catch. **v0.26.9 RFC 6749/7009 hardening pass:** F1+F2 fold `client_id` atomically into the `DELETE WHERE` clauses for both auth-code exchange and refresh rotation — pre-fix the post-hoc client compare burned the row on wrong-client paths so the legitimate client couldn't retry. F3 enforces refresh-scope-subset against the original grant on the row (RFC 6749 §6), not the client's currently-allowed scopes — fixes the case where revoking a scope from a client wouldn't shrink the agent's existing refresh tokens. F4 binds `client_id` on `revokeToken` so a client can only revoke its own tokens (RFC 7009 §2.1). F7c validates the `/token` request's `redirect_uri` against the value stored at `/authorize` (RFC 6749 §4.1.3) — empty-string treated as missing rather than wildcard match (adversarial-review fix). F5 swaps bare `catch {}` blocks in `verifyAccessToken` and `getClient` for `isUndefinedColumnError` from `src/core/utils.ts` — only SQLSTATE 42703 falls through to legacy fallback; lock timeouts and network blips throw and surface. F6 makes `sweepExpiredTokens()` actually return the count via `RETURNING 1` + array length, not a fire-and-forget zero. F12 adds `dcrDisabled` constructor option so `serve-http.ts` can disable the `/register` endpoint without monkey-patching the router. **v0.26.2:** module-private `coerceTimestamp()` boundary helper at the top of the file normalizes postgres-driver-as-string BIGINT columns to JS numbers at every read site (5 call sites: `getClient` L112+L113 for DCR `/register` RFC 7591 §3.2.1 numeric timestamps, `exchangeRefreshToken` L274 + `verifyAccessToken` L296+L303 for the SDK's `typeof === 'number'` bearerAuth check). Throws on non-finite input (NaN/Infinity) so corrupt rows fail loud at the boundary instead of riding through as `expiresAt: NaN`; returns undefined for SQL NULL so callers decide NULL semantics explicitly (refresh + access token paths treat NULL as expired). Helper intentionally NOT promoted to `src/core/utils.ts` — codex review flagged repo-wide BIGINT precision-loss risk for a generic helper. - `admin/` (v0.26.0) — React 19 + Vite + TypeScript admin SPA embedded in the binary via `admin/dist/` served by `serve-http.ts`. 7 screens: Login (bootstrap token → session cookie), Dashboard (metrics + SSE feed + token health), Agents (sortable table + sparklines + Register button), Register (modal with scope checkboxes + grant type selector), Credentials reveal (full-screen modal with Copy + Download JSON + yellow one-time-only warning), Request Log (filterable paginated), Agent Detail drawer (Details / Activity / Config Export tabs + Revoke). Design tokens: `#0a0a0f` bg, Inter for UI, JetBrains Mono for data, 4-32px spacing scale, rounded pill badges. HTTP-only SameSite=Strict cookie auth. 65KB gzip. Build: `cd admin && bun install && bun run build`; output at `admin/dist/` is committed for self-contained binaries. -- `src/commands/auth.ts` — Token management. `gbrain auth create/list/revoke/test` for legacy bearer tokens (v0.22.7 wired as a first-class CLI subcommand) plus `gbrain auth register-client` (v0.26.0) and `gbrain auth revoke-client ` (v0.26.2) for OAuth 2.1 client lifecycle. `revoke-client` runs an atomic `DELETE...RETURNING` on `oauth_clients`; FK `ON DELETE CASCADE` on `oauth_tokens.client_id` and `oauth_codes.client_id` purges every active token + authorization code in a single transaction. `process.exit(1)` on no-such-client (idempotent — re-running on the same id produces the same exit-1 message). Legacy tokens stored as SHA-256 hashes in `access_tokens`; OAuth clients in `oauth_clients`. As of v0.26.0, legacy tokens grandfather to `read+write+admin` scopes on the OAuth HTTP server, so pre-v0.26 deployments keep working with no migration. +- `src/commands/auth.ts` — Token management. `gbrain auth create/list/revoke/test` for legacy bearer tokens (v0.22.7 wired as a first-class CLI subcommand) plus `gbrain auth register-client` (v0.26.0) and `gbrain auth revoke-client ` (v0.26.2) for OAuth 2.1 client lifecycle. `revoke-client` runs an atomic `DELETE...RETURNING` on `oauth_clients`; FK `ON DELETE CASCADE` on `oauth_tokens.client_id` and `oauth_codes.client_id` purges every active token + authorization code in a single transaction. `process.exit(1)` on no-such-client (idempotent — re-running on the same id produces the same exit-1 message). Legacy tokens stored as SHA-256 hashes in `access_tokens`; OAuth clients in `oauth_clients`. As of v0.26.0, legacy tokens grandfather to `read+write+admin` scopes on the OAuth HTTP server, so pre-v0.26 deployments keep working with no migration. **v0.31.3 (#681):** every SQL site routes through `sqlQueryForEngine(engine)` from `src/core/sql-query.ts` (and `executeRawJsonb` for the takes-holders `permissions` JSONB column) so `gbrain auth` works against PGLite brains. Pre-fix, every call hit the postgres.js singleton via `getConn()` and silently failed (or wrote to the wrong DB) when the active engine was PGLite. The takes-holders write goes through `executeRawJsonb(engine, sql, [name, hash], [{takes_holders:[...]}])` which round-trips with `jsonb_typeof = 'object'` instead of the pre-v0.31.3 quoted-string shape. - `src/commands/upgrade.ts` — Self-update CLI. `runPostUpgrade()` enumerates migrations from the TS registry (src/commands/migrations/index.ts) and tail-calls `runApplyMigrations(['--yes', '--non-interactive'])` so the mechanical side of every outstanding migration runs unconditionally. - `src/commands/migrations/` — TS migration registry (compiled into the binary; no filesystem walk of `skills/migrations/*.md` needed at runtime). `index.ts` lists migrations in semver order. `v0_11_0.ts` = Minions adoption orchestrator (8 phases). `v0_12_0.ts` = Knowledge Graph auto-wire orchestrator (5 phases: schema → config check → backfill links → backfill timeline → verify). `phaseASchema` has a 600s timeout (bumped from 60s in v0.12.1 for duplicate-heavy brains). `v0_12_2.ts` = JSONB double-encode repair orchestrator (4 phases: schema → repair-jsonb → verify → record). `v0_14_0.ts` = shell-jobs + autopilot cooperative (2 phases: schema ALTER minion_jobs.max_stalled SET DEFAULT 3 — superseded by v0.14.3's schema-level DEFAULT 5 + UPDATE backfill; pending-host-work ping for skills/migrations/v0.14.0.md). All orchestrators are idempotent and resumable from `partial` status. As of v0.14.2 (Bug 3), the RUNNER owns all ledger writes — orchestrators return `OrchestratorResult` and `apply-migrations.ts` persists a canonical `{version, status, phases}` shape after return. Orchestrators no longer call `appendCompletedMigration` directly. `statusForVersion` prefers `complete` over `partial` (never regresses). 3 consecutive partials → wedged → `--force-retry ` writes a `'retry'` reset marker. v0.14.3 (fix wave) ships schema-only migrations v14 (`pages_updated_at_index`) + v15 (`minion_jobs_max_stalled_default_5` with UPDATE backfill) via the `MIGRATIONS` array in `src/core/migrate.ts` — no orchestrator phases needed. - `src/commands/repair-jsonb.ts` — `gbrain repair-jsonb [--dry-run] [--json]`: rewrites `jsonb_typeof='string'` rows in place across 5 affected columns (pages.frontmatter, raw_data.data, ingest_log.pages_updated, files.metadata, page_versions.frontmatter). Fixes v0.12.0 double-encode bug on Postgres; PGLite no-ops. Idempotent. @@ -253,7 +255,7 @@ strict behavior when unset. - `src/commands/transcripts.ts` (v0.29) — `gbrain transcripts recent [--days N] [--full] [--json]`: recent raw `.txt` transcripts from the dream-cycle corpus dirs. Imports `listRecentTranscripts` from `src/core/transcripts.ts` (the same library the gated `get_recent_transcripts` MCP op uses). Local-only by construction — the CLI always runs with `ctx.remote=false`. - `src/commands/integrity.ts` — `gbrain integrity check|auto|review|extract`: bare-tweet detection, dead-link detection, three-bucket repair (auto-repair / review-queue / skip). `scanIntegrity()` is the shared library function called from `gbrain doctor` (sampled at limit=500) and `cmdCheck` (full scan). v0.22.8: batch-load fast path on Postgres uses `SELECT DISTINCT ON (slug)` in a single SQL query to fix the PgBouncer round-trip timeout (60s → ~6s) while preserving `engine.getAllSlugs()`'s `Set` semantics on multi-source brains. Gated by `engine.kind === 'postgres'` at the call site so PGLite never enters batch; fallback `catch` logs at `GBRAIN_DEBUG=1` so real Postgres errors are diagnosable. - `src/commands/doctor.ts` — `gbrain doctor [--json] [--fast] [--fix] [--dry-run] [--index-audit]`: health checks. v0.12.3 added `jsonb_integrity` + `markdown_body_completeness` reliability checks. v0.14.1: `--fix` delegates inlined cross-cutting rules to `> **Convention:** see [path](path).` callouts (pipes DRY violations into `src/core/dry-fix.ts`); `--fix --dry-run` previews without writing. v0.14.2: `schema_version` check fails loudly when `version=0` (migrations never ran — the #218 `bun install -g` signature) and routes users to `gbrain apply-migrations --yes`; new opt-in `--index-audit` flag (Postgres-only) reports zero-scan indexes from `pg_stat_user_indexes` (informational only, no auto-drop). v0.15.2: every DB check is wrapped in a progress phase; `markdown_body_completeness` runs under a 1s heartbeat timer so 10+ min scans are observable on 50K-page brains. v0.19.1 added `queue_health` (Postgres-only) with two subchecks: stalled-forever active jobs (started_at > 1h) and waiting-depth-per-name > threshold (default 10, override via `GBRAIN_QUEUE_WAITING_THRESHOLD`). Worker-heartbeat subcheck intentionally deferred to follow-up B7 because it needs a `minion_workers` table to produce ground-truth signal. Fix hints point at `gbrain repair-jsonb`, `gbrain sync --force`, `gbrain apply-migrations`, and `gbrain jobs get/cancel `. v0.22.12 (#500): `sync_failures` check shows `[CODE=N, ...]` breakdown for both unacked entries (warn) and acked-historical entries (ok), surfacing systemic failure modes (`SLUG_MISMATCH=2685`) instead of a bare count. v0.26.7 (#612): `rls_event_trigger` check (post-install drift detector for migration v35's auto-RLS event trigger). Lives outside the `// 5. RLS` slice that the structural doctor.test.ts guards anchor on, so the existing test guards stay intact. Healthy `evtenabled` set is `('O','A')` only — `R` is replica-only and would not fire in normal sessions; `D` is disabled. Fix hint is `gbrain apply-migrations --force-retry 35`. **v0.30.2:** `queue_health` gains a fourth subcheck — surfaces dead-lettered subagent jobs with `last_error` matching the `prompt_too_long` classifier within the last 24h. Fix hint points at `gbrain dream --phase synthesize --dry-run --json` to identify the offending transcript and `gbrain jobs prune --status dead --queue default` to clean up. Postgres-only. -- `src/core/migrate.ts` — schema-migration runner. Owns the `MIGRATIONS` array (source of truth for schema DDL). **v40 (v0.29):** `pages_emotional_weight` adds `pages.emotional_weight REAL NOT NULL DEFAULT 0.0`. Column-only (no index). On Postgres 11+ and PGLite, `ADD COLUMN` with a constant DEFAULT is metadata-only — instant on tables of any size. v0.14.2 extended the `Migration` interface with `sqlFor?: { postgres?, pglite? }` (engine-specific SQL overrides `sql`) and `transaction?: boolean` (set to false for `CREATE INDEX CONCURRENTLY`, which Postgres refuses inside a transaction; ignored on PGLite since it has no concurrent writers). Migration v14 (fix wave) uses a handler branching on `engine.kind` to run CONCURRENTLY on Postgres (with a pre-drop of any invalid remnant via `pg_index.indisvalid`) and plain `CREATE INDEX` on PGLite. v15 bumps `minion_jobs.max_stalled` default 1→5 and backfills existing non-terminal rows. v0.22.6.1: migration v24 (`rls_backfill_missing_tables`) uses `sqlFor: { pglite: '' }` to no-op on PGLite — PGLite has no RLS engine and is single-tenant by definition, and the v24 ALTERs target subagent tables that don't exist in pglite-schema.ts. Closes #395 (contributed by @jdcastro2). **v30 (v0.23):** creates `dream_verdicts (file_path TEXT, content_hash TEXT, worth_processing BOOL, reasons JSONB, judged_at TIMESTAMPTZ, PK(file_path, content_hash))`. RLS-enabled when running as a BYPASSRLS role. The synthesize phase reads/writes this table to avoid re-judging on backfill re-runs. **v35 (v0.26.7):** auto-RLS event trigger + one-time backfill. `auto_rls_on_create_table` fires on `ddl_command_end` for `WHEN TAG IN ('CREATE TABLE','CREATE TABLE AS','SELECT INTO')` and runs `ALTER TABLE … ENABLE ROW LEVEL SECURITY` on every new `public.*` table — no FORCE (matches v24/v29/schema.sql posture so non-BYPASSRLS apps can still read their own tables). The same migration backfills RLS on every existing `public.*` base table whose comment doesn't match the doctor regex (`^GBRAIN:RLS_EXEMPT\s+reason=\S.{3,}`). Per-table failure aborts the offending CREATE TABLE (event triggers fire inside the DDL transaction); no EXCEPTION wrap — that would convert loud rollback into silent permissive default. PGLite no-op via `sqlFor.pglite: ''`. Breaking change: operators with intentionally-RLS-off public tables must add the GBRAIN:RLS_EXEMPT comment BEFORE upgrade or the backfill will flip them on. +- `src/core/migrate.ts` — schema-migration runner. Owns the `MIGRATIONS` array (source of truth for schema DDL). **v40 (v0.29):** `pages_emotional_weight` adds `pages.emotional_weight REAL NOT NULL DEFAULT 0.0`. Column-only (no index). On Postgres 11+ and PGLite, `ADD COLUMN` with a constant DEFAULT is metadata-only — instant on tables of any size. v0.14.2 extended the `Migration` interface with `sqlFor?: { postgres?, pglite? }` (engine-specific SQL overrides `sql`) and `transaction?: boolean` (set to false for `CREATE INDEX CONCURRENTLY`, which Postgres refuses inside a transaction; ignored on PGLite since it has no concurrent writers). Migration v14 (fix wave) uses a handler branching on `engine.kind` to run CONCURRENTLY on Postgres (with a pre-drop of any invalid remnant via `pg_index.indisvalid`) and plain `CREATE INDEX` on PGLite. v15 bumps `minion_jobs.max_stalled` default 1→5 and backfills existing non-terminal rows. v0.22.6.1: migration v24 (`rls_backfill_missing_tables`) uses `sqlFor: { pglite: '' }` to no-op on PGLite — PGLite has no RLS engine and is single-tenant by definition, and the v24 ALTERs target subagent tables that don't exist in pglite-schema.ts. Closes #395 (contributed by @jdcastro2). **v30 (v0.23):** creates `dream_verdicts (file_path TEXT, content_hash TEXT, worth_processing BOOL, reasons JSONB, judged_at TIMESTAMPTZ, PK(file_path, content_hash))`. RLS-enabled when running as a BYPASSRLS role. The synthesize phase reads/writes this table to avoid re-judging on backfill re-runs. **v35 (v0.26.7):** auto-RLS event trigger + one-time backfill. `auto_rls_on_create_table` fires on `ddl_command_end` for `WHEN TAG IN ('CREATE TABLE','CREATE TABLE AS','SELECT INTO')` and runs `ALTER TABLE … ENABLE ROW LEVEL SECURITY` on every new `public.*` table — no FORCE (matches v24/v29/schema.sql posture so non-BYPASSRLS apps can still read their own tables). The same migration backfills RLS on every existing `public.*` base table whose comment doesn't match the doctor regex (`^GBRAIN:RLS_EXEMPT\s+reason=\S.{3,}`). Per-table failure aborts the offending CREATE TABLE (event triggers fire inside the DDL transaction); no EXCEPTION wrap — that would convert loud rollback into silent permissive default. PGLite no-op via `sqlFor.pglite: ''`. Breaking change: operators with intentionally-RLS-off public tables must add the GBRAIN:RLS_EXEMPT comment BEFORE upgrade or the backfill will flip them on. **v46 (v0.31.3):** `mcp_request_log_params_jsonb_normalize` rewrites pre-v0.31.3 rows where `mcp_request_log.params` was stored as a JSON-encoded string (`jsonb_typeof = 'string'`) up to a real JSONB object via `UPDATE ... SET params = params::text::jsonb WHERE jsonb_typeof(params) = 'string'`. Single statement, idempotent — second-run finds no string-shaped rows and is a no-op. Closes the bug where `/admin/api/requests` returned a quoted string instead of the parsed object. - `src/core/progress.ts` — Shared bulk-action progress reporter. Writes to stderr. Modes: `auto` (TTY: `\r`-rewriting; non-TTY: plain lines), `human`, `json` (JSONL), `quiet`. Rate-gated by `minIntervalMs` and `minItems`. `startHeartbeat(reporter, note)` helper for single long queries. `child()` composes phase paths. Singleton SIGINT/SIGTERM coordinator emits `abort` events for every live phase. EPIPE defense on both sync throws and stream `'error'` events. Zero dependencies. Introduced in v0.15.2. - `src/core/cli-options.ts` — Global CLI flag parser. `parseGlobalFlags(argv)` returns `{cliOpts, rest}` with `--quiet` / `--progress-json` / `--progress-interval=` stripped. `getCliOptions()` / `setCliOptions()` expose a module-level singleton so commands reach the resolved flags without parameter threading. `cliOptsToProgressOptions()` maps to reporter options. `childGlobalFlags()` returns the flag suffix to append to `execSync('gbrain ...')` calls in migration orchestrators. `OperationContext.cliOpts` extends shared-op dispatch for MCP callers. - `src/core/db-lock.ts` (v0.22.13) — generic `tryAcquireDbLock(engine, lockId, ttlMinutes)` over the existing `gbrain_cycle_locks` table. Parameterized lock id so different scopes can nest cleanly: `gbrain-cycle` for the broad cycle (held by `cycle.ts`) and `gbrain-sync` (`SYNC_LOCK_ID` constant) for `performSync`'s narrower writer window. Same UPSERT-with-TTL semantics as the prior cycle-only helper, just generalized. Survives PgBouncer transaction pooling (unlike session-scoped `pg_try_advisory_lock`); crashed holders auto-release once their TTL expires.