diff --git a/CHANGELOG.md b/CHANGELOG.md index 69218cce8..c4590c2e0 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -2,6 +2,85 @@ All notable changes to GBrain will be documented in this file. +## [0.31.3] - 2026-05-09 + +**`gbrain serve` stops leaking the PGLite write lock when the parent disconnects, and `gbrain auth` + `gbrain serve --http` work against PGLite brains.** + +Two community PRs (#676 stdio cleanup, #681 engine-aware auth SQL) landed in one focused PR with two atomic commits. Both fixed real bugs that were biting users in production: stdio MCP servers held the PGLite write lock indefinitely after Claude Desktop / Cursor / launchd-managed gateways disconnected, forcing a 5-minute stale-lock wait on the next start. And `gbrain auth` + the OAuth admin server silently routed every SQL through the postgres.js singleton, which made `gbrain serve --http` Postgres-only even though the schema supported PGLite. + +### What you can now do + +**Restart your MCP server without waiting 5 minutes.** Stdio EOF, SIGTERM, SIGINT, SIGHUP, and parent-process death (every reparent case ... PID 1, launchd subreaper, systemd, tmux, or a parent shell with PR_SET_CHILD_SUBREAPER) all funnel into one idempotent shutdown that releases the engine and the lock dir within 5 seconds. Closes #413 and #446. Credit @Aragorn2046 (origin features in #591) and @seungsu-kr (rebased submitter, Bun ppid workaround). + +**Run `gbrain serve --http` against a PGLite brain.** Auth (`gbrain auth create/list/revoke/permissions`), OAuth 2.1 client registration + token mint, the admin SPA, file uploads, and the legacy bearer-auth HTTP transport all run through the active engine now via a deliberately-narrow `SqlQuery` adapter (`src/core/sql-query.ts`) that calls `engine.executeRaw`. Previously, with `DATABASE_URL` set but the config file pointing at PGLite, auth commands silently hit the wrong brain. Credit codex-bot. + +**`gbrain auth permissions ... set-takes-holders` and the four `mcp_request_log.params` INSERTs write real JSONB objects.** Pre-v0.31.3 they wrote `JSON.stringify(...)` strings into JSONB columns via the postgres.js template tag's loose typing ... the column was technically JSONB but stored as a quoted string, so reads via `params->>'op'` returned the encoded string `"search"` instead of `search`. Migration v46 normalizes the entire backlog of pre-v0.31.3 string-shaped rows up to objects so the `/admin/api/requests` endpoint returns one consistent shape to the SPA. Idempotent. + +**Bun's `process.ppid` cache no longer creates a phantom watchdog.** Bun (and Node ... see [oven-sh/bun#30305](https://github.com/oven-sh/bun/issues/30305)) caches `process.ppid` at process creation and never refreshes when the kernel re-parents. The watchdog now runs `spawnSync('ps','-o','ppid=','-p',PID)` per tick to read the live kernel PPID. A one-shot startup probe verifies ps is available; if it isn't (stripped containers, busybox without procps), the watchdog skips installing entirely AND emits a loud stderr line ... instead of silently running every 5 seconds and never firing. + +**Startup probe for stripped-container deployments.** If `ps` isn't on PATH, `gbrain serve` prints `[gbrain serve] watchdog disabled: ps unavailable, parent-death detection unavailable ... child will rely on stdin EOF / signals only` once at startup. Operators see the degraded mode at boot instead of wondering why their phantom watchdog never fires. + +### Numbers that matter + +| Failure mode | Before v0.31.3 | After v0.31.3 | +|---|---|---| +| Parent dies, stdin still open (launchd/cron orphan) | Lock held forever, next `gbrain serve` blocks 5+ minutes | Watchdog fires within 5s, lock released, next start clean | +| Parent reparents to subreaper PID 47 (systemd, tmux) | Watchdog silently no-op (only checked `ppid === 1`) | Watchdog fires (`ppid !== initialParentPid`) | +| `gbrain serve --http` against PGLite brain | Hard fail at startup, "Postgres only" | Works | +| `gbrain auth create` with `DATABASE_URL` set + PGLite config file | Silently writes to PGLite, env var ignored | DATABASE_URL wins, infers Postgres engine, clears stale `database_path` | +| Read `mcp_request_log.params` in admin SPA after a pre-v0.31.3 row was logged | Returned a JSON-encoded string `"{\"op\":\"search\"}"` | Returns the object `{op:'search'}` | +| `params->>'op'` SQL on pre-v0.31.3 rows | Returned the encoded string `"search"` (with quotes) | After v46 migration: returns `search` | +| `bun test test/serve-stdio-lifecycle.test.ts` | (test file did not exist) | 22 cases, all green | + +### What this means for your workflow + +If your editor talks to gbrain over stdio MCP ... Claude Desktop, Cursor, MCP gateway ... you'll notice restarts happen instantly instead of hanging on lock contention. If you've been running gbrain on PGLite (the default for new installs), `gbrain auth create`, OAuth client registration, and the `--http` admin dashboard all work now. Run `gbrain upgrade` to land both fixes plus the v46 migration that normalizes any string-shaped `mcp_request_log.params` rows you accumulated. + +### Important note for upgraders + +`gbrain upgrade` runs the v46 normalizer migration automatically. If you have a long history of `gbrain serve --http` requests in `mcp_request_log`, the UPDATE could touch many rows on first run ... it's a single statement filtered to `jsonb_typeof = 'string'` and won't lock the table for long, but you may see a brief CPU spike on Supabase Postgres if the table is large. Subsequent upgrades find no string-shaped rows and the migration is a no-op. + +### How it works under the hood + +`SqlQuery` is a deliberately narrow tagged-template adapter (`src/core/sql-query.ts`) that builds positional SQL with `$N` placeholders and calls `engine.executeRaw(sql, params)`. Scalar binds only ... the contract is the feature. JSONB writes go through a separate `executeRawJsonb(engine, sql, scalarParams, jsonbParams)` helper that composes positional `$N::jsonb` casts and passes JS objects through. The v0.12.0 double-encode bug class doesn't apply to positional binding through postgres.js's `unsafe()` (verified by `test/e2e/auth-permissions.test.ts:67` on Postgres and `test/sql-query.test.ts` on PGLite). + +`scripts/check-jsonb-pattern.sh` doesn't fire because `executeRawJsonb(...)` is a method call, not the banned literal-template-tag interpolation pattern. + +The watchdog reparent check is `getParentPid() !== initialParentPid`, capturing the initial ppid once at install time and firing on any change. The previous `=== 1` check missed the subreaper case that surfaced under launchd / systemd PR_SET_CHILD_SUBREAPER environments. + +### Tests + +- `test/serve-stdio-lifecycle.test.ts` (22 cases) ... signals (SIGTERM/SIGINT/SIGHUP), stdin EOF / close, TTY skip, watchdog reparent-to-1 AND reparent-to-subreaper-PID-47, ps-unavailable startup probe, idempotent shutdown under racing signals, 5s cleanup deadline when `disconnect()` rejects, `--stdio-idle-timeout` strict parse rejects (`abc`, `30junk`, `-1`, `1.5`, blank, missing). +- `test/sql-query.test.ts` (7 cases) ... scalar binds round-trip, array/promise/object rejection, `executeRawJsonb` JSONB round-trip with `jsonb_typeof = 'object'` and `->>` semantics, v0.12.0 double-encode regression guard, null JSONB handling, scalars-then-jsonb call shape. +- `test/config-env.test.ts` (5 cases, migrated to `withEnv()` helper per CLAUDE.md R1) ... `DATABASE_URL` + `GBRAIN_DATABASE_URL` precedence, file-only config, env-only config, no-config null path. +- `test/e2e/auth-takes-holders-pglite.test.ts` (6 cases against in-memory PGLite, no DATABASE_URL gate) ... full takes-holders create/update round-trip, mcp_request_log.params object + null writes, migration v46 normalizer (seed string-shaped row, run UPDATE, assert object shape; second-run no-op for idempotency). +- `test/http-transport.test.ts` (24 cases, mock updated to intercept `engine.executeRaw`) ... bearer auth, /health DB probe, 401 paths, rate limit, body cap, mcp_request_log audit. + +### To take advantage of v0.31.3 + +`gbrain upgrade` runs the v46 migration automatically on first start. Verify: + +```bash +gbrain upgrade +gbrain doctor # confirm migration v46 is applied +``` + +If you're on PGLite and want to confirm `gbrain serve --http` works: + +```bash +gbrain auth register-client test --grant-types client_credentials --scopes read +gbrain serve --http & +# Mint a token via the OAuth /token endpoint, hit /mcp with a real op. +``` + +If `gbrain doctor` flags anything after upgrade, file an issue: https://github.com/garrytan/gbrain/issues with `~/.gbrain/upgrade-errors.jsonl` if it exists and which step broke. + +### For contributors + +This PR went through `/plan-eng-review` (5 issues, all addressed) and `/codex` outside-voice consult-mode plan review (10 findings, 9 closed by re-design ... including reversing the original "extend SqlQuery with .json()" plan to "ship a separate `executeRawJsonb` helper" because codex argued the SqlQuery adapter should stay scalar-only or it drifts into a partial postgres.js clone). Decision log lives at `~/.claude/plans/system-instruction-you-are-working-peppy-moore.md`. Two atomic bisect-friendly commits in one PR. + +Closes #413, #446. Supersedes #591. Closes the architectural intent of #676 and #681. + ## [0.31.2] - 2026-05-08 **`gbrain sync --strategy code` no longer hangs on big repos. Code-strategy first-sync actually indexes code files. Walker hardened. Tree-sitter is now bounded.** diff --git a/CLAUDE.md b/CLAUDE.md index c90d11116..2aa53b1e1 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -140,10 +140,12 @@ strict behavior when unset. - `src/mcp/server.ts` — MCP stdio server (generated from operations). v0.22.7: tool-call handler delegates to `dispatchToolCall` from `src/mcp/dispatch.ts` so stdio + HTTP transports share one validation, context-build, and error-format path. - `src/mcp/dispatch.ts` (v0.22.7) — Shared tool-call dispatch consumed by both stdio (`server.ts`) and HTTP transports. Exports `dispatchToolCall(engine, name, params, opts)`, `buildOperationContext(engine, params, opts)`, and `validateParams(op, params)`. Single source of truth for `(ctx, params)` handler arg order and the 5-field `OperationContext` shape (engine + config + logger + dryRun + remote). Defaults to `remote: true` (untrusted); local CLI callers pass `remote: false`. Closed F1/F2/F3 drift bugs in the original v0.22.5 HTTP transport. **v0.26.9 (F8):** adds `summarizeMcpParams(opName, params)` — privacy-preserving redactor for `mcp_request_log` and the admin SSE feed. Returns `{redacted, kind, declared_keys, unknown_key_count, approx_bytes}`. Intersects submitted top-level keys against the operation's declared `params` allow-list (declared keys preserved as a sorted array for debug visibility; unknown keys counted but never named, closing the attacker-controlled-key-name leak). Byte counts bucketed up to nearest 1KB so an attacker can't binary-search secret-content sizes via repeated probes. Operators on a personal laptop who want raw payload visibility opt back in with `gbrain serve --http --log-full-params` (loud stderr warning at startup). Canonical helper — new logging code paths route through it rather than `JSON.stringify(params)`. - `src/mcp/rate-limit.ts` (v0.22.7) — Bounded-LRU token-bucket limiter. `buildDefaultLimiters()` returns the two-bucket pipeline: pre-auth IP (30/60s, fires BEFORE the DB lookup so brute-force load against `access_tokens` is actually capped) + post-auth token-id (60/60s). Tracks `lastTouchedMs` separately from `lastRefillMs` so an exhausted key can't be reset by hammering past the TTL. LRU cap bounds memory under attacker-controlled key growth. -- `src/commands/serve-http.ts` (v0.26.0) — Express 5 HTTP MCP server with OAuth 2.1, admin dashboard, and SSE live activity feed. Started via `gbrain serve --http [--port N] [--token-ttl N] [--enable-dcr] [--public-url URL] [--log-full-params]`. Supersedes the v0.22.7 `src/mcp/http-transport.ts` simple bearer-auth path. Combines MCP SDK's `mcpAuthRouter` (authorize / token / register / revoke endpoints), a custom `client_credentials` handler (SDK's token endpoint throws `UnsupportedGrantTypeError` for CC; the custom handler runs BEFORE the router and falls through for `auth_code` / `refresh_token`), `requireBearerAuth` middleware for `/mcp` with scope enforcement before op dispatch, `localOnly` rejection, and `express-rate-limit` at 50 req / 15 min on `/token`. Serves the built admin SPA from `admin/dist/` with SPA fallback. `/admin/events` SSE endpoint broadcasts every MCP request to connected admin browsers. `cookie-parser` middleware wired (Express 5 has no built-in). Startup logging prints port, engine, configured issuer URL (honors `--public-url`), registered-client count, DCR status, and admin bootstrap token. **v0.26.9 hardening pass:** F7 sets `remote: true` explicitly on the `/mcp` request handler's OperationContext literal (closes the HTTP shell-job RCE — without this, `submit_job`'s protected-name guard at `operations.ts:1391` saw a falsy undefined and skipped, letting a `read+write`-scoped OAuth token submit `shell` jobs). F8 wires `summarizeMcpParams` from `src/mcp/dispatch.ts` into both `mcp_request_log` writes and the admin SSE feed by default (raw payloads opt-in via `--log-full-params` with stderr warning). F9 sets cookie `Secure` flag when behind HTTPS or a public-URL proxy. F10 caps the magic-link nonce store with an LRU bound. F12 routes DCR disable through the `GBrainOAuthProvider` constructor's `dcrDisabled` option instead of the prior monkey-patch on the express router. F14 wraps `transport.handleRequest` in try/catch so SDK throws return a JSON-RPC 500 envelope instead of express's default HTML error page. F15 unifies OperationError + unexpected exceptions through `buildError` / `serializeError` so `/mcp` always returns the same envelope shape. **v0.28.1:** `/health` endpoint extracted into pure `probeHealth(engine)` async function with `HEALTH_TIMEOUT_MS = 3000` exported constant — drops the timeout from 5s to 3s so Fly.io's 5s health-check deadline gets 2s of headroom for TCP, response framing, and clock skew. Races `engine.getStats()` against the timeout via `Promise.race`; saturated pool returns 503 with `Health check timed out (database pool may be saturated)` instead of hanging. `clearTimeout` in finally block prevents pending-timer pile-up under high probe rates (race-leak fix from adversarial review). **v0.28.10:** `/health` is now liveness-only via the new `probeLiveness(sql, engineName, version, timeoutMs)` helper that races `sql\`SELECT 1\`` against `HEALTH_TIMEOUT_MS` and returns the same `ProbeHealthResult` tagged-union as `probeHealth` (single timer-cleanup site, single 503 envelope). Body shape: `{status, version, engine}` only — engine stats are no longer spread on the public route. Full stats moved to a new admin endpoint `/admin/api/full-stats` (sibling to `/admin/api/stats` and `/admin/api/health-indicators`) gated by the existing `requireAdmin` middleware; that route calls `probeHealth(engine, ...)` and returns the original spread-stats body. `?full=true` query param removed entirely. Closes the original DoS surface where `getStats()`'s 6× count(*) on 96K-page brains through PgBouncer exceeded `HEALTH_TIMEOUT_MS` and triggered orchestrator restart cascades (Fly.io / k8s seeing 503 → restart loop → advisory-lock pile-up on the migration lock). Outside-voice review (Codex) caught that `/admin/api/health-indicators` is NOT a full-stats endpoint (returns only `{expiring_soon, error_rate}`), and that an alternative loopback-IP gate would have depended on `app.set('trust proxy', 'loopback')` semantics holding under proxy/XFF misconfiguration; the shipped admin-cookie design avoids both. +- `src/commands/serve-http.ts` (v0.26.0) — Express 5 HTTP MCP server with OAuth 2.1, admin dashboard, and SSE live activity feed. Started via `gbrain serve --http [--port N] [--token-ttl N] [--enable-dcr] [--public-url URL] [--log-full-params]`. Supersedes the v0.22.7 `src/mcp/http-transport.ts` simple bearer-auth path. Combines MCP SDK's `mcpAuthRouter` (authorize / token / register / revoke endpoints), a custom `client_credentials` handler (SDK's token endpoint throws `UnsupportedGrantTypeError` for CC; the custom handler runs BEFORE the router and falls through for `auth_code` / `refresh_token`), `requireBearerAuth` middleware for `/mcp` with scope enforcement before op dispatch, `localOnly` rejection, and `express-rate-limit` at 50 req / 15 min on `/token`. Serves the built admin SPA from `admin/dist/` with SPA fallback. `/admin/events` SSE endpoint broadcasts every MCP request to connected admin browsers. `cookie-parser` middleware wired (Express 5 has no built-in). Startup logging prints port, engine, configured issuer URL (honors `--public-url`), registered-client count, DCR status, and admin bootstrap token. **v0.26.9 hardening pass:** F7 sets `remote: true` explicitly on the `/mcp` request handler's OperationContext literal (closes the HTTP shell-job RCE — without this, `submit_job`'s protected-name guard at `operations.ts:1391` saw a falsy undefined and skipped, letting a `read+write`-scoped OAuth token submit `shell` jobs). F8 wires `summarizeMcpParams` from `src/mcp/dispatch.ts` into both `mcp_request_log` writes and the admin SSE feed by default (raw payloads opt-in via `--log-full-params` with stderr warning). F9 sets cookie `Secure` flag when behind HTTPS or a public-URL proxy. F10 caps the magic-link nonce store with an LRU bound. F12 routes DCR disable through the `GBrainOAuthProvider` constructor's `dcrDisabled` option instead of the prior monkey-patch on the express router. F14 wraps `transport.handleRequest` in try/catch so SDK throws return a JSON-RPC 500 envelope instead of express's default HTML error page. F15 unifies OperationError + unexpected exceptions through `buildError` / `serializeError` so `/mcp` always returns the same envelope shape. **v0.28.1:** `/health` endpoint extracted into pure `probeHealth(engine)` async function with `HEALTH_TIMEOUT_MS = 3000` exported constant — drops the timeout from 5s to 3s so Fly.io's 5s health-check deadline gets 2s of headroom for TCP, response framing, and clock skew. Races `engine.getStats()` against the timeout via `Promise.race`; saturated pool returns 503 with `Health check timed out (database pool may be saturated)` instead of hanging. `clearTimeout` in finally block prevents pending-timer pile-up under high probe rates (race-leak fix from adversarial review). **v0.28.10:** `/health` is now liveness-only via the new `probeLiveness(sql, engineName, version, timeoutMs)` helper that races `sql\`SELECT 1\`` against `HEALTH_TIMEOUT_MS` and returns the same `ProbeHealthResult` tagged-union as `probeHealth` (single timer-cleanup site, single 503 envelope). Body shape: `{status, version, engine}` only — engine stats are no longer spread on the public route. Full stats moved to a new admin endpoint `/admin/api/full-stats` (sibling to `/admin/api/stats` and `/admin/api/health-indicators`) gated by the existing `requireAdmin` middleware; that route calls `probeHealth(engine, ...)` and returns the original spread-stats body. `?full=true` query param removed entirely. Closes the original DoS surface where `getStats()`'s 6× count(*) on 96K-page brains through PgBouncer exceeded `HEALTH_TIMEOUT_MS` and triggered orchestrator restart cascades (Fly.io / k8s seeing 503 → restart loop → advisory-lock pile-up on the migration lock). Outside-voice review (Codex) caught that `/admin/api/health-indicators` is NOT a full-stats endpoint (returns only `{expiring_soon, error_rate}`), and that an alternative loopback-IP gate would have depended on `app.set('trust proxy', 'loopback')` semantics holding under proxy/XFF misconfiguration; the shipped admin-cookie design avoids both. **v0.31.3 (#681):** every OAuth/admin/audit SQL call routes through `sqlQueryForEngine(engine)` from `src/core/sql-query.ts` so `gbrain serve --http` works against PGLite brains. The four `mcp_request_log.params` INSERT sites (success path, auth_failed path, scope_denied path, server-error path) all go through `executeRawJsonb(engine, ...)` so the JSONB column stores real objects, not JSON-encoded strings — closes the bug where `params->>'op'` returned the encoded string `"search"` (with quotes) instead of `search`. Migration v46 normalizes any pre-v0.31.3 string-shaped backlog rows on first start. +- `src/core/sql-query.ts` (v0.31.3) — Engine-aware tagged-template SQL adapter for OAuth/admin/auth infrastructure. `sqlQueryForEngine(engine)` returns a `SqlQuery` (`(strings, ...values) => Promise`) that walks the template, builds `$N` positional SQL, asserts every value is a `SqlValue` (string | number | bigint | boolean | Date | null), and routes through `engine.executeRaw(sql, params)` so Postgres goes via postgres.js's `unsafe(sql, params)` path and PGLite via its embedded `db.query(sql, params)`. Deliberately narrower than postgres.js's `sql` tag: no nested fragments, no `sql.json()`, no `sql.unsafe()`, no `sql.begin()`, no array binding. The narrow surface is the feature — codex finding #7 from the v0.31 plan review argued the adapter should stay scalar-only or it drifts into a partial postgres.js clone. JSONB writes go through the separate `executeRawJsonb(engine, sql, scalarParams, jsonbParams)` helper that composes positional `$N::jsonb` casts and passes JS objects through; the v0.12.0 double-encode bug class doesn't apply because positional binding through `unsafe()` reaches the wire protocol with the correct type oid (verified by `test/sql-query.test.ts` on PGLite and `test/e2e/auth-permissions.test.ts:67` on Postgres). `scripts/check-jsonb-pattern.sh` doesn't fire because `executeRawJsonb(...)` is a method call, not the banned literal-template-tag interpolation pattern. Consumed by `src/commands/auth.ts`, `src/commands/serve-http.ts`, `src/core/oauth-provider.ts`, `src/commands/files.ts`, and `src/mcp/http-transport.ts` so all five sites work against PGLite and Postgres uniformly. Closes the bug where `gbrain auth` + `gbrain serve --http` were silently Postgres-only because they routed every SQL through the postgres.js singleton (community PR #681). +- `src/commands/serve.ts` (v0.31.3) — `gbrain serve` stdio MCP entrypoint with idempotent shutdown across every parent-disconnect signal. Stdio EOF, SIGTERM, SIGINT, SIGHUP, and parent-process death (every reparent case — PID 1, launchd subreaper, systemd, tmux, or a parent shell with `PR_SET_CHILD_SUBREAPER`) all funnel into one `cleanup(reason)` path that releases the engine and the PGLite write-lock dir within 5 seconds. Pre-v0.31.3 the stdio MCP server held the lock indefinitely after Claude Desktop / Cursor / launchd-managed gateways disconnected, forcing a 5-minute stale-lock wait on the next start. Watchdog reparent check is `getParentPid() !== initialParentPid` (capturing the initial ppid once at install time and firing on any change); the previous `=== 1` check missed the subreaper case under launchd / systemd. Bun's `process.ppid` cache is stale across reparenting (see [oven-sh/bun#30305](https://github.com/oven-sh/bun/issues/30305)) so `getParentPid()` runs `spawnSync('ps', ['-o', 'ppid=', '-p', PID])` per tick to read the live kernel PPID. Startup probe verifies `ps` is on PATH; if not (stripped containers, busybox without procps), the watchdog skips installing AND emits a loud `[gbrain serve] watchdog disabled: ps unavailable, parent-death detection unavailable — child will rely on stdin EOF / signals only` stderr line so operators see the degraded mode at boot. Pinned by `test/serve-stdio-lifecycle.test.ts` (22 cases). Closes #413, #446. Credit @Aragorn2046 (origin features in #591) and @seungsu-kr (rebased submitter, Bun ppid workaround). - `src/core/oauth-provider.ts` (v0.26.0) — `GBrainOAuthProvider` implementing the MCP SDK's `OAuthServerProvider` + `OAuthRegisteredClientsStore` interfaces. Backed by raw SQL (works on both PGLite and Postgres — OAuth is infrastructure, not a BrainEngine concern). Full OAuth 2.1 spec: `authorize` + `exchangeAuthorizationCode` with PKCE (for ChatGPT), `client_credentials` (for Perplexity / Claude), `refresh_token` with rotation, `revokeToken`, `registerClient` (DCR path validates redirect_uri must be `https://` or loopback per RFC 6749 §3.1.2.1). All tokens + client secrets SHA-256 hashed before storage. Auth codes single-use with 10-minute TTL via atomic `DELETE...RETURNING` (closes RFC 6749 §10.5 TOCTOU race). Refresh rotation also `DELETE...RETURNING` (closes §10.4 stolen-token detection bypass). `pgArray()` escapes commas/quotes/braces in elements so a comma-bearing redirect_uri can't smuggle a second array element. Legacy `access_tokens` fallback in `verifyAccessToken` grandfathers pre-v0.26 bearer tokens as `read+write+admin`. `sweepExpiredTokens()` runs on startup wrapped in try/catch. **v0.26.9 RFC 6749/7009 hardening pass:** F1+F2 fold `client_id` atomically into the `DELETE WHERE` clauses for both auth-code exchange and refresh rotation — pre-fix the post-hoc client compare burned the row on wrong-client paths so the legitimate client couldn't retry. F3 enforces refresh-scope-subset against the original grant on the row (RFC 6749 §6), not the client's currently-allowed scopes — fixes the case where revoking a scope from a client wouldn't shrink the agent's existing refresh tokens. F4 binds `client_id` on `revokeToken` so a client can only revoke its own tokens (RFC 7009 §2.1). F7c validates the `/token` request's `redirect_uri` against the value stored at `/authorize` (RFC 6749 §4.1.3) — empty-string treated as missing rather than wildcard match (adversarial-review fix). F5 swaps bare `catch {}` blocks in `verifyAccessToken` and `getClient` for `isUndefinedColumnError` from `src/core/utils.ts` — only SQLSTATE 42703 falls through to legacy fallback; lock timeouts and network blips throw and surface. F6 makes `sweepExpiredTokens()` actually return the count via `RETURNING 1` + array length, not a fire-and-forget zero. F12 adds `dcrDisabled` constructor option so `serve-http.ts` can disable the `/register` endpoint without monkey-patching the router. **v0.26.2:** module-private `coerceTimestamp()` boundary helper at the top of the file normalizes postgres-driver-as-string BIGINT columns to JS numbers at every read site (5 call sites: `getClient` L112+L113 for DCR `/register` RFC 7591 §3.2.1 numeric timestamps, `exchangeRefreshToken` L274 + `verifyAccessToken` L296+L303 for the SDK's `typeof === 'number'` bearerAuth check). Throws on non-finite input (NaN/Infinity) so corrupt rows fail loud at the boundary instead of riding through as `expiresAt: NaN`; returns undefined for SQL NULL so callers decide NULL semantics explicitly (refresh + access token paths treat NULL as expired). Helper intentionally NOT promoted to `src/core/utils.ts` — codex review flagged repo-wide BIGINT precision-loss risk for a generic helper. - `admin/` (v0.26.0) — React 19 + Vite + TypeScript admin SPA embedded in the binary via `admin/dist/` served by `serve-http.ts`. 7 screens: Login (bootstrap token → session cookie), Dashboard (metrics + SSE feed + token health), Agents (sortable table + sparklines + Register button), Register (modal with scope checkboxes + grant type selector), Credentials reveal (full-screen modal with Copy + Download JSON + yellow one-time-only warning), Request Log (filterable paginated), Agent Detail drawer (Details / Activity / Config Export tabs + Revoke). Design tokens: `#0a0a0f` bg, Inter for UI, JetBrains Mono for data, 4-32px spacing scale, rounded pill badges. HTTP-only SameSite=Strict cookie auth. 65KB gzip. Build: `cd admin && bun install && bun run build`; output at `admin/dist/` is committed for self-contained binaries. -- `src/commands/auth.ts` — Token management. `gbrain auth create/list/revoke/test` for legacy bearer tokens (v0.22.7 wired as a first-class CLI subcommand) plus `gbrain auth register-client` (v0.26.0) and `gbrain auth revoke-client ` (v0.26.2) for OAuth 2.1 client lifecycle. `revoke-client` runs an atomic `DELETE...RETURNING` on `oauth_clients`; FK `ON DELETE CASCADE` on `oauth_tokens.client_id` and `oauth_codes.client_id` purges every active token + authorization code in a single transaction. `process.exit(1)` on no-such-client (idempotent — re-running on the same id produces the same exit-1 message). Legacy tokens stored as SHA-256 hashes in `access_tokens`; OAuth clients in `oauth_clients`. As of v0.26.0, legacy tokens grandfather to `read+write+admin` scopes on the OAuth HTTP server, so pre-v0.26 deployments keep working with no migration. +- `src/commands/auth.ts` — Token management. `gbrain auth create/list/revoke/test` for legacy bearer tokens (v0.22.7 wired as a first-class CLI subcommand) plus `gbrain auth register-client` (v0.26.0) and `gbrain auth revoke-client ` (v0.26.2) for OAuth 2.1 client lifecycle. `revoke-client` runs an atomic `DELETE...RETURNING` on `oauth_clients`; FK `ON DELETE CASCADE` on `oauth_tokens.client_id` and `oauth_codes.client_id` purges every active token + authorization code in a single transaction. `process.exit(1)` on no-such-client (idempotent — re-running on the same id produces the same exit-1 message). Legacy tokens stored as SHA-256 hashes in `access_tokens`; OAuth clients in `oauth_clients`. As of v0.26.0, legacy tokens grandfather to `read+write+admin` scopes on the OAuth HTTP server, so pre-v0.26 deployments keep working with no migration. **v0.31.3 (#681):** every SQL site routes through `sqlQueryForEngine(engine)` from `src/core/sql-query.ts` (and `executeRawJsonb` for the takes-holders `permissions` JSONB column) so `gbrain auth` works against PGLite brains. Pre-fix, every call hit the postgres.js singleton via `getConn()` and silently failed (or wrote to the wrong DB) when the active engine was PGLite. The takes-holders write goes through `executeRawJsonb(engine, sql, [name, hash], [{takes_holders:[...]}])` which round-trips with `jsonb_typeof = 'object'` instead of the pre-v0.31.3 quoted-string shape. - `src/commands/upgrade.ts` — Self-update CLI. `runPostUpgrade()` enumerates migrations from the TS registry (src/commands/migrations/index.ts) and tail-calls `runApplyMigrations(['--yes', '--non-interactive'])` so the mechanical side of every outstanding migration runs unconditionally. - `src/commands/migrations/` — TS migration registry (compiled into the binary; no filesystem walk of `skills/migrations/*.md` needed at runtime). `index.ts` lists migrations in semver order. `v0_11_0.ts` = Minions adoption orchestrator (8 phases). `v0_12_0.ts` = Knowledge Graph auto-wire orchestrator (5 phases: schema → config check → backfill links → backfill timeline → verify). `phaseASchema` has a 600s timeout (bumped from 60s in v0.12.1 for duplicate-heavy brains). `v0_12_2.ts` = JSONB double-encode repair orchestrator (4 phases: schema → repair-jsonb → verify → record). `v0_14_0.ts` = shell-jobs + autopilot cooperative (2 phases: schema ALTER minion_jobs.max_stalled SET DEFAULT 3 — superseded by v0.14.3's schema-level DEFAULT 5 + UPDATE backfill; pending-host-work ping for skills/migrations/v0.14.0.md). All orchestrators are idempotent and resumable from `partial` status. As of v0.14.2 (Bug 3), the RUNNER owns all ledger writes — orchestrators return `OrchestratorResult` and `apply-migrations.ts` persists a canonical `{version, status, phases}` shape after return. Orchestrators no longer call `appendCompletedMigration` directly. `statusForVersion` prefers `complete` over `partial` (never regresses). 3 consecutive partials → wedged → `--force-retry ` writes a `'retry'` reset marker. v0.14.3 (fix wave) ships schema-only migrations v14 (`pages_updated_at_index`) + v15 (`minion_jobs_max_stalled_default_5` with UPDATE backfill) via the `MIGRATIONS` array in `src/core/migrate.ts` — no orchestrator phases needed. - `src/commands/repair-jsonb.ts` — `gbrain repair-jsonb [--dry-run] [--json]`: rewrites `jsonb_typeof='string'` rows in place across 5 affected columns (pages.frontmatter, raw_data.data, ingest_log.pages_updated, files.metadata, page_versions.frontmatter). Fixes v0.12.0 double-encode bug on Postgres; PGLite no-ops. Idempotent. @@ -153,7 +155,7 @@ strict behavior when unset. - `src/commands/transcripts.ts` (v0.29) — `gbrain transcripts recent [--days N] [--full] [--json]`: recent raw `.txt` transcripts from the dream-cycle corpus dirs. Imports `listRecentTranscripts` from `src/core/transcripts.ts` (the same library the gated `get_recent_transcripts` MCP op uses). Local-only by construction — the CLI always runs with `ctx.remote=false`. - `src/commands/integrity.ts` — `gbrain integrity check|auto|review|extract`: bare-tweet detection, dead-link detection, three-bucket repair (auto-repair / review-queue / skip). `scanIntegrity()` is the shared library function called from `gbrain doctor` (sampled at limit=500) and `cmdCheck` (full scan). v0.22.8: batch-load fast path on Postgres uses `SELECT DISTINCT ON (slug)` in a single SQL query to fix the PgBouncer round-trip timeout (60s → ~6s) while preserving `engine.getAllSlugs()`'s `Set` semantics on multi-source brains. Gated by `engine.kind === 'postgres'` at the call site so PGLite never enters batch; fallback `catch` logs at `GBRAIN_DEBUG=1` so real Postgres errors are diagnosable. - `src/commands/doctor.ts` — `gbrain doctor [--json] [--fast] [--fix] [--dry-run] [--index-audit]`: health checks. v0.12.3 added `jsonb_integrity` + `markdown_body_completeness` reliability checks. v0.14.1: `--fix` delegates inlined cross-cutting rules to `> **Convention:** see [path](path).` callouts (pipes DRY violations into `src/core/dry-fix.ts`); `--fix --dry-run` previews without writing. v0.14.2: `schema_version` check fails loudly when `version=0` (migrations never ran — the #218 `bun install -g` signature) and routes users to `gbrain apply-migrations --yes`; new opt-in `--index-audit` flag (Postgres-only) reports zero-scan indexes from `pg_stat_user_indexes` (informational only, no auto-drop). v0.15.2: every DB check is wrapped in a progress phase; `markdown_body_completeness` runs under a 1s heartbeat timer so 10+ min scans are observable on 50K-page brains. v0.19.1 added `queue_health` (Postgres-only) with two subchecks: stalled-forever active jobs (started_at > 1h) and waiting-depth-per-name > threshold (default 10, override via `GBRAIN_QUEUE_WAITING_THRESHOLD`). Worker-heartbeat subcheck intentionally deferred to follow-up B7 because it needs a `minion_workers` table to produce ground-truth signal. Fix hints point at `gbrain repair-jsonb`, `gbrain sync --force`, `gbrain apply-migrations`, and `gbrain jobs get/cancel `. v0.22.12 (#500): `sync_failures` check shows `[CODE=N, ...]` breakdown for both unacked entries (warn) and acked-historical entries (ok), surfacing systemic failure modes (`SLUG_MISMATCH=2685`) instead of a bare count. v0.26.7 (#612): `rls_event_trigger` check (post-install drift detector for migration v35's auto-RLS event trigger). Lives outside the `// 5. RLS` slice that the structural doctor.test.ts guards anchor on, so the existing test guards stay intact. Healthy `evtenabled` set is `('O','A')` only — `R` is replica-only and would not fire in normal sessions; `D` is disabled. Fix hint is `gbrain apply-migrations --force-retry 35`. **v0.30.2:** `queue_health` gains a fourth subcheck — surfaces dead-lettered subagent jobs with `last_error` matching the `prompt_too_long` classifier within the last 24h. Fix hint points at `gbrain dream --phase synthesize --dry-run --json` to identify the offending transcript and `gbrain jobs prune --status dead --queue default` to clean up. Postgres-only. -- `src/core/migrate.ts` — schema-migration runner. Owns the `MIGRATIONS` array (source of truth for schema DDL). **v40 (v0.29):** `pages_emotional_weight` adds `pages.emotional_weight REAL NOT NULL DEFAULT 0.0`. Column-only (no index). On Postgres 11+ and PGLite, `ADD COLUMN` with a constant DEFAULT is metadata-only — instant on tables of any size. v0.14.2 extended the `Migration` interface with `sqlFor?: { postgres?, pglite? }` (engine-specific SQL overrides `sql`) and `transaction?: boolean` (set to false for `CREATE INDEX CONCURRENTLY`, which Postgres refuses inside a transaction; ignored on PGLite since it has no concurrent writers). Migration v14 (fix wave) uses a handler branching on `engine.kind` to run CONCURRENTLY on Postgres (with a pre-drop of any invalid remnant via `pg_index.indisvalid`) and plain `CREATE INDEX` on PGLite. v15 bumps `minion_jobs.max_stalled` default 1→5 and backfills existing non-terminal rows. v0.22.6.1: migration v24 (`rls_backfill_missing_tables`) uses `sqlFor: { pglite: '' }` to no-op on PGLite — PGLite has no RLS engine and is single-tenant by definition, and the v24 ALTERs target subagent tables that don't exist in pglite-schema.ts. Closes #395 (contributed by @jdcastro2). **v30 (v0.23):** creates `dream_verdicts (file_path TEXT, content_hash TEXT, worth_processing BOOL, reasons JSONB, judged_at TIMESTAMPTZ, PK(file_path, content_hash))`. RLS-enabled when running as a BYPASSRLS role. The synthesize phase reads/writes this table to avoid re-judging on backfill re-runs. **v35 (v0.26.7):** auto-RLS event trigger + one-time backfill. `auto_rls_on_create_table` fires on `ddl_command_end` for `WHEN TAG IN ('CREATE TABLE','CREATE TABLE AS','SELECT INTO')` and runs `ALTER TABLE … ENABLE ROW LEVEL SECURITY` on every new `public.*` table — no FORCE (matches v24/v29/schema.sql posture so non-BYPASSRLS apps can still read their own tables). The same migration backfills RLS on every existing `public.*` base table whose comment doesn't match the doctor regex (`^GBRAIN:RLS_EXEMPT\s+reason=\S.{3,}`). Per-table failure aborts the offending CREATE TABLE (event triggers fire inside the DDL transaction); no EXCEPTION wrap — that would convert loud rollback into silent permissive default. PGLite no-op via `sqlFor.pglite: ''`. Breaking change: operators with intentionally-RLS-off public tables must add the GBRAIN:RLS_EXEMPT comment BEFORE upgrade or the backfill will flip them on. +- `src/core/migrate.ts` — schema-migration runner. Owns the `MIGRATIONS` array (source of truth for schema DDL). **v40 (v0.29):** `pages_emotional_weight` adds `pages.emotional_weight REAL NOT NULL DEFAULT 0.0`. Column-only (no index). On Postgres 11+ and PGLite, `ADD COLUMN` with a constant DEFAULT is metadata-only — instant on tables of any size. v0.14.2 extended the `Migration` interface with `sqlFor?: { postgres?, pglite? }` (engine-specific SQL overrides `sql`) and `transaction?: boolean` (set to false for `CREATE INDEX CONCURRENTLY`, which Postgres refuses inside a transaction; ignored on PGLite since it has no concurrent writers). Migration v14 (fix wave) uses a handler branching on `engine.kind` to run CONCURRENTLY on Postgres (with a pre-drop of any invalid remnant via `pg_index.indisvalid`) and plain `CREATE INDEX` on PGLite. v15 bumps `minion_jobs.max_stalled` default 1→5 and backfills existing non-terminal rows. v0.22.6.1: migration v24 (`rls_backfill_missing_tables`) uses `sqlFor: { pglite: '' }` to no-op on PGLite — PGLite has no RLS engine and is single-tenant by definition, and the v24 ALTERs target subagent tables that don't exist in pglite-schema.ts. Closes #395 (contributed by @jdcastro2). **v30 (v0.23):** creates `dream_verdicts (file_path TEXT, content_hash TEXT, worth_processing BOOL, reasons JSONB, judged_at TIMESTAMPTZ, PK(file_path, content_hash))`. RLS-enabled when running as a BYPASSRLS role. The synthesize phase reads/writes this table to avoid re-judging on backfill re-runs. **v35 (v0.26.7):** auto-RLS event trigger + one-time backfill. `auto_rls_on_create_table` fires on `ddl_command_end` for `WHEN TAG IN ('CREATE TABLE','CREATE TABLE AS','SELECT INTO')` and runs `ALTER TABLE … ENABLE ROW LEVEL SECURITY` on every new `public.*` table — no FORCE (matches v24/v29/schema.sql posture so non-BYPASSRLS apps can still read their own tables). The same migration backfills RLS on every existing `public.*` base table whose comment doesn't match the doctor regex (`^GBRAIN:RLS_EXEMPT\s+reason=\S.{3,}`). Per-table failure aborts the offending CREATE TABLE (event triggers fire inside the DDL transaction); no EXCEPTION wrap — that would convert loud rollback into silent permissive default. PGLite no-op via `sqlFor.pglite: ''`. Breaking change: operators with intentionally-RLS-off public tables must add the GBRAIN:RLS_EXEMPT comment BEFORE upgrade or the backfill will flip them on. **v46 (v0.31.3):** `mcp_request_log_params_jsonb_normalize` rewrites pre-v0.31.3 rows where `mcp_request_log.params` was stored as a JSON-encoded string (`jsonb_typeof = 'string'`) up to a real JSONB object via `UPDATE ... SET params = params::text::jsonb WHERE jsonb_typeof(params) = 'string'`. Single statement, idempotent — second-run finds no string-shaped rows and is a no-op. Closes the bug where `/admin/api/requests` returned a quoted string instead of the parsed object. - `src/core/progress.ts` — Shared bulk-action progress reporter. Writes to stderr. Modes: `auto` (TTY: `\r`-rewriting; non-TTY: plain lines), `human`, `json` (JSONL), `quiet`. Rate-gated by `minIntervalMs` and `minItems`. `startHeartbeat(reporter, note)` helper for single long queries. `child()` composes phase paths. Singleton SIGINT/SIGTERM coordinator emits `abort` events for every live phase. EPIPE defense on both sync throws and stream `'error'` events. Zero dependencies. Introduced in v0.15.2. - `src/core/cli-options.ts` — Global CLI flag parser. `parseGlobalFlags(argv)` returns `{cliOpts, rest}` with `--quiet` / `--progress-json` / `--progress-interval=` stripped. `getCliOptions()` / `setCliOptions()` expose a module-level singleton so commands reach the resolved flags without parameter threading. `cliOptsToProgressOptions()` maps to reporter options. `childGlobalFlags()` returns the flag suffix to append to `execSync('gbrain ...')` calls in migration orchestrators. `OperationContext.cliOpts` extends shared-op dispatch for MCP callers. - `src/core/db-lock.ts` (v0.22.13) — generic `tryAcquireDbLock(engine, lockId, ttlMinutes)` over the existing `gbrain_cycle_locks` table. Parameterized lock id so different scopes can nest cleanly: `gbrain-cycle` for the broad cycle (held by `cycle.ts`) and `gbrain-sync` (`SYNC_LOCK_ID` constant) for `performSync`'s narrower writer window. Same UPSERT-with-TTL semantics as the prior cycle-only helper, just generalized. Survives PgBouncer transaction pooling (unlike session-scoped `pg_try_advisory_lock`); crashed holders auto-release once their TTL expires. diff --git a/VERSION b/VERSION index d1ae0a019..8239f42dc 100644 --- a/VERSION +++ b/VERSION @@ -1 +1 @@ -0.31.2 \ No newline at end of file +0.31.3 diff --git a/llms-full.txt b/llms-full.txt index d03197865..bec39df6b 100644 --- a/llms-full.txt +++ b/llms-full.txt @@ -240,10 +240,12 @@ strict behavior when unset. - `src/mcp/server.ts` — MCP stdio server (generated from operations). v0.22.7: tool-call handler delegates to `dispatchToolCall` from `src/mcp/dispatch.ts` so stdio + HTTP transports share one validation, context-build, and error-format path. - `src/mcp/dispatch.ts` (v0.22.7) — Shared tool-call dispatch consumed by both stdio (`server.ts`) and HTTP transports. Exports `dispatchToolCall(engine, name, params, opts)`, `buildOperationContext(engine, params, opts)`, and `validateParams(op, params)`. Single source of truth for `(ctx, params)` handler arg order and the 5-field `OperationContext` shape (engine + config + logger + dryRun + remote). Defaults to `remote: true` (untrusted); local CLI callers pass `remote: false`. Closed F1/F2/F3 drift bugs in the original v0.22.5 HTTP transport. **v0.26.9 (F8):** adds `summarizeMcpParams(opName, params)` — privacy-preserving redactor for `mcp_request_log` and the admin SSE feed. Returns `{redacted, kind, declared_keys, unknown_key_count, approx_bytes}`. Intersects submitted top-level keys against the operation's declared `params` allow-list (declared keys preserved as a sorted array for debug visibility; unknown keys counted but never named, closing the attacker-controlled-key-name leak). Byte counts bucketed up to nearest 1KB so an attacker can't binary-search secret-content sizes via repeated probes. Operators on a personal laptop who want raw payload visibility opt back in with `gbrain serve --http --log-full-params` (loud stderr warning at startup). Canonical helper — new logging code paths route through it rather than `JSON.stringify(params)`. - `src/mcp/rate-limit.ts` (v0.22.7) — Bounded-LRU token-bucket limiter. `buildDefaultLimiters()` returns the two-bucket pipeline: pre-auth IP (30/60s, fires BEFORE the DB lookup so brute-force load against `access_tokens` is actually capped) + post-auth token-id (60/60s). Tracks `lastTouchedMs` separately from `lastRefillMs` so an exhausted key can't be reset by hammering past the TTL. LRU cap bounds memory under attacker-controlled key growth. -- `src/commands/serve-http.ts` (v0.26.0) — Express 5 HTTP MCP server with OAuth 2.1, admin dashboard, and SSE live activity feed. Started via `gbrain serve --http [--port N] [--token-ttl N] [--enable-dcr] [--public-url URL] [--log-full-params]`. Supersedes the v0.22.7 `src/mcp/http-transport.ts` simple bearer-auth path. Combines MCP SDK's `mcpAuthRouter` (authorize / token / register / revoke endpoints), a custom `client_credentials` handler (SDK's token endpoint throws `UnsupportedGrantTypeError` for CC; the custom handler runs BEFORE the router and falls through for `auth_code` / `refresh_token`), `requireBearerAuth` middleware for `/mcp` with scope enforcement before op dispatch, `localOnly` rejection, and `express-rate-limit` at 50 req / 15 min on `/token`. Serves the built admin SPA from `admin/dist/` with SPA fallback. `/admin/events` SSE endpoint broadcasts every MCP request to connected admin browsers. `cookie-parser` middleware wired (Express 5 has no built-in). Startup logging prints port, engine, configured issuer URL (honors `--public-url`), registered-client count, DCR status, and admin bootstrap token. **v0.26.9 hardening pass:** F7 sets `remote: true` explicitly on the `/mcp` request handler's OperationContext literal (closes the HTTP shell-job RCE — without this, `submit_job`'s protected-name guard at `operations.ts:1391` saw a falsy undefined and skipped, letting a `read+write`-scoped OAuth token submit `shell` jobs). F8 wires `summarizeMcpParams` from `src/mcp/dispatch.ts` into both `mcp_request_log` writes and the admin SSE feed by default (raw payloads opt-in via `--log-full-params` with stderr warning). F9 sets cookie `Secure` flag when behind HTTPS or a public-URL proxy. F10 caps the magic-link nonce store with an LRU bound. F12 routes DCR disable through the `GBrainOAuthProvider` constructor's `dcrDisabled` option instead of the prior monkey-patch on the express router. F14 wraps `transport.handleRequest` in try/catch so SDK throws return a JSON-RPC 500 envelope instead of express's default HTML error page. F15 unifies OperationError + unexpected exceptions through `buildError` / `serializeError` so `/mcp` always returns the same envelope shape. **v0.28.1:** `/health` endpoint extracted into pure `probeHealth(engine)` async function with `HEALTH_TIMEOUT_MS = 3000` exported constant — drops the timeout from 5s to 3s so Fly.io's 5s health-check deadline gets 2s of headroom for TCP, response framing, and clock skew. Races `engine.getStats()` against the timeout via `Promise.race`; saturated pool returns 503 with `Health check timed out (database pool may be saturated)` instead of hanging. `clearTimeout` in finally block prevents pending-timer pile-up under high probe rates (race-leak fix from adversarial review). **v0.28.10:** `/health` is now liveness-only via the new `probeLiveness(sql, engineName, version, timeoutMs)` helper that races `sql\`SELECT 1\`` against `HEALTH_TIMEOUT_MS` and returns the same `ProbeHealthResult` tagged-union as `probeHealth` (single timer-cleanup site, single 503 envelope). Body shape: `{status, version, engine}` only — engine stats are no longer spread on the public route. Full stats moved to a new admin endpoint `/admin/api/full-stats` (sibling to `/admin/api/stats` and `/admin/api/health-indicators`) gated by the existing `requireAdmin` middleware; that route calls `probeHealth(engine, ...)` and returns the original spread-stats body. `?full=true` query param removed entirely. Closes the original DoS surface where `getStats()`'s 6× count(*) on 96K-page brains through PgBouncer exceeded `HEALTH_TIMEOUT_MS` and triggered orchestrator restart cascades (Fly.io / k8s seeing 503 → restart loop → advisory-lock pile-up on the migration lock). Outside-voice review (Codex) caught that `/admin/api/health-indicators` is NOT a full-stats endpoint (returns only `{expiring_soon, error_rate}`), and that an alternative loopback-IP gate would have depended on `app.set('trust proxy', 'loopback')` semantics holding under proxy/XFF misconfiguration; the shipped admin-cookie design avoids both. +- `src/commands/serve-http.ts` (v0.26.0) — Express 5 HTTP MCP server with OAuth 2.1, admin dashboard, and SSE live activity feed. Started via `gbrain serve --http [--port N] [--token-ttl N] [--enable-dcr] [--public-url URL] [--log-full-params]`. Supersedes the v0.22.7 `src/mcp/http-transport.ts` simple bearer-auth path. Combines MCP SDK's `mcpAuthRouter` (authorize / token / register / revoke endpoints), a custom `client_credentials` handler (SDK's token endpoint throws `UnsupportedGrantTypeError` for CC; the custom handler runs BEFORE the router and falls through for `auth_code` / `refresh_token`), `requireBearerAuth` middleware for `/mcp` with scope enforcement before op dispatch, `localOnly` rejection, and `express-rate-limit` at 50 req / 15 min on `/token`. Serves the built admin SPA from `admin/dist/` with SPA fallback. `/admin/events` SSE endpoint broadcasts every MCP request to connected admin browsers. `cookie-parser` middleware wired (Express 5 has no built-in). Startup logging prints port, engine, configured issuer URL (honors `--public-url`), registered-client count, DCR status, and admin bootstrap token. **v0.26.9 hardening pass:** F7 sets `remote: true` explicitly on the `/mcp` request handler's OperationContext literal (closes the HTTP shell-job RCE — without this, `submit_job`'s protected-name guard at `operations.ts:1391` saw a falsy undefined and skipped, letting a `read+write`-scoped OAuth token submit `shell` jobs). F8 wires `summarizeMcpParams` from `src/mcp/dispatch.ts` into both `mcp_request_log` writes and the admin SSE feed by default (raw payloads opt-in via `--log-full-params` with stderr warning). F9 sets cookie `Secure` flag when behind HTTPS or a public-URL proxy. F10 caps the magic-link nonce store with an LRU bound. F12 routes DCR disable through the `GBrainOAuthProvider` constructor's `dcrDisabled` option instead of the prior monkey-patch on the express router. F14 wraps `transport.handleRequest` in try/catch so SDK throws return a JSON-RPC 500 envelope instead of express's default HTML error page. F15 unifies OperationError + unexpected exceptions through `buildError` / `serializeError` so `/mcp` always returns the same envelope shape. **v0.28.1:** `/health` endpoint extracted into pure `probeHealth(engine)` async function with `HEALTH_TIMEOUT_MS = 3000` exported constant — drops the timeout from 5s to 3s so Fly.io's 5s health-check deadline gets 2s of headroom for TCP, response framing, and clock skew. Races `engine.getStats()` against the timeout via `Promise.race`; saturated pool returns 503 with `Health check timed out (database pool may be saturated)` instead of hanging. `clearTimeout` in finally block prevents pending-timer pile-up under high probe rates (race-leak fix from adversarial review). **v0.28.10:** `/health` is now liveness-only via the new `probeLiveness(sql, engineName, version, timeoutMs)` helper that races `sql\`SELECT 1\`` against `HEALTH_TIMEOUT_MS` and returns the same `ProbeHealthResult` tagged-union as `probeHealth` (single timer-cleanup site, single 503 envelope). Body shape: `{status, version, engine}` only — engine stats are no longer spread on the public route. Full stats moved to a new admin endpoint `/admin/api/full-stats` (sibling to `/admin/api/stats` and `/admin/api/health-indicators`) gated by the existing `requireAdmin` middleware; that route calls `probeHealth(engine, ...)` and returns the original spread-stats body. `?full=true` query param removed entirely. Closes the original DoS surface where `getStats()`'s 6× count(*) on 96K-page brains through PgBouncer exceeded `HEALTH_TIMEOUT_MS` and triggered orchestrator restart cascades (Fly.io / k8s seeing 503 → restart loop → advisory-lock pile-up on the migration lock). Outside-voice review (Codex) caught that `/admin/api/health-indicators` is NOT a full-stats endpoint (returns only `{expiring_soon, error_rate}`), and that an alternative loopback-IP gate would have depended on `app.set('trust proxy', 'loopback')` semantics holding under proxy/XFF misconfiguration; the shipped admin-cookie design avoids both. **v0.31.3 (#681):** every OAuth/admin/audit SQL call routes through `sqlQueryForEngine(engine)` from `src/core/sql-query.ts` so `gbrain serve --http` works against PGLite brains. The four `mcp_request_log.params` INSERT sites (success path, auth_failed path, scope_denied path, server-error path) all go through `executeRawJsonb(engine, ...)` so the JSONB column stores real objects, not JSON-encoded strings — closes the bug where `params->>'op'` returned the encoded string `"search"` (with quotes) instead of `search`. Migration v46 normalizes any pre-v0.31.3 string-shaped backlog rows on first start. +- `src/core/sql-query.ts` (v0.31.3) — Engine-aware tagged-template SQL adapter for OAuth/admin/auth infrastructure. `sqlQueryForEngine(engine)` returns a `SqlQuery` (`(strings, ...values) => Promise`) that walks the template, builds `$N` positional SQL, asserts every value is a `SqlValue` (string | number | bigint | boolean | Date | null), and routes through `engine.executeRaw(sql, params)` so Postgres goes via postgres.js's `unsafe(sql, params)` path and PGLite via its embedded `db.query(sql, params)`. Deliberately narrower than postgres.js's `sql` tag: no nested fragments, no `sql.json()`, no `sql.unsafe()`, no `sql.begin()`, no array binding. The narrow surface is the feature — codex finding #7 from the v0.31 plan review argued the adapter should stay scalar-only or it drifts into a partial postgres.js clone. JSONB writes go through the separate `executeRawJsonb(engine, sql, scalarParams, jsonbParams)` helper that composes positional `$N::jsonb` casts and passes JS objects through; the v0.12.0 double-encode bug class doesn't apply because positional binding through `unsafe()` reaches the wire protocol with the correct type oid (verified by `test/sql-query.test.ts` on PGLite and `test/e2e/auth-permissions.test.ts:67` on Postgres). `scripts/check-jsonb-pattern.sh` doesn't fire because `executeRawJsonb(...)` is a method call, not the banned literal-template-tag interpolation pattern. Consumed by `src/commands/auth.ts`, `src/commands/serve-http.ts`, `src/core/oauth-provider.ts`, `src/commands/files.ts`, and `src/mcp/http-transport.ts` so all five sites work against PGLite and Postgres uniformly. Closes the bug where `gbrain auth` + `gbrain serve --http` were silently Postgres-only because they routed every SQL through the postgres.js singleton (community PR #681). +- `src/commands/serve.ts` (v0.31.3) — `gbrain serve` stdio MCP entrypoint with idempotent shutdown across every parent-disconnect signal. Stdio EOF, SIGTERM, SIGINT, SIGHUP, and parent-process death (every reparent case — PID 1, launchd subreaper, systemd, tmux, or a parent shell with `PR_SET_CHILD_SUBREAPER`) all funnel into one `cleanup(reason)` path that releases the engine and the PGLite write-lock dir within 5 seconds. Pre-v0.31.3 the stdio MCP server held the lock indefinitely after Claude Desktop / Cursor / launchd-managed gateways disconnected, forcing a 5-minute stale-lock wait on the next start. Watchdog reparent check is `getParentPid() !== initialParentPid` (capturing the initial ppid once at install time and firing on any change); the previous `=== 1` check missed the subreaper case under launchd / systemd. Bun's `process.ppid` cache is stale across reparenting (see [oven-sh/bun#30305](https://github.com/oven-sh/bun/issues/30305)) so `getParentPid()` runs `spawnSync('ps', ['-o', 'ppid=', '-p', PID])` per tick to read the live kernel PPID. Startup probe verifies `ps` is on PATH; if not (stripped containers, busybox without procps), the watchdog skips installing AND emits a loud `[gbrain serve] watchdog disabled: ps unavailable, parent-death detection unavailable — child will rely on stdin EOF / signals only` stderr line so operators see the degraded mode at boot. Pinned by `test/serve-stdio-lifecycle.test.ts` (22 cases). Closes #413, #446. Credit @Aragorn2046 (origin features in #591) and @seungsu-kr (rebased submitter, Bun ppid workaround). - `src/core/oauth-provider.ts` (v0.26.0) — `GBrainOAuthProvider` implementing the MCP SDK's `OAuthServerProvider` + `OAuthRegisteredClientsStore` interfaces. Backed by raw SQL (works on both PGLite and Postgres — OAuth is infrastructure, not a BrainEngine concern). Full OAuth 2.1 spec: `authorize` + `exchangeAuthorizationCode` with PKCE (for ChatGPT), `client_credentials` (for Perplexity / Claude), `refresh_token` with rotation, `revokeToken`, `registerClient` (DCR path validates redirect_uri must be `https://` or loopback per RFC 6749 §3.1.2.1). All tokens + client secrets SHA-256 hashed before storage. Auth codes single-use with 10-minute TTL via atomic `DELETE...RETURNING` (closes RFC 6749 §10.5 TOCTOU race). Refresh rotation also `DELETE...RETURNING` (closes §10.4 stolen-token detection bypass). `pgArray()` escapes commas/quotes/braces in elements so a comma-bearing redirect_uri can't smuggle a second array element. Legacy `access_tokens` fallback in `verifyAccessToken` grandfathers pre-v0.26 bearer tokens as `read+write+admin`. `sweepExpiredTokens()` runs on startup wrapped in try/catch. **v0.26.9 RFC 6749/7009 hardening pass:** F1+F2 fold `client_id` atomically into the `DELETE WHERE` clauses for both auth-code exchange and refresh rotation — pre-fix the post-hoc client compare burned the row on wrong-client paths so the legitimate client couldn't retry. F3 enforces refresh-scope-subset against the original grant on the row (RFC 6749 §6), not the client's currently-allowed scopes — fixes the case where revoking a scope from a client wouldn't shrink the agent's existing refresh tokens. F4 binds `client_id` on `revokeToken` so a client can only revoke its own tokens (RFC 7009 §2.1). F7c validates the `/token` request's `redirect_uri` against the value stored at `/authorize` (RFC 6749 §4.1.3) — empty-string treated as missing rather than wildcard match (adversarial-review fix). F5 swaps bare `catch {}` blocks in `verifyAccessToken` and `getClient` for `isUndefinedColumnError` from `src/core/utils.ts` — only SQLSTATE 42703 falls through to legacy fallback; lock timeouts and network blips throw and surface. F6 makes `sweepExpiredTokens()` actually return the count via `RETURNING 1` + array length, not a fire-and-forget zero. F12 adds `dcrDisabled` constructor option so `serve-http.ts` can disable the `/register` endpoint without monkey-patching the router. **v0.26.2:** module-private `coerceTimestamp()` boundary helper at the top of the file normalizes postgres-driver-as-string BIGINT columns to JS numbers at every read site (5 call sites: `getClient` L112+L113 for DCR `/register` RFC 7591 §3.2.1 numeric timestamps, `exchangeRefreshToken` L274 + `verifyAccessToken` L296+L303 for the SDK's `typeof === 'number'` bearerAuth check). Throws on non-finite input (NaN/Infinity) so corrupt rows fail loud at the boundary instead of riding through as `expiresAt: NaN`; returns undefined for SQL NULL so callers decide NULL semantics explicitly (refresh + access token paths treat NULL as expired). Helper intentionally NOT promoted to `src/core/utils.ts` — codex review flagged repo-wide BIGINT precision-loss risk for a generic helper. - `admin/` (v0.26.0) — React 19 + Vite + TypeScript admin SPA embedded in the binary via `admin/dist/` served by `serve-http.ts`. 7 screens: Login (bootstrap token → session cookie), Dashboard (metrics + SSE feed + token health), Agents (sortable table + sparklines + Register button), Register (modal with scope checkboxes + grant type selector), Credentials reveal (full-screen modal with Copy + Download JSON + yellow one-time-only warning), Request Log (filterable paginated), Agent Detail drawer (Details / Activity / Config Export tabs + Revoke). Design tokens: `#0a0a0f` bg, Inter for UI, JetBrains Mono for data, 4-32px spacing scale, rounded pill badges. HTTP-only SameSite=Strict cookie auth. 65KB gzip. Build: `cd admin && bun install && bun run build`; output at `admin/dist/` is committed for self-contained binaries. -- `src/commands/auth.ts` — Token management. `gbrain auth create/list/revoke/test` for legacy bearer tokens (v0.22.7 wired as a first-class CLI subcommand) plus `gbrain auth register-client` (v0.26.0) and `gbrain auth revoke-client ` (v0.26.2) for OAuth 2.1 client lifecycle. `revoke-client` runs an atomic `DELETE...RETURNING` on `oauth_clients`; FK `ON DELETE CASCADE` on `oauth_tokens.client_id` and `oauth_codes.client_id` purges every active token + authorization code in a single transaction. `process.exit(1)` on no-such-client (idempotent — re-running on the same id produces the same exit-1 message). Legacy tokens stored as SHA-256 hashes in `access_tokens`; OAuth clients in `oauth_clients`. As of v0.26.0, legacy tokens grandfather to `read+write+admin` scopes on the OAuth HTTP server, so pre-v0.26 deployments keep working with no migration. +- `src/commands/auth.ts` — Token management. `gbrain auth create/list/revoke/test` for legacy bearer tokens (v0.22.7 wired as a first-class CLI subcommand) plus `gbrain auth register-client` (v0.26.0) and `gbrain auth revoke-client ` (v0.26.2) for OAuth 2.1 client lifecycle. `revoke-client` runs an atomic `DELETE...RETURNING` on `oauth_clients`; FK `ON DELETE CASCADE` on `oauth_tokens.client_id` and `oauth_codes.client_id` purges every active token + authorization code in a single transaction. `process.exit(1)` on no-such-client (idempotent — re-running on the same id produces the same exit-1 message). Legacy tokens stored as SHA-256 hashes in `access_tokens`; OAuth clients in `oauth_clients`. As of v0.26.0, legacy tokens grandfather to `read+write+admin` scopes on the OAuth HTTP server, so pre-v0.26 deployments keep working with no migration. **v0.31.3 (#681):** every SQL site routes through `sqlQueryForEngine(engine)` from `src/core/sql-query.ts` (and `executeRawJsonb` for the takes-holders `permissions` JSONB column) so `gbrain auth` works against PGLite brains. Pre-fix, every call hit the postgres.js singleton via `getConn()` and silently failed (or wrote to the wrong DB) when the active engine was PGLite. The takes-holders write goes through `executeRawJsonb(engine, sql, [name, hash], [{takes_holders:[...]}])` which round-trips with `jsonb_typeof = 'object'` instead of the pre-v0.31.3 quoted-string shape. - `src/commands/upgrade.ts` — Self-update CLI. `runPostUpgrade()` enumerates migrations from the TS registry (src/commands/migrations/index.ts) and tail-calls `runApplyMigrations(['--yes', '--non-interactive'])` so the mechanical side of every outstanding migration runs unconditionally. - `src/commands/migrations/` — TS migration registry (compiled into the binary; no filesystem walk of `skills/migrations/*.md` needed at runtime). `index.ts` lists migrations in semver order. `v0_11_0.ts` = Minions adoption orchestrator (8 phases). `v0_12_0.ts` = Knowledge Graph auto-wire orchestrator (5 phases: schema → config check → backfill links → backfill timeline → verify). `phaseASchema` has a 600s timeout (bumped from 60s in v0.12.1 for duplicate-heavy brains). `v0_12_2.ts` = JSONB double-encode repair orchestrator (4 phases: schema → repair-jsonb → verify → record). `v0_14_0.ts` = shell-jobs + autopilot cooperative (2 phases: schema ALTER minion_jobs.max_stalled SET DEFAULT 3 — superseded by v0.14.3's schema-level DEFAULT 5 + UPDATE backfill; pending-host-work ping for skills/migrations/v0.14.0.md). All orchestrators are idempotent and resumable from `partial` status. As of v0.14.2 (Bug 3), the RUNNER owns all ledger writes — orchestrators return `OrchestratorResult` and `apply-migrations.ts` persists a canonical `{version, status, phases}` shape after return. Orchestrators no longer call `appendCompletedMigration` directly. `statusForVersion` prefers `complete` over `partial` (never regresses). 3 consecutive partials → wedged → `--force-retry ` writes a `'retry'` reset marker. v0.14.3 (fix wave) ships schema-only migrations v14 (`pages_updated_at_index`) + v15 (`minion_jobs_max_stalled_default_5` with UPDATE backfill) via the `MIGRATIONS` array in `src/core/migrate.ts` — no orchestrator phases needed. - `src/commands/repair-jsonb.ts` — `gbrain repair-jsonb [--dry-run] [--json]`: rewrites `jsonb_typeof='string'` rows in place across 5 affected columns (pages.frontmatter, raw_data.data, ingest_log.pages_updated, files.metadata, page_versions.frontmatter). Fixes v0.12.0 double-encode bug on Postgres; PGLite no-ops. Idempotent. @@ -253,7 +255,7 @@ strict behavior when unset. - `src/commands/transcripts.ts` (v0.29) — `gbrain transcripts recent [--days N] [--full] [--json]`: recent raw `.txt` transcripts from the dream-cycle corpus dirs. Imports `listRecentTranscripts` from `src/core/transcripts.ts` (the same library the gated `get_recent_transcripts` MCP op uses). Local-only by construction — the CLI always runs with `ctx.remote=false`. - `src/commands/integrity.ts` — `gbrain integrity check|auto|review|extract`: bare-tweet detection, dead-link detection, three-bucket repair (auto-repair / review-queue / skip). `scanIntegrity()` is the shared library function called from `gbrain doctor` (sampled at limit=500) and `cmdCheck` (full scan). v0.22.8: batch-load fast path on Postgres uses `SELECT DISTINCT ON (slug)` in a single SQL query to fix the PgBouncer round-trip timeout (60s → ~6s) while preserving `engine.getAllSlugs()`'s `Set` semantics on multi-source brains. Gated by `engine.kind === 'postgres'` at the call site so PGLite never enters batch; fallback `catch` logs at `GBRAIN_DEBUG=1` so real Postgres errors are diagnosable. - `src/commands/doctor.ts` — `gbrain doctor [--json] [--fast] [--fix] [--dry-run] [--index-audit]`: health checks. v0.12.3 added `jsonb_integrity` + `markdown_body_completeness` reliability checks. v0.14.1: `--fix` delegates inlined cross-cutting rules to `> **Convention:** see [path](path).` callouts (pipes DRY violations into `src/core/dry-fix.ts`); `--fix --dry-run` previews without writing. v0.14.2: `schema_version` check fails loudly when `version=0` (migrations never ran — the #218 `bun install -g` signature) and routes users to `gbrain apply-migrations --yes`; new opt-in `--index-audit` flag (Postgres-only) reports zero-scan indexes from `pg_stat_user_indexes` (informational only, no auto-drop). v0.15.2: every DB check is wrapped in a progress phase; `markdown_body_completeness` runs under a 1s heartbeat timer so 10+ min scans are observable on 50K-page brains. v0.19.1 added `queue_health` (Postgres-only) with two subchecks: stalled-forever active jobs (started_at > 1h) and waiting-depth-per-name > threshold (default 10, override via `GBRAIN_QUEUE_WAITING_THRESHOLD`). Worker-heartbeat subcheck intentionally deferred to follow-up B7 because it needs a `minion_workers` table to produce ground-truth signal. Fix hints point at `gbrain repair-jsonb`, `gbrain sync --force`, `gbrain apply-migrations`, and `gbrain jobs get/cancel `. v0.22.12 (#500): `sync_failures` check shows `[CODE=N, ...]` breakdown for both unacked entries (warn) and acked-historical entries (ok), surfacing systemic failure modes (`SLUG_MISMATCH=2685`) instead of a bare count. v0.26.7 (#612): `rls_event_trigger` check (post-install drift detector for migration v35's auto-RLS event trigger). Lives outside the `// 5. RLS` slice that the structural doctor.test.ts guards anchor on, so the existing test guards stay intact. Healthy `evtenabled` set is `('O','A')` only — `R` is replica-only and would not fire in normal sessions; `D` is disabled. Fix hint is `gbrain apply-migrations --force-retry 35`. **v0.30.2:** `queue_health` gains a fourth subcheck — surfaces dead-lettered subagent jobs with `last_error` matching the `prompt_too_long` classifier within the last 24h. Fix hint points at `gbrain dream --phase synthesize --dry-run --json` to identify the offending transcript and `gbrain jobs prune --status dead --queue default` to clean up. Postgres-only. -- `src/core/migrate.ts` — schema-migration runner. Owns the `MIGRATIONS` array (source of truth for schema DDL). **v40 (v0.29):** `pages_emotional_weight` adds `pages.emotional_weight REAL NOT NULL DEFAULT 0.0`. Column-only (no index). On Postgres 11+ and PGLite, `ADD COLUMN` with a constant DEFAULT is metadata-only — instant on tables of any size. v0.14.2 extended the `Migration` interface with `sqlFor?: { postgres?, pglite? }` (engine-specific SQL overrides `sql`) and `transaction?: boolean` (set to false for `CREATE INDEX CONCURRENTLY`, which Postgres refuses inside a transaction; ignored on PGLite since it has no concurrent writers). Migration v14 (fix wave) uses a handler branching on `engine.kind` to run CONCURRENTLY on Postgres (with a pre-drop of any invalid remnant via `pg_index.indisvalid`) and plain `CREATE INDEX` on PGLite. v15 bumps `minion_jobs.max_stalled` default 1→5 and backfills existing non-terminal rows. v0.22.6.1: migration v24 (`rls_backfill_missing_tables`) uses `sqlFor: { pglite: '' }` to no-op on PGLite — PGLite has no RLS engine and is single-tenant by definition, and the v24 ALTERs target subagent tables that don't exist in pglite-schema.ts. Closes #395 (contributed by @jdcastro2). **v30 (v0.23):** creates `dream_verdicts (file_path TEXT, content_hash TEXT, worth_processing BOOL, reasons JSONB, judged_at TIMESTAMPTZ, PK(file_path, content_hash))`. RLS-enabled when running as a BYPASSRLS role. The synthesize phase reads/writes this table to avoid re-judging on backfill re-runs. **v35 (v0.26.7):** auto-RLS event trigger + one-time backfill. `auto_rls_on_create_table` fires on `ddl_command_end` for `WHEN TAG IN ('CREATE TABLE','CREATE TABLE AS','SELECT INTO')` and runs `ALTER TABLE … ENABLE ROW LEVEL SECURITY` on every new `public.*` table — no FORCE (matches v24/v29/schema.sql posture so non-BYPASSRLS apps can still read their own tables). The same migration backfills RLS on every existing `public.*` base table whose comment doesn't match the doctor regex (`^GBRAIN:RLS_EXEMPT\s+reason=\S.{3,}`). Per-table failure aborts the offending CREATE TABLE (event triggers fire inside the DDL transaction); no EXCEPTION wrap — that would convert loud rollback into silent permissive default. PGLite no-op via `sqlFor.pglite: ''`. Breaking change: operators with intentionally-RLS-off public tables must add the GBRAIN:RLS_EXEMPT comment BEFORE upgrade or the backfill will flip them on. +- `src/core/migrate.ts` — schema-migration runner. Owns the `MIGRATIONS` array (source of truth for schema DDL). **v40 (v0.29):** `pages_emotional_weight` adds `pages.emotional_weight REAL NOT NULL DEFAULT 0.0`. Column-only (no index). On Postgres 11+ and PGLite, `ADD COLUMN` with a constant DEFAULT is metadata-only — instant on tables of any size. v0.14.2 extended the `Migration` interface with `sqlFor?: { postgres?, pglite? }` (engine-specific SQL overrides `sql`) and `transaction?: boolean` (set to false for `CREATE INDEX CONCURRENTLY`, which Postgres refuses inside a transaction; ignored on PGLite since it has no concurrent writers). Migration v14 (fix wave) uses a handler branching on `engine.kind` to run CONCURRENTLY on Postgres (with a pre-drop of any invalid remnant via `pg_index.indisvalid`) and plain `CREATE INDEX` on PGLite. v15 bumps `minion_jobs.max_stalled` default 1→5 and backfills existing non-terminal rows. v0.22.6.1: migration v24 (`rls_backfill_missing_tables`) uses `sqlFor: { pglite: '' }` to no-op on PGLite — PGLite has no RLS engine and is single-tenant by definition, and the v24 ALTERs target subagent tables that don't exist in pglite-schema.ts. Closes #395 (contributed by @jdcastro2). **v30 (v0.23):** creates `dream_verdicts (file_path TEXT, content_hash TEXT, worth_processing BOOL, reasons JSONB, judged_at TIMESTAMPTZ, PK(file_path, content_hash))`. RLS-enabled when running as a BYPASSRLS role. The synthesize phase reads/writes this table to avoid re-judging on backfill re-runs. **v35 (v0.26.7):** auto-RLS event trigger + one-time backfill. `auto_rls_on_create_table` fires on `ddl_command_end` for `WHEN TAG IN ('CREATE TABLE','CREATE TABLE AS','SELECT INTO')` and runs `ALTER TABLE … ENABLE ROW LEVEL SECURITY` on every new `public.*` table — no FORCE (matches v24/v29/schema.sql posture so non-BYPASSRLS apps can still read their own tables). The same migration backfills RLS on every existing `public.*` base table whose comment doesn't match the doctor regex (`^GBRAIN:RLS_EXEMPT\s+reason=\S.{3,}`). Per-table failure aborts the offending CREATE TABLE (event triggers fire inside the DDL transaction); no EXCEPTION wrap — that would convert loud rollback into silent permissive default. PGLite no-op via `sqlFor.pglite: ''`. Breaking change: operators with intentionally-RLS-off public tables must add the GBRAIN:RLS_EXEMPT comment BEFORE upgrade or the backfill will flip them on. **v46 (v0.31.3):** `mcp_request_log_params_jsonb_normalize` rewrites pre-v0.31.3 rows where `mcp_request_log.params` was stored as a JSON-encoded string (`jsonb_typeof = 'string'`) up to a real JSONB object via `UPDATE ... SET params = params::text::jsonb WHERE jsonb_typeof(params) = 'string'`. Single statement, idempotent — second-run finds no string-shaped rows and is a no-op. Closes the bug where `/admin/api/requests` returned a quoted string instead of the parsed object. - `src/core/progress.ts` — Shared bulk-action progress reporter. Writes to stderr. Modes: `auto` (TTY: `\r`-rewriting; non-TTY: plain lines), `human`, `json` (JSONL), `quiet`. Rate-gated by `minIntervalMs` and `minItems`. `startHeartbeat(reporter, note)` helper for single long queries. `child()` composes phase paths. Singleton SIGINT/SIGTERM coordinator emits `abort` events for every live phase. EPIPE defense on both sync throws and stream `'error'` events. Zero dependencies. Introduced in v0.15.2. - `src/core/cli-options.ts` — Global CLI flag parser. `parseGlobalFlags(argv)` returns `{cliOpts, rest}` with `--quiet` / `--progress-json` / `--progress-interval=` stripped. `getCliOptions()` / `setCliOptions()` expose a module-level singleton so commands reach the resolved flags without parameter threading. `cliOptsToProgressOptions()` maps to reporter options. `childGlobalFlags()` returns the flag suffix to append to `execSync('gbrain ...')` calls in migration orchestrators. `OperationContext.cliOpts` extends shared-op dispatch for MCP callers. - `src/core/db-lock.ts` (v0.22.13) — generic `tryAcquireDbLock(engine, lockId, ttlMinutes)` over the existing `gbrain_cycle_locks` table. Parameterized lock id so different scopes can nest cleanly: `gbrain-cycle` for the broad cycle (held by `cycle.ts`) and `gbrain-sync` (`SYNC_LOCK_ID` constant) for `performSync`'s narrower writer window. Same UPSERT-with-TTL semantics as the prior cycle-only helper, just generalized. Survives PgBouncer transaction pooling (unlike session-scoped `pg_try_advisory_lock`); crashed holders auto-release once their TTL expires. diff --git a/package.json b/package.json index 41dfcc320..3af27aaf6 100644 --- a/package.json +++ b/package.json @@ -1,6 +1,6 @@ { "name": "gbrain", - "version": "0.31.2", + "version": "0.31.3", "description": "Postgres-native personal knowledge brain with hybrid RAG search", "type": "module", "main": "src/core/index.ts", diff --git a/src/commands/auth.ts b/src/commands/auth.ts index be1bed2b4..4d17409a8 100644 --- a/src/commands/auth.ts +++ b/src/commands/auth.ts @@ -11,20 +11,19 @@ * Also runs standalone (no compiled binary required): * DATABASE_URL=... bun run src/commands/auth.ts create "claude-desktop" * - * Both paths require DATABASE_URL or GBRAIN_DATABASE_URL (except `test`, - * which only hits the remote URL and doesn't need a local DB). + * DB-backed commands route through the active BrainEngine (PGLite or + * Postgres), so they work regardless of which engine the user's brain is + * configured for. The env-var DATABASE_URL / GBRAIN_DATABASE_URL still + * picks Postgres via loadConfig() (config.ts DbUrlSource inference), + * but the SQL itself goes through engine.executeRaw — never through a + * postgres.js singleton. `test` only hits a remote URL and doesn't need + * a local DB. */ -import postgres from 'postgres'; import { createHash, randomBytes } from 'crypto'; - -function getDatabaseUrl(requireDb: boolean): string | undefined { - const url = process.env.DATABASE_URL || process.env.GBRAIN_DATABASE_URL; - if (!url && requireDb) { - console.error('Set DATABASE_URL or GBRAIN_DATABASE_URL environment variable.'); - process.exit(1); - } - return url; -} +import { loadConfig, toEngineConfig } from '../core/config.ts'; +import { createEngine } from '../core/engine-factory.ts'; +import type { BrainEngine } from '../core/engine.ts'; +import { sqlQueryForEngine, executeRawJsonb, type SqlQuery } from '../core/sql-query.ts'; function hashToken(token: string): string { return createHash('sha256').update(token).digest('hex'); @@ -34,28 +33,60 @@ function generateToken(): string { return 'gbrain_' + randomBytes(32).toString('hex'); } +/** + * Acquire an engine from the active config, run `fn` with a SqlQuery, and + * disconnect afterward. Loud-fails when no config is present (matches the + * prior behavior of getDatabaseUrl(requireDb=true) — auth commands need a + * brain to write to). + */ +async function withConfiguredSql( + fn: (sql: SqlQuery, engine: BrainEngine) => Promise, +): Promise { + const config = loadConfig(); + if (!config) { + console.error('No GBrain config found. Run `gbrain init` first, or set DATABASE_URL / GBRAIN_DATABASE_URL.'); + process.exit(1); + } + const engine = await createEngine(toEngineConfig(config)); + const sql = sqlQueryForEngine(engine); + try { + return await fn(sql, engine); + } finally { + await engine.disconnect(); + } +} + async function create(name: string, opts: { takesHolders?: string[] } = {}) { if (!name) { console.error('Usage: auth create [--takes-holders world,garry]'); process.exit(1); } - const sql = postgres(getDatabaseUrl(true)!); const token = generateToken(); const hash = hashToken(token); try { - // v0.28: persist per-token takes-holder allow-list. Default ['world'] keeps - // private hunches hidden from MCP-bound tokens. - const takesHolders = opts.takesHolders && opts.takesHolders.length > 0 - ? opts.takesHolders - : ['world']; - const permissions = { takes_holders: takesHolders }; - await sql` - INSERT INTO access_tokens (name, token_hash, permissions) - VALUES (${name}, ${hash}, ${sql.json(permissions as Parameters[0])}) - `; - console.log(`Token created for "${name}" (takes_holders=${JSON.stringify(takesHolders)}):\n`); - console.log(` ${token}\n`); - console.log('Save this token — it will not be shown again.'); - console.log(`Revoke with: bun run src/commands/auth.ts revoke "${name}"`); - console.log(`Update visibility: bun run src/commands/auth.ts permissions "${name}" set-takes-holders world,garry`); + await withConfiguredSql(async (_sql, engine) => { + // v0.28: persist per-token takes-holder allow-list. Default ['world'] keeps + // private hunches hidden from MCP-bound tokens. + const takesHolders = opts.takesHolders && opts.takesHolders.length > 0 + ? opts.takesHolders + : ['world']; + const permissions = { takes_holders: takesHolders }; + // JSONB write: pass the object via executeRawJsonb with an explicit + // ::jsonb cast in the SQL string. Both engines round-trip the object + // through the wire-protocol type oid without the v0.12.0 double-encode + // bug class (verified by test/e2e/auth-permissions.test.ts:67 on + // Postgres and test/sql-query.test.ts on PGLite). + await executeRawJsonb( + engine, + `INSERT INTO access_tokens (name, token_hash, permissions) + VALUES ($1, $2, $3::jsonb)`, + [name, hash], + [permissions], + ); + console.log(`Token created for "${name}" (takes_holders=${JSON.stringify(takesHolders)}):\n`); + console.log(` ${token}\n`); + console.log('Save this token — it will not be shown again.'); + console.log(`Revoke with: gbrain auth revoke "${name}"`); + console.log(`Update visibility: gbrain auth permissions "${name}" set-takes-holders world,garry`); + }); } catch (e: any) { if (e.code === '23505') { console.error(`A token named "${name}" already exists. Revoke it first or use a different name.`); @@ -63,8 +94,6 @@ async function create(name: string, opts: { takesHolders?: string[] } = {}) { console.error('Error:', e.message); } process.exit(1); - } finally { - await sql.end(); } } @@ -73,43 +102,45 @@ async function permissions(name: string, action: string, value: string | undefin console.error('Usage: auth permissions set-takes-holders world,garry,brain'); process.exit(1); } - const sql = postgres(getDatabaseUrl(true)!); try { - const list = value.split(',').map(s => s.trim()).filter(Boolean); - if (list.length === 0) { - console.error('takes-holders list cannot be empty (use "world" for default-deny on private)'); - process.exit(1); - } - const perms = { takes_holders: list }; - const result = await sql` - UPDATE access_tokens - SET permissions = ${sql.json(perms as Parameters[0])} - WHERE name = ${name} - RETURNING id - `; - if (result.length === 0) { - console.error(`Token "${name}" not found.`); - process.exit(1); - } - console.log(`Updated "${name}": takes_holders = ${JSON.stringify(list)}`); + await withConfiguredSql(async (sql, engine) => { + const list = value.split(',').map(s => s.trim()).filter(Boolean); + if (list.length === 0) { + console.error('takes-holders list cannot be empty (use "world" for default-deny on private)'); + process.exit(1); + } + const perms = { takes_holders: list }; + // JSONB UPDATE via executeRawJsonb — same pattern as create() above. + const result = await executeRawJsonb( + engine, + `UPDATE access_tokens + SET permissions = $2::jsonb + WHERE name = $1 + RETURNING id`, + [name], + [perms], + ); + if (result.length === 0) { + console.error(`Token "${name}" not found.`); + process.exit(1); + } + console.log(`Updated "${name}": takes_holders = ${JSON.stringify(list)}`); + }); } catch (e: any) { console.error('Error:', e.message); process.exit(1); - } finally { - await sql.end(); } } async function list() { - const sql = postgres(getDatabaseUrl(true)!); - try { + await withConfiguredSql(async (sql) => { const rows = await sql` SELECT name, created_at, last_used_at, revoked_at FROM access_tokens ORDER BY created_at DESC `; if (rows.length === 0) { - console.log('No tokens found. Create one: bun run src/commands/auth.ts create "my-client"'); + console.log('No tokens found. Create one: gbrain auth create "my-client"'); return; } console.log('Name Created Last Used Status'); @@ -121,27 +152,23 @@ async function list() { const status = r.revoked_at ? 'REVOKED' : 'active'; console.log(`${name} ${created} ${lastUsed} ${status}`); } - } finally { - await sql.end(); - } + }); } async function revoke(name: string) { if (!name) { console.error('Usage: auth revoke '); process.exit(1); } - const sql = postgres(getDatabaseUrl(true)!); - try { - const result = await sql` + await withConfiguredSql(async (sql) => { + const rows = await sql` UPDATE access_tokens SET revoked_at = now() WHERE name = ${name} AND revoked_at IS NULL + RETURNING 1 `; - if (result.count === 0) { + if (rows.length === 0) { console.error(`No active token found with name "${name}".`); process.exit(1); } console.log(`Token "${name}" revoked.`); - } finally { - await sql.end(); - } + }); } async function test(url: string, token: string) { @@ -269,26 +296,25 @@ async function revokeClient(clientId: string) { console.error('Usage: auth revoke-client '); process.exit(1); } - const sql = postgres(getDatabaseUrl(true)!); try { - // Atomic single-statement delete: no race window between count + delete. - // Postgres cascades to oauth_tokens and oauth_codes (FK ON DELETE CASCADE - // declared in src/schema.sql:370,382) before the transaction commits. - const rows = await sql` - DELETE FROM oauth_clients WHERE client_id = ${clientId} - RETURNING client_id, client_name - `; - if (rows.length === 0) { - console.error(`No client found with id "${clientId}"`); - process.exit(1); - } - console.log(`OAuth client revoked: "${rows[0].client_name}" (${clientId})`); - console.log('Tokens and authorization codes purged via cascade.'); + await withConfiguredSql(async (sql) => { + // Atomic single-statement delete: no race window between count + delete. + // Postgres cascades to oauth_tokens and oauth_codes (FK ON DELETE CASCADE + // declared in src/schema.sql:370,382) before the transaction commits. + const rows = await sql` + DELETE FROM oauth_clients WHERE client_id = ${clientId} + RETURNING client_id, client_name + `; + if (rows.length === 0) { + console.error(`No client found with id "${clientId}"`); + process.exit(1); + } + console.log(`OAuth client revoked: "${rows[0].client_name}" (${clientId})`); + console.log('Tokens and authorization codes purged via cascade.'); + }); } catch (e: any) { console.error('Error:', e.message); process.exit(1); - } finally { - await sql.end(); } } @@ -301,25 +327,24 @@ async function registerClient(name: string, args: string[]) { : ['client_credentials']; const scopes = scopesIdx >= 0 && args[scopesIdx + 1] ? args[scopesIdx + 1] : 'read'; - const sql = postgres(getDatabaseUrl(true)!); try { - const { GBrainOAuthProvider } = await import('../core/oauth-provider.ts'); - const provider = new GBrainOAuthProvider({ sql: sql as any }); - const { clientId, clientSecret } = await provider.registerClientManual( - name, grantTypes, scopes, [], - ); - console.log(`OAuth client registered: "${name}"\n`); - console.log(` Client ID: ${clientId}`); - console.log(` Client Secret: ${clientSecret}\n`); - console.log(` Grant types: ${grantTypes.join(', ')}`); - console.log(` Scopes: ${scopes}\n`); - console.log('Save the client secret — it will not be shown again.'); - console.log(`Revoke with: gbrain auth revoke-client "${clientId}"`); + await withConfiguredSql(async (sql) => { + const { GBrainOAuthProvider } = await import('../core/oauth-provider.ts'); + const provider = new GBrainOAuthProvider({ sql }); + const { clientId, clientSecret } = await provider.registerClientManual( + name, grantTypes, scopes, [], + ); + console.log(`OAuth client registered: "${name}"\n`); + console.log(` Client ID: ${clientId}`); + console.log(` Client Secret: ${clientSecret}\n`); + console.log(` Grant types: ${grantTypes.join(', ')}`); + console.log(` Scopes: ${scopes}\n`); + console.log('Save the client secret — it will not be shown again.'); + console.log(`Revoke with: gbrain auth revoke-client "${clientId}"`); + }); } catch (e: any) { console.error('Error:', e.message); process.exit(1); - } finally { - await sql.end(); } } diff --git a/src/commands/files.ts b/src/commands/files.ts index ce84c54ea..a8990bfdd 100644 --- a/src/commands/files.ts +++ b/src/commands/files.ts @@ -2,7 +2,7 @@ import { readFileSync, readdirSync, statSync, lstatSync, existsSync, writeFileSy import { join, relative, extname, basename, dirname } from 'path'; import { createHash } from 'crypto'; import type { BrainEngine } from '../core/engine.ts'; -import * as db from '../core/db.ts'; +import { sqlQueryForEngine, executeRawJsonb } from '../core/sql-query.ts'; import { humanSize } from '../core/file-resolver.ts'; import { createProgress } from '../core/progress.ts'; import { getCliOptions, cliOptsToProgressOptions } from '../core/cli-options.ts'; @@ -47,16 +47,16 @@ export async function runFiles(engine: BrainEngine, args: string[]) { switch (subcommand) { case 'list': - await listFiles(args[1]); + await listFiles(engine, args[1]); break; case 'upload': - await uploadFile(args.slice(1)); + await uploadFile(engine, args.slice(1)); break; case 'sync': - await syncFiles(args[1]); + await syncFiles(engine, args[1]); break; case 'verify': - await verifyFiles(); + await verifyFiles(engine); break; case 'mirror': await mirrorFiles(args.slice(1)); @@ -74,7 +74,7 @@ export async function runFiles(engine: BrainEngine, args: string[]) { await cleanFiles(args.slice(1)); break; case 'upload-raw': - await uploadRaw(args.slice(1)); + await uploadRaw(engine, args.slice(1)); break; case 'signed-url': await signedUrl(args.slice(1)); @@ -100,8 +100,8 @@ export async function runFiles(engine: BrainEngine, args: string[]) { } } -async function listFiles(slug?: string) { - const sql = db.getConnection(); +async function listFiles(engine: BrainEngine, slug?: string) { + const sql = sqlQueryForEngine(engine); let rows; if (slug) { rows = await sql`SELECT * FROM files WHERE page_slug = ${slug} ORDER BY filename LIMIT 100`; @@ -116,12 +116,13 @@ async function listFiles(slug?: string) { console.log(`${rows.length} file(s):`); for (const row of rows) { - const size = row.size_bytes ? `${Math.round(row.size_bytes / 1024)}KB` : '?'; + const sizeBytes = row.size_bytes as number | null; + const size = sizeBytes ? `${Math.round(sizeBytes / 1024)}KB` : '?'; console.log(` ${row.page_slug || '(unlinked)'} / ${row.filename} [${size}, ${row.mime_type || '?'}]`); } } -async function uploadFile(args: string[]) { +async function uploadFile(engine: BrainEngine, args: string[]) { const filePath = args.find(a => !a.startsWith('--')); const pageSlug = args.find((a, i) => args[i - 1] === '--page') || null; @@ -136,7 +137,7 @@ async function uploadFile(args: string[]) { const storagePath = pageSlug ? `${pageSlug}/${filename}` : `unsorted/${hash.slice(0, 8)}-${filename}`; const mimeType = getMimeType(filePath); - const sql = db.getConnection(); + const sql = sqlQueryForEngine(engine); // Check for existing file by hash const existing = await sql`SELECT id FROM files WHERE content_hash = ${hash} AND storage_path = ${storagePath}`; @@ -178,7 +179,7 @@ async function uploadFile(args: string[]) { * * The .redirect.yaml pointer stays in the brain repo so git tracks what was stored. */ -async function uploadRaw(args: string[]) { +async function uploadRaw(engine: BrainEngine, args: string[]) { const filePath = args.find(a => !a.startsWith('--')); const pageSlug = args.find((a, i) => args[i - 1] === '--page') || null; const fileType = args.find((a, i) => args[i - 1] === '--type') || null; @@ -248,17 +249,20 @@ async function uploadRaw(args: string[]) { console.error(`Pointer written: ${pointerPath}`); } - // Record in DB - const sql = db.getConnection(); - await sql` - INSERT INTO files (page_slug, filename, storage_path, mime_type, size_bytes, content_hash, metadata) - VALUES (${pageSlug}, ${filename}, ${storagePath}, ${mimeType}, ${stat.size}, ${'sha256:' + hash}, - ${sql.json({ type: fileType, upload_method: method })}) - ON CONFLICT (storage_path) DO UPDATE SET - content_hash = EXCLUDED.content_hash, - size_bytes = EXCLUDED.size_bytes, - mime_type = EXCLUDED.mime_type - `; + // Record in DB. files.metadata is JSONB — pass the object via + // executeRawJsonb with an explicit ::jsonb cast so post-v0.31 reads see + // an actual object, not a JSON-encoded string (D1 wave). + await executeRawJsonb( + engine, + `INSERT INTO files (page_slug, filename, storage_path, mime_type, size_bytes, content_hash, metadata) + VALUES ($1, $2, $3, $4, $5, $6, $7::jsonb) + ON CONFLICT (storage_path) DO UPDATE SET + content_hash = EXCLUDED.content_hash, + size_bytes = EXCLUDED.size_bytes, + mime_type = EXCLUDED.mime_type`, + [pageSlug, filename, storagePath, mimeType, stat.size, 'sha256:' + hash], + [{ type: fileType, upload_method: method }], + ); // Output JSON for scripting console.log(JSON.stringify({ @@ -296,7 +300,7 @@ async function signedUrl(args: string[]) { console.log(url); } -async function syncFiles(dir?: string) { +async function syncFiles(engine: BrainEngine, dir?: string) { if (!dir || !existsSync(dir)) { console.error('Usage: gbrain files sync '); process.exit(1); @@ -323,7 +327,7 @@ async function syncFiles(dir?: string) { const mimeType = getMimeType(filePath); const stat = statSync(filePath); - const sql = db.getConnection(); + const sql = sqlQueryForEngine(engine); const existing = await sql`SELECT id FROM files WHERE content_hash = ${hash} AND storage_path = ${storagePath}`; if (existing.length > 0) { skipped++; @@ -351,8 +355,8 @@ async function syncFiles(dir?: string) { console.log(`Files sync complete: ${uploaded} uploaded, ${skipped} skipped (unchanged)`); } -async function verifyFiles() { - const sql = db.getConnection(); +async function verifyFiles(engine: BrainEngine) { + const sql = sqlQueryForEngine(engine); const rows = await sql`SELECT * FROM files ORDER BY storage_path LIMIT 1000`; if (rows.length === 0) { diff --git a/src/commands/serve-http.ts b/src/commands/serve-http.ts index 31c8ee355..b8da94603 100644 --- a/src/commands/serve-http.ts +++ b/src/commands/serve-http.ts @@ -33,6 +33,7 @@ import { loadConfig } from '../core/config.ts'; import { buildError, serializeError } from '../core/errors.ts'; import { VERSION } from '../version.ts'; import * as db from '../core/db.ts'; +import { sqlQueryForEngine, executeRawJsonb } from '../core/sql-query.ts'; /** * /health endpoint timeout. 3s rather than 5s: Fly.io's default @@ -173,8 +174,12 @@ export async function runServeHttp(engine: BrainEngine, options: ServeHttpOption ); } - // Get raw SQL connection for OAuth provider - const sql = db.getConnection() as SqlQuery; + // Engine-aware SQL adapter. Routes through engine.executeRaw on both + // Postgres and PGLite — the OAuth/admin/auth surface no longer requires + // a postgres.js singleton, so `gbrain serve --http` works against PGLite + // brains too. The narrow SqlQuery contract is scalar-binds-only; JSONB + // writes use executeRawJsonb (see mcp_request_log INSERT sites below). + const sql = sqlQueryForEngine(engine); // Initialize OAuth provider. F12 cleanup: DCR-disable now flips a // constructor option instead of monkey-patching `_clientsStore` after @@ -596,27 +601,44 @@ export async function runServeHttp(engine: BrainEngine, options: ServeHttpOption const operation = req.query.operation as string; const status = req.query.status as string; - // Dynamic filtering via postgres.js tagged-template fragments. - // Each filter expands to either `AND col = $N` (parameterized) or - // an empty fragment. `WHERE 1=1` lets us always have a WHERE clause - // and unconditionally append AND-prefixed fragments — no string - // interpolation, no manual escaping, no sql.unsafe. - const agentFilter = agent && agent !== 'all' ? sql`AND token_name = ${agent}` : sql``; - const opFilter = operation && operation !== 'all' ? sql`AND operation = ${operation}` : sql``; - const statusFilter = status && status !== 'all' ? sql`AND status = ${status}` : sql``; - - const rows = await sql` - SELECT id, token_name, COALESCE(agent_name, token_name) as agent_name, - operation, latency_ms, status, params, error_message, created_at - FROM mcp_request_log - WHERE 1=1 ${agentFilter} ${opFilter} ${statusFilter} - ORDER BY created_at DESC LIMIT ${limit} OFFSET ${offset} - `; - const [countResult] = await sql` - SELECT count(*)::int as total FROM mcp_request_log - WHERE 1=1 ${agentFilter} ${opFilter} ${statusFilter} - `; - res.json({ rows, total: (countResult as any).total, page, pages: Math.ceil((countResult as any).total / limit) }); + // Dynamic filtering: SqlQuery is deliberately scalar-only and does not + // support fragment composition (the prior `sql\`AND ... = ${v}\`` shape). + // Build the WHERE clause with positional placeholders + a params array. + // `WHERE 1=1` lets us always have a WHERE clause and conditionally + // append `AND col = $N` fragments — still parameterized, still escaped + // by the driver, no sql.unsafe. + const filters: string[] = []; + const params: (string | number)[] = []; + if (agent && agent !== 'all') { + filters.push(`AND token_name = $${params.length + 1}`); + params.push(agent); + } + if (operation && operation !== 'all') { + filters.push(`AND operation = $${params.length + 1}`); + params.push(operation); + } + if (status && status !== 'all') { + filters.push(`AND status = $${params.length + 1}`); + params.push(status); + } + const filterSql = filters.join(' '); + const limitParam = `$${params.length + 1}`; + const offsetParam = `$${params.length + 2}`; + + const rows = await engine.executeRaw( + `SELECT id, token_name, COALESCE(agent_name, token_name) as agent_name, + operation, latency_ms, status, params, error_message, created_at + FROM mcp_request_log + WHERE 1=1 ${filterSql} + ORDER BY created_at DESC LIMIT ${limitParam} OFFSET ${offsetParam}`, + [...params, limit, offset], + ); + const [countResult] = await engine.executeRaw<{ total: number }>( + `SELECT count(*)::int as total FROM mcp_request_log + WHERE 1=1 ${filterSql}`, + params, + ); + res.json({ rows, total: countResult.total, page, pages: Math.ceil(countResult.total / limit) }); } catch { res.status(503).json({ error: 'service_unavailable' }); } @@ -768,8 +790,13 @@ export async function runServeHttp(engine: BrainEngine, options: ServeHttpOption // asserting >= 2 rows after tools/list + tools/call was unreachable. const latency = Date.now() - startTime; try { - await sql`INSERT INTO mcp_request_log (token_name, agent_name, operation, latency_ms, status, params) - VALUES (${authInfo.clientId}, ${agentName}, ${'tools/list'}, ${latency}, ${'success'}, ${null})`; + await executeRawJsonb( + engine, + `INSERT INTO mcp_request_log (token_name, agent_name, operation, latency_ms, status, params) + VALUES ($1, $2, $3, $4, $5, $6::jsonb)`, + [authInfo.clientId, agentName, 'tools/list', latency, 'success'], + [null], + ); } catch { /* best effort */ } broadcastEvent({ agent: agentName, @@ -808,8 +835,13 @@ export async function runServeHttp(engine: BrainEngine, options: ServeHttpOption // valid-op success/error. const latency = Date.now() - startTime; try { - await sql`INSERT INTO mcp_request_log (token_name, agent_name, operation, latency_ms, status, params, error_message) - VALUES (${authInfo.clientId}, ${agentName}, ${name}, ${latency}, ${'error'}, ${null}, ${`unknown_operation: ${name}`})`; + await executeRawJsonb( + engine, + `INSERT INTO mcp_request_log (token_name, agent_name, operation, latency_ms, status, error_message, params) + VALUES ($1, $2, $3, $4, $5, $6, $7::jsonb)`, + [authInfo.clientId, agentName, name, latency, 'error', `unknown_operation: ${name}`], + [null], + ); } catch { /* best effort */ } broadcastEvent({ agent: agentName, @@ -835,8 +867,13 @@ export async function runServeHttp(engine: BrainEngine, options: ServeHttpOption // persistence regression test reliable across both rejection paths. const latency = Date.now() - startTime; try { - await sql`INSERT INTO mcp_request_log (token_name, agent_name, operation, latency_ms, status, params, error_message) - VALUES (${authInfo.clientId}, ${agentName}, ${name}, ${latency}, ${'error'}, ${null}, ${`insufficient_scope: requires '${requiredScope}'`})`; + await executeRawJsonb( + engine, + `INSERT INTO mcp_request_log (token_name, agent_name, operation, latency_ms, status, error_message, params) + VALUES ($1, $2, $3, $4, $5, $6, $7::jsonb)`, + [authInfo.clientId, agentName, name, latency, 'error', `insufficient_scope: requires '${requiredScope}'`], + [null], + ); } catch { /* best effort */ } broadcastEvent({ agent: agentName, @@ -865,10 +902,19 @@ export async function runServeHttp(engine: BrainEngine, options: ServeHttpOption // never written to mcp_request_log or the SSE feed). --log-full-params // bypasses this for operators debugging on their own laptop, with the // startup warning printed earlier. + // + // D1 (v0.31 wave): mcp_request_log.params is JSONB. Pre-v0.31 wrote + // a JSON-string into that JSONB column via the postgres.js template + // tag's loose typing — readable but semantically wrong (params->>'op' + // would return the encoded string, not the value). Post-v0.31 we + // pass the OBJECT through executeRawJsonb with an explicit ::jsonb + // cast, so reads return real objects and `params->>'op'` returns + // 'tools/list'. Pre-existing string-shaped rows are normalized by + // migration v41 in src/core/migrate.ts. const safeParamsSummary = summarizeMcpParams(name, params); - const logParams = logFullParams - ? (params ? JSON.stringify(params) : null) - : (safeParamsSummary ? JSON.stringify(safeParamsSummary) : null); + const logParamsObj: unknown = logFullParams + ? (params || null) + : (safeParamsSummary || null); const broadcastParams = logFullParams ? (params || {}) : safeParamsSummary; // v0.31 (D12 / eE1): refactor the inlined op.handler call to go through @@ -906,12 +952,19 @@ export async function runServeHttp(engine: BrainEngine, options: ServeHttpOption } catch (e) { // dispatchToolCall absorbs OperationError + Error and returns // isError:true; only an unexpected throw lands here. Treat as the - // F15 unified envelope. + // F15 unified envelope. v0.31 wave (D1): mcp_request_log.params is + // JSONB — write the object via executeRawJsonb so reads return a + // real object, not a JSON-encoded string. const latency = Date.now() - startTime; const errorPayload = serializeError(e); try { - await sql`INSERT INTO mcp_request_log (token_name, agent_name, operation, latency_ms, status, params, error_message) - VALUES (${authInfo.clientId}, ${agentName}, ${name}, ${latency}, ${'error'}, ${logParams}, ${errorPayload.message})`; + await executeRawJsonb( + engine, + `INSERT INTO mcp_request_log (token_name, agent_name, operation, latency_ms, status, error_message, params) + VALUES ($1, $2, $3, $4, $5, $6, $7::jsonb)`, + [authInfo.clientId, agentName, name, latency, 'error', errorPayload.message], + [logParamsObj], + ); } catch { /* best effort */ } broadcastEvent({ agent: agentName, @@ -937,8 +990,13 @@ export async function runServeHttp(engine: BrainEngine, options: ServeHttpOption errMsg = parsed.error?.message ?? parsed.message ?? errMsg; } catch { /* ignore */ } try { - await sql`INSERT INTO mcp_request_log (token_name, agent_name, operation, latency_ms, status, params, error_message) - VALUES (${authInfo.clientId}, ${agentName}, ${name}, ${latency}, ${'error'}, ${logParams}, ${errMsg})`; + await executeRawJsonb( + engine, + `INSERT INTO mcp_request_log (token_name, agent_name, operation, latency_ms, status, error_message, params) + VALUES ($1, $2, $3, $4, $5, $6, $7::jsonb)`, + [authInfo.clientId, agentName, name, latency, 'error', errMsg], + [logParamsObj], + ); } catch { /* best effort */ } broadcastEvent({ agent: agentName, @@ -954,8 +1012,13 @@ export async function runServeHttp(engine: BrainEngine, options: ServeHttpOption } try { - await sql`INSERT INTO mcp_request_log (token_name, agent_name, operation, latency_ms, status, params) - VALUES (${authInfo.clientId}, ${agentName}, ${name}, ${latency}, ${'success'}, ${logParams})`; + await executeRawJsonb( + engine, + `INSERT INTO mcp_request_log (token_name, agent_name, operation, latency_ms, status, params) + VALUES ($1, $2, $3, $4, $5, $6::jsonb)`, + [authInfo.clientId, agentName, name, latency, 'success'], + [logParamsObj], + ); } catch { /* best effort */ } broadcastEvent({ agent: agentName, diff --git a/src/commands/serve.ts b/src/commands/serve.ts index 257bc372d..1b1651a82 100644 --- a/src/commands/serve.ts +++ b/src/commands/serve.ts @@ -1,7 +1,68 @@ +import { spawnSync } from 'node:child_process'; import type { BrainEngine } from '../core/engine.ts'; import { startMcpServer } from '../mcp/server.ts'; -export async function runServe(engine: BrainEngine, args: string[] = []) { +// Maximum time the stdio path will wait for engine.disconnect() (PGLite +// close + advisory lock release) before forcing exit. Keeps a wedged +// disconnect from trapping the process forever; the abandoned lock dir is +// already covered by the in-process stale-lock check (acquireLock walks +// the dir, sees a dead PID, and removes it). +const CLEANUP_DEADLINE_MS = 5_000; + +// How often the parent-process watchdog polls the live kernel parent PID +// (via `readLiveParentPid`, NOT the cached `process.ppid` — see that +// helper's comment). We don't receive a signal when our parent dies (the +// kernel just re-parents us to init / launchd / a subreaper-PID), so +// polling is the only reliable way to detect "parent went away without +// closing stdin". 5s matches the cadence in the concurrent #591 PR; +// faster polling has no benefit, slower would extend the lock-leak window. +const PARENT_WATCHDOG_INTERVAL_MS = 5_000; + +export interface ServeOptions { + // Test seam — defaults to the live process. The lifecycle plumbing reads + // these for stdin EOF detection, signal handlers, and exit, so unit + // tests can drive end-to-end shutdown via mocked streams without + // spawning a real Bun process. `exit` is typed as `void` (not `never`) + // so test stubs that record + return are accepted without casts; + // `process.exit`'s `never` return is assignable to `void`. + stdin?: NodeJS.ReadableStream & { isTTY?: boolean }; + signals?: Pick; + exit?: (code?: number) => void; + log?: (msg: string) => void; + // Test seam: replace startMcpServer to avoid booting the real MCP SDK + // (which unconditionally attaches a 'data' listener to real + // process.stdin and would pollute the test runner's stdin handle). + // Defaults to the real implementation when omitted. + startMcpServer?: (engine: BrainEngine) => Promise; + // Test seam for the parent-process watchdog. The default + // (`readLiveParentPid`) reads the live kernel PPID via `ps` because + // `process.ppid` is captured at process creation and does not refresh + // on re-parent (Node/Bun parity). Tests inject a stub so they can + // simulate the parent dying without spawning ps or re-parenting any + // real process. + getParentPid?: () => number; + // Test seam: replace setInterval/clearInterval so the watchdog can + // fire deterministically in tests instead of waiting 5s. Defaults to + // the global timer functions. + setInterval?: (fn: () => void, ms: number) => unknown; + clearInterval?: (handle: unknown) => void; + // Test seam for the one-shot watchdog readiness probe. The default + // runs `spawnSync('ps', ['-o','ppid=','-p',PID])` and returns true on + // success. Tests inject a stub to simulate ps unavailability (e.g. + // stripped containers, busybox without procps) without modifying PATH. + // When the probe returns false, `installStdioLifecycle` skips the + // watchdog interval entirely and emits a loud stderr line. Without + // the probe, the original PR's behavior was a silent no-op: every + // tick fell through to the cached `process.ppid` and the watchdog + // never fired, while still claiming to be installed. + probeWatchdog?: () => boolean; +} + +export async function runServe( + engine: BrainEngine, + args: string[] = [], + opts: ServeOptions = {}, +) { // v0.26+: --http dispatches to the full OAuth 2.1 server (serve-http.ts) // with admin dashboard, scope enforcement, SSE feed, and the requireBearerAuth // middleware. Master's simpler startHttpTransport from v0.22.7 is superseded @@ -31,8 +92,279 @@ export async function runServe(engine: BrainEngine, args: string[] = []) { const { runServeHttp } = await import('./serve-http.ts'); await runServeHttp(engine, { port, tokenTtl, enableDcr, publicUrl, logFullParams }); - } else { - console.error('Starting GBrain MCP server (stdio)...'); - await startMcpServer(engine); + return; + } + + // stdio path — install lifecycle handlers BEFORE startMcpServer so that + // an early stdin EOF (parent died before our first read) can still + // trigger graceful release of the PGLite write lock held by `engine`. + // The HTTP / OAuth path above has its own lifecycle in serve-http.ts + // and is intentionally NOT wired into this stdio plumbing. + console.error('Starting GBrain MCP server (stdio)...'); + + installStdioLifecycle(engine, args, opts); + + const start = opts.startMcpServer ?? startMcpServer; + await start(engine); + // startMcpServer's `await server.connect(transport)` resolves once the + // SDK has wired up its stdin 'data' listener; that listener keeps the + // event loop alive. We deliberately do NOT add `await new Promise(() => + // {})` here — it would block this async frame and stop the lifecycle + // hooks from being able to call process.exit() cleanly. +} + +interface StdioLifecycleDeps { + stdin: NodeJS.ReadableStream & { isTTY?: boolean }; + signals: Pick; + exit: (code?: number) => void; + log: (msg: string) => void; + getParentPid: () => number; + setInterval: (fn: () => void, ms: number) => unknown; + clearInterval: (handle: unknown) => void; + probeWatchdog: () => boolean; +} + +function installStdioLifecycle( + engine: BrainEngine, + args: string[], + opts: ServeOptions, +): void { + const deps: StdioLifecycleDeps = { + stdin: opts.stdin ?? process.stdin, + signals: opts.signals ?? process, + exit: opts.exit ?? ((code?: number) => { process.exit(code); }), + log: opts.log ?? ((msg: string) => console.error(msg)), + getParentPid: opts.getParentPid ?? readLiveParentPid, + setInterval: opts.setInterval ?? ((fn, ms) => setInterval(fn, ms)), + clearInterval: opts.clearInterval ?? ((h) => clearInterval(h as ReturnType)), + probeWatchdog: opts.probeWatchdog ?? probeWatchdogAvailable, + }; + + let shuttingDown = false; + let parentWatchdog: unknown = null; + const beginShutdown = (reason: string): void => { + if (shuttingDown) return; + shuttingDown = true; + + // Stop the parent-watchdog interval as soon as a shutdown begins so + // it cannot fire a redundant 'parent-died' shutdown while the first + // one is still draining the cleanup chain. + if (parentWatchdog !== null) { + deps.clearInterval(parentWatchdog); + parentWatchdog = null; + } + + deps.log(`GBrain MCP server: graceful exit (${reason})`); + + // Race the cleanup against a deadline. engine.disconnect() does a + // PGLite WASM close + a synchronous rmSync on the lock dir; both + // should be sub-second, but a wedged WASM runtime shouldn't be able + // to trap us forever. If we hit the deadline we still exit; the + // lock dir is advisory and the next process's stale-lock check + // (process.kill(pid, 0) → ESRCH) will reclaim it. + const deadline = setTimeout(() => { + deps.log( + `GBrain MCP server: cleanup deadline (${CLEANUP_DEADLINE_MS}ms) exceeded — forcing exit`, + ); + deps.exit(0); + }, CLEANUP_DEADLINE_MS); + deadline.unref?.(); + + Promise.resolve() + .then(() => engine.disconnect()) + .catch((err: unknown) => { + const msg = err instanceof Error ? err.message : String(err); + deps.log(`GBrain MCP server: cleanup error: ${msg}`); + }) + .finally(() => { + clearTimeout(deadline); + deps.exit(0); + }); + }; + + // Signal-based termination. SIGTERM: daemon ask. SIGINT: user Ctrl-C. + // SIGHUP: terminal disconnect / daemon-style "reload" channels — Aragorn + // observed real-world hosts (Claude Desktop on macOS, hermes-agent + // restart) send these instead of closing stdin. All three get the same + // graceful path; the idempotency guard absorbs duplicate signals. + deps.signals.on('SIGTERM', () => beginShutdown('SIGTERM')); + deps.signals.on('SIGINT', () => beginShutdown('SIGINT')); + deps.signals.on('SIGHUP', () => beginShutdown('SIGHUP')); + + // Stdin EOF — the parent closes the pipe but the MCP SDK's + // StdioServerTransport only listens for 'data'/'error', not 'end' or + // 'close', so without these hooks the process keeps the engine (and its + // PGLite write lock) live indefinitely after the parent disconnects. + // 'end' fires on a clean EOF; 'close' fires when the underlying handle + // is destroyed (e.g. parent SIGKILL'd while pipe still open). Both + // converge on the same idempotent shutdown. + // Skip when stdin is a TTY: interactive `gbrain serve` use shouldn't + // terminate just because the user hasn't typed anything. Signal / + // watchdog paths still cover that case if needed. + if (!deps.stdin.isTTY) { + deps.stdin.once('end', () => beginShutdown('stdin-end')); + deps.stdin.once('close', () => beginShutdown('stdin-close')); + } + + // Parent-process watchdog. Some hosts (launchd, cron, certain MCP + // gateways) terminate without closing stdin and without sending a + // signal — the kernel just re-parents us to whichever ancestor is + // still alive (PID 1, or any closer subreaper such as launchd, systemd, + // tmux, or a parent shell with PR_SET_CHILD_SUBREAPER). Polling is the + // only portable way to notice; see `readLiveParentPid` for why we + // cannot rely on `process.ppid` (cached at process creation and never + // refreshed on re-parent in Node or Bun). + // + // We capture the initial parent PID once at install time and fire on + // ANY change, not just reparent-to-PID-1. The PR-#676 author's original + // `=== 1` check missed reparent-to-subreaper-PID-N, which is the actual + // observed behavior under launchd / systemd subreapers (codex review + // finding #3). A process legitimately started under PID 1 (e.g. a + // systemd service) skips the watchdog: there's no parent-death event + // to detect, and any reparent FROM 1 doesn't happen. `unref()` keeps + // the interval from blocking other exit paths. + // + // A one-shot startup probe (D2-revisited per codex finding #4) verifies + // that the underlying mechanism (`spawnSync('ps')`) actually works on + // this host. Stripped containers / busybox-without-procps environments + // would silently fall back to the cached `process.ppid` on every tick + // — the watchdog claims to be installed but never fires. When the probe + // fails, we skip installing the interval entirely and log loudly so the + // operator sees the degraded mode instead of a phantom watchdog. + const initialParentPid = deps.getParentPid(); + if (initialParentPid !== 1) { + if (!deps.probeWatchdog()) { + deps.log( + '[gbrain serve] watchdog disabled: ps unavailable, parent-death detection unavailable — child will rely on stdin EOF / signals only', + ); + } else { + parentWatchdog = deps.setInterval(() => { + if (deps.getParentPid() !== initialParentPid) { + beginShutdown('parent-died'); + } + }, PARENT_WATCHDOG_INTERVAL_MS); + (parentWatchdog as { unref?: () => void } | null)?.unref?.(); + } + } + + // Optional idle-timeout safety net. Default OFF; opt-in via + // `--stdio-idle-timeout `. The flag is for the rare case where + // the parent leaks the stdin pipe but never closes it (so 'end' never + // fires) and never sends another message — we'd otherwise sit on the + // PGLite lock forever. Off by default because most parents close + // properly and an over-eager idle timeout would surprise long-poll + // workloads. + const idleTimeoutSec = parseStdioIdleTimeout(args); + if (idleTimeoutSec > 0) { + let idleTimer: ReturnType | null = null; + const armIdle = (): void => { + if (idleTimer) clearTimeout(idleTimer); + idleTimer = setTimeout( + () => beginShutdown(`stdio-idle-timeout (${idleTimeoutSec}s)`), + idleTimeoutSec * 1000, + ); + idleTimer.unref?.(); + }; + armIdle(); + // Reset on every chunk. We can't observe SDK-parsed messages from + // here, but every JSON-RPC frame causes a 'data' event on stdin, so + // chunk-level granularity is sufficient. + deps.stdin.on('data', armIdle); + deps.log(`GBrain MCP server: stdio idle timeout = ${idleTimeoutSec}s`); + } +} + +/** + * Resolve the live parent PID from the kernel (not the cached startup + * value). Both Node and Bun expose `process.ppid` as a property captured + * at process creation, so it does NOT update when the kernel re-parents + * us to a new ancestor after the original parent dies — which is the + * exact event the watchdog needs to detect. Empirical evidence on + * macOS / Bun 1.3.12: `process.ppid` stays at the original parent ID + * indefinitely while `ps -o ppid= -p $$` reports the new parent within + * one tick. + * + * Cost: ~10ms per spawn. Called every 5s (PARENT_WATCHDOG_INTERVAL_MS), + * so amortized < 0.5% CPU. Falls back to `process.ppid` if `ps` fails + * (best-effort safety net for stripped-down containers, etc.); the + * startup probe at watchdog-install time loud-logs and skips the + * interval entirely when ps is unavailable, so a per-tick fallback is + * a redundant safety net rather than a primary mechanism. + */ +function readLiveParentPid(): number { + try { + const r = spawnSync('ps', ['-o', 'ppid=', '-p', String(process.pid)], { + encoding: 'utf8', + timeout: 1000, + }); + if (r.status === 0 && typeof r.stdout === 'string') { + const n = parseInt(r.stdout.trim(), 10); + if (Number.isInteger(n) && n >= 0) return n; + } + } catch { + /* fall through */ + } + return process.ppid; +} + +/** + * One-shot probe at watchdog-install time to confirm ps actually works + * on this host. Returns true iff `spawnSync('ps','-o','ppid=','-p',PID)` + * exits 0 with a parseable integer. When it returns false, the caller + * skips installing the watchdog and emits a loud stderr line — the + * operator sees "watchdog disabled" instead of an installed-but-never- + * fires phantom. + * + * Why a separate probe rather than relying on the per-tick fallback in + * `readLiveParentPid`: the per-tick fallback returns the cached + * `process.ppid` silently, so the watchdog runs every 5s, compares + * cached PPID to itself, never detects a change, and never fires — + * while still claiming to be active. The probe surfaces the gap once + * at install time and lets the caller short-circuit cleanly. + */ +function probeWatchdogAvailable(): boolean { + try { + const r = spawnSync('ps', ['-o', 'ppid=', '-p', String(process.pid)], { + encoding: 'utf8', + timeout: 1000, + }); + if (r.status !== 0 || typeof r.stdout !== 'string') return false; + const n = parseInt(r.stdout.trim(), 10); + return Number.isInteger(n) && n >= 0; + } catch { + return false; + } +} + +function parseStdioIdleTimeout(args: string[]): number { + const idx = args.indexOf('--stdio-idle-timeout'); + if (idx < 0) return 0; + const raw = args[idx + 1]; + // Strict parsing — silent fallback to 0 turns an opt-in safety net into + // a no-op when an operator typos the value (e.g. `--stdio-idle-timeout + // 30s`). `Number()` rejects partial parses like `30junk` (returns NaN), + // unlike `parseInt` which would silently accept it. A missing value + // (`--stdio-idle-timeout` at end of args) and any non-integer / negative + // value are surfaced as a CLI error before we install the timer. + if (raw === undefined) { + throw new Error( + '--stdio-idle-timeout requires a non-negative integer (seconds). Got: (missing value)', + ); + } + // Reject empty / whitespace-only explicitly: `Number('')` is 0 in JS, + // which would silently turn `--stdio-idle-timeout ""` into the + // documented opt-out — the exact silent-fallback failure mode this + // strict parser exists to prevent. + if (raw.trim() === '') { + throw new Error( + '--stdio-idle-timeout requires a non-negative integer (seconds). Got: (blank value)', + ); + } + const n = Number(raw); + if (!Number.isInteger(n) || n < 0) { + throw new Error( + `--stdio-idle-timeout requires a non-negative integer (seconds). Got: ${JSON.stringify(raw)}`, + ); } + return n; } diff --git a/src/core/config.ts b/src/core/config.ts index c0c6f09e4..126a45388 100644 --- a/src/core/config.ts +++ b/src/core/config.ts @@ -134,15 +134,22 @@ export function loadConfig(): GBrainConfig | null { if (!fileConfig && !dbUrl) return null; - // Infer engine type if not explicitly set - const inferredEngine: 'postgres' | 'pglite' = fileConfig?.engine - || (fileConfig?.database_path ? 'pglite' : 'postgres'); + // Infer engine type. A DATABASE_URL-style env var is always a Postgres + // connection target and must override a file-backed PGLite engine + // selection; otherwise direct-script / operator paths can silently hit + // the local PGLite brain while claiming to use the env URL. The PGLite + // database_path is also cleared when dbUrl is set so toEngineConfig + // doesn't pass a stale path through alongside the URL. + const inferredEngine: 'postgres' | 'pglite' = dbUrl + ? 'postgres' + : fileConfig?.engine || (fileConfig?.database_path ? 'pglite' : 'postgres'); // Merge: env vars override config file. READ only — never mutate process.env. const merged = { ...fileConfig, engine: inferredEngine, ...(dbUrl ? { database_url: dbUrl } : {}), + ...(dbUrl ? { database_path: undefined } : {}), ...(process.env.OPENAI_API_KEY ? { openai_api_key: process.env.OPENAI_API_KEY } : {}), ...(process.env.GBRAIN_EMBEDDING_MODEL ? { embedding_model: process.env.GBRAIN_EMBEDDING_MODEL } : {}), ...(process.env.GBRAIN_EMBEDDING_DIMENSIONS ? { embedding_dimensions: parseInt(process.env.GBRAIN_EMBEDDING_DIMENSIONS, 10) } : {}), diff --git a/src/core/migrate.ts b/src/core/migrate.ts index fec4e8f41..c0ce963e2 100644 --- a/src/core/migrate.ts +++ b/src/core/migrate.ts @@ -2362,6 +2362,36 @@ export const MIGRATIONS: Migration[] = [ } }, }, + { + version: 46, + name: 'mcp_request_log_params_jsonb_normalize', + idempotent: true, + // v0.31.3 wave (D-codex-2 / D1): mcp_request_log.params is JSONB, but + // pre-v0.31.3 serve-http.ts wrote `JSON.stringify(...)` strings into it + // via the postgres.js template tag's loose typing. The column was + // technically JSONB but stored as a JSON-encoded string, so reads via + // `params->>'op'` returned the encoded string '"search"' instead of + // 'search'. The /admin/api/requests endpoint returned both shapes raw + // to the SPA depending on row age. + // + // The v0.31.3 commit re-routes those INSERTs through executeRawJsonb, + // which writes real objects. This one-shot UPDATE lifts existing + // string-shaped rows up to objects so the read side sees one + // consistent shape. Idempotent: subsequent runs find no rows where + // jsonb_typeof = 'string' and the UPDATE is a no-op. + // + // `params #>> '{}'` extracts the underlying string at the top level, + // then ::jsonb re-parses it as JSON. The `WHERE` filter guards against + // running on already-object rows AND limits the unwrap to strings that + // start with `{` (object-shaped) so a malformed legacy string can't + // abort the migration. + sql: ` + UPDATE mcp_request_log + SET params = (params #>> '{}')::jsonb + WHERE jsonb_typeof(params) = 'string' + AND params #>> '{}' LIKE '{%'; + `, + }, ]; export const LATEST_VERSION = MIGRATIONS.length > 0 diff --git a/src/core/oauth-provider.ts b/src/core/oauth-provider.ts index 171b539df..c31188d15 100644 --- a/src/core/oauth-provider.ts +++ b/src/core/oauth-provider.ts @@ -24,14 +24,13 @@ import type { OAuthRegisteredClientsStore } from '@modelcontextprotocol/sdk/serv import type { AuthInfo } from '@modelcontextprotocol/sdk/server/auth/types.js'; import { hashToken, generateToken, isUndefinedColumnError } from './utils.ts'; import { hasScope, assertAllowedScopes, parseScopeString, InvalidScopeError } from './scope.ts'; +import type { SqlQuery, SqlValue } from './sql-query.ts'; +export type { SqlQuery, SqlValue }; // --------------------------------------------------------------------------- // Types // --------------------------------------------------------------------------- -/** Raw SQL query function — works with both PGLite and postgres tagged templates */ -export type SqlQuery = (strings: TemplateStringsArray, ...values: unknown[]) => Promise[]>; - /** * Convert a JS array to a PostgreSQL array literal for PGLite compat. * diff --git a/src/core/sql-query.ts b/src/core/sql-query.ts new file mode 100644 index 000000000..6483af8f1 --- /dev/null +++ b/src/core/sql-query.ts @@ -0,0 +1,121 @@ +import type { BrainEngine } from './engine.ts'; + +/** + * Minimal tagged SQL function used by OAuth/admin/auth infrastructure. + * + * This is deliberately narrower than postgres.js's `sql` tag: values must be + * scalar bind parameters only. It does not support nested SQL fragments, + * sql.json(), sql.unsafe(), sql.begin(), or direct JS array binding. JSONB + * writes go through the separate `executeRawJsonb` helper below. + * + * The narrow surface is the feature: every call site in auth.ts / + * serve-http.ts / mcp/http-transport.ts / files.ts answers "do you support + * X?" with "no, and that's the contract." That keeps the adapter from + * drifting into a partial postgres.js clone (codex finding #7 from the + * v0.31 plan review). + */ +export type SqlValue = string | number | bigint | boolean | Date | null; +export type SqlQuery = (strings: TemplateStringsArray, ...values: SqlValue[]) => Promise[]>; + +/** + * Build a minimal tagged-template SQL adapter over the active BrainEngine. + * + * OAuth/admin code only needs scalar positional parameters plus returned rows. + * Using BrainEngine.executeRaw keeps the path engine-aware: Postgres goes + * through the connected postgres.js client (`unsafe(sql, params)`), while + * PGLite goes through its embedded `db.query(sql, params)`. + * + * The v0.12.0 double-encode bug class does NOT apply here because + * executeRaw uses positional binding, not the postgres.js template tag's + * auto-stringify path that caused the original silent-data-loss incident. + */ +export function sqlQueryForEngine(engine: BrainEngine): SqlQuery { + return async (strings: TemplateStringsArray, ...values: SqlValue[]) => { + for (const value of values) { + assertSqlValue(value); + } + const query = strings.reduce((acc, str, i) => { + return acc + str + (i < values.length ? `$${i + 1}` : ''); + }, ''); + return engine.executeRaw(query, values); + }; +} + +function assertSqlValue(value: unknown): asserts value is SqlValue { + if ( + value === null || + typeof value === 'string' || + typeof value === 'number' || + typeof value === 'bigint' || + typeof value === 'boolean' || + value instanceof Date + ) { + return; + } + + const kind = Array.isArray(value) + ? 'array' + : value && typeof (value as { then?: unknown }).then === 'function' + ? 'promise' + : typeof value; + throw new TypeError( + `sqlQueryForEngine only supports scalar bind values; got ${kind}. ` + + 'Use fixed SQL with scalar params, or executeRawJsonb for JSONB writes.', + ); +} + +/** + * Cross-engine JSONB write helper. Composes a parametrized SQL string with + * explicit `$N::jsonb` casts for the JSONB positions and passes the JSONB + * values as JS objects through `engine.executeRaw`. Both the postgres.js + * `unsafe(sql, params)` path (via PostgresEngine) and PGLite's + * `db.query(sql, params)` accept objects for `$N::jsonb` positions and + * round-trip them with `jsonb_typeof = 'object'` (verified by + * test/e2e/auth-permissions.test.ts:67 on Postgres and test/sql-query.test.ts + * on PGLite). + * + * Why this exists separately from SqlQuery: the SqlQuery contract is + * deliberately scalar-only. JSONB columns are rare enough across the + * auth/admin surface that a focused helper preserves the contract without + * forcing every call site to remember which positions hold JSONB. + * + * Why this is safe vs the v0.12.0 double-encode bug: the bug was specific + * to postgres.js's template-tag auto-stringify path interacting with + * sql.json() — not to positional binding through `unsafe()`. JS objects + * passed as positional params reach the wire protocol with the correct + * type oid (jsonb when cast in the SQL string), so there is no double- + * encode. The CI guard (scripts/check-jsonb-pattern.sh) doesn't fire + * because the source pattern is a method call (`executeRawJsonb(...)`), + * not the banned literal-template-tag interpolation pattern with + * JSON.stringify cast to jsonb. + * + * Usage: + * await executeRawJsonb( + * engine, + * `INSERT INTO access_tokens (name, token_hash, permissions) + * VALUES ($1, $2, $3::jsonb)`, + * [name, hash], + * [{ takes_holders: ['world', 'garry'] }], + * ); + * + * The SQL string MUST already contain the `$N::jsonb` casts; the helper + * does NOT rewrite or inject them. Scalar params come first ($1..$N), then + * JSONB params ($N+1..$N+M). Matching the call-site convention to scalars- + * before-JSONB simplifies argument order and matches how the existing + * call sites we're migrating are shaped. + */ +export async function executeRawJsonb>( + engine: BrainEngine, + sql: string, + scalarParams: SqlValue[], + jsonbParams: unknown[], +): Promise { + for (const value of scalarParams) { + assertSqlValue(value); + } + // jsonbParams are explicitly NOT validated as scalar — they're meant to + // hold JS objects/arrays that postgres.js / PGLite will encode as JSONB + // via the explicit ::jsonb cast in the caller's SQL string. + const params: unknown[] = [...scalarParams, ...jsonbParams]; + return engine.executeRaw(sql, params); +} diff --git a/src/mcp/http-transport.ts b/src/mcp/http-transport.ts index 0ba2be992..6356208d6 100644 --- a/src/mcp/http-transport.ts +++ b/src/mcp/http-transport.ts @@ -1,8 +1,9 @@ /** - * HTTP transport for `gbrain serve --http`. + * HTTP transport for `gbrain serve --http` (legacy bearer-auth path). * - * Postgres-only. PGLite users get a clear fail-fast at startup (the access_tokens - * table doesn't exist on PGLite per pglite-schema.ts). + * Engine-aware via SqlQuery (works on both Postgres and PGLite as of the + * v0.31 wave). The access_tokens and mcp_request_log tables exist on both + * engines (see src/core/pglite-schema.ts:478,495 and src/schema.sql). * * Security model: * - Every request must include `Authorization: Bearer ` (except /health) @@ -31,6 +32,7 @@ import { operations } from '../core/operations.ts'; import { VERSION } from '../version.ts'; import { dispatchToolCall } from './dispatch.ts'; import { buildDefaultLimiters, type RateLimiter } from './rate-limit.ts'; +import { sqlQueryForEngine } from '../core/sql-query.ts'; const DEFAULT_BODY_CAP = 1024 * 1024; // 1 MiB @@ -116,22 +118,11 @@ function resolveClientIp(req: Request, server: { requestIP: (r: Request) => { ad export async function startHttpTransport(opts: HttpTransportOptions) { const { port, engine } = opts; - // Fail-fast: HTTP transport requires Postgres because access_tokens / mcp_request_log - // only exist in the Postgres schema (see src/core/pglite-schema.ts:5-6). - if ((engine as { kind?: string }).kind !== 'postgres') { - console.error('Error: gbrain serve --http requires a Postgres engine for remote auth tokens.'); - console.error('PGLite is local-only by design (access_tokens table is Postgres-only).'); - console.error('Either:'); - console.error(' - Use stdio: gbrain serve'); - console.error(' - Migrate to Postgres: gbrain migrate --to supabase'); - process.exit(1); - } - - const sql = (engine as unknown as { sql: any }).sql; - if (!sql) { - console.error('Error: Postgres engine has no .sql client. Engine may not be connected.'); - process.exit(1); - } + // Engine-aware: route SQL through the active BrainEngine. Both Postgres + // and PGLite carry access_tokens + mcp_request_log in their schemas + // (pglite-schema.ts:478,495 and schema.sql), so the legacy bearer-auth + // path works on either engine without a postgres.js singleton. + const sql = sqlQueryForEngine(engine); const limiters = opts.limiters || buildDefaultLimiters(); const bodyCap = envInt('GBRAIN_HTTP_MAX_BODY_BYTES', DEFAULT_BODY_CAP); @@ -169,11 +160,13 @@ export async function startHttpTransport(opts: HttpTransportOptions) { WHERE token_hash = ${hash} AND revoked_at IS NULL `; if (!row) return { ok: false }; + const rowId = row.id as string; + const rowName = row.name as string; // Debounced last_used_at update — only writes once per token per 60s. // SQL-level WHERE clause keeps this race-tolerant even under concurrent requests. sql`UPDATE access_tokens SET last_used_at = now() - WHERE id = ${row.id} + WHERE id = ${rowId} AND (last_used_at IS NULL OR last_used_at < now() - interval '60 seconds')` .catch(() => { /* fire-and-forget */ }); // v0.28: extract per-token takes-holder allow-list. Fail-safe default @@ -184,8 +177,8 @@ export async function startHttpTransport(opts: HttpTransportOptions) { : ['world']; return { ok: true, - tokenId: row.id, - tokenName: row.name, + tokenId: rowId, + tokenName: rowName, takesHoldersAllowList: allowList, }; } catch { diff --git a/test/config-env.test.ts b/test/config-env.test.ts new file mode 100644 index 000000000..a3e6bce3c --- /dev/null +++ b/test/config-env.test.ts @@ -0,0 +1,131 @@ +import { mkdtempSync, rmSync } from 'fs'; +import { tmpdir } from 'os'; +import { join } from 'path'; +import { describe, expect, test } from 'bun:test'; +import { loadConfig, saveConfig } from '../src/core/config.ts'; +import { withEnv } from './helpers/with-env.ts'; + +// PR #681 originally shipped a manual `restoreEnv()` in afterEach for these +// tests. CLAUDE.md R1 (test-isolation lint) and the codex outside-voice +// review (D3 in the v0.31 plan) called that out — manual try/restore patterns +// don't survive an assertion failure mid-test, the codemod for parallel +// test execution can't whitelist them, and the canonical helper at +// test/helpers/with-env.ts already exists for this. Migrating here. +// +// withEnv()'s try/finally restores even when the callback throws, so a +// failed expect() inside the block leaves the process env clean for the +// next file in the shard. + +describe('loadConfig env database URL precedence', () => { + test('DATABASE_URL switches an existing PGLite file config to Postgres', async () => { + const home = mkdtempSync(join(tmpdir(), 'gbrain-config-env-')); + try { + // Pre-seed: PGLite file config in this isolated GBRAIN_HOME. + await withEnv( + { GBRAIN_HOME: home, GBRAIN_DATABASE_URL: undefined, DATABASE_URL: undefined }, + () => { + saveConfig({ engine: 'pglite', database_path: '/tmp/local-brain.pglite' }); + }, + ); + + // DATABASE_URL set: loadConfig must override PGLite selection, + // pick Postgres, and clear the stale database_path so toEngineConfig + // doesn't try to use both. + await withEnv( + { + GBRAIN_HOME: home, + GBRAIN_DATABASE_URL: undefined, + DATABASE_URL: 'postgres://user:pass@example.test:5432/gbrain', + }, + () => { + const cfg = loadConfig(); + expect(cfg?.engine).toBe('postgres'); + expect(cfg?.database_url).toBe('postgres://user:pass@example.test:5432/gbrain'); + expect(cfg?.database_path).toBeUndefined(); + }, + ); + } finally { + rmSync(home, { recursive: true, force: true }); + } + }); + + test('GBRAIN_DATABASE_URL beats DATABASE_URL (operator override)', async () => { + const home = mkdtempSync(join(tmpdir(), 'gbrain-config-env-')); + try { + await withEnv( + { GBRAIN_HOME: home, GBRAIN_DATABASE_URL: undefined, DATABASE_URL: undefined }, + () => { + saveConfig({ engine: 'pglite', database_path: '/tmp/local-brain.pglite' }); + }, + ); + + await withEnv( + { + GBRAIN_HOME: home, + GBRAIN_DATABASE_URL: 'postgres://win:win@gbrain.test:5432/db', + DATABASE_URL: 'postgres://lose:lose@other.test:5432/db', + }, + () => { + const cfg = loadConfig(); + expect(cfg?.engine).toBe('postgres'); + expect(cfg?.database_url).toBe('postgres://win:win@gbrain.test:5432/db'); + }, + ); + } finally { + rmSync(home, { recursive: true, force: true }); + } + }); + + test('No env DB URL → existing PGLite file config is honored', async () => { + const home = mkdtempSync(join(tmpdir(), 'gbrain-config-env-')); + try { + await withEnv( + { GBRAIN_HOME: home, GBRAIN_DATABASE_URL: undefined, DATABASE_URL: undefined }, + () => { + saveConfig({ engine: 'pglite', database_path: '/tmp/local-brain.pglite' }); + const cfg = loadConfig(); + expect(cfg?.engine).toBe('pglite'); + expect(cfg?.database_path).toBe('/tmp/local-brain.pglite'); + expect(cfg?.database_url).toBeUndefined(); + }, + ); + } finally { + rmSync(home, { recursive: true, force: true }); + } + }); + + test('No file config + DATABASE_URL → infers Postgres', async () => { + const home = mkdtempSync(join(tmpdir(), 'gbrain-config-env-')); + try { + await withEnv( + { + GBRAIN_HOME: home, + GBRAIN_DATABASE_URL: undefined, + DATABASE_URL: 'postgres://only:env@gbrain.test:5432/db', + }, + () => { + // No saveConfig() — no file present at all. + const cfg = loadConfig(); + expect(cfg?.engine).toBe('postgres'); + expect(cfg?.database_url).toBe('postgres://only:env@gbrain.test:5432/db'); + }, + ); + } finally { + rmSync(home, { recursive: true, force: true }); + } + }); + + test('No file config + no env DB URL → loadConfig returns null', async () => { + const home = mkdtempSync(join(tmpdir(), 'gbrain-config-env-')); + try { + await withEnv( + { GBRAIN_HOME: home, GBRAIN_DATABASE_URL: undefined, DATABASE_URL: undefined }, + () => { + expect(loadConfig()).toBeNull(); + }, + ); + } finally { + rmSync(home, { recursive: true, force: true }); + } + }); +}); diff --git a/test/e2e/auth-takes-holders-pglite.test.ts b/test/e2e/auth-takes-holders-pglite.test.ts new file mode 100644 index 000000000..6f57b6aa8 --- /dev/null +++ b/test/e2e/auth-takes-holders-pglite.test.ts @@ -0,0 +1,211 @@ +/** + * E2E for the v0.31 auth/admin SQL routing wave: full takes-holders + * round-trip on PGLite, in-memory, no DATABASE_URL gate. + * + * Mirrors test/e2e/auth-permissions.test.ts (which exercises the Postgres + * path) so JSONB shape parity is proven for both engines (Codex finding + * #1 from the v0.31 plan review). + * + * The path under test is the one auth.ts and src/mcp/http-transport.ts + * actually run after migration: + * 1. Token create with takes-holders → executeRawJsonb writes a JSONB object + * 2. validateToken-shaped read → SELECT permissions; jsonb_typeof = 'object' + * 3. Permissions update → executeRawJsonb again (UPDATE) + * 4. mcp_request_log.params write → executeRawJsonb (the serve-http flow) + * 5. Migration v45 normalizer → seed a string-shaped row, run the + * UPDATE, assert it's lifted to an object + */ +import { afterAll, beforeAll, describe, expect, test } from 'bun:test'; +import { PGLiteEngine } from '../../src/core/pglite-engine.ts'; +import { sqlQueryForEngine, executeRawJsonb } from '../../src/core/sql-query.ts'; + +let engine: PGLiteEngine; + +beforeAll(async () => { + engine = new PGLiteEngine(); + await engine.connect({}); + await engine.initSchema(); +}, 60_000); + +afterAll(async () => { + if (engine) await engine.disconnect(); +}); + +describe('auth takes-holders + mcp_request_log JSONB on PGLite (v0.31)', () => { + test('access_tokens.permissions: create + read returns a real JSONB object', async () => { + const sql = sqlQueryForEngine(engine); + const name = `tok-create-${Math.random().toString(36).slice(2, 8)}`; + const hash = `hash-${name}`; + const permissions = { takes_holders: ['world', 'garry'] }; + + // The exact shape auth.ts:create uses post-migration. + await executeRawJsonb( + engine, + `INSERT INTO access_tokens (name, token_hash, permissions) + VALUES ($1, $2, $3::jsonb)`, + [name, hash], + [permissions], + ); + + // The exact shape http-transport.ts:validateToken uses to read it back. + const rows = await sql` + SELECT permissions FROM access_tokens + WHERE token_hash = ${hash} + `; + const perms = (rows[0] as { permissions?: { takes_holders?: unknown } }).permissions; + expect(perms).toBeDefined(); + expect(Array.isArray(perms?.takes_holders)).toBe(true); + expect(perms?.takes_holders).toEqual(['world', 'garry']); + + // Defense in depth: the JSONB-text representation must be an object, + // not a JSON-encoded string. Codex finding #9 — assert the contract. + const typed = await engine.executeRaw<{ kind: string; first_holder: string }>( + `SELECT jsonb_typeof(permissions) AS kind, + permissions->'takes_holders'->>0 AS first_holder + FROM access_tokens WHERE token_hash = $1`, + [hash], + ); + expect(typed[0].kind).toBe('object'); + expect(typed[0].first_holder).toBe('world'); + }); + + test('access_tokens.permissions: UPDATE preserves JSONB object shape', async () => { + const sql = sqlQueryForEngine(engine); + const name = `tok-update-${Math.random().toString(36).slice(2, 8)}`; + const hash = `hash-${name}`; + + // Seed with default ['world']. + await executeRawJsonb( + engine, + `INSERT INTO access_tokens (name, token_hash, permissions) + VALUES ($1, $2, $3::jsonb)`, + [name, hash], + [{ takes_holders: ['world'] }], + ); + + // The exact shape auth.ts:permissions uses (set-takes-holders). + const result = await executeRawJsonb( + engine, + `UPDATE access_tokens + SET permissions = $2::jsonb + WHERE name = $1 + RETURNING id`, + [name], + [{ takes_holders: ['world', 'garry', 'brain'] }], + ); + expect(result).toHaveLength(1); + + const rows = await sql` + SELECT permissions FROM access_tokens + WHERE token_hash = ${hash} + `; + const perms = (rows[0] as { permissions: { takes_holders: string[] } }).permissions; + expect(perms.takes_holders).toEqual(['world', 'garry', 'brain']); + }); + + test('mcp_request_log.params: object writes round-trip as JSONB object', async () => { + // The serve-http.ts INSERT shape after the v0.31 migration. + const summary = { redacted: true, declared_keys: ['query', 'limit'], approx_bytes: 1024 }; + await executeRawJsonb( + engine, + `INSERT INTO mcp_request_log (token_name, agent_name, operation, latency_ms, status, params) + VALUES ($1, $2, $3, $4, $5, $6::jsonb)`, + ['test-token', 'test-agent', 'tools/call:query', 12, 'success'], + [summary], + ); + + const rows = await engine.executeRaw<{ + kind: string; + redacted: boolean; + bytes: number; + first_key: string; + }>( + `SELECT jsonb_typeof(params) AS kind, + (params->>'redacted')::boolean AS redacted, + (params->>'approx_bytes')::int AS bytes, + params->'declared_keys'->>0 AS first_key + FROM mcp_request_log + WHERE operation = $1`, + ['tools/call:query'], + ); + expect(rows[0].kind).toBe('object'); + expect(rows[0].redacted).toBe(true); + expect(rows[0].bytes).toBe(1024); + expect(rows[0].first_key).toBe('query'); + }); + + test('mcp_request_log.params: NULL writes (no params) round-trip as SQL NULL', async () => { + // tools/list and scope-rejected paths write NULL params. Must not + // be encoded as the string "null". + await executeRawJsonb( + engine, + `INSERT INTO mcp_request_log (token_name, agent_name, operation, latency_ms, status, params) + VALUES ($1, $2, $3, $4, $5, $6::jsonb)`, + ['test-token', 'test-agent', 'tools/list', 5, 'success'], + [null], + ); + + const rows = await engine.executeRaw<{ is_null: boolean }>( + `SELECT (params IS NULL) AS is_null FROM mcp_request_log + WHERE operation = 'tools/list'`, + ); + expect(rows[0].is_null).toBe(true); + }); + + test('migration v45 normalizer: lifts pre-v0.31 string-shaped rows to objects', async () => { + // Seed a row in the broken pre-v0.31 shape: a JSON-encoded object + // stored as a string-typed JSONB. This is what postgres.js's loose + // template-tag typing produced when `${JSON.stringify(obj)}` was + // bound to a JSONB column without sql.json(). + const broken = JSON.stringify({ legacy: 'shape', op: 'search' }); + await engine.executeRaw( + `INSERT INTO mcp_request_log (token_name, agent_name, operation, latency_ms, status, params) + VALUES ($1, $2, $3, $4, $5, to_jsonb($6::text))`, + ['legacy-token', 'legacy-agent', 'tools/call:legacy', 8, 'success', broken], + ); + // Confirm the seed produced the broken shape (jsonb_typeof = 'string'). + const before = await engine.executeRaw<{ kind: string }>( + `SELECT jsonb_typeof(params) AS kind FROM mcp_request_log + WHERE operation = 'tools/call:legacy'`, + ); + expect(before[0].kind).toBe('string'); + + // Run the migration v45 SQL exactly as it lives in src/core/migrate.ts. + await engine.executeRaw(` + UPDATE mcp_request_log + SET params = (params #>> '{}')::jsonb + WHERE jsonb_typeof(params) = 'string' + AND params #>> '{}' LIKE '{%' + `); + + // After: real object, ->> reads the values. + const after = await engine.executeRaw<{ kind: string; legacy: string; op: string }>( + `SELECT jsonb_typeof(params) AS kind, + params->>'legacy' AS legacy, + params->>'op' AS op + FROM mcp_request_log + WHERE operation = 'tools/call:legacy'`, + ); + expect(after[0].kind).toBe('object'); + expect(after[0].legacy).toBe('shape'); + expect(after[0].op).toBe('search'); + }); + + test('migration v45 normalizer: idempotent — re-running on already-fixed rows is a no-op', async () => { + // Run the migration a second time. The WHERE jsonb_typeof = 'string' + // guard means already-object rows are skipped, so this should leave + // the legacy row unchanged. + await engine.executeRaw(` + UPDATE mcp_request_log + SET params = (params #>> '{}')::jsonb + WHERE jsonb_typeof(params) = 'string' + AND params #>> '{}' LIKE '{%' + `); + const after = await engine.executeRaw<{ kind: string; legacy: string }>( + `SELECT jsonb_typeof(params) AS kind, params->>'legacy' AS legacy + FROM mcp_request_log WHERE operation = 'tools/call:legacy'`, + ); + expect(after[0].kind).toBe('object'); + expect(after[0].legacy).toBe('shape'); + }); +}); diff --git a/test/http-transport.test.ts b/test/http-transport.test.ts index 3df8e31e7..e011aa8af 100644 --- a/test/http-transport.test.ts +++ b/test/http-transport.test.ts @@ -23,6 +23,12 @@ type SqlHandler = (query: string, values: unknown[]) => SqlResult | Promise>(sql: string, params?: unknown[]) => Promise; sql: ReturnType; audit: { token_name: string | null; operation: string; status: string; latency_ms: number }[]; } @@ -39,6 +45,17 @@ function makeSqlTag(handler: SqlHandler) { }; } +/** + * Normalize a SQL string for pattern-matching: collapse whitespace, + * lowercase. Helps the mock's `executeRaw` match the queries that + * `sqlQueryForEngine` builds (multi-line, $1/$2/etc. placeholders) the + * same way the legacy template-tag mock matched the older single-line + * shapes. + */ +function normalizeSql(sql: string): string { + return sql.replace(/\s+/g, ' ').trim().toLowerCase(); +} + function hash(token: string): string { return createHash('sha256').update(token).digest('hex'); } @@ -61,35 +78,35 @@ function makeFakeEngine(cfg: FakeEngineConfig = {}): FakeEngine { const revokedTokens = cfg.revokedTokens ?? new Set(); const audit: FakeEngine['audit'] = []; - const sql = makeSqlTag((query, values) => { - if (cfg.dbDown && query.startsWith('SELECT')) throw new Error('db down'); + // Legacy template-tag handler. Preserved so any non-migrated call path + // still has a place to land (defense in depth). The new code path routes + // through executeRaw below. + const handle = (query: string, values: unknown[]): unknown[] => { + if (cfg.dbDown && query.toLowerCase().includes('select')) throw new Error('db down'); - if (query === 'SELECT 1') { - // /health DB probe + if (query === 'SELECT 1' || normalizeSql(query) === 'select 1') { return [{ '?column?': 1 }]; } - // v0.28: query now selects `permissions` too. Match either the legacy - // SELECT id, name shape OR the new SELECT id, name, permissions shape so - // older tests that haven't been updated still work; new tests can stash - // a `permissions` field on the validTokens row. - if (query.startsWith('SELECT id, name FROM access_tokens') || - query.startsWith('SELECT id, name, permissions FROM access_tokens')) { + const norm = normalizeSql(query); + + // SELECT id, name, permissions FROM access_tokens WHERE token_hash = $1 AND revoked_at IS NULL + if (norm.startsWith('select id, name from access_tokens') || + norm.startsWith('select id, name, permissions from access_tokens')) { const tokenHash = values[0] as string; if (revokedTokens.has(tokenHash)) return []; const row = validTokens.get(tokenHash); if (!row) return []; - // Default permissions to {takes_holders: ['world']} (matches migration v33 default). const rowWithPerms = { ...row, permissions: row.permissions ?? { takes_holders: ['world'] } }; return [rowWithPerms]; } - if (query.startsWith('UPDATE access_tokens')) { + if (norm.startsWith('update access_tokens')) { // last_used_at debounce — succeed silently return []; } - if (query.startsWith('INSERT INTO mcp_request_log')) { + if (norm.startsWith('insert into mcp_request_log')) { audit.push({ token_name: values[0] as string | null, operation: values[1] as string, @@ -100,9 +117,24 @@ function makeFakeEngine(cfg: FakeEngineConfig = {}): FakeEngine { } return []; - }); + }; + + const sql = makeSqlTag(handle); + + // v0.31: sqlQueryForEngine + executeRawJsonb both call engine.executeRaw. + // The new SQL strings carry $N positional placeholders (not the legacy + // template-tag `?`), but the queries themselves match by their leading + // text. We normalize whitespace so multi-line SQL bodies match the same + // way single-line shapes did. + const executeRaw = async >( + rawSql: string, + params?: unknown[], + ): Promise => { + const result = handle(rawSql, params ?? []); + return Promise.resolve(result as T[]); + }; - return { kind: 'postgres', sql, audit }; + return { kind: 'postgres', executeRaw, sql, audit }; } interface TestServer { diff --git a/test/serve-stdio-lifecycle.test.ts b/test/serve-stdio-lifecycle.test.ts new file mode 100644 index 000000000..e548b0134 --- /dev/null +++ b/test/serve-stdio-lifecycle.test.ts @@ -0,0 +1,442 @@ +import { describe, test, expect } from 'bun:test'; +import { EventEmitter } from 'events'; +import { runServe, type ServeOptions } from '../src/commands/serve'; +import type { BrainEngine } from '../src/core/engine'; + +// These tests cover the stdio lifecycle hooks added to runServe so that the +// PGLite write lock is released when the parent disconnects. We don't spawn +// a real Bun child or boot the real MCP SDK; we inject a stub `engine`, a +// fake stdin Readable (EventEmitter is enough — only on/once/emit are +// touched), an injected exit() that resolves a promise instead of +// terminating the process, and (per Codex Layer 2 review feedback) a +// no-op startMcpServer stub so the real MCP SDK never attaches a 'data' +// listener to the test runner's actual process.stdin. + +class StubEngine implements Partial { + // Track whether disconnect was called; the lock-release behavior we care + // about here is "did the lifecycle path actually invoke disconnect?". + disconnectCalls = 0; + disconnect = async (): Promise => { + this.disconnectCalls += 1; + }; +} + +class StubSignals { + private handlers = new Map void>>(); + on(signal: string, handler: (...a: unknown[]) => void): this { + const list = this.handlers.get(signal) ?? []; + list.push(handler); + this.handlers.set(signal, list); + return this; + } + emit(signal: string): void { + for (const h of this.handlers.get(signal) ?? []) h(); + } +} + +// Stub timer pair: `setInterval` returns a numeric handle; `tickAll()` +// fires every registered fn once, mirroring 1 real-time tick. Lets the +// test drive the parent-watchdog deterministically without 5s of wall +// clock and without leaving real timers active across the suite. +interface TimerStub { + setInterval: (fn: () => void, ms: number) => unknown; + clearInterval: (h: unknown) => void; + tickAll: () => void; + active: () => number; +} + +function makeTimerStub(): TimerStub { + const fns = new Map void>(); + let next = 1; + return { + setInterval(fn) { + const id = next++; + fns.set(id, fn); + return id; + }, + clearInterval(h) { + if (typeof h === 'number') fns.delete(h); + }, + tickAll() { + for (const fn of fns.values()) fn(); + }, + active() { + return fns.size; + }, + }; +} + +interface Harness { + engine: StubEngine; + stdin: EventEmitter & { isTTY?: boolean; on: any; once: any }; + signals: StubSignals; + logs: string[]; + exited: Promise; + opts: ServeOptions; + timers: TimerStub; + setParentPid: (pid: number) => void; +} + +function makeHarness(opts: { + isTTY?: boolean; + initialParentPid?: number; + probeWatchdog?: boolean; +} = {}): Harness { + const engine = new StubEngine(); + const stdin = new EventEmitter() as EventEmitter & { isTTY?: boolean }; + if (opts.isTTY) stdin.isTTY = true; + const signals = new StubSignals(); + const logs: string[] = []; + + let resolveExit!: (code: number) => void; + const exited = new Promise(r => { resolveExit = r; }); + let exitCalled = false; + + // Mutable parent-pid the test can flip; defaults to a non-1 sentinel + // so the watchdog *will* install (`initialParentPid !== 1` guard). + // Tests that want "we were spawned under PID 1" pass `initialParentPid: 1`. + let parentPid = opts.initialParentPid ?? 12345; + const timers = makeTimerStub(); + + // probeWatchdog defaults to true so tests run with watchdog installed. + // Set probeWatchdog: false to simulate stripped-container ps unavailability. + const probeWatchdogResult = opts.probeWatchdog ?? true; + + const serveOpts: ServeOptions = { + stdin: stdin as any, + signals: signals as any, + exit: (code?: number) => { + if (exitCalled) return; + exitCalled = true; + resolveExit(code ?? 0); + }, + log: (msg: string) => { logs.push(msg); }, + // Replace the real MCP SDK boot with a no-op so we never touch the + // test runner's real process.stdin. The lifecycle hooks under test + // are installed *before* this is awaited, so all behaviors are still + // exercised end-to-end. + startMcpServer: async () => {}, + getParentPid: () => parentPid, + setInterval: timers.setInterval, + clearInterval: timers.clearInterval, + probeWatchdog: () => probeWatchdogResult, + }; + + return { + engine, + stdin: stdin as any, + signals, + logs, + exited, + opts: serveOpts, + timers, + setParentPid: (pid: number) => { parentPid = pid; }, + }; +} + +// runServe in tests resolves quickly because the injected startMcpServer +// is a no-op. The lifecycle hooks were installed synchronously before +// that no-op was awaited, so they're already wired by the time runServe +// returns. We start runServe and `await` it (so any setup error surfaces +// immediately), then drive the test-controlled events. +async function startInBackground( + engine: StubEngine, + args: string[], + opts: ServeOptions, +): Promise { + await runServe(engine as unknown as BrainEngine, args, opts); +} + +describe('runServe stdio lifecycle', () => { + test('stdin end triggers engine.disconnect() and process exit(0)', async () => { + const h = makeHarness(); + await startInBackground(h.engine, [], h.opts); + + h.stdin.emit('end'); + const code = await h.exited; + + expect(code).toBe(0); + expect(h.engine.disconnectCalls).toBe(1); + expect(h.logs.some(l => l.includes('graceful exit (stdin-end)'))).toBe(true); + }); + + test('SIGTERM triggers graceful exit', async () => { + const h = makeHarness(); + await startInBackground(h.engine, [], h.opts); + + h.signals.emit('SIGTERM'); + const code = await h.exited; + + expect(code).toBe(0); + expect(h.engine.disconnectCalls).toBe(1); + expect(h.logs.some(l => l.includes('graceful exit (SIGTERM)'))).toBe(true); + }); + + test('SIGINT triggers graceful exit', async () => { + const h = makeHarness(); + await startInBackground(h.engine, [], h.opts); + + h.signals.emit('SIGINT'); + const code = await h.exited; + + expect(code).toBe(0); + expect(h.engine.disconnectCalls).toBe(1); + expect(h.logs.some(l => l.includes('graceful exit (SIGINT)'))).toBe(true); + }); + + test('SIGHUP triggers graceful exit (terminal disconnect / daemon reload)', async () => { + // Per Aragorn (#591): real-world hosts (Claude Desktop on macOS, + // hermes-agent restart) sometimes send SIGHUP instead of closing + // stdin or sending SIGTERM. The handler converges on the same + // graceful path as the other signals. + const h = makeHarness(); + await startInBackground(h.engine, [], h.opts); + + h.signals.emit('SIGHUP'); + const code = await h.exited; + + expect(code).toBe(0); + expect(h.engine.disconnectCalls).toBe(1); + expect(h.logs.some(l => l.includes('graceful exit (SIGHUP)'))).toBe(true); + }); + + test('stdin close (parent SIGKILL leaves pipe destroyed) triggers graceful exit', async () => { + // 'end' fires on a clean EOF; 'close' fires when the underlying + // handle is destroyed (e.g. parent SIGKILL'd while pipe still open). + // We must observe both — observing only 'end' would miss the + // hard-kill path that #591's reporter hit on macOS. + const h = makeHarness(); + await startInBackground(h.engine, [], h.opts); + + h.stdin.emit('close'); + const code = await h.exited; + + expect(code).toBe(0); + expect(h.engine.disconnectCalls).toBe(1); + expect(h.logs.some(l => l.includes('graceful exit (stdin-close)'))).toBe(true); + }); + + test('parent watchdog fires shutdown when ppid flips to 1 (orphaned to init)', async () => { + // Some hosts (launchd, cron, certain MCP gateways) terminate + // without closing stdin and without sending a signal — the kernel + // re-parents us. The watchdog polls the live ppid on an interval; + // when it differs from the initial captured ppid, we detect "parent + // died" and shut down. + const h = makeHarness({ initialParentPid: 4242 }); + await startInBackground(h.engine, [], h.opts); + + // Watchdog should have installed (we passed initialParentPid !== 1 + // and probeWatchdog defaulted to true). + expect(h.timers.active()).toBe(1); + + // Simulate parent death: our process gets re-parented to init. + h.setParentPid(1); + h.timers.tickAll(); + + const code = await h.exited; + expect(code).toBe(0); + expect(h.engine.disconnectCalls).toBe(1); + expect(h.logs.some(l => l.includes('graceful exit (parent-died)'))).toBe(true); + + // beginShutdown clears the watchdog interval as part of cleanup so + // a duplicate tick can't queue a redundant shutdown. + expect(h.timers.active()).toBe(0); + }); + + test('parent watchdog fires shutdown when ppid flips to a SUBREAPER PID > 1 (codex finding #3)', async () => { + // Reparent-to-PID-1 is the easy case. Real hosts under launchd / + // systemd / tmux / a parent-shell-with-PR_SET_CHILD_SUBREAPER will + // re-parent us to that subreaper's PID, NOT to 1. The PR-#676 + // author's original `=== 1` check missed this. The fix is to fire + // on `current !== initialParentPid` so any reparent triggers the + // shutdown, regardless of where the kernel re-anchors us. + const h = makeHarness({ initialParentPid: 8500 }); + await startInBackground(h.engine, [], h.opts); + + expect(h.timers.active()).toBe(1); + + // Parent died; kernel re-parented to a launchd subreaper (PID 47). + h.setParentPid(47); + h.timers.tickAll(); + + const code = await h.exited; + expect(code).toBe(0); + expect(h.engine.disconnectCalls).toBe(1); + expect(h.logs.some(l => l.includes('graceful exit (parent-died)'))).toBe(true); + expect(h.timers.active()).toBe(0); + }); + + test('parent watchdog NOT installed when initial ppid is already 1 (legitimate init child)', async () => { + // Spawned directly under PID 1 (e.g. systemd unit, Docker entrypoint): + // ppid=1 is the documented steady state, not "parent died". We must + // NOT install the watchdog or we'd shut down immediately. + const h = makeHarness({ initialParentPid: 1 }); + await startInBackground(h.engine, [], h.opts); + + expect(h.timers.active()).toBe(0); + + // Sanity: the other lifecycle paths still work. + h.signals.emit('SIGTERM'); + await h.exited; + expect(h.engine.disconnectCalls).toBe(1); + }); + + test('parent watchdog NOT installed when ps probe fails (codex finding #4 / D2-revisited)', async () => { + // Stripped containers / busybox-without-procps environments lack ps. + // The original PR's per-tick fallback would silently return cached + // process.ppid, never detect a change, and never fire the shutdown + // — while still claiming to be active. + // + // The fix: a one-shot startup probe. When it returns false, we skip + // installing the watchdog interval AND emit a loud stderr line so + // the operator sees the degraded mode at startup. + const h = makeHarness({ initialParentPid: 4242, probeWatchdog: false }); + await startInBackground(h.engine, [], h.opts); + + // Watchdog NOT installed — message matches behavior. + expect(h.timers.active()).toBe(0); + expect(h.logs.some(l => l.includes('[gbrain serve] watchdog disabled: ps unavailable'))).toBe(true); + + // Sanity: the other lifecycle paths still work — the shutdown still + // funnels through stdin EOF / signals, just not via the watchdog. + h.signals.emit('SIGTERM'); + await h.exited; + expect(h.engine.disconnectCalls).toBe(1); + }); + + test('parent watchdog tick with ppid still alive does NOT fire shutdown', async () => { + // The watchdog must only fire on the *transition* away from the + // initial ppid; a healthy tick (ppid still equal to the original) + // is a no-op. + const h = makeHarness({ initialParentPid: 4242 }); + await startInBackground(h.engine, [], h.opts); + + expect(h.timers.active()).toBe(1); + // Tick with ppid unchanged. + h.timers.tickAll(); + h.timers.tickAll(); + h.timers.tickAll(); + expect(h.engine.disconnectCalls).toBe(0); + + // ... and signal-driven shutdown still works after several quiet ticks. + h.signals.emit('SIGTERM'); + await h.exited; + expect(h.engine.disconnectCalls).toBe(1); + }); + + test('shutdown is idempotent — multiple signals only disconnect once', async () => { + const h = makeHarness(); + await startInBackground(h.engine, [], h.opts); + + h.signals.emit('SIGTERM'); + h.signals.emit('SIGTERM'); + h.signals.emit('SIGINT'); + h.stdin.emit('end'); + + await h.exited; + expect(h.engine.disconnectCalls).toBe(1); + }); + + test('TTY stdin does NOT install end watcher (interactive use unaffected)', async () => { + const h = makeHarness({ isTTY: true }); + await startInBackground(h.engine, [], h.opts); + + // Emit 'end' on TTY stdin — no listener should be wired so this is a + // no-op. The test passes by simply not exiting; we give the runtime a + // beat to confirm nothing fires. Signals must still work. + h.stdin.emit('end'); + await new Promise(r => setTimeout(r, 10)); + expect(h.engine.disconnectCalls).toBe(0); + + // Sanity: signals still wired regardless of TTY-ness. + h.signals.emit('SIGTERM'); + await h.exited; + expect(h.engine.disconnectCalls).toBe(1); + }); + + test('--stdio-idle-timeout 0 disarms the idle hook (sanity)', async () => { + const h = makeHarness(); + await startInBackground(h.engine, ['--stdio-idle-timeout', '0'], h.opts); + + // 0 is the documented opt-out. No idle hook should be armed; drive a + // different exit path to confirm flow still works. + h.signals.emit('SIGTERM'); + await h.exited; + expect(h.engine.disconnectCalls).toBe(1); + expect(h.logs.every(l => !l.includes('idle timeout'))).toBe(true); + }); + + test('--stdio-idle-timeout > 0 logs the configured value', async () => { + const h = makeHarness(); + await startInBackground(h.engine, ['--stdio-idle-timeout', '60'], h.opts); + + expect(h.logs.some(l => l.includes('stdio idle timeout = 60s'))).toBe(true); + + h.signals.emit('SIGTERM'); + await h.exited; + expect(h.engine.disconnectCalls).toBe(1); + }); + + test('idle timer is reset on every stdin data chunk', async () => { + const h = makeHarness(); + // Use a very short timeout so we can observe the firing/resetting + // without slowing the suite. 50ms is enough to be measurable but + // short enough that the suite finishes promptly. + await startInBackground( + h.engine, + ['--stdio-idle-timeout', '1'], // 1 second; we reset it before it fires + h.opts, + ); + + // Pulse 'data' a few times to keep the timer reset. + for (let i = 0; i < 3; i++) { + h.stdin.emit('data', Buffer.from('{"jsonrpc":"2.0"}')); + await new Promise(r => setTimeout(r, 100)); + } + expect(h.engine.disconnectCalls).toBe(0); + + // Now stop pulsing and wait for the timer to actually fire end-to-end + // (it ought to elapse within ~1s of the last reset). Awaiting + // h.exited rather than a wall-clock race makes this deterministic. + await h.exited; + expect(h.engine.disconnectCalls).toBe(1); + expect(h.logs.some(l => l.includes('stdio-idle-timeout (1s)'))).toBe(true); + }, 5000); + + test.each([ + ['abc', /--stdio-idle-timeout/], + ['30junk', /--stdio-idle-timeout/], + ['-1', /--stdio-idle-timeout/], + ['1.5', /--stdio-idle-timeout/], + ['', /--stdio-idle-timeout/], + ])('--stdio-idle-timeout rejects invalid value %p (typo is a CLI error)', async (bad, msgRe) => { + // Per Codex Layer 2 review P1: silent fallback on typo turns the + // opt-in safety net into a no-op. Strict parsing throws so the + // operator sees the mistake immediately. + const h = makeHarness(); + expect( + runServe(h.engine as unknown as BrainEngine, ['--stdio-idle-timeout', bad], h.opts), + ).rejects.toThrow(msgRe); + }); + + test('--stdio-idle-timeout with no following value also throws', async () => { + const h = makeHarness(); + // Flag at end of args — no value to consume. + expect( + runServe(h.engine as unknown as BrainEngine, ['--stdio-idle-timeout'], h.opts), + ).rejects.toThrow(/missing value/); + }); + + test('engine.disconnect throwing still results in exit(0) and logged error', async () => { + const h = makeHarness(); + h.engine.disconnect = async () => { + throw new Error('synthetic disconnect failure'); + }; + await startInBackground(h.engine, [], h.opts); + + h.signals.emit('SIGTERM'); + const code = await h.exited; + expect(code).toBe(0); + expect(h.logs.some(l => l.includes('cleanup error: synthetic disconnect failure'))).toBe(true); + }); +}); diff --git a/test/sql-query.test.ts b/test/sql-query.test.ts new file mode 100644 index 000000000..c9f57a515 --- /dev/null +++ b/test/sql-query.test.ts @@ -0,0 +1,175 @@ +import { afterAll, beforeAll, describe, expect, test } from 'bun:test'; +import { PGLiteEngine } from '../src/core/pglite-engine.ts'; +import { sqlQueryForEngine, executeRawJsonb } from '../src/core/sql-query.ts'; + +let engine: PGLiteEngine; + +beforeAll(async () => { + engine = new PGLiteEngine(); + await engine.connect({}); +}, 30_000); + +afterAll(async () => { + if (engine) await engine.disconnect(); +}); + +describe('sqlQueryForEngine', () => { + test('runs parameterized tagged-template SQL against PGLite', async () => { + const sql = sqlQueryForEngine(engine); + const rows = await sql`SELECT ${'pglite'}::text AS engine, ${3}::int AS count`; + expect(rows).toEqual([{ engine: 'pglite', count: 3 }]); + }); + + test('rejects postgres.js-style fragment / object values explicitly', async () => { + const sql = sqlQueryForEngine(engine); + await expect( + sql`SELECT ${(Promise.resolve([]) as any)}::text AS bad` + ).rejects.toThrow(/only supports scalar bind values/); + await expect( + sql`SELECT ${(['read', 'write'] as any)}::text[] AS bad` + ).rejects.toThrow(/only supports scalar bind values/); + await expect( + sql`SELECT ${({ takes_holders: ['world'] } as any)}::jsonb AS bad` + ).rejects.toThrow(/only supports scalar bind values/); + }); +}); + +describe('executeRawJsonb (D1 wave / v0.31)', () => { + test('round-trips an object as JSONB on PGLite (jsonb_typeof = object, ->> reads value)', async () => { + // Verifies the cross-engine JSONB write helper produces a real Postgres + // JSONB object — not a quoted JSON string. Codex's plan-review #9 said + // "use the actual JSONB contract, not string-grep for backslash-quote", + // and that's what this asserts: jsonb_typeof + ->>. + const tableName = `t_jsonb_${Math.random().toString(36).slice(2, 10)}`; + await engine.executeRaw(`CREATE TEMP TABLE ${tableName} (j jsonb)`); + try { + await executeRawJsonb( + engine, + `INSERT INTO ${tableName} (j) VALUES ($1::jsonb)`, + [], + [{ k: 'v', n: 42 }], + ); + const rows = await engine.executeRaw<{ kind: string; k: string; n: number }>( + `SELECT jsonb_typeof(j) AS kind, j->>'k' AS k, (j->>'n')::int AS n FROM ${tableName}`, + ); + expect(rows).toHaveLength(1); + expect(rows[0].kind).toBe('object'); + expect(rows[0].k).toBe('v'); + expect(rows[0].n).toBe(42); + } finally { + await engine.executeRaw(`DROP TABLE ${tableName}`); + } + }); + + test('takes-holders shape: object preserved, ->> returns the encoded array, NOT a double-encoded string (v0.12.0 regression guard)', async () => { + // The v0.12.0 silent-data-loss bug stored `${JSON.stringify(perms)}::jsonb` + // as a JSON string-of-an-object, so `permissions->>'takes_holders'` + // would return a string with backslashes instead of the array. + // Post-fix: real JSONB object, ->> on the array key returns the + // pretty-printed JSON of the array (because jsonb -> array -> ->> is + // the array as a text), and jsonb_typeof on the parent stays 'object'. + const tableName = `t_perms_${Math.random().toString(36).slice(2, 10)}`; + await engine.executeRaw( + `CREATE TEMP TABLE ${tableName} (id serial PRIMARY KEY, permissions jsonb)`, + ); + try { + const perms = { takes_holders: ['world', 'garry'] }; + await executeRawJsonb( + engine, + `INSERT INTO ${tableName} (permissions) VALUES ($1::jsonb)`, + [], + [perms], + ); + const rows = await engine.executeRaw<{ + outer_kind: string; + holders_kind: string; + first_holder: string; + text_form: string; + }>( + `SELECT + jsonb_typeof(permissions) AS outer_kind, + jsonb_typeof(permissions->'takes_holders') AS holders_kind, + permissions->'takes_holders'->>0 AS first_holder, + permissions::text AS text_form + FROM ${tableName}`, + ); + expect(rows).toHaveLength(1); + // The parent JSONB is an object; the takes_holders child is an array. + // Pre-fix this would be 'string' / 'string' (string-of-object). + expect(rows[0].outer_kind).toBe('object'); + expect(rows[0].holders_kind).toBe('array'); + expect(rows[0].first_holder).toBe('world'); + // Defense in depth: the text representation must NOT contain + // backslash-quote sequences, which is what double-encoded JSONB + // looked like in the v0.12.0 incident (e.g. `"{\"takes_holders\":...}"`). + expect(rows[0].text_form).not.toContain('\\"'); + // And the text representation should look like a normal JSON object + // — starts with `{`, not `"{`. + expect(rows[0].text_form.startsWith('{')).toBe(true); + } finally { + await engine.executeRaw(`DROP TABLE ${tableName}`); + } + }); + + test('null jsonb value stores NULL, not the string "null"', async () => { + // serve-http.ts sometimes inserts NULL params (e.g. tools/list, scope- + // rejected paths). The helper must accept null without trying to + // encode it as the string "null" or rejecting it. + const tableName = `t_jnull_${Math.random().toString(36).slice(2, 10)}`; + await engine.executeRaw(`CREATE TEMP TABLE ${tableName} (j jsonb)`); + try { + await executeRawJsonb( + engine, + `INSERT INTO ${tableName} (j) VALUES ($1::jsonb)`, + [], + [null], + ); + const rows = await engine.executeRaw<{ kind: string | null; is_null: boolean }>( + `SELECT jsonb_typeof(j) AS kind, (j IS NULL) AS is_null FROM ${tableName}`, + ); + expect(rows[0].is_null).toBe(true); + // jsonb_typeof on SQL NULL returns NULL. + expect(rows[0].kind).toBeNull(); + } finally { + await engine.executeRaw(`DROP TABLE ${tableName}`); + } + }); + + test('mixes scalar params and jsonb params in positional order', async () => { + // Real call shape: scalars first ($1..$N), JSONB params next + // ($N+1..$N+M). Mirrors the auth.ts `INSERT INTO access_tokens + // (name, token_hash, permissions) VALUES ($1, $2, $3::jsonb)` pattern. + const tableName = `t_mix_${Math.random().toString(36).slice(2, 10)}`; + await engine.executeRaw( + `CREATE TEMP TABLE ${tableName} (name text, weight int, payload jsonb)`, + ); + try { + await executeRawJsonb( + engine, + `INSERT INTO ${tableName} (name, weight, payload) VALUES ($1, $2, $3::jsonb)`, + ['alice', 7], + [{ tags: ['a', 'b'] }], + ); + const rows = await engine.executeRaw<{ name: string; weight: number; first_tag: string }>( + `SELECT name, weight, payload->'tags'->>0 AS first_tag FROM ${tableName}`, + ); + expect(rows).toEqual([{ name: 'alice', weight: 7, first_tag: 'a' }]); + } finally { + await engine.executeRaw(`DROP TABLE ${tableName}`); + } + }); + + test('rejects non-scalar values in scalarParams (defense in depth)', async () => { + // The scalar position validator should fire even when a misuse passes + // an object via scalarParams instead of jsonbParams. Catches the + // cross-up-the-positions footgun loud at the helper boundary. + await expect( + executeRawJsonb( + engine, + `SELECT $1::text AS bad`, + [{ object: 'in scalar position' } as any], + [], + ), + ).rejects.toThrow(/only supports scalar bind values/); + }); +});