Skip to content

fix(migrations): v0_29_1 backfill misses connect() and uses BEGIN/COMMIT outside transaction#758

Open
brandonlipman wants to merge 1 commit intogarrytan:masterfrom
brandonlipman:fix/v0_29_1-engine-connect-and-backfill-transaction
Open

fix(migrations): v0_29_1 backfill misses connect() and uses BEGIN/COMMIT outside transaction#758
brandonlipman wants to merge 1 commit intogarrytan:masterfrom
brandonlipman:fix/v0_29_1-engine-connect-and-backfill-transaction

Conversation

@brandonlipman
Copy link
Copy Markdown
Contributor

@brandonlipman brandonlipman commented May 9, 2026

Summary

Two real bugs that prevent the v0.29.1 effective_date backfill from running on Postgres / Supabase brains. Found while upgrading a real ~10.7K-page Supabase brain that was wedged at v0.29.1 PARTIAL for hours.

Bug 1 — v0_29_1.ts phase B + phase C never call engine.connect()

phaseBBackfill and phaseCVerify create an engine via createEngine() but never call await engine.connect(cfg). Every other migration orchestrator does this (v0_28_0.ts:208, v0_22_4.ts:84, v0_18_0.ts:50, v0_14_0.ts:55, v0_13_1.ts:68).

Failure surface on a real Supabase brain:

Migration v0.29.1 reported status=failed.
backfill_effective_date status: failed
detail: "No database connection: connect() has not been called.
 Fix: Run gbrain init --supabase or gbrain init --url ..."

Wrapped the engine usage in try/finally so disconnect() always fires (matches the pattern in v0_22_4.ts).

Bug 2 — backfill-effective-date.ts ad-hoc BEGIN/COMMIT via separate executeRaw calls

The current code wraps each batch in:

await engine.executeRaw(`BEGIN`);
await engine.executeRaw(`SET LOCAL statement_timeout = '600s'`);
// ... 1000 UPDATEs ...
await engine.executeRaw(`COMMIT`);

with a comment that already acknowledges the intended alternative:

// postgres.js's `transaction` would be cleaner but we're using executeRaw
// for engine portability; explicit BEGIN/COMMIT does the same on both.

On real pooled Postgres it does NOT do the same. postgres.js v3 refuses this pattern on pooled connections:

UNSAFE_TRANSACTION: Only use sql.begin, sql.reserved or max: 1

The reason is that pgbouncer / Supavisor in transaction mode may route each executeRaw call to a different backend, so BEGIN lands on backend-A and COMMIT lands on backend-B — the protected updates aren't atomic, statement_timeout SET LOCAL is silently dropped, and postgres.js detects the structural smell.

Replaced with engine.transaction(async (tx) => { ... }) (which uses sql.begin under the hood) for the Postgres path. PGLite path stays direct since it has a single writer and no pooler.

Note on overlap with v0.30.1's backfill-base.ts

v0.30.1 introduced src/core/backfill-base.ts as the forward-looking pattern for new backfills (with withReservedConnection + adaptive batching). This PR keeps the OLD backfill-effective-date.ts path working for existing v0.29.1 brains mid-upgrade rather than migrating the v0.29.1 orchestrator to backfill-base.ts. Happy to do that migration in a follow-up PR if preferred.

Verification

Tested on a ~10.7K-page real Supabase brain that was wedged at v0.29.1 PARTIAL. With both fixes:

  • Phase A (schema): complete
  • Phase B (backfill_effective_date): examined=10666, updated=10666, fallback=535, dur=484s (about 8 min)
  • Phase C (verify): 0 pages with NULL effective_date
  • Final ledger entry: status: complete
  • gbrain doctor effective_date_health reports clean.

Test plan

  • bun run typecheck clean
  • Run gbrain apply-migrations --yes against an existing v0.29.0/v0.29.1 brain on Supabase pooler (port 6543) — should reach status=complete
  • Run on PGLite — should still work (PGLite path unchanged)
  • No new tests added; happy to add test/migrations-v0_29_1.serial.test.ts mirroring test/migration-orchestrator-v0_21_0.test.ts if desired

🤖 Generated with Claude Code


View in Codesmith
Need help on this PR? Tag @codesmith with what you need.

  • Let Codesmith autofix CI failures and bot reviews

…utside transaction

Two real bugs that prevent the v0.29.1 effective_date backfill from running
on Postgres / Supabase brains:

1. v0_29_1.ts phaseBBackfill + phaseCVerify create an engine via
   `createEngine()` but never call `await engine.connect(cfg)`. Every other
   migration orchestrator (v0_28_0, v0_22_4, v0_18_0, v0_14_0, v0_13_1)
   does this. Symptom on a real Supabase brain:

       backfill_effective_date status: failed
       detail: "No database connection: connect() has not been called.
        Fix: Run gbrain init --supabase or gbrain init --url ..."

2. backfill-effective-date.ts wraps each batch in ad-hoc
   `executeRaw('BEGIN')` + many UPDATEs + `executeRaw('COMMIT')`. postgres.js
   v3 refuses this pattern on pooled connections (UNSAFE_TRANSACTION):

       UNSAFE_TRANSACTION: Only use sql.begin, sql.reserved or max: 1

   The comment in the code itself acknowledged the alternative:

       // postgres.js's `transaction` would be cleaner but we're using
       // executeRaw for engine portability; explicit BEGIN/COMMIT does the
       // same on both.

   On real pooled Postgres it does NOT do the same — pgbouncer / Supavisor
   may route each `executeRaw` call to a different backend, so BEGIN lands
   on backend-A and the COMMIT lands on backend-B. Replaced with
   `engine.transaction()` (which uses sql.begin under the hood) for the
   Postgres path; PGLite path stays direct since it has a single writer.

Note: v0.30.1 introduced `src/core/backfill-base.ts` which is the
forward-looking pattern for new backfills. The fix here keeps the OLD
backfill-effective-date.ts path working for existing v0.29.1 brains
mid-upgrade — happy to migrate the v0.29.1 orchestrator to backfill-base.ts
in a follow-up PR if preferred.

Verified on a real ~10.7K-page Supabase brain that was wedged at v0.29.1
"PARTIAL" for hours; with both fixes it ran phase B (examined=10666,
updated=10666, fallback=535, dur=484s) and phase C (verify complete) and
landed at status=complete on a clean ledger entry.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant