fix(migrations): v0_29_1 backfill misses connect() and uses BEGIN/COMMIT outside transaction#758
Open
brandonlipman wants to merge 1 commit intogarrytan:masterfrom
Conversation
…utside transaction
Two real bugs that prevent the v0.29.1 effective_date backfill from running
on Postgres / Supabase brains:
1. v0_29_1.ts phaseBBackfill + phaseCVerify create an engine via
`createEngine()` but never call `await engine.connect(cfg)`. Every other
migration orchestrator (v0_28_0, v0_22_4, v0_18_0, v0_14_0, v0_13_1)
does this. Symptom on a real Supabase brain:
backfill_effective_date status: failed
detail: "No database connection: connect() has not been called.
Fix: Run gbrain init --supabase or gbrain init --url ..."
2. backfill-effective-date.ts wraps each batch in ad-hoc
`executeRaw('BEGIN')` + many UPDATEs + `executeRaw('COMMIT')`. postgres.js
v3 refuses this pattern on pooled connections (UNSAFE_TRANSACTION):
UNSAFE_TRANSACTION: Only use sql.begin, sql.reserved or max: 1
The comment in the code itself acknowledged the alternative:
// postgres.js's `transaction` would be cleaner but we're using
// executeRaw for engine portability; explicit BEGIN/COMMIT does the
// same on both.
On real pooled Postgres it does NOT do the same — pgbouncer / Supavisor
may route each `executeRaw` call to a different backend, so BEGIN lands
on backend-A and the COMMIT lands on backend-B. Replaced with
`engine.transaction()` (which uses sql.begin under the hood) for the
Postgres path; PGLite path stays direct since it has a single writer.
Note: v0.30.1 introduced `src/core/backfill-base.ts` which is the
forward-looking pattern for new backfills. The fix here keeps the OLD
backfill-effective-date.ts path working for existing v0.29.1 brains
mid-upgrade — happy to migrate the v0.29.1 orchestrator to backfill-base.ts
in a follow-up PR if preferred.
Verified on a real ~10.7K-page Supabase brain that was wedged at v0.29.1
"PARTIAL" for hours; with both fixes it ran phase B (examined=10666,
updated=10666, fallback=535, dur=484s) and phase C (verify complete) and
landed at status=complete on a clean ledger entry.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two real bugs that prevent the v0.29.1 effective_date backfill from running on Postgres / Supabase brains. Found while upgrading a real ~10.7K-page Supabase brain that was wedged at
v0.29.1 PARTIALfor hours.Bug 1 —
v0_29_1.tsphase B + phase C never callengine.connect()phaseBBackfillandphaseCVerifycreate an engine viacreateEngine()but never callawait engine.connect(cfg). Every other migration orchestrator does this (v0_28_0.ts:208,v0_22_4.ts:84,v0_18_0.ts:50,v0_14_0.ts:55,v0_13_1.ts:68).Failure surface on a real Supabase brain:
Wrapped the engine usage in try/finally so
disconnect()always fires (matches the pattern inv0_22_4.ts).Bug 2 —
backfill-effective-date.tsad-hocBEGIN/COMMITvia separateexecuteRawcallsThe current code wraps each batch in:
with a comment that already acknowledges the intended alternative:
On real pooled Postgres it does NOT do the same. postgres.js v3 refuses this pattern on pooled connections:
The reason is that pgbouncer / Supavisor in transaction mode may route each
executeRawcall to a different backend, soBEGINlands on backend-A andCOMMITlands on backend-B — the protected updates aren't atomic, statement_timeout SET LOCAL is silently dropped, and postgres.js detects the structural smell.Replaced with
engine.transaction(async (tx) => { ... })(which usessql.beginunder the hood) for the Postgres path. PGLite path stays direct since it has a single writer and no pooler.Note on overlap with v0.30.1's
backfill-base.tsv0.30.1 introduced
src/core/backfill-base.tsas the forward-looking pattern for new backfills (withwithReservedConnection+ adaptive batching). This PR keeps the OLDbackfill-effective-date.tspath working for existing v0.29.1 brains mid-upgrade rather than migrating the v0.29.1 orchestrator tobackfill-base.ts. Happy to do that migration in a follow-up PR if preferred.Verification
Tested on a ~10.7K-page real Supabase brain that was wedged at
v0.29.1 PARTIAL. With both fixes:examined=10666, updated=10666, fallback=535, dur=484s(about 8 min)0 pages with NULL effective_datestatus: completegbrain doctoreffective_date_healthreports clean.Test plan
bun run typecheckcleangbrain apply-migrations --yesagainst an existing v0.29.0/v0.29.1 brain on Supabase pooler (port 6543) — should reach status=completetest/migrations-v0_29_1.serial.test.tsmirroringtest/migration-orchestrator-v0_21_0.test.tsif desired🤖 Generated with Claude Code
Need help on this PR? Tag
@codesmithwith what you need.