Skip to content

fix(migrate v0.29.1 Phase B): force poolSize:1 for explicit BEGIN/COMMIT path#917

Closed
jeremyknows wants to merge 1 commit into
garrytan:masterfrom
jeremyknows:fix/migrate-v0-29-1-phase-b-poolsize
Closed

fix(migrate v0.29.1 Phase B): force poolSize:1 for explicit BEGIN/COMMIT path#917
jeremyknows wants to merge 1 commit into
garrytan:masterfrom
jeremyknows:fix/migrate-v0-29-1-phase-b-poolsize

Conversation

@jeremyknows
Copy link
Copy Markdown
Contributor

@jeremyknows jeremyknows commented May 12, 2026

Summary

phaseBBackfill in src/commands/migrations/v0_29_1.ts calls backfillEffectiveDate which uses explicit BEGIN/COMMIT per-batch for PGLite portability. postgres.js rejects manual transactions with UNSAFE_TRANSACTION whenever the connection pool's max > 1 — so on multi-pool Postgres installs the migration wedges after the first batch.

Phase C (verify, read-only, no manual transactions) is unaffected and keeps the default pool size.

Diff scope

src/commands/migrations/v0_29_1.ts | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)

One-line behavioral change: pass poolSize: 1 to engine.connect() in phaseBBackfill only. Type-cast pattern lifted from src/core/brain-registry.ts:406, which is the documented way to thread poolSize through BrainEngine.connect() (the field is accepted by PostgresEngine.connect() at src/core/postgres-engine.ts:109 but isn't surfaced on the generic interface in engine.ts:420).

Verification

$ bun run typecheck
$ tsc --noEmit
(exit 0)

$ bun run verify
OK: no JSON.stringify(x)::jsonb interpolation pattern in src/
OK: max_stalled defaults are 5 in all schema sources
OK: all rowToPage feeder projections include source_id
check-progress-to-stdout: OK (no banned stdout \r patterns)
check-test-isolation: OK (367 non-serial unit files scanned)
[check-wasm-embedded] OK — compiled binary produced real semantic chunks.
[check-admin-scope-drift] ok: 5 scopes match
OK: src/cli.ts is git-tracked as executable (100755)
OK: no direct derived-table writes outside the reconcile layer in src/ + scripts/
(all 11 checks pass)

$ bun test test/migrations
 133 pass / 0 fail / 485 expect()

Real-behavior proof (observed, not reproduced this session)

Pre-fix observation — May 11, 2026 on a developer Postgres install (postgres.js with default max > 1):

  • gbrain migrate run v0.29.1 started, completed Phase A (schema), advanced into Phase B, completed batch 1.
  • Batch 2 failed with UNSAFE_TRANSACTION: Manual transactions are not allowed when connection pool max > 1.
  • Migration left in partial state. Re-running picked up resume token from backfill.effective_date.last_id, hit the same failure on the next batch, partial again. Three consecutive partials before root-causing.
  • gbrain doctor reported MINIONS half-install — effective_date_health was warning on rows the partial backfill never reached.

Why pglite users haven't seen this — PGLite ignores poolSize and always runs single-connection. The bug only surfaces on a postgres-engine multi-pool setup.

What was NOT tested in this session

  • Re-running the migration end-to-end on a multi-pool postgres install to demonstrate the post-fix completion. The session that observed the pre-fix wedge was 2026-05-11; the working brain has since been advanced. Re-creating the broken state would require seeding pre-v0.29.1 rows on a postgres engine and re-running migrate run v0.29.1. The fix is one line, the bug pattern is well-documented in postgres.js docs (UNSAFE_TRANSACTION when manual transactions cross a connection pool), and the typecheck path matches the existing brain-registry.ts:406 precedent, so I'm comfortable submitting on those grounds — but flagging the gap for reviewers who may want to validate against their own postgres setup before merge.
  • No new unit test added. The migration phase is hard to test without a live Postgres engine (the unit suite uses pglite, which doesn't exhibit the bug). The existing migration test suite (test/migrations/*) covers happy-path orchestration; reproducing the UNSAFE_TRANSACTION interaction would need either a Postgres harness or a postgres.js mock that enforces the same pool-size invariant — both out of scope for a one-line fix.

Test plan

  • CI green
  • Reviewer with multi-pool postgres install: run gbrain migrate run v0.29.1 on a brain with NULL effective_date rows; Phase B completes without UNSAFE_TRANSACTION.
  • PGLite user: regression check — gbrain migrate run v0.29.1 still completes normally (poolSize is ignored by PGLite per brain-registry.ts:403 comment).

Commits

  1. 6a91912 fix(migrate v0.29.1 Phase B): force poolSize:1 for explicit BEGIN/COMMIT path — one-line behavioral fix + import + cast pattern matching brain-registry.ts:406.

View in Codesmith
Need help on this PR? Tag @codesmith with what you need.

  • Let Codesmith autofix CI failures and bot reviews

…MIT path

phaseBBackfill calls backfillEffectiveDate which uses explicit BEGIN/COMMIT
for PGLite portability. postgres.js rejects manual transactions with
UNSAFE_TRANSACTION when the pool's max > 1, so on multi-pool postgres
installs the migration wedges mid-batch after the first batch.

Fix: pass poolSize:1 to engine.connect() in Phase B only. Phase C (read-only
verify, no manual transactions) keeps the default pool size.

The cast pattern matches src/core/brain-registry.ts:406 — poolSize is
accepted by postgres-engine's connect() implementation but isn't surfaced
on the generic BrainEngine.connect() interface, so the cast is the
documented way to thread it.

Observed on a developer postgres install (May 11, 2026): migration wedged
after 3 consecutive partials. Unblocks MINIONS half-install check in
gbrain doctor.
@jeremyknows
Copy link
Copy Markdown
Contributor Author

Closing — superseded by PR #740 (already in master) which solved this differently and better.

What I missed: PR #740 rewrote backfillEffectiveDate in src/core/backfill-effective-date.ts to use engine.transaction() instead of explicit BEGIN/COMMIT. engine.transaction() routes through sql.begin() and uses a reserved backend connection, so the UNSAFE_TRANSACTION error this PR was trying to fix simply cannot happen on the current code — regardless of pool size.

I observed the original wedge on 2026-05-11 against a pre-#740 base on my local fork (sprint2/combined). My local commit's message described "explicit BEGIN/COMMIT" which was true at the time. After today's fork rebase onto v0.33.0 (which includes #740), the explicit transactions were gone, but I didn't re-verify the assumption before opening the PR.

Verified on current master:

$ grep -n "BEGIN\|COMMIT\|transaction(" ~/gbrain/src/core/backfill-effective-date.ts
178:      // postgres.js refuses bare BEGIN/COMMIT on pooled connections
179:      // (UNSAFE_TRANSACTION); engine.transaction() routes through sql.begin()
181:      await engine.transaction(async (tx) => {

No explicit BEGIN/COMMIT remain. poolSize: 1 is unnecessary.

Apologies for the noise. Thanks for the work on #740.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant