fix(migrations): cold-start robustness in apply-migrations + v0.29.1 orchestrator#760
Open
knee5 wants to merge 2 commits intogarrytan:masterfrom
Open
fix(migrations): cold-start robustness in apply-migrations + v0.29.1 orchestrator#760knee5 wants to merge 2 commits intogarrytan:masterfrom
knee5 wants to merge 2 commits intogarrytan:masterfrom
Conversation
…Verify Both phases created an engine via createEngine() but never called engine.connect(), causing every executeRaw / backfillEffectiveDate call to fail with "No database connection" on cold-start (v29 or earlier schema upgrading to v30+). Pattern fix: follow the v0.28.0 orchestrator convention — capture the EngineConfig first, call createEngine(engineConfig), then immediately await engine.connect(engineConfig) before any engine method. Added try/finally + engine.disconnect() in both phases to mirror v0.28.0's resource-cleanup shape. Repro: 1. Schema at v29 (or pre-v30) 2. gbrain apply-migrations --yes 3. Phase B fails: "No database connection" Workaround used during clevin's v0.22→v0.30.1 migration on 2026-05-08: gbrain reindex-frontmatter (examined=2286, updated=2286, fallback=1616, 467 s), then manually appended a `complete` row to ~/.gbrain/migrations/completed.jsonl. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase A previously asserted schema version >= 38 but returned status 'failed' if the schema was behind. On a direct upgrade from v0.22 (or any pre-v0.28 version) without first running `gbrain upgrade`, the schema version is below 38 and the entire orchestrator chain aborts, requiring the user to run `gbrain apply-migrations --force-schema --yes` first and then re-run `gbrain apply-migrations --yes`. Fix: when Phase A detects schema < 38, call `engine.initSchema()` inline (the engine is already connected at this point via the v0.28.0 shared orchestrator setup) before proceeding to the post-condition table check. Re-reads the version after apply to give a clear actionable error if DDL fails partway through. This makes `gbrain apply-migrations --yes` succeed from any starting schema version in one shot. Also adds test/migration-orchestrator-v0_29_1.test.ts covering the v0.29.1 orchestrator registration, phase shape, and dry-run behavior — matches the style of migration-orchestrator-v0_21_0.test.ts. Repro for Bug 2: 1. Fresh install of gbrain v0.30.1 on a brain previously at v0.22 2. gbrain apply-migrations --yes 3. v0.28.0 Phase A fails: "expected schema version >= 38; got <N>" Workaround used during clevin's v0.22→v0.30.1 migration on 2026-05-08: gbrain apply-migrations --force-schema --yes gbrain apply-migrations --yes Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two bugs found during clevin's gbrain v0.22 → v0.30.1 migration on 2026-05-08 that together cause
gbrain apply-migrations --yesto fail from any cold-start state (pre-v0.28 or pre-v0.30 schema). Both fixes make the command "just work" in one shot without flags or workarounds, completing the operational hardening intent stated in v0.30.1's release notes.Bug 1 — v0.29.1
phaseBBackfill/phaseCVerifymissingengine.connect()File:
src/commands/migrations/v0_29_1.tsSymptom: During
gbrain apply-migrations --yeson a schema at v29 or below, Phase B fails immediately with:"No database connection". Phase C has the same bug.Root cause: Both phases called
createEngine(config)to instantiate the engine but never calledawait engine.connect(engineConfig).createEnginereturns an unconnected instance;connectis the separate lifecycle call that establishes the pool. Compare the v0.28.0 orchestrator (the established pattern) — it callsawait engine.connect(engineConfig)in its shared setup block right aftercreateEngine.Repro:
gbrain apply-migrations --yesgbrain init --migrate-only); Phase B (phaseBBackfill) throws"No database connection"Workaround used (2026-05-08):
gbrain reindex-frontmatterdirectly (examined=2286, updated=2286, fallback=1616, 467 s), then manually appended acompleterow to~/.gbrain/migrations/completed.jsonl.Patch: Added
await engine.connect(engineConfig)immediately aftercreateEngine(engineConfig)in bothphaseBBackfillandphaseCVerify, plustry/finally+engine.disconnect()for resource cleanup — matching the v0.28.0 pattern exactly.Bug 2 — v0.28.0 Phase A aborts instead of applying schema on cold-start
File:
src/commands/migrations/v0_28_0.tsSymptom:
gbrain apply-migrations --yeson a brain upgrading from v0.22 (or any pre-v0.28 version) fails with:"expected schema version >= 38 (takes + access_tokens.permissions); got <N>". The command exits 1 before doing any work.Root cause: v0.28.0 Phase A was a pure verify: it checked
schema_version >= 38and returnedstatus: 'failed'if behind. The design comment says "the schema runner does the actual DDL duringgbrain upgrade/initSchema" — but on a direct install upgrade (not viagbrain upgrade), the DDL hasn't run yet.Diagnosis: The
--force-schemaflag inapply-migrations.tscallsrunMigrations(eng)in-process to resolve exactly this drift. The two-step workaround (--force-schema --yesthen--yes) works because--force-schemabrings the DDL to LATEST_VERSION before the orchestrator chain starts.Patch: When Phase A detects schema < 38, call
await engine.initSchema()inline (the engine is already connected via the v0.28.0 shared orchestrator setup block) before the post-condition table check. Re-reads the version after apply to surface any partial DDL failure clearly. This makesapply-migrations --yessucceed from any starting schema version in one shot.Workaround used (2026-05-08):
Testing
test/migration-orchestrator-v0_29_1.test.ts— registry, phase shape, and dry-run assertions following themigration-orchestrator-v0_21_0.test.tsstyle.bun test test/migration-orchestrator-v0_29_1.test.ts test/migrations-registry.test.ts test/apply-migrations.test.ts→ 32 pass, 0 fail.bun build --compilefails on missing optional packages (heic-decode,@jsquash/png) in the checkout — pre-existing, not introduced by this PR.Discovered during
clevin's gbrain v0.22.2 → v0.30.1 migration on 2026-05-08. These fixes complete the intent of v0.30.1's "operational hardening — make upgrades just work on Supabase" release note: with both patches,
gbrain apply-migrations --yessucceeds from any pre-v0.30 starting state without flags or manual workarounds.Co-Authored-By: Claude Opus 4.7 (1M context) noreply@anthropic.com
Need help on this PR? Tag
@codesmithwith what you need.