Skip to content

Schema migration v44→v45+ wedged: schema-verify references source_id column that doesn't exist at v44 (chicken-and-egg) #1054

@Rosevari

Description

@Rosevari

Summary

On gbrain v0.35.0.0 with PGLite engine, a database at schema v44 cannot advance past v44 because the schema-verify path queries the source_id column before the migration that adds it has run. This wedges release migrations v0.31.0 (Hot Memory) and v0.32.2 (Facts-as-system-of-record), both of which gate on schema >= 45 and >= 51 respectively.

The result: schema stays at v44 forever, two release migrations report PARTIAL/FAILED indefinitely, and gbrain doctor health score is capped (mine sits at 75/100 with one FAIL on minions_migration).

Environment

  • gbrain: v0.35.0.0 (/Users/jr/gbrain checked out at master baf1a47, freshly upgraded from 0.30.2 today)
  • Engine: PGLite (local, ~/.gbrain/brain.pglite, 290MB, 1093 pages)
  • macOS 26.3 (Darwin 25.3.0, arm64)
  • Bun 1.3.x

Reproduction

  1. Brain at schema v44 (mine was created/migrated under v0.30.2)
  2. Upgrade gbrain to v0.35.x (cd ~/gbrain && git pull && bun install)
  3. Run gbrain apply-migrations --yes

Observed

=== Applying migration v0.31.0: Hot memory ships ===
=== v0.31.0 — Hot Memory: Cross-Session Facts ===
Migration v0.31.0 finished as PARTIAL. Re-run `gbrain apply-migrations --yes` after resolving any pending host-work items.

=== Applying migration v0.32.2: Facts join the system-of-record ===
=== v0.32.2 — facts join the system-of-record invariant ===
Migration v0.32.2 reported status=failed.

pending-host-work.jsonl is empty — there is no actual user host-work to resolve, despite the message.

Inspecting ~/.gbrain/migrations/completed.jsonl:

{"ts":"2026-05-16T00:57:42.623Z","version":"0.31.0","status":"partial","phases":[{"name":"schema","status":"failed","detail":"expected schema version >= 45 (facts hot memory); got 44. Run `gbrain apply-migrations --yes` to apply."}]}
{"ts":"2026-05-16T00:57:42.705Z","version":"0.32.2","status":"partial","phases":[{"name":"schema","status":"failed","detail":"expected schema version >= 51 (facts_fence_columns); got 44. Run `gbrain apply-migrations --yes` to apply."}]}

The docs/error message says: "Schema migrations run automatically on next connectEngine() / initSchema(). To run them now: gbrain init --migrate-only"

Trying that:

$ gbrain init --migrate-only
column "source_id" does not exist

Same error on any command that triggers connectEngine:

$ gbrain config
[ai.gateway] recipe "google" ...
  Schema probe/migrate failed: column "source_id" does not exist
  Try: gbrain init --migrate-only

Diagnosis (best guess from outside)

The chain appears to be:

  1. connectEngine() calls schema-verify before any migration runs
  2. Schema-verify executes a query that selects/joins on source_id (presumably as part of multi-source bookkeeping introduced in v0.18.0 or later)
  3. On a v44 database, source_id does not exist on whatever table is being probed
  4. The source_id-creating migration is itself in the chain that hasn't run yet
  5. The probe throws → connectEngine() aborts → the schema migration chain never runs → schema stays at v44
  6. Release migrations v0.31.0 + v0.32.2 perpetually report PARTIAL/FAILED because schema can never reach v45+

Force-retry (gbrain apply-migrations --force-retry 0.31.0) just rewrites the retry marker; the next --yes hits the same wall.

Expected behavior

Either:

  • (a) schema-verify should tolerate the missing column at v44 and skip ahead to apply the migration that creates it, OR
  • (b) the schema-verify probe should run AFTER the schema chain has finished advancing, not before, OR
  • (c) the missing-column case should trigger an explicit "schema not yet bootstrapped past vX, applying" path rather than aborting

Workaround currently used

None. I've left the brain at v44 — everything I actually use (query, sync default source, embed, extract, sources list) works at v44. Just the new v0.31+ Hot Memory feature is unavailable, and gbrain doctor reports a FAIL on minions_migration until upstream fixes this.

Related

Happy to test any patches or run additional diagnostics. PGLite database is reproducible state from a clean v0.30.2 → v0.35.0 upgrade path.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions