Skip to content

Migration regressions in v0.28.0 / v0.29.1 can leave installs partial and emit invalid takes backfill command #763

@papiofficial

Description

@papiofficial

Summary

Two migration regressions in v0.28.0 / v0.29.1 can leave existing installs stuck or partially migrated, and v0.28.0 also points users to a non-existent follow-up command.

I hit this on an existing Postgres-backed brain during upgrade to 0.30.2, but the underlying issues look general rather than environment-specific.

Affected versions

  • v0.28.0
  • v0.29.1
  • observed while upgrading to gbrain 0.30.2

Problems

1) v0.28.0 runs heavy inline work during migration

v0.28.0 was invoking heavy inline takes backfill work during the upgrade path via:

extractTakes(engine, { source: 'db' })

That makes the migration path unbounded on larger/existing brains and can hang or get killed mid-upgrade.

Migration code should establish compatibility, not depend on long-running content rebuild work to finish inline.

2) v0.29.1 orchestrator had a broken DB connection path

v0.29.1 was able to record as partial with:

No database connection: connect() has not been called. Fix: Run gbrain init --supabase or gbrain init --url <connection_string>

This came from the orchestrator path itself, not from the actual install lacking DB config.

3) v0.28.0 emits an invalid deferred remediation command

The migration queued this follow-up command:

gbrain extract takes --source db --rebuild

But there is no working extract takes CLI route.

gbrain extract only supports:

  • links
  • timeline
  • all

So users who follow the migration output get pointed to a dead command.

What I observed

  • gbrain apply-migrations --yes --non-interactive --skip-verify would get stuck/partial around v0.28.0 / v0.29.1
  • gbrain doctor --fast reported:
MINIONS HALF-INSTALLED (partial migration: 0.29.1). Run: gbrain apply-migrations --yes
  • migration ledger showed 0.29.1 partial with the failed backfill/verify path
  • the deferred extract takes command fails because that CLI surface does not exist

Why this seems general

This does not look specific to one machine.

Portable issues:

  • long-running inline backfill in a migration
  • orchestrator querying before a valid engine connection is established
  • migration queuing a non-existent remediation command

The exact severity probably depends on brain size and install shape, but the bugs themselves seem product-level.

Expected behavior

  • Migrations should remain bounded and safe on large/existing brains
  • Historical backfills should be deferred or resumable without blocking upgrade health
  • Orchestrators should not attempt DB queries before a valid connection exists
  • Any queued/manual follow-up command should be real and executable

Suggested fixes

  • keep v0.28.0 migration bounded, with heavy takes rebuild deferred out of the upgrade path
  • fix v0.29.1 orchestrator connection lifecycle before any executeRaw calls
  • either:
    • add a real supported CLI for takes backfill/rebuild, or
    • stop emitting gbrain extract takes --source db --rebuild and replace it with a valid supported remediation path
  • treat deferred historical effective_date backfill as non-fatal if that is the intended product behavior

Environment

  • gbrain 0.30.2
  • Postgres-backed brain
  • existing brain with thousands of pages

If helpful I can open a follow-up PR with the concrete code changes I used locally to get this unstuck.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions