Skip to content

v0.29.1 host-work phase + reindex-frontmatter fail with "PGLite not connected" — createEngine() not paired with engine.connect() in two call sites #756

@axe08

Description

@axe08

Symptoms

After gbrain apply-migrations --yes finishes the schema chain on a PGLite brain, the v0.29.1 wave records status: partial indefinitely with phase B failing on PGLite not connected. Call connect() first.:

=== Applying migration v0.29.1: Recency + salience as two opt-in axes — agent in charge of when to use each ===

=== v0.29.1 — backfill effective_date for existing pages ===

Schema up to date (engine: pglite).
Migration v0.29.1 finished as PARTIAL. Re-run `gbrain apply-migrations --yes` after resolving any pending host-work items.

The phase-failure detail ("PGLite not connected. Call connect() first.") appears only in ~/.gbrain/migrations/completed.jsonl, not on stdout. Re-running apply-migrations --yes reproduces the same partial ledger entry forever:

{"version":"0.29.1","status":"partial","phases":[{"name":"schema","status":"complete"},{"name":"backfill_effective_date","status":"failed","detail":"PGLite not connected. Call connect() first."}]}

gbrain reindex-frontmatter (the doctor-recommended recovery path for the related effective_date_health warning) crashes with the same error, so operators end up in a loop where the recommended fix triggers the same bug. Different from #737, which is a Postgres-only BEGIN-syntax bug inside backfillEffectiveDate; this one fails before that function executes a single statement, on the PGLite path.

Affected versions

PGLite brains on a v0.30.x binary running v0.29.1's host-work phase (i.e., any pre-v0.29.1 PGLite brain after the schema migrates to ≥ v41).

Root cause

createEngine(toEngineConfig(cfg)) returns an unconnected engine. The PGLite engine's connect() is what actually instantiates the underlying PGlite instance and stores it on this.db; without that call, every method on the engine that touches this.db throws PGLite not connected. Call connect() first.

Two callers in the v0.30.x tree create an engine and immediately invoke library methods without the follow-up engine.connect(...):

src/commands/migrations/v0_29_1.ts:phaseBBackfill (~line 47):

const cfg = loadConfig();
const engine = await createEngine(toEngineConfig(cfg));
// ← missing: await engine.connect(toEngineConfig(cfg));
const result = await backfillEffectiveDate(engine, { onBatch: ... });

src/commands/reindex-frontmatter.ts:runReindexFrontmatterCli (~line 168):

const cfg = loadConfig();
const engine = await createEngine(toEngineConfig(cfg));
// ← missing: await engine.connect(toEngineConfig(cfg));
try {
  const result = await runReindexFrontmatter(engine, opts);
  ...

Compare to the working pattern in src/commands/init.ts:184:

const engine = await createEngine(toEngineConfig(config));
try {
  await engine.connect(toEngineConfig(config));
  await engine.initSchema();
} finally { ... }

Effect

Repro

# On any PGLite brain that's just had v41 applied and has effective_date NULL.
gbrain apply-migrations --yes
# → "Migration v0.29.1 finished as PARTIAL"
gbrain reindex-frontmatter --yes
# → "PGLite not connected. Call connect() first."
tail -1 ~/.gbrain/migrations/completed.jsonl
# → status:"partial", detail:"PGLite not connected. Call connect() first."

Suggested fix

Add await engine.connect(toEngineConfig(cfg)) after createEngine(...) in both call sites. Mirror the try/finally cleanup pattern from init.ts so disconnect runs even on error.

const cfg = loadConfig();
const engine = await createEngine(toEngineConfig(cfg));
await engine.connect(toEngineConfig(cfg));        // ← add
try {
  const result = await backfillEffectiveDate(engine, { onBatch: ... });
  // ...
} finally {
  if ('disconnect' in engine && typeof engine.disconnect === 'function') {
    await engine.disconnect();
  }
}

reindex-frontmatter.ts:runReindexFrontmatterCli already has the disconnect finally block — it just needs the matching connect at the top.

A more durable fix: have createEngine (or a thin createConnectedEngine helper) call connect() before returning, so future host-work scripts can't repeat this footgun. There are at least three call sites in src/commands/migrations/ that follow the missing-connect pattern (greppable: createEngine\\(toEngineConfig.*?\\) with no following \\.connect).

Worth surfacing the failure earlier too: have the migration runner promote a phase B failed detail to stderr (currently only the orchestrator-level "PARTIAL" message appears on stdout; the actual error sits in completed.jsonl and the operator has to grep for it).

Workaround for affected operators

A small script that connects the engine first, then calls the library function the orchestrator would have called:

// /tmp/backfill.ts — bun run from the gbrain repo
import { createEngine } from '/path/to/gbrain/src/core/engine-factory.ts';
import { loadConfig, toEngineConfig } from '/path/to/gbrain/src/core/config.ts';
import { backfillEffectiveDate } from '/path/to/gbrain/src/core/backfill-effective-date.ts';

const cfg = loadConfig()!;
const engine = await createEngine(toEngineConfig(cfg));
await engine.connect(toEngineConfig(cfg));
const result = await backfillEffectiveDate(engine, {
  onBatch: ({ batch, lastId, rowsTouched, cumulative }) =>
    console.log(`[batch ${batch}] last_id=${lastId} touched=${rowsTouched} cumulative=${cumulative}`),
});
console.log('result:', result);
await (engine as any).disconnect?.();

After it runs (160 pages → 31 updated for me), the brain is in the correct post-v0.29.1 state — but the orchestrator ledger still says partial because the runner was never told. Append a complete record to ~/.gbrain/migrations/completed.jsonl:

{"ts":"2026-05-08T...","version":"0.29.1","status":"complete","phases":[{"name":"schema","status":"complete"},{"name":"backfill_effective_date","status":"complete","detail":"manual workaround"},{"name":"verify","status":"complete","detail":"0 pages with NULL effective_date"}]}

gbrain apply-migrations --list then shows applied 0.29.1 and doctor goes green.

Environment

  • gbrain v0.30.1 (master @ dffb607)
  • OS: Ubuntu 22.04, Linux 5.15.0-176-generic
  • Bun 1.3.11
  • Database: PGLite (embedded), 160 pages, post-schema-migration

Companion bug

Hit this immediately after fixing the unrelated bootstrap-coverage gap for pages.effective_date (separate issue filed). The two together turn a v0.18 → v0.30 PGLite upgrade from "one command" into "diagnose two upstream bugs and write two workaround scripts." Both are PGLite-side, both surface as v0.29.1 PARTIAL, both have small fixes.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions