Summary
On a multi-source brain (1 default + 19 federated), every page lands in source_id='default' regardless of which source the sync ran against. gbrain sources list reports correct paths and recent last_sync_at timestamps for all federated sources, but every one shows 0 pages. The source-aware ranking machinery added in v0.22.0 is effectively running against a single-source corpus.
Tested on gbrain CLI 0.26.0 against Postgres (Supabase pooler). Brain has 2992 pages total; all 2992 attribute to default.
Reproducer
gbrain sources add propdevnz --path /path/to/propdev
cd /path/to/propdev
gbrain sources attach propdevnz # writes .gbrain-source dotfile
gbrain sync
gbrain sources list # propdevnz still shows 0 pages despite recent last_sync_at
Direct probe confirms:
SELECT source_id, COUNT(*) FROM pages GROUP BY source_id;
-- default | 2992
-- (no other rows)
Per-source state IS being maintained — sources.last_sync_at, last_commit, repo_path, chunker_version all update correctly via the source-scoped sync-state helpers. Only the actual page writes are mis-attributed.
Root cause — three layers, one missing thread
-
Engine layer — src/core/postgres-engine.ts:302-314 (and the PGLite mirror). putPage's INSERT INTO pages doesn't include source_id; relies on schema DEFAULT 'default'. The own comment at line 300 marks the gap explicitly:
// v0.18.0 Step 2: source_id relies on schema DEFAULT 'default'. ON
// CONFLICT target becomes (source_id, slug) since global UNIQUE(slug)
// was dropped in migration v17. See pglite-engine.ts for matching
// notes; multi-source sync (Step 5) will surface an explicit sourceId.
-
Import layer — src/core/import-file.ts:338-343. importFromFile opts type is { noEmbed?: boolean; inferFrontmatter?: boolean } — no sourceId field. tx.putPage(slug, {...}) at line 272 has no source attribution to pass.
-
Sync + import callers — src/commands/sync.ts:527, 582 and src/commands/import.ts:108. performSync correctly resolves opts.sourceId (used for readSyncAnchor/writeSyncAnchor) but the importFile(engine, filePath, to, { noEmbed }) calls drop it on the floor.
So per-source sync state is tracked precisely (PropDev was last synced at headCommit XYZ at 5:37am) — and then the file content gets written under default. The routing infrastructure exists; the data plane was never wired.
Why it didn't get caught
- E2E coverage for sync is single-source PGLite. There's no test asserting
INSERT INTO sources (id, local_path) ...; sync; SELECT COUNT(*) FROM pages WHERE source_id = $new_source is non-zero.
gbrain sources list reports last_sync_at from the sources row — which IS being updated correctly — so the dashboard surface looks healthy. The 0-pages number is the only tell, and it reads as "haven't synced this one yet" rather than "data plane disconnected."
Proposed fix
Forward fix is small (~30 LOC + tests):
- Add
sourceId?: string to importFromFile opts (and importCodeFile, importFromContent).
- Add
source_id?: string to PageInput; pass through to tx.putPage.
- Engine
putPage: include source_id in INSERT (COALESCE-to-default keeps back-compat). The (source_id, slug) UNIQUE is already the conflict target so no schema change.
performSync and runImport thread the resolved opts.sourceId into importFile.
Backfill is the trickier part because the existing default-source pages will collide on (source_id, slug) if you just re-sync — you'd get duplicates rather than updates. Cleanest path I can see: filesystem-probe backfill — for each non-default source, walk local_path, slugify each file, and UPDATE pages SET source_id = ? WHERE source_id = 'default' AND slug = ? for matches. Deterministic, leaves genuinely-default pages alone.
Offer
Happy to put up a PR if useful — the diagnosis is the hard part and that's done. Let me know if you'd prefer to handle it yourself, or if there's a different shape you want for the fix (e.g. require explicit migration vs auto-backfill).
Diagnosed in a Claude Code session against my live brain; happy to provide additional probes if you want specific queries run.
Summary
On a multi-source brain (1 default + 19 federated), every page lands in
source_id='default'regardless of which source the sync ran against.gbrain sources listreports correct paths and recentlast_sync_attimestamps for all federated sources, but every one shows0 pages. The source-aware ranking machinery added in v0.22.0 is effectively running against a single-source corpus.Tested on gbrain CLI 0.26.0 against Postgres (Supabase pooler). Brain has 2992 pages total; all 2992 attribute to
default.Reproducer
Direct probe confirms:
Per-source state IS being maintained —
sources.last_sync_at,last_commit,repo_path,chunker_versionall update correctly via the source-scoped sync-state helpers. Only the actual page writes are mis-attributed.Root cause — three layers, one missing thread
Engine layer —
src/core/postgres-engine.ts:302-314(and the PGLite mirror).putPage'sINSERT INTO pagesdoesn't includesource_id; relies on schemaDEFAULT 'default'. The own comment at line 300 marks the gap explicitly:Import layer —
src/core/import-file.ts:338-343.importFromFileopts type is{ noEmbed?: boolean; inferFrontmatter?: boolean }— nosourceIdfield.tx.putPage(slug, {...})at line 272 has no source attribution to pass.Sync + import callers —
src/commands/sync.ts:527, 582andsrc/commands/import.ts:108.performSynccorrectly resolvesopts.sourceId(used forreadSyncAnchor/writeSyncAnchor) but theimportFile(engine, filePath, to, { noEmbed })calls drop it on the floor.So per-source sync state is tracked precisely (PropDev was last synced at headCommit XYZ at 5:37am) — and then the file content gets written under
default. The routing infrastructure exists; the data plane was never wired.Why it didn't get caught
INSERT INTO sources (id, local_path) ...; sync; SELECT COUNT(*) FROM pages WHERE source_id = $new_sourceis non-zero.gbrain sources listreportslast_sync_atfrom thesourcesrow — which IS being updated correctly — so the dashboard surface looks healthy. The 0-pages number is the only tell, and it reads as "haven't synced this one yet" rather than "data plane disconnected."Proposed fix
Forward fix is small (~30 LOC + tests):
sourceId?: stringtoimportFromFileopts (andimportCodeFile,importFromContent).source_id?: stringtoPageInput; pass through totx.putPage.putPage: includesource_idin INSERT (COALESCE-to-default keeps back-compat). The(source_id, slug)UNIQUE is already the conflict target so no schema change.performSyncandrunImportthread the resolvedopts.sourceIdintoimportFile.Backfill is the trickier part because the existing
default-source pages will collide on(source_id, slug)if you just re-sync — you'd get duplicates rather than updates. Cleanest path I can see: filesystem-probe backfill — for each non-default source, walklocal_path, slugify each file, andUPDATE pages SET source_id = ? WHERE source_id = 'default' AND slug = ?for matches. Deterministic, leaves genuinely-default pages alone.Offer
Happy to put up a PR if useful — the diagnosis is the hard part and that's done. Let me know if you'd prefer to handle it yourself, or if there's a different shape you want for the fix (e.g. require explicit migration vs auto-backfill).
Diagnosed in a Claude Code session against my live brain; happy to provide additional probes if you want specific queries run.