Multi-source: pages always written with source_id='default' (federation appears empty)

## Summary

On a multi-source brain (1 default + 19 federated), **every page lands in `source_id='default'`** regardless of which source the sync ran against. `gbrain sources list` reports correct paths and recent `last_sync_at` timestamps for all federated sources, but every one shows `0 pages`. The source-aware ranking machinery added in v0.22.0 is effectively running against a single-source corpus.

Tested on gbrain CLI **0.26.0** against Postgres (Supabase pooler). Brain has 2992 pages total; **all 2992 attribute to `default`**.

## Reproducer

```bash
gbrain sources add propdevnz --path /path/to/propdev
cd /path/to/propdev
gbrain sources attach propdevnz   # writes .gbrain-source dotfile
gbrain sync
gbrain sources list               # propdevnz still shows 0 pages despite recent last_sync_at
```

Direct probe confirms:

```sql
SELECT source_id, COUNT(*) FROM pages GROUP BY source_id;
-- default | 2992
-- (no other rows)
```

Per-source state IS being maintained — `sources.last_sync_at`, `last_commit`, `repo_path`, `chunker_version` all update correctly via the source-scoped sync-state helpers. Only the actual page writes are mis-attributed.

## Root cause — three layers, one missing thread

1. **Engine layer** — `src/core/postgres-engine.ts:302-314` (and the PGLite mirror). `putPage`'s `INSERT INTO pages` doesn't include `source_id`; relies on schema `DEFAULT 'default'`. The own comment at line 300 marks the gap explicitly:
   > `// v0.18.0 Step 2: source_id relies on schema DEFAULT 'default'. ON`
   > `// CONFLICT target becomes (source_id, slug) since global UNIQUE(slug)`
   > `// was dropped in migration v17. See pglite-engine.ts for matching`
   > `// notes; multi-source sync (Step 5) will surface an explicit sourceId.`

2. **Import layer** — `src/core/import-file.ts:338-343`. `importFromFile` opts type is `{ noEmbed?: boolean; inferFrontmatter?: boolean }` — no `sourceId` field. `tx.putPage(slug, {...})` at line 272 has no source attribution to pass.

3. **Sync + import callers** — `src/commands/sync.ts:527, 582` and `src/commands/import.ts:108`. `performSync` correctly resolves `opts.sourceId` (used for `readSyncAnchor`/`writeSyncAnchor`) but the `importFile(engine, filePath, to, { noEmbed })` calls drop it on the floor.

So per-source sync state is tracked precisely (PropDev was last synced at headCommit XYZ at 5:37am) — and then the file content gets written under `default`. The routing infrastructure exists; the data plane was never wired.

## Why it didn't get caught

- E2E coverage for sync is single-source PGLite. There's no test asserting `INSERT INTO sources (id, local_path) ...; sync; SELECT COUNT(*) FROM pages WHERE source_id = $new_source` is non-zero.
- `gbrain sources list` reports `last_sync_at` from the `sources` row — which IS being updated correctly — so the dashboard surface looks healthy. The 0-pages number is the only tell, and it reads as "haven't synced this one yet" rather than "data plane disconnected."

## Proposed fix

Forward fix is small (~30 LOC + tests):

1. Add `sourceId?: string` to `importFromFile` opts (and `importCodeFile`, `importFromContent`).
2. Add `source_id?: string` to `PageInput`; pass through to `tx.putPage`.
3. Engine `putPage`: include `source_id` in INSERT (COALESCE-to-default keeps back-compat). The `(source_id, slug)` UNIQUE is already the conflict target so no schema change.
4. `performSync` and `runImport` thread the resolved `opts.sourceId` into `importFile`.

Backfill is the trickier part because the existing `default`-source pages will collide on `(source_id, slug)` if you just re-sync — you'd get duplicates rather than updates. Cleanest path I can see: filesystem-probe backfill — for each non-default source, walk `local_path`, slugify each file, and `UPDATE pages SET source_id = ? WHERE source_id = 'default' AND slug = ?` for matches. Deterministic, leaves genuinely-default pages alone.

## Offer

Happy to put up a PR if useful — the diagnosis is the hard part and that's done. Let me know if you'd prefer to handle it yourself, or if there's a different shape you want for the fix (e.g. require explicit migration vs auto-backfill).

Diagnosed in a Claude Code session against my live brain; happy to provide additional probes if you want specific queries run.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-source: pages always written with source_id='default' (federation appears empty) #1015

Summary

Reproducer

Root cause — three layers, one missing thread

Why it didn't get caught

Proposed fix

Offer

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Multi-source: pages always written with source_id='default' (federation appears empty) #1015

Description

Summary

Reproducer

Root cause — three layers, one missing thread

Why it didn't get caught

Proposed fix

Offer

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions