fix(multi-source): plumb sourceId through performFullSync (closes PR #707 gap; closes #497, #540)#757
Conversation
Multi-source brains crashed mid-import with Postgres 21000 ("more than one
row returned by a subquery used as an expression"). Root cause: putPage's
INSERT column list omitted source_id, so writes intended for a non-default
source (e.g. 'jarvis-memory') silently fabricated a duplicate row at
(default, slug). The schema has UNIQUE(source_id, slug) but DEFAULT 'default'
for source_id; calling putPage(slug, page) without source_id landed at
(default, slug) and ON CONFLICT updated the wrong row, leaving the intended
source row stale. Subsequent bare-slug subqueries inside the same tx —
(SELECT id FROM pages WHERE slug = $1) in getTags / removeTag / deleteChunks
/ removeLink / addLink (cross-product) — then matched 2 rows and crashed
with 21000, rolling back the entire import. Observed: 18 sync failures
against a 'jarvis-memory'-sourced brain.
Fix:
- putPage adds source_id to the INSERT column list (defaults 'default' for
back-compat).
- Every bare-slug page-id subquery becomes source-qualified
(AND source_id = $X) in both engines: createVersion, upsertChunks,
getChunks, addTag, removeTag, getTags, deleteChunks, removeLink,
addTimelineEntry, deletePage, updateSlug.
- addLink rewritten away from FROM pages f, pages t cross-product into a
VALUES + JOIN-on-(slug, source_id) shape mirroring addLinksBatch.
- engine.ts interface: 11 method signatures gain optional opts.sourceId
(or opts.{from,to,origin}SourceId for addLink/removeLink). All optional;
existing callers default to source='default' and behave identically.
- import-file.ts: importFromContent / importFromFile / importCodeFile take
opts.sourceId and thread txOpts = { sourceId } through every per-page tx
call. engine.getPage callsite source-scoped for accurate idempotency.
- commands/sync.ts: thread opts.sourceId at importFile (line 581 + 641),
un-syncable cleanup (487-498), delete phase (557), rename phase (574),
and post-sync extract phase (815-816).
- commands/reindex-code.ts: thread opts.sourceId at importCodeFile call.
- commands/extract.ts: extractLinksForSlugs / extractTimelineForSlugs accept
opts.sourceId and propagate via linkOpts / entryOpts.
- commands/reconcile-links.ts: ReconcileLinksOpts.sourceId was declared but
ignored end-to-end; now wired through getPage + addLink calls.
- commands/migrate-engine.ts: --force wipe switched to executeRaw('DELETE
FROM pages') to preserve the pre-PR all-sources semantic after deletePage
became default-source-scoped.
Regression test: test/source-id-tx-regression.test.ts (19 tests). Validates
two sources × same slug coexist; getTags/addTag/removeTag/deleteChunks/
upsertChunks/createVersion/addLink/addTimelineEntry/deletePage/updateSlug
source-scoped writes don't 21000; back-compat without opts targets
source='default'; addLink fail-fast on missing source-qualified endpoint;
importFromContent end-to-end tx thread without fabricating duplicate.
Adversarial review: Codex (gpt-5.5 reviewer) + Grok (xAI flagship reviewer)
3-round crew loop. Round 1: 2 HIGH (addTimelineEntry + extract.ts thread)
+ 2 MED. Round 2: 1 CRITICAL + 1 HIGH (deletePage + updateSlug bare-slug)
+ 2 MED. Round 3: 2 HIGH (getChunks + migrate-engine semantic regression
introduced by R2 fix). Round 4: both reviewers CLEAR.
Deferred to follow-up PRs (noted as TODO):
- src/commands/embed.ts source-aware threading (auto-embed at sync.ts:823
has a TODO; try/catch swallows the failure as best-effort).
- src/core/postgres-engine.ts:1511 / pglite-engine.ts:1446 putRawData
bare-slug (lower-impact metadata path).
- Read-surface bare-slug consistency cleanup (getLinks/getBacklinks/
getTimeline/getRawData/getVersions): non-mutating, won't 21000.
- reconcile-links.ts CLI --source flag exposure (internal opt is wired;
CLI parser is a UX feature for later).
Existing rows in production written under (default, slug) by the old
putPage when caller meant another source remain misrouted. Backfill
heuristics need install-specific knowledge of intended source and are
outside this PR's scope; surface as a deployment-side cleanup task.
bun run typecheck clean, bun run build clean, 19/19 regression tests pass,
4082 unit pass / 1 pre-existing fail (BrainRegistry test depending on
test-env ~/.gbrain/ absence — fails on untouched main, unrelated).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…n#707 gap) PR garrytan#707 fixed source_id routing for sync's incremental loop (lines 581/641) but performFullSync (line 922) calls runImport without threading sourceId. Result: full syncs route pages to default even with --source <id>. Verified on v0.30.1 by direct PGLite probe after `gbrain sync --source X --full`: all pages landed in default, not the named source. Fix: - runImport accepts sourceId in opts (programmatic only — no CLI flag, preserving PR garrytan#707's design intent of `gbrain import` being default-only). - runImport threads sourceId to importFile + importImageFile. - performFullSync passes opts.sourceId to runImport. - ImportImageOptions type accepts sourceId for runImport branch (importImageFile body wiring deferred — image imports out of scope for current use case; TS error fix only). Verified: real sync test against /tmp/test-sync routes 1 page to "testsync" source, 0 to default (post-fix). 19/19 source-id regression tests still pass. Typecheck clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
PR garrytan#707's existing 19-test suite at test/source-id-tx-regression.test.ts covers the engine-layer transaction surface (putPage / addTag / etc.) but does NOT exercise commands/sync.ts:performFullSync. Verified via `grep -c 'performFullSync' test/source-id-tx-regression.test.ts → 0`. This means the +18/-4 fix at sync.ts:892 (performFullSync passing sourceId to runImport) had no automated coverage. Adds 2 PGLite-only regression tests: 1. `performFullSync with --source routes pages to named source (not default)` — fixture: temp git repo with 2 markdown files. Calls performSync with { full: true, sourceId: 'testsrc-pfs', noPull: true, noEmbed: true }. Asserts pages.source_id = 'testsrc-pfs', not 'default'. Pre-fix: FAILS (verified by checking out 46cd197 — rebased PR garrytan#707 only, without my gap-fix — and running this test). Post-fix: PASSES. 2. `performFullSync WITHOUT --source still targets default (back-compat)` — same fixture, no sourceId opt. Asserts pages.source_id = 'default'. Both pre-fix and post-fix: PASSES (back-compat preserved by the fix). Verified: 21/21 tests pass on this branch (19 from PR garrytan#707 + 2 new). `bun run typecheck` clean. `bun run verify` clean (8 guard checks pass). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
Going to fold all three commits (
Once I force-push to PR #707 I'll ping here so you can close #757. Realistically tonight or tomorrow morning. Thanks again — the regression test was the missing piece I'd planned to add myself, glad to have it landed already. (Heads-up: the test docstring references some Atlas-internal vocabulary — "PRISM Round 2", "Atlas Terminal agent", "PR-E" — I'll scrub those in a follow-up cleanup commit during the fold-in so the upstream test reads as a generic regression. Doesn't affect the test logic.) |
|
Done — PR #707 force-pushed to v0.30.1 base with all 3 of your commits (
Full stack + rationale on #707 (comment). Feel free to close #757 — appreciate you driving the rebase + the empirical proof of the gap. Couldn't have shipped this in a single coherent PR without your work. |
Codex-mandated test gate (C4 from /codex review of v0.30.3 plan). Pins three privacy invariants for #728's fence-stripping in operations.ts: 1. Local CLI caller (no allow-list) sees full takes fence — operator reads should preserve everything. 2. MCP-bound caller (allow-list set) sees compiled_truth with fence STRIPPED on get_page AND get_versions. 3. Allow-list PRESENCE (not contents) flags MCP-bound identity. Even a permissive ['world','garry','brain'] still strips, because the typed read surface for takes is takes_list / takes_search, not get_page or get_versions. Lane 4 (#757 + #728) was the high-risk merge surface for this privacy invariant. The test runs through dispatchToolCall to exercise the full threading path (auth → context → handler → engine read → stripTakesFence) so a future bad merge fails loudly at the conflict seam in operations.ts.
Summary
Follow-up + complement to PR #707 (mdcruz88's source_id-routing fix). PR #707 correctly threads
sourceIdthrough sync's incremental loop (lines 581/641) butperformFullSync(the path--fullinvokes) atsync.ts:892callsrunImport(engine, importArgs, { commit: headCommit })withoutsourceId. Result:gbrain sync --source X --fullupdatessources.last_sync_at(looks like binding) but writes pages todefaultsource.This PR adds the missing thread + a regression test that exercises the full-sync path the existing 19-test suite at
test/source-id-tx-regression.test.tsdoesn't cover.Relates to #497, #540. Branch rebases cleanly onto current
master(commitdffb607, v0.30.1) and includes all of @mdcruz88's PR #707 commits as base.Diff scope
3 files for the gap-fix (+18/-4) — preserves PR #707's design intent of
gbrain importCLI being default-only by makingsourceIda programmatic-only opt (no CLI flag added):src/commands/import.ts—runImportacceptsopts.sourceId, threads toimportFile+importImageFilesrc/commands/sync.ts:924—performFullSyncpassesopts.sourceIdtorunImportsrc/core/import-file.ts—ImportImageOptionstype acceptssourceId(TS-only fix; image-import body wiring deferred — out of scope, noted as a separate follow-up)1 test file (+139) — regression coverage for the gap:
test/performfullsync-source-id.test.ts— 2 PGLite tests:performFullSync with --source routes pages to named source (not default)performFullSync WITHOUT --source still targets default (back-compat preserved)Verification
The 19 existing tests (PR #707) + 2 new tests (this PR) all pass.
Real-behavior proof
Behavior addressed:
performFullSync(the--fullsync path) does not threadsourceIdtorunImport. With PR fix(multi-source): thread source_id through per-page tx surface (closes Postgres 21000 mid-import) #707 alone,gbrain sync --source X --fullupdatessources.last_sync_atto look like binding worked, but pages still land indefault. Verified empirically against rebased PR fix(multi-source): thread source_id through per-page tx surface (closes Postgres 21000 mid-import) #707 commit46cd197(i.e. PR fix(multi-source): thread source_id through per-page tx surface (closes Postgres 21000 mid-import) #707 without this gap-fix): the new regression test FAILS withexpected counts['testsrc-pfs'] toBeGreaterThan(0)because pages went todefault.Real environment tested: Bun 1.3.8 / macOS 25.4.0 / PGLite (no DATABASE_URL). Test fixture: temp git repo (
mkdtempSync+git init+ 2 markdown files) +performSync(engine, { repoPath, full: true, sourceId: 'testsrc-pfs', noPull: true, noEmbed: true })+SELECT source_id, COUNT(*) FROM pages GROUP BY source_id.Exact command:
bun test test/performfullsync-source-id.test.tsEvidence after fix (commit 694f9c9 + test 37dc598):
Observed result: With the fix applied, pages land at
(source_id='testsrc-pfs', slug='topics/foo')and(source_id='testsrc-pfs', slug='topics/bar'). With nosourceIdopt passed (back-compat path), pages land at(source_id='default', ...). Direct PGLite probe confirms.What was NOT tested:
DATABASE_URLwas available; the engine-layer fix in PR fix(multi-source): thread source_id through per-page tx surface (closes Postgres 21000 mid-import) #707 has Postgres parity per its own e2e tests, butperformFullSyncitself is engine-agnostic plumbing).local_pathto be a git root, so atlas's 9 agent-diary subdirs of~/atlasstill can't be source-bound viasynceven with this fix; that's a separate concern filed at feat: sync should support subdir-of-git-repo as a source (atlas-style monorepo with N logical sources) #753).ImportImageOptionsacceptssourceIdhere as a TS-only fix to keeprunImporttypecheck-clean, butwithImportTransactiondoesn't yet passsourceIdthrough to imagetx.putPage. PR fix(multi-source): thread source_id through per-page tx surface (closes Postgres 21000 mid-import) #707's commit body explicitly enumerates this as a deferred follow-up; not in scope here.--watch+--source) — not covered.Regression check against pre-fix
Verified by checking out
46cd197(rebased PR #707 without this gap-fix) and running the same test:Result: with PR #707 alone (no gap-fix), the
--sourcepath fails — pages went todefault. This is the bug. The fix in commit694f9c9resolves it.Test plan
bun test test/source-id-tx-regression.test.ts test/performfullsync-source-id.test.ts— should be 21 pass.bun run typecheck— clean.bun run verify— all 8 guard checks pass.DATABASE_URL— performFullSync's source threading is identical for Postgres + PGLite (it's at the runImport CLI dispatch layer, not engine-layer), so PGLite coverage gives high confidence.Maintainer-friendly notes
@mdcruz88 — earlier comment on PR #707 (#707 (comment)) offered this as either a separate PR or as commits to pull into PR #707. This PR is the standalone form. If you'd prefer to incorporate commits
694f9c9(gap-fix) +37dc598(test) into PR #707 directly, happy to close this PR — just leave a comment.@garrytan — appreciate the source isolation infrastructure in v0.18+. The combination of PR #707 (engine-layer source_id threading) + this PR (sync-command-layer threading) closes the open
#497+#540paths fully. Filed alongside#753(a related architectural follow-up about monorepo subdir-source support, not in this PR's scope).Commits
46cd197— fix(multi-source): thread source_id through per-page tx surface (rebased from PR fix(multi-source): thread source_id through per-page tx surface (closes Postgres 21000 mid-import) #707)694f9c9— fix(multi-source): plumb sourceId through performFullSync (PR fix(multi-source): thread source_id through per-page tx surface (closes Postgres 21000 mid-import) #707 gap)37dc598— test: regression test for performFullSync sourceId threadingBranch: https://github.com/jeremyknows/gbrain/tree/fix-707-performfullsync-gap-rebase
Need help on this PR? Tag
@codesmithwith what you need.