gbrain sync deletes pages by slug only, ignoring source_id - cross-source data loss
Summary
gbrain sync against source A can delete brain pages that came from source B if they share a slug. The deletion in src/commands/sync.ts:425-440 filters the source's git diff for un-syncable files, then calls engine.deletePage(slug) without checking whether that brain page actually originated from this source.
I hit this in production: a one-line linkedin-brain README page (slug readme, ingested April 10 from ~/linkedin-brain via gbrain import) was silently deleted by a misconfigured gbrain sync targeting ~/gbrain-prod. gbrain-prod's README.md is un-syncable (no YAML frontmatter), so sync's deletion logic treated the brain page slug readme as a stale orphan and removed it - even though the page belonged to a completely different source.
Reproduction
- Two source repos with same-slug files:
~/source-a/README.md (no YAML frontmatter, un-syncable)
~/source-b/README.md (well-formed, syncable - or just any source that imported a readme slug)
- Source B was ingested at some point:
gbrain import ~/source-b → creates page slug readme.
- Some prior
gbrain sync ran against source A and bookmarked last_commit (sets sync.last_commit global key, since no --source was passed).
- Run
gbrain sync again (no --source, no --repo) - it picks up source A from the global sync.repo_path fallback, runs git pull, computes manifest, finds A's README.md was modified between bookmarks.
runPhaseSync filters A's manifest.modified → un-syncable files → calls engine.deletePage('readme').
- The
readme page from source B is GONE. Cascade-deletes its content_chunks, links, tags, page_versions, etc.
Root cause
src/commands/sync.ts:425-440 (v0.27.0):
const unsyncableModified = manifest.modified.filter(p => !isSyncable(p, syncOpts));
for (const path of unsyncableModified) {
const slug = resolveSlugForPath(path);
try {
const existing = await engine.getPage(slug);
if (existing) {
await engine.deletePage(slug);
console.log(` Deleted un-syncable page: ${slug}`);
}
} catch { /* ignore */ }
}
engine.getPage(slug) and engine.deletePage(slug) operate on the slug across ALL sources. They don't take a sourceId parameter. Even though runPhaseSync already resolved sourceId upstream at line ~534 to thread into performSync, the deletion path doesn't use it.
Suggested fix
Filter the deletion by source provenance. Two options:
Option 1 (preferred): use ingest_log to derive provenance.
For each candidate slug, check the most recent ingest_log entry that lists this slug in pages_updated. If the entry's source_ref (or its mapped sourceId) doesn't match the current sync's source, skip the deletion. Source-agnostic deletes only fire when sync is explicitly source-agnostic (pre-v0.18 brains).
Option 2: extend deletePage to take a sourceId.
engine.deletePage(slug, sourceId) does DELETE FROM pages WHERE slug = $1 AND source_id = $2. The sync deletion path threads its sourceId (already resolved). Same-slug pages from other sources survive.
Option 2 is cleaner but a breaking change to the BrainEngine interface. Option 1 is contained but adds a query per un-syncable candidate.
Impact
Severity: data loss across sources. Hard to diagnose because the page disappears silently with a single log line, page_versions cascade-delete (no recovery from gbrain doctor or version history), and the only audit trail is ingest_log which shows the original write but doesn't capture the deletion.
Likely mostly affects users with multiple sources or who run gbrain import <dir> once and forget the global sync.* config keys it leaves behind. The deletion-by-slug ambiguity also affects sync's manifest.deleted path on line 422 (same shape).
Workaround
For users who hit this:
- The deleted page is recoverable only if the source content is still on disk:
gbrain put <slug> < ~/source-b/README.md reinstates the page.
- To prevent recurrence, delete the bare-fallback global config keys:
DELETE FROM config WHERE key IN ('sync.repo_path', 'sync.last_commit', 'sync.last_run'). After this, gbrain sync without --source errors out instead of silently targeting the wrong repo.
Version
gbrain 0.27.0 (binary). Source viewed at commit ee9ceb3 (v0.27 release base) and b325f28 (v0.28.6 master). The behavior in sync.ts:425-440 is unchanged on master.
Engagement context
Discovered during a pollution-cleanup engagement after a misconfigured autopilot --install --repo ~/gbrain-prod (instead of ~/youtube-knowledge). Cleanup deleted 176 polluted rows successfully via gbrain sync --source default --no-pull for the recovery sync.
gbrain syncdeletes pages by slug only, ignoring source_id - cross-source data lossSummary
gbrain syncagainst source A can delete brain pages that came from source B if they share a slug. The deletion insrc/commands/sync.ts:425-440filters the source's git diff for un-syncable files, then callsengine.deletePage(slug)without checking whether that brain page actually originated from this source.I hit this in production: a one-line
linkedin-brainREADME page (slugreadme, ingested April 10 from~/linkedin-brainviagbrain import) was silently deleted by a misconfiguredgbrain synctargeting~/gbrain-prod. gbrain-prod'sREADME.mdis un-syncable (no YAML frontmatter), so sync's deletion logic treated the brain page slugreadmeas a stale orphan and removed it - even though the page belonged to a completely different source.Reproduction
~/source-a/README.md(no YAML frontmatter, un-syncable)~/source-b/README.md(well-formed, syncable - or just any source that imported areadmeslug)gbrain import ~/source-b→ creates page slugreadme.gbrain syncran against source A and bookmarkedlast_commit(setssync.last_commitglobal key, since no--sourcewas passed).gbrain syncagain (no--source, no--repo) - it picks up source A from the globalsync.repo_pathfallback, runsgit pull, computes manifest, finds A'sREADME.mdwas modified between bookmarks.runPhaseSyncfilters A'smanifest.modified→ un-syncable files → callsengine.deletePage('readme').readmepage from source B is GONE. Cascade-deletes its content_chunks, links, tags, page_versions, etc.Root cause
src/commands/sync.ts:425-440(v0.27.0):engine.getPage(slug)andengine.deletePage(slug)operate on the slug across ALL sources. They don't take asourceIdparameter. Even thoughrunPhaseSyncalready resolvedsourceIdupstream at line ~534 to thread intoperformSync, the deletion path doesn't use it.Suggested fix
Filter the deletion by source provenance. Two options:
Option 1 (preferred): use
ingest_logto derive provenance.For each candidate slug, check the most recent
ingest_logentry that lists this slug inpages_updated. If the entry'ssource_ref(or its mappedsourceId) doesn't match the current sync's source, skip the deletion. Source-agnostic deletes only fire when sync is explicitly source-agnostic (pre-v0.18 brains).Option 2: extend
deletePageto take a sourceId.engine.deletePage(slug, sourceId)doesDELETE FROM pages WHERE slug = $1 AND source_id = $2. The sync deletion path threads itssourceId(already resolved). Same-slug pages from other sources survive.Option 2 is cleaner but a breaking change to the BrainEngine interface. Option 1 is contained but adds a query per un-syncable candidate.
Impact
Severity: data loss across sources. Hard to diagnose because the page disappears silently with a single log line, page_versions cascade-delete (no recovery from
gbrain doctoror version history), and the only audit trail isingest_logwhich shows the original write but doesn't capture the deletion.Likely mostly affects users with multiple sources or who run
gbrain import <dir>once and forget the globalsync.*config keys it leaves behind. The deletion-by-slug ambiguity also affects sync'smanifest.deletedpath on line 422 (same shape).Workaround
For users who hit this:
gbrain put <slug> < ~/source-b/README.mdreinstates the page.DELETE FROM config WHERE key IN ('sync.repo_path', 'sync.last_commit', 'sync.last_run'). After this,gbrain syncwithout--sourceerrors out instead of silently targeting the wrong repo.Version
gbrain 0.27.0 (binary). Source viewed at commit
ee9ceb3(v0.27 release base) andb325f28(v0.28.6 master). The behavior insync.ts:425-440is unchanged on master.Engagement context
Discovered during a pollution-cleanup engagement after a misconfigured
autopilot --install --repo ~/gbrain-prod(instead of~/youtube-knowledge). Cleanup deleted 176 polluted rows successfully viagbrain sync --source default --no-pullfor the recovery sync.