Skip to content

feat(sync): --src-subpath + --exclude for monorepo subdir-source support#774

Open
jeremyknows wants to merge 2 commits intogarrytan:masterfrom
jeremyknows:feat/sync-monorepo-subdir-source
Open

feat(sync): --src-subpath + --exclude for monorepo subdir-source support#774
jeremyknows wants to merge 2 commits intogarrytan:masterfrom
jeremyknows:feat/sync-monorepo-subdir-source

Conversation

@jeremyknows
Copy link
Copy Markdown

@jeremyknows jeremyknows commented May 9, 2026

Summary

Adds two flags to gbrain sync that together make Atlas-style monorepos (1 git repo, N logical sources at subdirs) work without any workarounds.

PR-A — --src-subpath <subdir>

  • New discoverGitRoot() helper: git -C <path> rev-parse --show-toplevel walks up from any path to the git root — handles worktrees + submodules natively.
  • Splits the implicit repoPath into two axes:
    • gitContextRoot — all git operations (rev-parse, diff, pull)
    • syncScopeRoot — file walking and imports
  • Slugs stay relative to gitContextRoot so wiki/page1.md lands as slug wiki/page1, not page1, regardless of which subpath is synced.
  • runImport() gains a slugRoot option to support this: walk from syncScopeRoot, compute slugs relative to gitContextRoot.
  • manageGitignore() always resolves to git root even when repoPath is a subdirectory.
  • Auto-discovery: passing a subdir directly as repoPath (no --src-subpath) also works — same code path.

PR-B — --exclude <glob> (repeatable)

  • Exposes the existing internal SyncOpts.exclude field as a repeatable CLI flag.
  • Works in both full-sync and incremental sync paths.
  • Threaded into runImport() opts for full-sync paths.
  • matchesAnyGlob promoted to export from core/sync.ts for reuse in import.ts.

Security guards

Guard What it blocks
NAV-1 --src-subpath ../escaperealpathSync scope-entry check before any git op
NAV-1 TOCTOU Symlink inside repo pointing outside — per-file realpathSync check during walk
NAV-2 gitContextRoot itself passed through realpathSync
NAV-4 Warning when --exclude patterns filter out all files
NAV-3 note Git hooks (.git/hooks/) run during git pull — unchanged from current behavior; callers trusting remote repos already accept this risk

Tests

test/sync-monorepo.test.ts — 10 tests, all pass:

✓ back-compat: sync at git root without srcSubpath imports all files
✓ auto-discovery: repoPath is a git subdir — discoverGitRoot succeeds
✓ --src-subpath wiki: only wiki/ files are imported
✓ --src-subpath memory: only memory/ files are imported
✓ 2 sources in 1 repo: sync each scope independently, no cross-contamination
✓ path-traversal: --src-subpath ../escape is rejected before any git op
✓ path-traversal: symlink subdir pointing outside repo is rejected (NAV-1 TOCTOU)
✓ --exclude: single pattern excludes matching files from full sync
✓ --exclude: glob pattern with wildcard
✓ --exclude **/* emits warning when all files are excluded (NAV-4)
bun test test/sync-monorepo.test.ts  →  10 pass / 0 fail
bun run typecheck                    →  clean
bun run verify                       →  all checks pass

What was NOT tested

  • Incremental sync with --src-subpath (requires 2 commits in a real git repo; incremental path uses inScope() filter on git diff output which is git-root-relative — tested manually by inspection but not automated here)
  • Parallel workers + --src-subpath (PGLite forces serial; Postgres worker path is unchanged)

Motivation

Closes the gap for monorepo deployments where a single git repo contains multiple logical knowledge bases (e.g., wiki/, memory/, projects/). Previously, gbrain sync required the git root and imported everything; there was no way to scope a sync to a subdirectory. With --src-subpath, each logical source can be synced independently with its own source ID, enabling per-source search and cross-contamination prevention.

Relates to #753.

🤖 Generated with Claude Code


View in Codesmith
Need help on this PR? Tag @codesmith with what you need.

  • Let Codesmith autofix CI failures and bot reviews

…support

PR-A: discoverGitRoot() walks up from any path to the git root so
`gbrain sync wiki/` works without requiring the git root as cwd.
Splits the existing repoPath into gitContextRoot (git ops) and
syncScopeRoot (file walking / import), keeping slugs relative to
gitContextRoot so wiki/page1.md lands as slug "wiki/page1" not "page1".

Scope guard: realpathSync() scope-entry check + per-file TOCTOU check
(NAV-1) reject --src-subpath values that traverse outside the repo via
`../` or symlinks. NAV-4: warn when --exclude filters everything out.

PR-B: expose the existing internal `exclude` SyncOpts field as a
repeatable --exclude <glob> CLI flag. Also threads exclude support into
runImport() for full-sync paths.

runImport() gains a `slugRoot` option so callers can walk from a
subdirectory while keeping slugs relative to the repo root.
matchesAnyGlob promoted to export from core/sync.ts.

manageGitignore() always resolves to git root via discoverGitRoot()
even when repoPath is a subdirectory.

Tests: 10 tests in test/sync-monorepo.test.ts covering back-compat,
auto-discovery, --src-subpath scoping, 2-source isolation, path-traversal
rejection (NAV-1/NAV-2), and --exclude single/glob/NAV-4.
…slugs for direct-dir sources

Sources registered pointing at a subdirectory (e.g. wiki → ~/atlas/shared/wiki)
expect slugs relative to their own path, not the git context root. The slugRoot
opt was introduced for the --src-subpath monorepo case where git-root-relative
slugs are intentional. Applying it to all subdirectory sources breaks frontmatter
slug matching (e.g. agent-roster.md slug:'agent-roster' vs path-derived
'shared/wiki/agent-roster').

Fix: `slugRoot = (opts.srcSubpath && scopeRel) ? gitContextRoot : undefined`

10/10 tests pass. Discovered during 4th-wipe re-import (wiki: 4 → 64 pages).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@jeremyknows
Copy link
Copy Markdown
Author

Follow-up commit: b4bfb70fix(sync): gate slugRoot on opts.srcSubpath

Discovered during live testing: the original PR applied slugRoot = gitContextRoot whenever syncScopeRoot !== gitContextRoot, which affected all subdirectory sources — not just --src-subpath cases. Sources registered with a direct subdirectory path (e.g., sources add wiki --path ~/atlas/shared/wiki) carry slug: frontmatter fields matching their path-relative slug (agent-roster, not shared/wiki/agent-roster). The mismatch caused 80/144 wiki pages to fail import with SLUG_MISMATCH.

Fix: slugRoot = (opts.srcSubpath && scopeRel) ? gitContextRoot : undefined

Git-root-relative slugs are only activated when --src-subpath is explicitly passed. Back-compat with existing sources that point directly at subdirectories is preserved.

All 10 tests still pass.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant