fix(sync): honor --strategy on first sync via performFullSync (#767)#768
Open
rayers wants to merge 2 commits intogarrytan:masterfrom
Open
fix(sync): honor --strategy on first sync via performFullSync (#767)#768rayers wants to merge 2 commits intogarrytan:masterfrom
rayers wants to merge 2 commits intogarrytan:masterfrom
Conversation
…an#767) `gbrain sync --strategy code --source <id>` silently ran a markdown-only import on first sync (no anchor commit yet). The flag was parsed but dropped when performFullSync called runImport, which is hardcoded to collectMarkdownFiles. Result: registering a fresh code source produced 0 code pages, breaking code-def / code-refs / code-callers / code-callees on freshly-registered code repos. Changes: - src/commands/import.ts - runImport now parses `--strategy markdown|code|auto` (defaults to markdown, preserves backward compat). - When strategy != markdown, route file collection through a new `collectFilesByStrategy` helper that mirrors collectMarkdownFiles safety guards (lstatSync symlink containment, hidden + node_modules skip, 5MB size cap) and uses isSyncable as the include filter so inclusion logic stays in sync with the incremental walker. - Validates --strategy value with a clear error before falling into the file walk. - src/commands/sync.ts - performFullSync threads opts.strategy through to runImport's argv. - Dry-run branch also honors strategy (was a parallel bug — silent misleading dry-run counts when strategy=code). - test/collect-files-by-strategy.test.ts (new) - 10 tests, all pass: strategy=code returns code-only, strategy=auto returns code+md, recursion, node_modules + hidden-dir skips, 5MB cap, symlink containment (file + dir + dangling), parity with markdown. - Mirrors import-walker.test.ts patterns + L002 security invariants. Verified end-to-end: against a 1.6GB nvr repo (~7000 files, mixed Java + Python + C + TypeScript + bash), `gbrain sync --strategy code` produced 1882 code pages and 31933 tree-sitter chunks (was 0 before this patch). `gbrain code-def DstreamView` returns count: 1. Existing test surface (import-walker, import-file, import-resume, sync-classifier-widening, sources-resync-recovery, skillpack-sync-guard) runs green: 64 pass, 0 fail. Closes garrytan#767
Adversarial review (Claude subagent) flagged three HIGH issues in the initial garrytan#767 patch. This commit addresses each: 1. HIGH — `auto` was not a strict superset of `collectMarkdownFiles`. The previous implementation routed `auto` through isSyncable, which strips brain-convention files (README.md, index.md, log.md, schema.md, ops/**, .raw/**) and multimodal images. A user migrating from strategy=markdown to strategy=auto would silently lose pages — the exact silent-drop class the patch is meant to fix. Fixed by computing `auto` in runImport as a UNION of `collectMarkdownFiles(dir) ∪ collectFilesByStrategy(dir, 'code')`. The raw `collectFilesByStrategy(dir, 'auto')` helper still uses isSyncable's auto rules (consistent with the incremental sync path) and is documented accordingly; callers wanting the union compose the two. 2. HIGH — `--strategy=code` (equals form) silently dropped. The original `args.indexOf('--strategy')` only matched the space-separated form; `--strategy=code` fell through to default 'markdown'. Verified by manual smoke test before the fix: `gbrain import <dir> --strategy=code` reported "Found N markdown files". Fixed with a two-form parser that accepts both `--strategy code` and `--strategy=code`. Verified: both forms now print "Found N code files". 3. MEDIUM — `require('../core/sync.ts')` in collectFilesByStrategy was the only `require` in the file (vs `import` everywhere else). Stylistic inconsistency + brittle if the project moves to ESM-only. Fixed by promoting to a top-level `import { isSyncable } from '...'`. 4. MEDIUM — duplicated 5MB constant. Extracted to a documented module- level `MAX_IMPORT_FILE_SIZE` to make the sync between this and `MAX_FILE_SIZE` in `import-file.ts` discoverable. 5. LOW — test coverage gaps: empty dir, non-existent dir, unreadable subdir, brain-convention strip behavior on `auto`. All added. Test count for collect-files-by-strategy.test.ts: 10 → 13 pass. Full import+sync suite (7 files): 77 pass, 0 fail (was 64 before, +13 new tests). Equals-form spot-check confirms parity: --strategy code → Found N code files --strategy=code → Found N code files --strategy auto → 4 files (3 md + 1 ts) on a 4-file fixture --strategy markdown → 3 files (md only) (no flag) → Found N markdown files (backward compat) The single-source `gbrain sync --source <id>` path that doesn't read cfg.strategy from sources.config (`sync.ts:1071-1101`) is pre-existing behavior and out of scope for garrytan#767. Documented in the related issue discussion; potential follow-up.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes #767.
gbrain sync --strategy code --source <id>silently ran a markdown-only import on first sync (no anchor commit yet) becauseperformFullSynccallsrunImportwhich is hardcoded tocollectMarkdownFiles. Result: registering a fresh code source produced 0 code pages — no error, just silent under-coverage.code-def/code-refsreturned 0 hits for every symbol.Changes
src/commands/import.tsrunImportnow parses--strategy markdown|code|auto(defaults tomarkdownfor backward compat).--strategy code) and equals-separated (--strategy=code) forms are accepted. The original code only matched the space form viaargs.indexOf('--strategy'); equals-form fell through to default.autostrategy is computed as a UNION ofcollectMarkdownFiles(dir) ∪ collectFilesByStrategy(dir, 'code')— strict superset of legacy markdown coverage. The rawcollectFilesByStrategy(dir, 'auto')helper alone usesisSyncable's auto rules (consistent withwalkSyncableFiles), which strips brain-convention files (README.md,index.md,ops/**); callers wanting the union compose the two.collectFilesByStrategy(dir, strategy)helper. MirrorscollectMarkdownFilessymlink + hidden-dir + node_modules safety (the L002 invariants from PR fix: skip node_modules and handle broken symlinks in import walker #26 / PR fix: community fix wave — 9 PRs, 8 contributors (v0.6.1) #38) and routes the include filter throughisSyncableso this and the incrementalwalkSyncableFilesshare inclusion logic. Per-file 5MB cap via a documented module-levelMAX_IMPORT_FILE_SIZEconstant.src/commands/sync.tsperformFullSyncthreadsopts.strategythrough torunImport'simportArgs.dryRunbranch (parallel bug — silent misleading dry-run counts when strategy=code).test/collect-files-by-strategy.test.ts(new, 13 tests)strategy=codereturns code-only, recurses, skips node_modules + hidden dirs, 5MB cap.[]; non-existent dir returns[]; unreadable subdir doesn't throw.strategy=autoreturns code + non-stripped markdown; documents the brain-convention strip behavior so callers know to compose withcollectMarkdownFilesfor the union.Verification
End-to-end against a 1.6GB Dividia NVR repo (~7000 files, mixed Java + Python + C + TypeScript + bash):
Was 0 before this patch.
gbrain code-refs <symbol>returns hits via chunk_text scan (verified with 50 hits for a known nvr Python symbol).gbrain code-defworks for any language whose tree-sitter chunker fully populatessymbol_name/symbol_typecolumns (Python: yes; Java: chunks are created with[Java] path:linechunk_text but symbol metadata is unpopulated — separate downstream chunker behavior, not affected by this patch).Equals-form spot-checks:
Tests
test/collect-files-by-strategy.test.ts— all pass.import-walker,import-file,import-resume,sync-classifier-widening,sources-resync-recovery,skillpack-sync-guard): 64 → 64, all pass.Adversarial review
Two-round review process before submission. Round-1 patch was reviewed by an independent Claude subagent (read-only, hostile-reviewer mode) which surfaced three HIGH issues:
autowas not a strict superset ofcollectMarkdownFiles(silently droppedREADME.md/index.md/ops/**).--strategy=code(equals form) silently dropped (same class of bug being fixed).require('../core/sync.ts')was the onlyrequirein the file (vsimporteverywhere else).Plus MEDIUM (duplicated 5MB constant) and LOW (test gaps). All addressed in commit
d8e79f7. Two rounds are visible as separate commits on the branch for review history; happy to squash on merge.Codex CLI review was also attempted (
codex review --base master) but stalled in repo-exploration mode — known issue with codex 0.117 on patches against larger repos. Claude subagent review was the more productive of the two.Out of scope (separate from this PR)
gbrain sync --source <id>without--strategydoes not consultcfg.strategyfrom the source's stored config (sync.ts:1071-1101). The--allpath does. Pre-existing inconsistency; can address separately.chunk_textlike[Java] path:linebut doesn't populatesymbol_name/languagecolumns the way Python's chunker does. Separate downstream concern; affectscode-defquality but is unrelated to file walking.Environment used to develop this patch
gbrain v0.30.1 (
dffb607), engine = Postgres 16.13 (pgvector + pg_trgm), bun 1.3.10, macOS 26.3.1 (build 25D771280a). Use case: enablingmcp__gbrain__code_deffor cross-codebase symbol lookup on a single-developer NVR codebase.🤖 Generated with Claude Code
Need help on this PR? Tag
@codesmithwith what you need.