Skip to content

sync --strategy code dropped on first sync via performFullSync #767

@rayers

Description

@rayers

Summary

gbrain sync --strategy code --source <id> silently does a markdown-only import on first sync (no anchor commit yet). The --strategy flag is parsed correctly but dropped when performFullSync calls runImport, which is hardcoded to walk markdown only via collectMarkdownFiles. As a result, registering a fresh code source and running first sync produces 0 code pages — no error, just silent under-coverage. code-def / code-refs / code-callers / code-callees then return 0 hits for every symbol in the repo.

Steps to reproduce

# Start with a fresh code source
gbrain sources add gstack-code-myrepo --path /path/to/myrepo --federated

# First sync with --strategy code
cd /path/to/myrepo
gbrain sync --source gstack-code-myrepo --strategy code --no-embed

# Output: 0 pages imported, N skipped (where N = the number of .md files
#         that already exist as duplicates in default source)
# Expected: thousands of code pages from .ts/.py/.java/.c/etc.

Verify:

gbrain code-def <known_symbol>  # returns count: 0

Root cause

src/commands/sync.ts:847performFullSync calls runImport from ./import.ts:

const { runImport } = await import('./import.ts');
const importArgs = [repoPath];
if (opts.noEmbed) importArgs.push('--no-embed');
if (fullConcurrency > 1) importArgs.push('--workers', String(fullConcurrency));
const result = await runImport(engine, importArgs, { commit: headCommit });

Note that opts.strategy is in scope but never threaded into importArgs. runImport then uses collectMarkdownFiles(repoPath) which only enumerates *.md. The walkSyncableFiles(strategy) path that incremental mode uses (sync.ts:111-134) is bypassed entirely on first sync.

The dry-run branch above (sync.ts:858) has the same drop — collectMarkdownFiles regardless of opts.strategy.

Affected behavior

  • First-time gbrain sources add <id> --path <repo> followed by gbrain sync --strategy code produces no code pages.
  • Subsequent invocations of sync --strategy code after the first still hit the same path because last_commit was never advanced.
  • Workaround gbrain reindex-code --source <id> --yes prints No code pages to reindex (chicken-and-egg — needs code pages first).
  • setting strategy=code directly in the source's config row in postgres doesn't help — the code path of performFullSync ignores the source config too because it uses runImport.

Suggested fix

Thread opts.strategy through to runImport (or call a strategy-aware walker directly inside performFullSync instead of delegating to the markdown-only runImport). The incremental path (walkSyncableFiles(repoPath, cb, strategy)) already does the right thing; ideally performFullSync would use the same walker.

Same fix needs to apply to the dry-run branch.

Environment

  • gbrain v0.30.1 (commit dffb607)
  • Engine: postgres 16.13 (pgvector + pg_trgm)
  • macOS 26.3.1 (build 25D771280a), bun 1.3.10
  • Use case: Dividia NVR repo (~7000 files: Java + Python + C + TypeScript + bash). Goal was enabling mcp__gbrain__code_def for cross-codebase symbol lookup.

Workarounds tried

  1. gbrain sources add ... --strategy code — no --strategy flag on sources add.
  2. Setting config.strategy = 'code' directly in postgres — ignored by runImport.
  3. gbrain sync --source <id> --strategy code --full --no-pull — same bug.
  4. gbrain reindex-code --yes --source <id> — needs pre-existing code pages.

Currently no clean workaround short of writing a custom TS script that imports importCodeFile from gbrain internals and walks the repo manually.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions