Skip to content

bug: gbrain sync --strategy code silently imports zero code files on first-sync (full path) #744

@lanceretter

Description

@lanceretter

Summary

gbrain sync --strategy code --source <id> reports "X pages imported" and advances last_commit on the named source, but zero code files are actually persisted to the database. The flag is silently dropped on the full-sync code path. Reproduces consistently across multiple repos and multiple syncs.

Reproducer

# Fresh source, fresh sync
gbrain sources add gstack-code-myrepo --path /path/to/repo --federated
gbrain sync --strategy code --source gstack-code-myrepo
# → "216 pages imported, 1323 chunks created" (looks great)

# Verify
psql "$DATABASE_URL" -c "SELECT COUNT(*) FROM pages WHERE source_id='gstack-code-myrepo' AND page_kind='code';"
# → 0
psql "$DATABASE_URL" -c "SELECT COUNT(*) FROM pages WHERE page_kind='code';"
# → 0 across the entire brain
gbrain code-def someSymbol
# → { "count": 0, "results": [] }

I have three sources on the same brain (gstack-code-rv-helper, gstack-code-trashtastic-helix, gstack-code-conquest-lpr-ccfd94) all synced via this command path over the past 3 days. All three show last_commit and last_sync_at set, yet pages WHERE source_id=<X> AND page_kind='code' returns 0 for every one of them. pages WHERE page_kind='code' returns 0 globally.

Root cause

gbrain sync --strategy code --source X on a source with last_commit IS NULL:

  1. Hits the full-sync path: commands/sync.ts:performFullSync.
  2. Calls runImport(engine, [repoPath], { commit: headCommit }) at commands/sync.ts:892. Strategy is not plumbed through.
  3. runImport in commands/import.ts:31 calls collectMarkdownFiles(dir) and logs "Found ${allFiles.length} markdown files". It walks markdown only — by name.
  4. Result: 216 markdown files walked → all match existing default-source slugs by content hash → all marked skipped (unchanged)zero code files ever touch the importCodeFile path.
  5. last_commit gets set as if synced. Next incremental run via the diff path (sync.ts:467) finds no diff and does nothing. Code files remain forever absent.

The dry-run path at commands/sync.ts:861 has the same shape — isSyncable(rel) without the strategy arg — but the prod path is the bigger break because it actually writes the misleading last_commit.

The incremental path at line 471 does thread strategy via syncOpts. It's only the full-sync (first-sync) path that's broken.

gbrain reindex-code is not a workaround: it walks pages WHERE p.type='code', of which there are zero post-sync, so it reports "No code pages to reindex" and exits clean.

Verbose-mode evidence

$ gbrain sync --strategy code --source gstack-code-conquest-lpr-ccfd94 --verbose
Found 216 markdown files                          ← walker is markdown-only
[import.files] 216/216 (100%) imported=0 skipped=216 errors=0
Import complete (2.2s):
  0 pages imported
  216 pages skipped (216 unchanged, 0 errors)     ← all 216 match default-source slugs
  0 chunks created
First sync complete. Checkpoint: 6f462d2f         ← last_commit advances anyway

(File counts: this repo has 223 .md and 1109 .ts/.tsx/.py/.js. The "216 markdown files" matches the .md count after skipFiles exclusion of README.md/schema.md/etc., not the code count.)

Suggested fix locations

The cleanest shape is probably to plumb strategy through runImport so the walker can branch:

  • commands/import.ts — accept strategy in opts; replace collectMarkdownFiles with a strategy-aware walker (or call a new collectCodeFiles for strategy='code').
  • commands/sync.ts:892 — pass opts.strategy into runImport.
  • commands/sync.ts:861 — pass { strategy: opts.strategy } into isSyncable (dry-run cosmetic fix).

Alternatively keep runImport markdown-only and split performFullSync so strategy='code' routes to a new performFullCodeSync that walks code files via isCodeFilePath and calls importCodeFile directly.

Either fix is bigger than a one-liner — happy to send a PR if you want, but flagging as an issue first since the right shape is your call.

Companion to recent PRs

This was discovered during the v0.26.6 → v0.30.0 upgrade dig in #740 + #741. Same upgrade exercise revealed three fixed bugs (deps, UNSAFE_TRANSACTION, missing connect) plus the bootstrap gap. This is the fourth real bug from that exercise. Not blocking the upgrade — markdown semantic search works fine — but gbrain code-def/code-refs/code-callers is unusable for any user who has gotten this far without realizing their code source has zero pages.

🤖 Reported by Claude Code

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions