Skip to content

fix: code-source first-sync (#744) + class-method code-def#749

Open
lanceretter wants to merge 2 commits intogarrytan:masterfrom
lanceretter:fix/sync-strategy-and-code-def-class-methods
Open

fix: code-source first-sync (#744) + class-method code-def#749
lanceretter wants to merge 2 commits intogarrytan:masterfrom
lanceretter:fix/sync-strategy-and-code-def-class-methods

Conversation

@lanceretter
Copy link
Copy Markdown

@lanceretter lanceretter commented May 8, 2026

Summary

Two atomic commits that together make gbrain sync --strategy code actually work end-to-end on first sync, and gbrain code-def resolve class methods that the existing tree-sitter chunker already extracts.

Discovered while upgrading a real PlanetScale Postgres brain from v0.26.6v0.30.0 (companion to #740, #741, #744).

1. commands/sync.ts + commands/import.ts — strategy threading (closes #744)

Reproduces issue #744 directly. performFullSync called runImport(repoPath) without the strategy flag; runImport then walked markdown only via collectMarkdownFiles. First-sync of a fresh gstack-code-* source reported "X pages imported" but persisted zero code files. Three different repos on the same brain showed identical broken state.

Three small changes, all additive:

  • Extract a shared collectFiles(dir, strategy) walker in commands/import.ts that branches on strategy. Add collectCodeFiles and collectFilesAuto wrappers. collectMarkdownFiles keeps its current behavior + signature for backward compat (no caller changes needed).
  • runImport accepts strategy in opts and picks the right walker. Default 'markdown' preserves pre-strategy callers.
  • performFullSync threads opts.strategy through to runImport (and into the dry-run filter at line 861, which dropped strategy too).

Verified end-to-end across three repos:

Repo Code files imported Status
conquest-lpr (a 5-app monorepo) 858
rv-helper (Expo + Supabase) 360
trashtastic-helix partial — 2 legitimate failures (5.6MB minified dist/ JS; Supabase types.ts with NUL bytes — both correctly refused, not bugs)

2. commands/code-def.ts — extend DEF_TYPES for class members

gbrain code-def <symbol> returned 0 results for any class method even when chunks existed with the correct symbol_name, because the WHERE filter only allowed:

['function', 'class', 'interface', 'type', 'enum', 'struct', 'trait', 'module', 'contract', 'export statement']

The tree-sitter chunker emits more granular symbol_type values than that. Real distribution on my brain after a full sync:

function                  | 10897
declaration               |  5926
method definition         |  3201   ← was filtered out
export statement          |   818
import                    |   375
class                     |   343
interface                 |   215
field definition          |   124   ← was filtered out
variable assignment       |   108
struct specifier          |    60
public field definition   |    57   ← was filtered out
type                      |    40
method signature          |    14   ← was filtered out

Concretely: Database.calculateDeviceStatuses (a TypeScript class method) returned {count: 0} from code-def even though the chunk existed at apps/worker/src/db.ts with the right symbol_name and symbol_type='method definition'.

Fix: add four multi-word symbol_types representing class members. Additive only:

   const DEF_TYPES = [
     'function', 'class', 'interface', 'type', 'enum', 'struct', 'trait', 'module', 'contract',
+    'method definition', 'method signature', 'field definition', 'public field definition',
   ];

'declaration' is left out (too generic — covers imports, type aliases, plain bindings).

After: gbrain code-def calculateDeviceStatuses returns 3 hits (the real source plus a couple of compiled-JS build artifacts that are themselves a separate walker concern).

Test plan

  • Build: bun install && bun run build — passes.
  • Typecheck: bun run typecheck — passes.
  • Existing tests: bun test test/schema-bootstrap-coverage.test.ts — 5/5 pass (unchanged).
  • Smoke test: fresh gstack-code-* source on Postgres → gbrain sync --strategy code --source <id> actually imports code files; gbrain code-def <classMethod> returns hits.
  • CI on this PR — open to suggestions if there's a specific test you'd want for either fix.

Notes on follow-up bugs surfaced during this dig

Not in this PR; flagging for future:

  • The walker doesn't respect .gitignoretmp/, dist/, build artifacts get indexed alongside source. Worth a .gitignore-aware skip-list or an explicit --ignore glob.
  • --source <id> is recorded on sources.last_commit/last_sync_at but pages still land under source_id='default' rather than the named source. That's why gbrain sources list shows page_count=0 on the new sources even after a successful sync. Reasonable next dig — happy to PR if you want.

🤖 Generated with Claude Code


View in Codesmith
Need help on this PR? Tag @codesmith with what you need.

  • Let Codesmith autofix CI failures and bot reviews

…#744)

`gbrain sync --strategy code --source <id>` reports "X pages imported"
on first sync but persists ZERO code files. The full-sync path
(`performFullSync` at commands/sync.ts:847) called `runImport(repoPath)`
without the strategy flag, and `runImport` walked markdown only via
`collectMarkdownFiles`. Result: code files never reached the importer
on first sync; `last_commit` got bumped as if synced; subsequent
incremental runs found no diff and did nothing. Reproduced across
three different repos on the same brain.

Three small changes:

1. commands/import.ts — extract a shared `collectFiles(dir, strategy)`
   walker that branches on strategy. Add `collectCodeFiles` and
   `collectFilesAuto` wrappers. `collectMarkdownFiles` keeps its
   current behavior + signature for backward compat.

2. commands/import.ts:runImport — accept `strategy` in opts and pick
   the right walker. Default 'markdown' preserves pre-strategy callers.

3. commands/sync.ts — thread `opts.strategy` through to `runImport`
   on the full-sync path; also fix the dry-run filter at
   commands/sync.ts:861 which dropped strategy too.

Verified locally on a real PlanetScale Postgres brain across three
repos: conquest-lpr (858 code files imported), rv-helper (360),
trashtastic-helix (partial — 2 legitimate failures: a 5.6MB minified
JS in dist/ and a Supabase types.ts with NUL bytes). Code files now
populate `pages WHERE page_kind='code'`; `gbrain code-def` returns
real hits (modulo issue tracked separately re: class-method symbol
types — see follow-up commit).
`gbrain code-def <symbol>` returned 0 results for any class method
even when chunks existed with the right `symbol_name`, because the
WHERE filter (`commands/code-def.ts:56`) only allowed
`['function', 'class', 'interface', 'type', 'enum', 'struct',
'trait', 'module', 'contract', 'export statement']`.

The tree-sitter chunker emits more granular symbol_type values than
that list covers. On a real Postgres brain after a full code sync:

  function                  | 10897
  declaration               |  5926
  method definition         |  3201   ← previously filtered out
  export statement          |   818
  import                    |   375
  class                     |   343
  interface                 |   215
  field definition          |   124   ← previously filtered out
  variable assignment       |   108
  struct specifier          |    60
  public field definition   |    57   ← previously filtered out
  type                      |    40
  method signature          |    14   ← previously filtered out

Concretely: `Database.calculateDeviceStatuses` (a class method)
returned `{count: 0}` even though the chunk existed at
`apps/worker/src/db.ts` with symbol_name='calculateDeviceStatuses'
and symbol_type='method definition'.

Fix: extend DEF_TYPES with the four multi-word symbol_types that
represent class members:
  - method definition         (TS/JS class methods, Python methods)
  - method signature          (TS interface methods)
  - field definition          (class fields)
  - public field definition   (TS public fields)

This is additive only. Existing callers continue to work; class
members now resolve. `declaration` stays out (too generic — covers
imports, type aliases, and bindings).

Verified locally:
  $ gbrain code-def calculateDeviceStatuses
  → 3 hits (real source `apps-worker-src-db-ts` + 2 build artifacts)
  $ gbrain code-def runFifteenMinuteTick
  → still works (export statement, was already covered)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: gbrain sync --strategy code silently imports zero code files on first-sync (full path)

1 participant