Code chunks land in DB with NULL language / symbol_name / symbol_type across all languages

## Symptoms

After `gbrain sync --strategy code` walks a multi-language repo (1882 source files: Java + Python + C + TypeScript + JavaScript + bash + CSS + HTML + YAML + JSON, etc.), the chunks land in `content_chunks` with the chunk text correctly tagged in the body (e.g., `[Java] dview/.../DstreamView.java:5-5 import java.awt.C`), but the `language`, `symbol_name`, `symbol_type`, and `symbol_name_qualified` columns are NULL on essentially every chunk.

`code-def <symbol>` returns 0 hits as a result — the query filters on `symbol_name` which is unpopulated. `code-refs` still works because it's a text-scan over `chunk_text`.

## Concrete numbers from a freshly-synced nvr repo (~31k chunks)

After `gbrain sync --source <id> --strategy code` finishes successfully, plus the autopilot daemon doing a few maintenance cycles plus a manual `gbrain extract all`:

| Source language (inferred from slug) | chunks | with language | with symbol_name | with symbol_type |
|--------------------------------------|--------|---------------|------------------|------------------|
| java | 16,663 | 0 | 0 | 0 |
| python | 10,541 | 0 | 0 | 0 |
| c | 3,024 | 0 | 0 | 0 |
| javascript | 692 | 0 | 0 | 0 |
| c-header | 461 | 0 | 0 | 0 |
| bash | 234 | 0 | 0 | 0 |
| json | 117 | 104 | 0 | 104 |
| html | 59 | 0 | 0 | 0 |
| yaml | 52 | 0 | 0 | 0 |
| css | 50 | 0 | 0 | 0 |
| cpp | 14 | 0 | 0 | 0 |
| typescript | 1 | 0 | 0 | 0 |

JSON's 104 chunks with `symbol_type` are the only language with consistent metadata; everything else is empty.

The chunker code path looks plausible:
- `src/core/chunkers/code.ts:542-579` calls `extractSymbolName(node)` and `normalizeSymbolType(node.type)` to populate metadata.
- `src/core/import-file.ts:127, 470` reads `c.metadata.symbolName` and writes to the `symbol_name` column.
- The chunkable-node Sets at `src/core/chunkers/code.ts:296-300` for Java look right (`method_declaration`, `class_declaration`, `interface_declaration`, etc.).

So either the chunker is computing metadata but not persisting (some pipeline disconnect), or the chunker isn't actually invoking the language grammar for code files (silent fallback to a non-AST chunker), or a downstream process (extraction, autopilot maintenance) rewrites chunks and strips metadata.

## Reproduction

1. Register a code source pointing at a real multi-language repo:
   ```bash
   gbrain sources add my-code --path /path/to/repo --federated
   ```
2. Set `strategy=code` in the source's stored config (until #767's PR lands, this is needed to make sync walk code files):
   ```sql
   UPDATE sources SET config = '{"federated":true,"strategy":"code"}' WHERE id='my-code';
   ```
3. First sync:
   ```bash
   cd /path/to/repo
   gbrain sync --source my-code --strategy code --no-embed
   ```
4. Verify chunks landed:
   ```sql
   SELECT COUNT(*) FROM content_chunks WHERE chunk_text LIKE '[Java] %';
   -- many
   SELECT COUNT(*) FROM content_chunks WHERE language = 'java' OR symbol_name IS NOT NULL;
   -- zero or near-zero
   ```
5. Confirm `code-def` is empty for known symbols:
   ```bash
   gbrain code-def <a-class-name-from-the-repo>
   # {"symbol": "...", "count": 0, "results": []}
   ```

## Suspected interaction with `gbrain extract all` / autopilot

A pre-extract snapshot showed `__init__: 325, os: 133, stdMsg: 102` populated in Python `symbol_name` (from `gbrain stats` with strategy filter). After running `gbrain extract all` plus a few autopilot maintenance cycles, those counts dropped to single digits or zero. Suspect the extract / autopilot path may be rewriting `content_chunks` rows (e.g., for backlink reconciliation or timeline extraction) without preserving the original symbol metadata.

## Environment

- gbrain v0.30.1 (`dffb607`) with PR #768 applied locally (the strategy fix; doesn't touch chunkers or import-file)
- Engine: Postgres 16.13 (pgvector + pg_trgm) via Docker
- bun 1.3.10
- macOS 26.3.1 (build 25D771280a)

## Why I'm filing as an observation rather than a fix

Root-causing this requires instrumenting the chunker output before DB write to see whether `c.metadata.symbolName` is populated on the way in (chunker bug vs persistence bug) and tracing the extract / autopilot paths to see if either rewrites chunks without metadata. The chunker pipeline is yours; faster for you to triage than for me to debug-explore. Happy to run any diagnostic queries or share more data.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Code chunks land in DB with NULL language / symbol_name / symbol_type across all languages #769

Symptoms

Concrete numbers from a freshly-synced nvr repo (~31k chunks)

Reproduction

Suspected interaction with `gbrain extract all` / autopilot

Environment

Why I'm filing as an observation rather than a fix

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Source language (inferred from slug)	chunks	with language	with symbol_type
java	16,663	0	0
python	10,541	0	0
c	3,024	0	0
javascript	692	0	0
c-header	461	0	0
bash	234	0	0
json	117	104	104
html	59	0	0
yaml	52	0	0
css	50	0	0
cpp	14	0	0
typescript	1	0	0

Code chunks land in DB with NULL language / symbol_name / symbol_type across all languages #769

Description

Symptoms

Concrete numbers from a freshly-synced nvr repo (~31k chunks)

Reproduction

Suspected interaction with gbrain extract all / autopilot

Environment

Why I'm filing as an observation rather than a fix

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Suspected interaction with `gbrain extract all` / autopilot