Conversation JSONL bloats ~10× when chat uploads include binary files

## Symptom

Conversations that include chat-uploaded files persist to JSONL at roughly 10× the size they should. A single 100 KB image balloons its conversation file by ~1 MB.

**Evidence:** `conv_ce7995f740af44d4.jsonl` is 8.8 MB and mostly binary-as-dict serializations of uploaded images/PDFs.

## Root cause

Uploaded file bytes reach `runtime.chat()` as `Buffer` instances on the `contentParts` array of the user-message event. The conversation event store calls `JSON.stringify` on the event as-is; `Buffer` serializes to a one-key-per-byte dict shape:

\`\`\`json
{"type":"image","image":{"0":137,"1":80,"2":78,"3":71, ...}, "mimeType":"image/png"}
\`\`\`

~10× overhead vs. the raw bytes, because each byte produces \`\`\"NN\":NNN,\`\` (5–7 characters) instead of 1 byte.

## Cost framing

- **Disk/IOPS** today — conversation files grow faster than they should.
- **LLM tokens** on reload — if the full conversation is ever rehydrated into a prompt (e.g., resume, export, analytics), the bloated payload burns tokens. Even if today's reload path doesn't feed these parts back to the model, any future path that does will pay the inflated cost.
- **Bandwidth** on reads — frontends streaming conversation events pay the bloat.

## Fix

Strip binary-bearing content parts before persist; rehydrate from the workspace file store on reload via the \`fileRefs\` that are already on the user-message metadata.

- The bytes are already stored durably at \`<workDir>/workspaces/<wsId>/files/\` (after PR #52's unification).
- The conversation event only needs a reference by file id — \`fileRefs\` is the existing field.
- On reload, only rehydrate if a caller actually needs the bytes. Most replay paths (history to the LLM) can work off the extracted-text content part, which is already persisted separately and is cheap.

## Files likely to touch

- \`src/runtime/runtime.ts\` — around where the user message is assembled before the store append (see ~line 519 for the \`userContent\` construction from \`request.contentParts\`).
- \`src/conversation/event-sourced-store.ts\` (and memory/jsonl siblings) — assert that the event being appended has no \`Buffer\`/\`Uint8Array\` payloads. Throw loudly if it does, so this regression can't silently come back.
- New test in \`test/unit/conversation/\` — round-trip a user message with a \`contentParts\` image; assert the on-disk JSONL does not contain \`\"0\":\`, \`\"1\":\`, etc.

## Out of scope

- Replay UX changes (e.g., lazy-loading file bytes in the web client). Track separately if we need it.
- Re-writing existing bloated JSONL files. If desired, a one-shot cleanup can come later — needs decisions about whether to mutate historical events.

## References

- Bug 4 sprint spec (`BUGFIX_4_FILE_STORE_SPLIT.md`), Step 4 "Conversation JSONL bloat (bonus)"
- PR #52 (store unification — lands the prerequisite: file bytes live in the workspace files store)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Conversation JSONL bloats ~10× when chat uploads include binary files #54

Symptom

Root cause

Cost framing

Fix

Files likely to touch

Out of scope

References

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Conversation JSONL bloats ~10× when chat uploads include binary files #54

Description

Symptom

Root cause

Cost framing

Fix

Files likely to touch

Out of scope

References

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions