You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Conversations that include chat-uploaded files persist to JSONL at roughly 10× the size they should. A single 100 KB image balloons its conversation file by ~1 MB.
Evidence:conv_ce7995f740af44d4.jsonl is 8.8 MB and mostly binary-as-dict serializations of uploaded images/PDFs.
Root cause
Uploaded file bytes reach runtime.chat() as Buffer instances on the contentParts array of the user-message event. The conversation event store calls JSON.stringify on the event as-is; Buffer serializes to a one-key-per-byte dict shape:
~10× overhead vs. the raw bytes, because each byte produces ``"NN":NNN,`` (5–7 characters) instead of 1 byte.
Cost framing
Disk/IOPS today — conversation files grow faster than they should.
LLM tokens on reload — if the full conversation is ever rehydrated into a prompt (e.g., resume, export, analytics), the bloated payload burns tokens. Even if today's reload path doesn't feed these parts back to the model, any future path that does will pay the inflated cost.
Bandwidth on reads — frontends streaming conversation events pay the bloat.
Fix
Strip binary-bearing content parts before persist; rehydrate from the workspace file store on reload via the `fileRefs` that are already on the user-message metadata.
The conversation event only needs a reference by file id — `fileRefs` is the existing field.
On reload, only rehydrate if a caller actually needs the bytes. Most replay paths (history to the LLM) can work off the extracted-text content part, which is already persisted separately and is cheap.
Files likely to touch
`src/runtime/runtime.ts` — around where the user message is assembled before the store append (see ~line 519 for the `userContent` construction from `request.contentParts`).
`src/conversation/event-sourced-store.ts` (and memory/jsonl siblings) — assert that the event being appended has no `Buffer`/`Uint8Array` payloads. Throw loudly if it does, so this regression can't silently come back.
New test in `test/unit/conversation/` — round-trip a user message with a `contentParts` image; assert the on-disk JSONL does not contain `"0":`, `"1":`, etc.
Out of scope
Replay UX changes (e.g., lazy-loading file bytes in the web client). Track separately if we need it.
Re-writing existing bloated JSONL files. If desired, a one-shot cleanup can come later — needs decisions about whether to mutate historical events.
Symptom
Conversations that include chat-uploaded files persist to JSONL at roughly 10× the size they should. A single 100 KB image balloons its conversation file by ~1 MB.
Evidence:
conv_ce7995f740af44d4.jsonlis 8.8 MB and mostly binary-as-dict serializations of uploaded images/PDFs.Root cause
Uploaded file bytes reach
runtime.chat()asBufferinstances on thecontentPartsarray of the user-message event. The conversation event store callsJSON.stringifyon the event as-is;Bufferserializes to a one-key-per-byte dict shape:```json
{"type":"image","image":{"0":137,"1":80,"2":78,"3":71, ...}, "mimeType":"image/png"}
```
~10× overhead vs. the raw bytes, because each byte produces ``"NN":NNN,`` (5–7 characters) instead of 1 byte.
Cost framing
Fix
Strip binary-bearing content parts before persist; rehydrate from the workspace file store on reload via the `fileRefs` that are already on the user-message metadata.
Files likely to touch
Out of scope
References
BUGFIX_4_FILE_STORE_SPLIT.md), Step 4 "Conversation JSONL bloat (bonus)"