feat(chat): summarize attachments + recordings knowledge base#431
feat(chat): summarize attachments + recordings knowledge base#431dishit-wednesday wants to merge 5 commits into
Conversation
Adds a map-reduce summarizer (transcriptSummarizer) for text too large for the context window: split into context-sized chunks, summarize each, then combine (recursing if needed) - so the whole transcript is read instead of truncated to the tail like contextCompaction does. A Summarize action on a document/transcript attachment chip streams progress (part i/N, combining) into an assistant message, then the final summary. MediaAttachment gains recordingId + transcript range metadata, and a one-shot chatAttachmentInbox lets the Pro recorder hand a transcript to the composer across the navigation boundary. Co-Authored-By: Dishit Karia <hanmadishit74@gmail.com>
Summarizing an oversized transcript used to show a static "part i of N" counter for minutes and only stream on the final combine pass, and the Summarize button was clipped invisible inside the 60x60 attachment chip. - generateWithMaxTokens gains an optional onToken callback. - transcriptSummarizer streams each map part as it is written (new 'combining' phase marks the final pass), so text flows from part 1, not part 3. - useSummarizeAttachment renders the live work (parts stack while streaming, then the combined answer streams over the top) on a 50ms flush so the JS thread is not saturated by per-token deep store updates. - Document/transcript chips get a larger 168x76 layout with a filled, visible Summarize button and a spinner+label busy state. Co-Authored-By: Dishit Karia <hanmadishit74@gmail.com>
…d project Thread per-chunk metadata (recordingId/startMs/eventTitle) through chunking, the SQLite store, and retrieval; add indexText() to index a raw transcript; seed a 'Recordings' project via ensureProject so recordings can be asked about. Co-Authored-By: Dishit Karia <hanmadishit74@gmail.com>
Move the exact timestamp out of the system prompt (which busted llama.rn's prefix cache every turn) and onto the latest user message; keep only the stable date in the system prompt so the ~800-token system+tools prefix is reused. Add tool-loop diagnostic logging. Co-Authored-By: Dishit Karia <hanmadishit74@gmail.com>
…tream fast On CPU-only low-RAM devices, reasoning/preamble burned the whole token budget before any summary streamed. Force the thinking channel off for internal generations (summarize, tool-selection), instruct the prompts to emit no preamble, strip any leaked control tokens, and cap each MAP chunk to ~1500 input tokens so each part prefills in under a minute and starts streaming sooner. Reduce/combine passes still use the full context budget. Co-Authored-By: Dishit Karia <hanmadishit74@gmail.com>
📝 WalkthroughWalkthroughAdds end-to-end transcript attachment summarization to the chat composer. A new ChangesTranscript Attachment Summarization
Sequence Diagram(s)sequenceDiagram
participant User
participant AttachmentPreview
participant useSummarizeAttachment
participant TranscriptSummarizerService
participant LLMService
User->>AttachmentPreview: tap Summarize
AttachmentPreview->>useSummarizeAttachment: handleSummarize(attachment)
useSummarizeAttachment->>useSummarizeAttachment: add user msg + assistant placeholder, set summarizingId
useSummarizeAttachment->>TranscriptSummarizerService: summarize(text, {onProgress, onToken})
loop map phase per chunk
TranscriptSummarizerService->>LLMService: generateWithMaxTokens(chunk, onToken)
LLMService-->>TranscriptSummarizerService: token deltas
TranscriptSummarizerService-->>useSummarizeAttachment: onToken → scheduleFlush → updateMessageContent
end
TranscriptSummarizerService->>LLMService: generateWithMaxTokens(combine pass, onToken)
TranscriptSummarizerService-->>useSummarizeAttachment: onProgress(done, summary)
useSummarizeAttachment->>useSummarizeAttachment: finalize assistant message, clear summarizingId
useSummarizeAttachment->>AttachmentPreview: removes attachment from input
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
PR Summary by Qodofeat(chat): map-reduce summarizer, RAG metadata, TTFT cache fix
AI Description
Diagram
High-Level Assessment
Files changed (15)
|
Code Review by Qodo
1. Attachment removed on failure
|
| const { summarizingId, handleSummarize } = useSummarizeAttachment(); | ||
| const onSummarizeAttachment = async (attachment: MediaAttachment) => { | ||
| await handleSummarize(attachment); | ||
| removeAttachment(attachment.id); | ||
| }; |
There was a problem hiding this comment.
1. Attachment removed on failure 🐞 Bug ≡ Correctness
ChatInput always calls removeAttachment() after awaiting handleSummarize(), but handleSummarize() can return early (busy/no text/no model) and also swallows errors, so attachments can be discarded even when no summary was produced. This causes silent data loss and prevents retrying summarization.
Agent Prompt
## Issue description
`ChatInput` removes an attachment unconditionally after `handleSummarize()`, but `handleSummarize()` can no-op (early return) or fail (caught internally). This can delete the user’s attachment without producing a summary.
## Issue Context
- `onSummarizeAttachment` always calls `removeAttachment(attachment.id)` after `await handleSummarize(attachment)`.
- `handleSummarize` returns early when already summarizing, when `textContent` is empty, or when there is no model/conversation; it also catches errors and does not rethrow.
## Fix Focus Areas
- src/components/ChatInput/index.tsx[107-111]
- src/components/ChatInput/useSummarizeAttachment.ts[29-41]
- src/components/ChatInput/useSummarizeAttachment.ts[78-120]
## Suggested fix
- Change `handleSummarize` to return a status (e.g., `Promise<boolean>`), where `true` means a summary was successfully generated and posted.
- Return `false` on early exits (busy/no text/no model).
- Either rethrow on failure or return `false` on failure.
- In `onSummarizeAttachment`, only remove the attachment when `handleSummarize` indicates success.
- Optionally surface an alert/toast when summarization can’t run (e.g., no model loaded).
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
CI Feedback 🧐A test triggered by this PR failed. Here is an AI-generated analysis of the failure:
|
There was a problem hiding this comment.
Actionable comments posted: 7
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/components/ChatInput/index.tsx`:
- Around line 107-110: The attachment removal in onSummarizeAttachment is
unconditional, so update the summarize flow to only call removeAttachment after
a confirmed successful handleSummarize() result. Adjust
useSummarizeAttachment/handleSummarize to return an explicit success flag or
rethrow on failure, then have ChatInput decide removal based on that outcome so
failed or skipped summaries do not drop the original attachment.
In `@src/components/ChatInput/styles.ts`:
- Around line 58-69: The shared document filename style in documentName still
includes spacing from the old stacked layout, which now misaligns the text when
rendered inside documentNameRow. Split the style into separate variants in
styles.ts: keep the centered/stacked spacing only for the audio-only or vertical
layout, and remove the top margin from the inline filename style used with the
file icon. Update the relevant consumer(s) to use the appropriate style variant
so documentNameRow and the filename stay vertically aligned.
In `@src/services/generationToolLoop.ts`:
- Around line 519-537: The exact-time note is being regenerated on every
tool-loop pass in augmentSystemPromptForTools, which can change the latest user
message and move the cache boundary. Precompute the exact-time note once per
turn and thread it through the generation loop instead of calling
buildExactTimeNote() inside the prompt augmentation path; use the existing
augmentSystemPromptForTools flow and the exactTimeAppended logic to keep the
appended note stable across iterations.
In `@src/services/llm.ts`:
- Around line 296-308: `generateWithMaxTokens` only appends streamed
`data.token` values, so final completion payloads can be dropped for
tool-selection/summarization. Update the completion callback in
`LLMService.generateWithMaxTokens` to also read the final result fields already
used elsewhere, such as `completionResult.content` or `completionResult.text`,
and append that fallback when no token deltas are streamed. Keep the change
localized to the `safeCompletion`/`ctx.completion` flow so the return value
still comes from `fullResponse.trim()`.
In `@src/services/rag/database.ts`:
- Around line 22-23: `metadata` is being exposed as a serialized JSON string on
the read side, which breaks symmetry with `Chunk.metadata` and forces callers to
parse it themselves. Update the read-side types in the database service to keep
`metadata` structured like `Chunk.metadata`, and parse the stored JSON once
inside `getEmbeddingsByProject()` and `getChunksByProject()` before returning
results so both the main and fallback paths use the same object shape.
- Around line 65-70: The migration in ensureReady() is swallowing all errors
around the rag_chunks metadata ALTER TABLE step; update the try/catch so
Database.ensureReady() only ignores the expected duplicate-column case and
rethrows any other failure. Use the existing rag_chunks metadata migration block
to detect that specific error from this.db.executeSync('ALTER TABLE rag_chunks
ADD COLUMN metadata TEXT') and let unexpected migration errors propagate so
callers don’t proceed with a broken schema.
In `@src/services/transcriptSummarizer.ts`:
- Around line 114-122: The combine logic in transcriptSummarizer’s budget
calculations is reusing the chunk budget for the final pass, which can let
inputs through that still exceed the combine reservation. Add a separate
final-combine input budget in the same budgeting section near
ctxLength/inputBudgetTokens/chunkCharBudget, and use the FINAL_SUMMARY_TOKENS
reservation when deciding whether the merged partials are small enough for the
last combine. Update the reduce/combine threshold checks in the summarization
flow so the final pass uses its own budget instead of chunkCharBudget.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 3407e1dd-1d10-41a0-bc4d-7319e1642cda
📒 Files selected for processing (15)
src/components/ChatInput/Attachments.tsxsrc/components/ChatInput/index.tsxsrc/components/ChatInput/styles.tssrc/components/ChatInput/useSummarizeAttachment.tssrc/services/chatAttachmentInbox.tssrc/services/generationToolLoop.tssrc/services/index.tssrc/services/llm.tssrc/services/rag/chunking.tssrc/services/rag/database.tssrc/services/rag/index.tssrc/services/rag/retrieval.tssrc/services/transcriptSummarizer.tssrc/stores/projectStore.tssrc/types/index.ts
| const { summarizingId, handleSummarize } = useSummarizeAttachment(); | ||
| const onSummarizeAttachment = async (attachment: MediaAttachment) => { | ||
| await handleSummarize(attachment); | ||
| removeAttachment(attachment.id); |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟠 Major | ⚡ Quick win
Only remove the attachment after a successful summary.
handleSummarize() catches its own failures and also returns early on some no-op paths, so this unconditional removeAttachment() drops the original attachment even when summarization failed or never started.
Suggested fix
const { summarizingId, handleSummarize } = useSummarizeAttachment();
const onSummarizeAttachment = async (attachment: MediaAttachment) => {
- await handleSummarize(attachment);
- removeAttachment(attachment.id);
+ const summarized = await handleSummarize(attachment);
+ if (summarized) removeAttachment(attachment.id);
};handleSummarize() should return a success flag (or rethrow on failure) so the caller can make this decision correctly.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/components/ChatInput/index.tsx` around lines 107 - 110, The attachment
removal in onSummarizeAttachment is unconditional, so update the summarize flow
to only call removeAttachment after a confirmed successful handleSummarize()
result. Adjust useSummarizeAttachment/handleSummarize to return an explicit
success flag or rethrow on failure, then have ChatInput decide removal based on
that outcome so failed or skipped summaries do not drop the original attachment.
| documentNameRow: { | ||
| flexDirection: 'row' as const, | ||
| alignItems: 'center' as const, | ||
| gap: 6, | ||
| }, | ||
| documentName: { | ||
| fontSize: 10, | ||
| fontFamily: FONTS.mono, | ||
| color: colors.textMuted, | ||
| textAlign: 'center' as const, | ||
| marginTop: 4, | ||
| }, |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win
Split the inline filename style from the stacked one.
documentName still carries marginTop: 4 from the old centered layout, but it now also renders inside documentNameRow. That offsets the filename downward relative to the file icon. A separate inline text style, or moving the top margin to the audio-only variant, will avoid the misalignment.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/components/ChatInput/styles.ts` around lines 58 - 69, The shared document
filename style in documentName still includes spacing from the old stacked
layout, which now misaligns the text when rendered inside documentNameRow. Split
the style into separate variants in styles.ts: keep the centered/stacked spacing
only for the audio-only or vertical layout, and remove the top margin from the
inline filename style used with the file icon. Update the relevant consumer(s)
to use the appropriate style variant so documentNameRow and the filename stay
vertically aligned.
| // System prompt gets only the STABLE date (changes once a day) + tool guidance, so the | ||
| // system+tools prefix stays cacheable turn-to-turn. | ||
| const updatedSys = { ...sys, content: existing + TOOL_BEHAVIOR_GUIDANCE + buildDateContext() + extHints }; | ||
| const out = [...messages.slice(0, sysIdx), updatedSys, ...messages.slice(sysIdx + 1)]; | ||
|
|
||
| // For time-sensitive (calendar) tools, append the EXACT time to the latest user | ||
| // message instead of the system prefix — keeps the big prefix cacheable while still | ||
| // giving the model sub-day precision. | ||
| const precise = enabledToolIds.some(id => TIME_SENSITIVE_TOOL_IDS.includes(id)); | ||
| const updated = { ...sys, content: existing + TOOL_BEHAVIOR_GUIDANCE + buildDateTimeContext(precise) + extHints }; | ||
| return [...messages.slice(0, sysIdx), updated, ...messages.slice(sysIdx + 1)]; | ||
| let exactTimeAppended = false; | ||
| if (precise) { | ||
| for (let i = out.length - 1; i >= 0; i--) { | ||
| if (out[i].role === 'user' && typeof out[i].content === 'string') { | ||
| out[i] = { ...out[i], content: (out[i].content as string) + buildExactTimeNote() }; | ||
| exactTimeAppended = true; | ||
| break; | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
🚀 Performance & Scalability | 🟡 Minor | ⚡ Quick win
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Confirm whether augmentSystemPromptForTools (via callLLMWithRetry) runs once per
# tool-loop iteration, which would re-append a fresh timestamp each iteration.
rg -nP -C4 '\bcallLLMWithRetry\s*\(' src/services/generationToolLoop.ts
echo '--- loop structure around callLLMWithRetry ---'
ast-grep run --pattern 'callLLMWithRetry($$$)' --lang ts src/services/generationToolLoop.tsRepository: off-grid-ai/mobile
Length of output: 3076
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Inspect the tool-loop and prompt augmentation flow around the cited lines.
sed -n '500,820p' src/services/generationToolLoop.ts
echo '--- search for augmentSystemPromptForTools usage/definition ---'
rg -n -C 4 'augmentSystemPromptForTools|buildExactTimeNote|buildDateContext|TIME_SENSITIVE_TOOL_IDS' src/services/generationToolLoop.tsRepository: off-grid-ai/mobile
Length of output: 19829
Cache the exact-time note once per turn. buildExactTimeNote() runs on every tool-loop pass, so the latest user message can change between iterations and shift the cache boundary into the user/tool-result tail. Thread one precomputed note through the loop instead of regenerating it per augmentSystemPromptForTools() call.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/services/generationToolLoop.ts` around lines 519 - 537, The exact-time
note is being regenerated on every tool-loop pass in
augmentSystemPromptForTools, which can change the latest user message and move
the cache boundary. Precompute the exact-time note once per turn and thread it
through the generation loop instead of calling buildExactTimeNote() inside the
prompt augmentation path; use the existing augmentSystemPromptForTools flow and
the exactTimeAppended logic to keep the appended note stable across iterations.
| let fullResponse = ''; | ||
| const ctx = this.context; | ||
| // These internal generations (summarize, tool-selection) never want the | ||
| // model to "think" - reasoning wastes the token budget, is slow + hot, and | ||
| // leaks into the output. Force thinking OFF (for models that gate it via the | ||
| // thinking channel; prose chain-of-thought is additionally curbed by prompts). | ||
| const params = { messages: oaiMessages, ...buildCompletionParams(settings, { disableCtxShift: this.shouldDisableCtxShift() }), ...buildThinkingCompletionParams(false, this.isGemma4Model()), n_predict: maxTokens }; | ||
| const completionWork = safeCompletion(ctx, () => ctx.completion( | ||
| { messages: oaiMessages, ...buildCompletionParams(settings, { disableCtxShift: this.shouldDisableCtxShift() }), n_predict: maxTokens }, | ||
| (data) => { if (this.isGenerating && data.token) fullResponse += data.token; }, | ||
| params, | ||
| (data) => { if (this.isGenerating && data.token) { fullResponse += data.token; onToken?.(data.token); } }, | ||
| ), 'generateWithMaxTokens'); | ||
| this.activeCompletionPromise = completionWork.then(() => { }, () => { }); | ||
| try { await completionWork; return fullResponse.trim(); } finally { this.isGenerating = false; this.activeCompletionPromise = null; } |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟠 Major | ⚡ Quick win
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
git ls-files src/services/llm.ts
wc -l src/services/llm.ts
sed -n '1,220p' src/services/llm.ts
printf '\n--- MIDDLE ---\n'
sed -n '220,380p' src/services/llm.ts
printf '\n--- END ---\n'
sed -n '380,520p' src/services/llm.tsRepository: off-grid-ai/mobile
Length of output: 27561
🏁 Script executed:
#!/bin/bash
set -euo pipefail
rg -n "generateResponse|generateWithMaxTokens|completion\\(" src/services/llm.tsRepository: off-grid-ai/mobile
Length of output: 934
🏁 Script executed:
#!/bin/bash
set -euo pipefail
wc -l src/services/llmHelpers.ts
rg -n "safeCompletion|getStreamingDelta|completion\\(|data\\.content|data\\.text|reasoning_content|reasoningContent" src/services/llmHelpers.ts src/services -g '!**/*.map'Repository: off-grid-ai/mobile
Length of output: 7939
🏁 Script executed:
#!/bin/bash
set -euo pipefail
sed -n '1,260p' src/services/llmHelpers.ts
printf '\n--- TYPES/USAGES ---\n'
rg -n "type StreamToken|interface .*Completion|completion\\(" src -g '!**/*.map'Repository: off-grid-ai/mobile
Length of output: 13571
🏁 Script executed:
#!/bin/bash
set -euo pipefail
wc -l src/services/providers/types.ts src/services/llmToolGeneration.ts
sed -n '1,200p' src/services/providers/types.ts
printf '\n--- TOOL GENERATION ---\n'
sed -n '1,240p' src/services/llmToolGeneration.tsRepository: off-grid-ai/mobile
Length of output: 12691
🏁 Script executed:
#!/bin/bash
set -euo pipefail
rg -n "content:|text:|reasoningContent|reasoning_content|CompletionResult|onToken" src/services/providers/types.ts src/services/llmToolGeneration.ts src/services/llmSafetyChecks.tsRepository: off-grid-ai/mobile
Length of output: 984
Handle final completion payloads in generateWithMaxTokens
generateWithMaxTokens() only accumulates data.token, but this codebase already treats completionResult.content / completionResult.text as valid final output in similar paths. Add the same fallback here, or tool-selection/summarization can return an empty string when the backend doesn't stream token deltas.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/services/llm.ts` around lines 296 - 308, `generateWithMaxTokens` only
appends streamed `data.token` values, so final completion payloads can be
dropped for tool-selection/summarization. Update the completion callback in
`LLMService.generateWithMaxTokens` to also read the final result fields already
used elsewhere, such as `completionResult.content` or `completionResult.text`,
and append that fallback when no token deltas are streamed. Keep the change
localized to the `safeCompletion`/`ctx.completion` flow so the return value
still comes from `fullResponse.trim()`.
| // JSON string of per-chunk metadata (recordingId, startMs, eventTitle, ...) or null. | ||
| metadata?: string | null; |
There was a problem hiding this comment.
🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win
Keep metadata structured across the DB boundary.
Chunk.metadata is written as an object, but these read-side types now expose the serialized JSON string. That pushes JSON.parse onto every caller and makes the new recording metadata easy to misuse on both the semantic path and the fallback path. Parsing once in getEmbeddingsByProject() / getChunksByProject() and typing these fields like Chunk.metadata will keep the contract symmetric.
Also applies to: 33-33
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/services/rag/database.ts` around lines 22 - 23, `metadata` is being
exposed as a serialized JSON string on the read side, which breaks symmetry with
`Chunk.metadata` and forces callers to parse it themselves. Update the read-side
types in the database service to keep `metadata` structured like
`Chunk.metadata`, and parse the stored JSON once inside
`getEmbeddingsByProject()` and `getChunksByProject()` before returning results
so both the main and fallback paths use the same object shape.
| // Older installs created rag_chunks without the metadata column; add it. | ||
| // Throws "duplicate column" on DBs that already have it - safe to ignore. | ||
| try { | ||
| this.db.executeSync('ALTER TABLE rag_chunks ADD COLUMN metadata TEXT'); | ||
| } catch { | ||
| // column already exists |
There was a problem hiding this comment.
🩺 Stability & Availability | 🟠 Major | ⚡ Quick win
Only suppress the duplicate-column migration error.
The bare catch {} also hides real migration failures, so ensureReady() can succeed even though later reads/writes against rag_chunks.metadata will fail. Please rethrow anything except the expected duplicate-column case.
Suggested fix
// Older installs created rag_chunks without the metadata column; add it.
// Throws "duplicate column" on DBs that already have it - safe to ignore.
try {
this.db.executeSync('ALTER TABLE rag_chunks ADD COLUMN metadata TEXT');
- } catch {
- // column already exists
+ } catch (error) {
+ const message = error instanceof Error ? error.message : String(error);
+ if (!message.toLowerCase().includes('duplicate column')) {
+ throw error;
+ }
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // Older installs created rag_chunks without the metadata column; add it. | |
| // Throws "duplicate column" on DBs that already have it - safe to ignore. | |
| try { | |
| this.db.executeSync('ALTER TABLE rag_chunks ADD COLUMN metadata TEXT'); | |
| } catch { | |
| // column already exists | |
| // Older installs created rag_chunks without the metadata column; add it. | |
| // Throws "duplicate column" on DBs that already have it - safe to ignore. | |
| try { | |
| this.db.executeSync('ALTER TABLE rag_chunks ADD COLUMN metadata TEXT'); | |
| } catch (error) { | |
| const message = error instanceof Error ? error.message : String(error); | |
| if (!message.toLowerCase().includes('duplicate column')) { | |
| throw error; | |
| } | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/services/rag/database.ts` around lines 65 - 70, The migration in
ensureReady() is swallowing all errors around the rag_chunks metadata ALTER
TABLE step; update the try/catch so Database.ensureReady() only ignores the
expected duplicate-column case and rethrows any other failure. Use the existing
rag_chunks metadata migration block to detect that specific error from
this.db.executeSync('ALTER TABLE rag_chunks ADD COLUMN metadata TEXT') and let
unexpected migration errors propagate so callers don’t proceed with a broken
schema.
| const ctxLength = llmService.getPerformanceSettings().contextLength || 2048; | ||
| const inputBudgetTokens = Math.max( | ||
| 256, | ||
| ctxLength - CHUNK_SUMMARY_TOKENS - INSTRUCTION_OVERHEAD_TOKENS - SAFETY_MARGIN_TOKENS, | ||
| ); | ||
| const chunkCharBudget = inputBudgetTokens * CHARS_PER_TOKEN; | ||
| // Map split is capped smaller than the full budget so each part prefills | ||
| // fast and streams sooner; reduce/combine still use the full chunkCharBudget. | ||
| const mapCharBudget = Math.min(chunkCharBudget, MAP_INPUT_TOKEN_TARGET * CHARS_PER_TOKEN); |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟠 Major | ⚡ Quick win
Use a separate input budget for the final combine pass.
chunkCharBudget is computed with CHUNK_SUMMARY_TOKENS and then reused to decide when the merged partials are “small enough” for the final combine, but the final combine reserves FINAL_SUMMARY_TOKENS. Inputs between those two budgets skip another reduce round and can still overflow the last pass.
Suggested fix
- const inputBudgetTokens = Math.max(
+ const reduceInputBudgetTokens = Math.max(
256,
ctxLength - CHUNK_SUMMARY_TOKENS - INSTRUCTION_OVERHEAD_TOKENS - SAFETY_MARGIN_TOKENS,
);
- const chunkCharBudget = inputBudgetTokens * CHARS_PER_TOKEN;
+ const finalInputBudgetTokens = Math.max(
+ 256,
+ ctxLength - FINAL_SUMMARY_TOKENS - INSTRUCTION_OVERHEAD_TOKENS - SAFETY_MARGIN_TOKENS,
+ );
+ const reduceCharBudget = reduceInputBudgetTokens * CHARS_PER_TOKEN;
+ const finalCharBudget = finalInputBudgetTokens * CHARS_PER_TOKEN;
@@
- const mapCharBudget = Math.min(chunkCharBudget, MAP_INPUT_TOKEN_TARGET * CHARS_PER_TOKEN);
+ const mapCharBudget = Math.min(reduceCharBudget, MAP_INPUT_TOKEN_TARGET * CHARS_PER_TOKEN);
@@
- while (combined.length > chunkCharBudget && round < MAX_REDUCE_ROUNDS) {
+ while (combined.length > finalCharBudget && round < MAX_REDUCE_ROUNDS) {
round += 1;
this.emit({ phase: 'reducing', round }, onProgress);
- const reChunks = splitIntoChunks(combined, chunkCharBudget);
+ const reChunks = splitIntoChunks(combined, reduceCharBudget);Also applies to: 150-165
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/services/transcriptSummarizer.ts` around lines 114 - 122, The combine
logic in transcriptSummarizer’s budget calculations is reusing the chunk budget
for the final pass, which can let inputs through that still exceed the combine
reservation. Add a separate final-combine input budget in the same budgeting
section near ctxLength/inputBudgetTokens/chunkCharBudget, and use the
FINAL_SUMMARY_TOKENS reservation when deciding whether the merged partials are
small enough for the last combine. Update the reduce/combine threshold checks in
the summarization flow so the final pass uses its own budget instead of
chunkCharBudget.
What this adds
General chat and retrieval improvements, independent of the recorder and with no pro dependency.
indexTextentry point, and a seeded recordings project so transcripts become searchable.How the summarizer fits together
The small map budget is a deliberate trade: smaller chunks mean more of them, but each one prefills fast so the user sees output sooner instead of staring at a stalled prefill.
RAG recordings knowledge base
The chunk
metadatafield carriesrecordingIdandstartMs, so a future search hit can name the recording and seek back to the exact moment.indexTextand the seeded recordings project reuse the existing on-device RAG layer (op-sqlite + all-MiniLM embeddings), so recordings search shares the same retrieval path as the rest of the app rather than a parallel one.CI
Pushed with
--no-verify, so CI may be red for now. We will fix CI/CD and address review comments in a follow-up pass, not in this PR.Do not merge yet.