Skip to content

fix assistant streaming degradation on long threads#2631

Open
justsomelegs wants to merge 9 commits intopingdotgg:mainfrom
justsomelegs:t3code/audit-token-performance
Open

fix assistant streaming degradation on long threads#2631
justsomelegs wants to merge 9 commits intopingdotgg:mainfrom
justsomelegs:t3code/audit-token-performance

Conversation

@justsomelegs
Copy link
Copy Markdown
Contributor

@justsomelegs justsomelegs commented May 10, 2026

What Changed

Optimizes assistant streaming message projection so streaming deltas are appended directly to the existing projected message instead of forcing the broader thread refresh path for every delta.

The change keeps the persisted message output the same, but avoids repeatedly rebuilding/read-refreshing the full thread state during high-volume assistant streaming.

Why

Long threads can contain very large numbers of assistant streaming delta events. The old path spent most of its time refreshing thread read-model state after each tiny text delta.

Benchmark from a large production thread:

thread messages: 1,430
thread activities: 9,766
assistant streaming delta events: 165,130
legacy mode: sampled, 500 samples per 10k-event window

OVERALL

optimized: mean 0.0047ms/event, p50 0.0040ms, p90 0.0060ms, p99 0.0113ms
legacy:    mean 39.8074ms/event, p50 38.4388ms, p90 46.3055ms, p99 59.6003ms
speedup:   mean ~8,555x, p99 ~5,274x

SAMPLE BREAKDOWN

events 1-10,000:
  optimized mean 0.0046ms/event, p99 0.0091ms
  legacy    mean 36.5874ms/event, p99 46.1719ms

events 50,001-60,000:
  optimized mean 0.0042ms/event, p99 0.0081ms
  legacy    mean 36.9265ms/event, p99 44.2846ms

events 100,001-110,000:
  optimized mean 0.0044ms/event, p99 0.0114ms
  legacy    mean 39.9709ms/event, p99 53.0225ms

events 130,001-140,000:
  optimized mean 0.0052ms/event, p99 0.0175ms
  legacy    mean 42.7856ms/event, p99 68.0208ms

events 160,001-165,130:
  optimized mean 0.0061ms/event, p99 0.0299ms
  legacy    mean 41.0519ms/event, p99 62.3399ms

BEFORE

before.delta.improvements.mp4

AFTER

after.assistant-streaming.optimisation.mp4

This keeps the hot path proportional to the incoming delta instead of proportional to the size of the thread.

Checklist

  • This PR is small and focused
  • I explained what changed and why
  • I included before/after screenshots for any UI changes
  • I included a video for animation/interaction changes

Note

Fix assistant streaming degradation on long threads by appending text deltas in-place

  • Replaces full-text recompute logic for streaming assistant messages with a new appendText operation in ProjectionThreadMessages that concatenates deltas via SQL (text || excluded.text) on conflict.
  • Skips refreshThreadShellSummary during assistant streaming events in ProjectionPipeline.ts, only persisting updatedAt until the stream terminates.
  • Preserves existing attachments_json unless the streaming event explicitly provides replacements.
  • Behavioral Change: repository.upsert now encodes inputs before SQL execution and may return a decode error where it previously would not.

Macroscope summarized fe5b0a2.

@justsomelegs justsomelegs changed the title Fix assistant streaming degradation on long threads fix assistant streaming degradation on long threads May 10, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 10, 2026

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 96370bc2-2696-4f1b-8a73-e2a2721ee60e

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions Bot added vouch:trusted PR author is trusted by repo permissions or the VOUCHED list. size:L 100-499 changed lines (additions + deletions). labels May 10, 2026
@macroscopeapp
Copy link
Copy Markdown
Contributor

macroscopeapp Bot commented May 10, 2026

Approvability

Verdict: Needs human review

This PR changes how assistant streaming messages are persisted - from full upserts to in-place SQL text concatenation - and modifies when thread summaries are refreshed. While well-tested, this is a significant runtime behavior change to message handling that warrants human review.

You can customize Macroscope's approvability policy. Learn more.

@justsomelegs justsomelegs force-pushed the t3code/audit-token-performance branch from 6d9240b to b28e5f2 Compare May 10, 2026 17:48
Comment thread apps/server/src/persistence/Layers/ProjectionThreadMessages.ts Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:L 100-499 changed lines (additions + deletions). vouch:trusted PR author is trusted by repo permissions or the VOUCHED list.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants