fix: prevent mid-stream retry from duplicating published tokens (#5923)#6541
Open
EsraaKamel11 wants to merge 1 commit intoaden-hive:mainfrom
Open
fix: prevent mid-stream retry from duplicating published tokens (#5923)#6541EsraaKamel11 wants to merge 1 commit intoaden-hive:mainfrom
EsraaKamel11 wants to merge 1 commit intoaden-hive:mainfrom
Conversation
…ent bus Fixes aden-hive#5923 LiteLLMProvider.stream() retried on transient errors and rate limits without checking whether TextDeltaEvents had already been yielded and published to the event bus. When an error fired after K chunks had streamed, the retry replayed the full response from token 1 — permanently concatenating the partial first attempt with the complete second attempt in the client UI stream. EventBus.publish() is fire-and-forget with no retract mechanism, making the corruption irreversible. With RATE_LIMIT_MAX_RETRIES=10, up to 11 concatenated partial attempts could reach the client before a terminal error. Tool-call-only streams were unaffected (tool deltas are buffered, never yielded as TextDeltaEvents). Fix: add a guard in both exception handlers — if accumulated_text is non-empty when an error fires, yield StreamErrorEvent(recoverable=True) and return instead of retrying. EventLoopNode._do_stream() commits the partial text to conversation history and does not trigger an outer retry (line 1706 condition requires accumulated_text == '' to raise ConnectionError). Clean restart without touching the already-published stream. Guard uses accumulated_text only, not tool_calls_acc — tool deltas are buffered locally and never published before stream completion, so mid-tool-stream errors remain safe to retry internally. Tests added (5 new, 74/74 passing): - test_mid_stream_error_no_duplicate_deltas_3_chunks: 3 chunks + error -> exactly 3 deltas on bus, no outer retry - test_mid_stream_error_no_duplicate_deltas_50_chunks: 50 chunks + error -> exactly 50 deltas, no outer retry - test_mid_stream_error_at_chunk_0_triggers_outer_retry: error before first chunk -> outer retry fires, exactly 2 deltas from success path - test_mid_stream_tool_only_error_inner_retry_unaffected: tool-only error -> inner retry safe, no duplication - test_mid_stream_recoverable_error_partial_text_committed: partial text committed to history, call_index == 1
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #5923 — when the LiteLLM streaming layer retried after a mid-stream
RateLimitErroror transient error, it re-streamed from token 1, duplicating content that had already been yielded to callers and published to the event bus.Since published events cannot be recalled, the retry must be abandoned when a partial stream has already been emitted.
Root Cause
LiteLLMProvider.stream()has an internal retry loop for transient errors. Both theRateLimitErrorhandler and the generic transient-error handler would unconditionallycontinue— restarting the entire stream from the beginning — even whenaccumulated_textwas non-empty (i.e., chunks had already been yielded upstream and emitted on the event bus).Before — both handlers did this unconditionally:
The event bus publishes token deltas eagerly as they stream in. There is no mechanism to retract already-published events, so retrying produced a second copy of every token the client had already received.
Fix
In both exception handlers, check
accumulated_textbefore retrying. If any text has already been yielded, emit arecoverable=TrueStreamErrorEventand return immediately.EventLoopNode's existing empty-response guard at line 1706 detects the non-emptyaccumulated_textand suppresses the outer retry, preserving the partial turn.Changes
core/framework/llm/litellm.pyaccumulated_textguard toRateLimitErrorhandler (L989) and transientExceptionhandler (L1012)core/tests/test_event_loop_node.pyPartialStreamThenErrorLLMhelper andTestMidStreamRetryNoDuplicationclass with 5 new testsTests
5 new automated tests added to
TestMidStreamRetryNoDuplicationincore/tests/test_event_loop_node.py— all passing (74/74 total):test_mid_stream_error_no_duplicate_deltas_3_chunks— 3 chunks + error -> exactly 3 deltas on bus, no outer retrytest_mid_stream_error_no_duplicate_deltas_50_chunks— 50 chunks + error -> exactly 50 deltas, no outer retrytest_mid_stream_error_at_chunk_0_triggers_outer_retry— error before first chunk -> outer retry fires correctly, no duplicationtest_mid_stream_tool_only_error_inner_retry_unaffected— tool-only error -> inner retry safe, guard does not blocktest_mid_stream_recoverable_error_partial_text_committed— partial text committed to history,_call_index == 1Notes
accumulated_textonly, nottool_calls_acc— tool deltas are buffered locally and never published before stream completion, so mid-tool-stream errors remain safe to retry internallynot has_contentand is unaffectedEventLoopNodeempty-response guard atruntime_logger.py:1706is the cooperating mechanism that absorbs the early exit without crashing the turn