fix: buffer non-streaming upstream responses for currency e2e demo#952
Draft
Spherrrical wants to merge 1 commit into
Draft
fix: buffer non-streaming upstream responses for currency e2e demo#952Spherrrical wants to merge 1 commit into
Spherrrical wants to merge 1 commit into
Conversation
Co-authored-by: Musa <musa@spherrrical.dev>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
The
e2e-demo-currencyCI job fails onadvanced/currency_exchangehurl tests even though the demo flow mostly works. Logs show:llm_gateway:upstream response parse error: JSON parsing error: EOF while parsing a stringon non-streaming OpenAI responses (body truncated mid-JSON)prompt_gateway:response body empty, chunk_start: 0, chunk_size: 0during streamingtransfer closed with outstanding read data remainingon the chat completions requestRoot cause
Both WASM filters parsed non-streaming upstream bodies on the first response chunk. When Envoy delivered the JSON across multiple chunks, partial bodies were forwarded/parsed, causing JSON EOF errors and broken client streams.
Fix
llm_gateway: Accumulate non-streaming response chunks (suppress intermediate chunks) and parse only when the stream ends. Flush any remaining SSE processor/buffer bytes on a final empty EOS chunk for streaming responses.prompt_gateway: Same buffering for non-streaming tool-call responses that enrich metadata before returning to the client.advanced/currency_exchangeto the hurl retry list (3 attempts), matchingpreference_based_routing, to tolerate transient external API flakiness.Testing
cargo +1.93.0 clippy -p llm_gateway -p prompt_gateway -- -D warningscargo +1.93.0 test -p llm_gateway -p prompt_gateway --libcargo +1.93.0 build --release --target=wasm32-wasip1 -p llm_gateway -p prompt_gatewayFull e2e requires CI (
e2e-demo-currency) with API keys and the plano Docker image rebuild.