Skip to content

fix(anthropic-sse): finalize stream on mid-stream socket close (never-hang)#143

Open
jsboige wants to merge 1 commit into
MadAppGang:mainfrom
jsboige:fix/anthropic-sse-midstream-socket-close
Open

fix(anthropic-sse): finalize stream on mid-stream socket close (never-hang)#143
jsboige wants to merge 1 commit into
MadAppGang:mainfrom
jsboige:fix/anthropic-sse-midstream-socket-close

Conversation

@jsboige

@jsboige jsboige commented Jun 13, 2026

Copy link
Copy Markdown

Problem

Anthropic-compatible providers that speak native Anthropic SSE (Z.AI, MiniMax, Kimi) intermittently close the TCP socket mid-stream under load / burst limits. When that happens, reader.read() in the passthrough loop rejects, the rejection escapes to the outer catch (e) block, which does a bare controller.close() with no terminal message_stop event.

The Claude Code client then reports:

API Error: The socket connection was closed unexpectedly. For more information, pass verbose: true in the second argument to fetch()

…and freezes the agent turn — a hung agent, the worst proxy failure mode.

Root cause

In anthropic-sse.ts, the read loop's outer catch can only raw-close the controller. There was no path to emit the terminal message_stop the client requires to end a turn, because a finalization helper was never in scope.

Fix

Two changes, both additive:

  1. finalizeWithError(errMsg, path) helper — declared inside the outer try, above the read loop. Emits a valid terminal Claude message so the client ends the turn cleanly:

    • no content reached the client yet → synthetic message_start + a content block carrying a short notice, so the closing events are well-formed
    • content already streamingmessage_delta(end_turn) + message_stop (the client tolerates a missing content_block_stop on early termination far better than a missing message_stop, which hangs the turn)
  2. Inner try/catch around the read loop — a mid-stream reader rejection is now caught where finalizeWithError is in scope and routed through it, instead of escaping to the bare outer close. The outer catch is preserved as a last-resort safety net.

Whatever streamed before the cutoff is preserved — no content loss.

Why this matters

A frozen agent is far worse than a clean error. With providers that drop connections under load (observed in production with Z.AI serving GLM models), this turns a transient network blip into a permanently stuck session that the user must manually reset. The fix degrades it to a clean end-of-turn.

Test

Regression test added to format-translation.test.ts:

never-hang: upstream socket close mid-stream still emits terminal message_stop

Replays an upstream body that emits message_start + a text delta, then errors (simulating socket close). Asserts:

  • the partial content survives (not lost to the cutoff)
  • a terminal message_stop is present
  • a synthesized stop_reason: end_turn

Fails without the fix (no message_stop reaches the client), passes with it.

Scope

  • 2 files, +148 lines, additions only — no behavior change on the happy path or the normal end-of-stream path
  • Touched file: packages/cli/src/handlers/shared/stream-parsers/anthropic-sse.ts
  • No dependencies on other PRs

🤖 Generated with Claude Code

…-hang)

Z.AI and other Anthropic-compatible providers (MiniMax, Kimi) intermittently
close the TCP socket mid-stream under load / burst limits. The reader.read()
in the passthrough loop rejected, escaped to the outer `catch (e)` block,
which did a bare `controller.close()` with NO terminal message_stop event.
The Claude Code client then reported "API Error: The socket connection was
closed unexpectedly" and froze the turn — a hung agent, the worst proxy
failure mode.

Wrap the read loop in an inner try/catch that has a new `finalizeWithError`
helper in scope (declared in the same outer try). On a reader rejection it
emits a valid terminal Claude message (message_delta with end_turn +
message_stop), preserving whatever streamed before the cutoff so the client
ends the turn cleanly instead of freezing.

The helper handles both cases:
  - no content reached the client yet → synthetic message_start + content
    block + the notice, so message_stop is well-formed
  - content already streaming → message_delta + message_stop (the client
    tolerates a missing content_block_stop on early termination far better
    than a missing message_stop, which hangs the turn)

Regression test: upstream body that emits message_start + a text delta then
errors. Asserts the partial content survives AND a terminal message_stop is
emitted. Fails without the fix (no message_stop reaches the client), passes
with it.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant