Skip to content

fix: stream hang prevention, thinking block strip, response capture, web search interception#137

Open
jsboige wants to merge 21 commits into
MadAppGang:mainfrom
jsboige:main
Open

fix: stream hang prevention, thinking block strip, response capture, web search interception#137
jsboige wants to merge 21 commits into
MadAppGang:mainfrom
jsboige:main

Conversation

@jsboige
Copy link
Copy Markdown

@jsboige jsboige commented May 31, 2026

Summary

Multiple bug fixes and features accumulated on the fork, all backwards-compatible:

Bug fixes

  • fix(stream-parsers): prevent openai-sse stream hang (34bac1f) — finalize() referenced undefined opts.modelName, causing a ReferenceError before controller.close(). The error was swallowed, leaving the ReadableStream permanently open and the client hanging forever. Fixed by using the correct target parameter.
  • fix(composed-handler): strip thinking blocks from message history for non-native providers (e1c9753) — Opus thinking blocks (with Anthropic signatures) were forwarded verbatim to Z.AI/GLM/MiniMax, causing corruption or API rejection.
  • fix(anthropic-sse): suppress server_tool_use blocks and handle empty responses (0cb986a) — Z.AI sends server_tool_use blocks that the Anthropic client doesn't understand. Also emits synthetic message_start/message_stop when the provider returns an empty response.
  • fix(composed-handler): strip inline system messages for Anthropic-transport providers (032919c) — Claude Code v2.1.153+ injects role: "system" inline. Anthropic-compat providers reject these.

Features

  • feat(stream-parsers): intercept WebSearch/WebFetch via SearXNG (24ec4da) — Replaces the "web search unsupported" warning with actual execution via a local SearXNG instance. Two interception paths: structured tool_call and GLM <searchWeb> tags.
  • feat(request-logger): add full-body capture mode for diagnostics (122b681) — Gated by CLAUDISH_CAPTURE_DIR env var. No-op in production.

Infrastructure

  • Response-side SSE capture (response-capture.ts) — diagnostic helper gated by CLAUDISH_CAPTURE_DIR, wired into all three stream parsers
  • Docker: bump bun 1.3-alpine, use SearXNG DNS name

Testing

  • 2000+ response captures during live testing: 0 closed=false (all streams close cleanly)
  • Tested 3 sub-agent types (code-explorer, Explore, general-purpose) under GLM — all completed
  • Tested Opus sub-agent with 35 tool_use blocks — completed in 8.8s
  • All existing tests pass (bun test)

🤖 Generated with Claude Code

jsboigeEpita and others added 21 commits May 14, 2026 01:09
…creds

- Proxy auth middleware validates authorization/x-api-key/x-proxy-key
  headers against proxyKey from config (no custom header needed)
- NativeHandler falls back to stored ANTHROPIC_API_KEY when client
  doesn't provide auth (cluster scenario)
- Custom endpoints registered at runtime pass credential checks
- loadConfig() preserves proxyKey field
- Proxy binds to configurable hostname (0.0.0.0 for Docker)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
OAuth tokens (sk-ant-oat01-) are rejected by Anthropic when sent as
x-api-key. Detect the prefix and use authorization: Bearer instead.
Also pass proxy key to NativeHandler so it can distinguish proxy auth
from genuine client tokens.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…tion

Forward ALL incoming headers to api.anthropic.com instead of selectively
forwarding only auth/beta headers. Claude Code sends internal headers that
make Max subscription work for Opus/Sonnet — the proxy must not drop them.

Proxy key override still works: when proxyKey is configured and matches
client auth, it's replaced with the stored Anthropic key. When no proxyKey
is set (pass-through mode), everything flows through unmodified.

Same change applied to the count_tokens endpoint.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The standalone proxy now reads the default profile from ~/.claudish/config.json
and passes its model mapping to createProxyServer(). This means:
- opus/sonnet/haiku role requests get remapped to the profile's models
- Profile "default" maps: opus→glm-5.1, sonnet→glm-4.7, haiku→glm-4.7-flash
- Without a profile mapping, all models pass through unchanged

Startup logs now show the active profile name and role mappings.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…elMap

- Add /v1/models endpoint for Claude Code gateway discovery
  (CLAUDE_CODE_ENABLE_GATEWAY_MODEL_DISCOVERY=1)
- Returns all models from config.json routing rules + custom endpoints
- Standalone proxy passes no modelMap — every model routes via config rules
- No role remapping: opus→claude-opus-4-7, sonnet→glm-5.1, haiku→qwen3.6

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ropic providers

Claude Code injects `x-anthropic-billing-header: cc_version=...; cch=XXXXX;`
into the system prompt body. The cch= token changes every request, which
invalidates vLLM prefix caching (strict block hash). Strip this line from
the prompt for all non-Anthropic handlers — only Anthropic uses it.

Expected impact: vLLM TTFT from 30-67s → 1-3s on conversation turns 2+.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Dockerfile: Bun alpine, standalone proxy, healthcheck
- docker-compose.yml: volume mount for config, port 3000
- .dockerignore: exclude traces, node_modules, dist
- start-claudish-proxy.ps1: Windows service wrapper with logging

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Direct connections (no IIS ARR) had no x-forwarded-for/x-real-ip
headers, so all LAN machines appeared as "direct" in logs. Capture
remote address via Bun's server context and include in request logging.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…utocompact

All non-Anthropic stream parsers (openai-sse, gemini-sse, ollama-jsonl,
openai-responses-sse) hardcoded input_tokens: 100 in message_start and
omitted input_tokens from message_delta. This broke Claude Code's
autocompact because tokenCountWithEstimation() accumulated usage from
the SSE stream — seeing input_tokens=100 forever meant the context
appeared nearly empty, so condensation never triggered.

Changes:
- message_start: input_tokens: 100 → 0 (placeholder, replaced by delta)
- message_delta: include actual input_tokens from provider response
- openai-sse fallback: remove hardcoded 100 in onTokenUpdate

Affected models: GLM-5.1, Qwen, Gemini, Ollama, Codex — all non-Anthropic
providers using these parsers.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Extract all fork-specific code (proxy auth, model discovery, billing
header strip, request logger, hostname binding, standalone proxy) into
dedicated modules under packages/cli/src/fork/. proxy-server.ts drops
from 715 to 597 lines, with fork diff now ~30 lines (import + registration
call). This makes future upstream syncs straightforward — fork code is
visible in one directory and easily rebased.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…e errors

When thinking blocks are suppressed, subsequent content block indices can
jump ahead of what the client expects. Clamp indices to highestSeen+1 and
log the adjustment for debugging. Also pass CLAUDISH_PROXY_KEY via
docker-compose environment.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Some Anthropic-compat providers (z.ai, GLM) send message_delta with
stop_reason but never emit the message_stop event. Claude Code requires
message_stop as the terminal event — without it, the client reports
"API returned an empty or malformed response (HTTP 200)".

Track sawMessageStop separately from stopReason so the synthetic
finalization fires even when stopReason was received but message_stop
was not.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Replace the "web search unsupported" warning with actual execution via
a local SearXNG instance. Two interception paths:

1. Structured tool_call: WebSearch/WebFetch tool calls are suppressed
   from the stream and replaced with SearXNG results as a text block.
2. GLM <searchWeb> tags: GLM models emit <searchWeb><query>...</query>
   </searchWeb> in text — these are intercepted at finalize and replaced
   with SearXNG results.

Also intercepts sub-agent web search/fetch requests at the proxy level
(Claude Code sends "Perform a web search for the query: X" as a single
user message).

Additional improvements:
- ToolState.suppressed flag for clean tool call suppression
- Empty-response error handling: providers returning no content now emit
  a structured api_error event instead of a malformed empty message
- Finalize logging with model, text length, tool count
- Proxy logs tool names in incoming requests for debug

Requires SEARXNG_URL env var. Graceful fallback when unavailable.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…responses

Z.AI/GLM providers emit server_tool_use content blocks that the Anthropic
SDK does not understand (it expects only text, tool_use, thinking). These
blocks are now silently suppressed from the stream with index tracking.

When the stream ends without a message_start (empty response from provider),
emit a synthetic message sequence (message_start → text block with error
message → message_stop) so the client SDK receives a well-formed response
instead of a malformed empty one.

Also includes minor whitespace reformatting for consistency.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…nsport providers

Claude Code v2.1.153+ injects role:"system" messages inline in the
messages array (e.g. system-reminders). Anthropic-compatible providers
(Z.AI, MiniMax, Kimi) reject role:"system" in messages — only
"user"/"assistant" are accepted.

The fix strips inline system messages from the payload and merges their
content into the top-level system prompt field. Applied at two levels:

1. ComposedHandler: strips from requestPayload.messages for
   anthropic-sse transport
2. openai-messages converter: handles the OpenAI format path by merging
   into the system message at index 0

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add CLAUDISH_CAPTURE_DIR env-gated diagnostic capture that writes the
full request body to JSON files. Disabled by default (no-op when env
var is unset), so zero overhead in normal production use.

Purpose: when a hang or malformed response is reported, enable the env
var to capture the exact request payload for offline replay against
different bun versions or configurations.

Files are written to the capture dir with pattern:
  req-{pid}-{seq}-{timestamp}-{source}.json

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…rtifacts

- Bump Dockerfile base image from oven/bun:1.2-alpine to 1.3-alpine
  (pending Docker Hub egress fix for rebuild validation)
- Switch SEARXNG_URL from IP:port to search.myia.io DNS name
- Add .gitignore entries for diagnostic capture/, trace files,
  deploy notes, and monitoring scripts

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…andling, and diagnostic capture

Document three new subsystems:
- Web Search Interception (v7.1+): SearXNG backend, two interception
  paths (tool_call suppression + GLM searchWeb tags), sub-agent handling
- Inline System Message Handling: strip role:system for Anthropic-transport
  providers (Claude Code v2.1.153+ compatibility)
- Diagnostic Body Capture: CLAUDISH_CAPTURE_DIR env-gated full request
  body capture for offline hang reproduction

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…de capture

The finalize() function in openai-sse.ts referenced opts.modelName which
does not exist in that scope (the parameter is target). This caused a
ReferenceError thrown AFTER state.finalized=true but BEFORE
controller.close(). The catch block re-invoked finalize() which returned
immediately (already finalized), swallowing the error and leaving the
ReadableStream permanently open - the client HTTP connection hung forever.

Fix: replace opts.modelName with target (lines 160-161, 350).

Also adds response-side SSE capture infrastructure (gated by
CLAUDISH_CAPTURE_DIR env var, no-op when unset):
- response-capture.ts: new diagnostic helper
- Wired into openai-sse, anthropic-sse, and native-handler
- request-logger: add machine= tag from x-claudish-machine header

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… non-native providers

ComposedHandler is never used for native Anthropic (that's NativeHandler),
so every request here goes to a provider that doesn't understand Anthropic
thinking signatures. Previously, Opus thinking blocks (with signatures)
flowed through to Z.AI/GLM/MiniMax/etc. unmodified, causing either silent
corruption or API rejection on subsequent turns.

The strip removes thinking blocks (type: 'thinking') from assistant messages
in the request payload before forwarding to the provider. Native Anthropic
requests via NativeHandler are unaffected.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants