fix: stream hang prevention, thinking block strip, response capture, web search interception by jsboige · Pull Request #137 · MadAppGang/claudish

jsboige · 2026-05-31T23:41:03Z

Summary

Multiple bug fixes and features accumulated on the fork, all backwards-compatible:

Bug fixes

fix(stream-parsers): prevent openai-sse stream hang (34bac1f) — finalize() referenced undefined opts.modelName, causing a ReferenceError before controller.close(). The error was swallowed, leaving the ReadableStream permanently open and the client hanging forever. Fixed by using the correct target parameter.
fix(composed-handler): strip thinking blocks from message history for non-native providers (e1c9753) — Opus thinking blocks (with Anthropic signatures) were forwarded verbatim to Z.AI/GLM/MiniMax, causing corruption or API rejection.
fix(anthropic-sse): suppress server_tool_use blocks and handle empty responses (0cb986a) — Z.AI sends server_tool_use blocks that the Anthropic client doesn't understand. Also emits synthetic message_start/message_stop when the provider returns an empty response.
fix(composed-handler): strip inline system messages for Anthropic-transport providers (032919c) — Claude Code v2.1.153+ injects role: "system" inline. Anthropic-compat providers reject these.

Features

feat(stream-parsers): intercept WebSearch/WebFetch via SearXNG (24ec4da) — Replaces the "web search unsupported" warning with actual execution via a local SearXNG instance. Two interception paths: structured tool_call and GLM <searchWeb> tags.
feat(request-logger): add full-body capture mode for diagnostics (122b681) — Gated by CLAUDISH_CAPTURE_DIR env var. No-op in production.

Infrastructure

Response-side SSE capture (response-capture.ts) — diagnostic helper gated by CLAUDISH_CAPTURE_DIR, wired into all three stream parsers
Docker: bump bun 1.3-alpine, use SearXNG DNS name

Testing

2000+ response captures during live testing: 0 closed=false (all streams close cleanly)
Tested 3 sub-agent types (code-explorer, Explore, general-purpose) under GLM — all completed
Tested Opus sub-agent with 35 tool_use blocks — completed in 8.8s
All existing tests pass (bun test)

🤖 Generated with Claude Code

…creds - Proxy auth middleware validates authorization/x-api-key/x-proxy-key headers against proxyKey from config (no custom header needed) - NativeHandler falls back to stored ANTHROPIC_API_KEY when client doesn't provide auth (cluster scenario) - Custom endpoints registered at runtime pass credential checks - loadConfig() preserves proxyKey field - Proxy binds to configurable hostname (0.0.0.0 for Docker) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

OAuth tokens (sk-ant-oat01-) are rejected by Anthropic when sent as x-api-key. Detect the prefix and use authorization: Bearer instead. Also pass proxy key to NativeHandler so it can distinguish proxy auth from genuine client tokens. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…tion Forward ALL incoming headers to api.anthropic.com instead of selectively forwarding only auth/beta headers. Claude Code sends internal headers that make Max subscription work for Opus/Sonnet — the proxy must not drop them. Proxy key override still works: when proxyKey is configured and matches client auth, it's replaced with the stored Anthropic key. When no proxyKey is set (pass-through mode), everything flows through unmodified. Same change applied to the count_tokens endpoint. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

The standalone proxy now reads the default profile from ~/.claudish/config.json and passes its model mapping to createProxyServer(). This means: - opus/sonnet/haiku role requests get remapped to the profile's models - Profile "default" maps: opus→glm-5.1, sonnet→glm-4.7, haiku→glm-4.7-flash - Without a profile mapping, all models pass through unchanged Startup logs now show the active profile name and role mappings. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…elMap - Add /v1/models endpoint for Claude Code gateway discovery (CLAUDE_CODE_ENABLE_GATEWAY_MODEL_DISCOVERY=1) - Returns all models from config.json routing rules + custom endpoints - Standalone proxy passes no modelMap — every model routes via config rules - No role remapping: opus→claude-opus-4-7, sonnet→glm-5.1, haiku→qwen3.6 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…ropic providers Claude Code injects `x-anthropic-billing-header: cc_version=...; cch=XXXXX;` into the system prompt body. The cch= token changes every request, which invalidates vLLM prefix caching (strict block hash). Strip this line from the prompt for all non-Anthropic handlers — only Anthropic uses it. Expected impact: vLLM TTFT from 30-67s → 1-3s on conversation turns 2+. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

- Dockerfile: Bun alpine, standalone proxy, healthcheck - docker-compose.yml: volume mount for config, port 3000 - .dockerignore: exclude traces, node_modules, dist - start-claudish-proxy.ps1: Windows service wrapper with logging Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Direct connections (no IIS ARR) had no x-forwarded-for/x-real-ip headers, so all LAN machines appeared as "direct" in logs. Capture remote address via Bun's server context and include in request logging. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…utocompact All non-Anthropic stream parsers (openai-sse, gemini-sse, ollama-jsonl, openai-responses-sse) hardcoded input_tokens: 100 in message_start and omitted input_tokens from message_delta. This broke Claude Code's autocompact because tokenCountWithEstimation() accumulated usage from the SSE stream — seeing input_tokens=100 forever meant the context appeared nearly empty, so condensation never triggered. Changes: - message_start: input_tokens: 100 → 0 (placeholder, replaced by delta) - message_delta: include actual input_tokens from provider response - openai-sse fallback: remove hardcoded 100 in onTokenUpdate Affected models: GLM-5.1, Qwen, Gemini, Ollama, Codex — all non-Anthropic providers using these parsers. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Extract all fork-specific code (proxy auth, model discovery, billing header strip, request logger, hostname binding, standalone proxy) into dedicated modules under packages/cli/src/fork/. proxy-server.ts drops from 715 to 597 lines, with fork diff now ~30 lines (import + registration call). This makes future upstream syncs straightforward — fork code is visible in one directory and easily rebased. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…e errors When thinking blocks are suppressed, subsequent content block indices can jump ahead of what the client expects. Clamp indices to highestSeen+1 and log the adjustment for debugging. Also pass CLAUDISH_PROXY_KEY via docker-compose environment. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Some Anthropic-compat providers (z.ai, GLM) send message_delta with stop_reason but never emit the message_stop event. Claude Code requires message_stop as the terminal event — without it, the client reports "API returned an empty or malformed response (HTTP 200)". Track sawMessageStop separately from stopReason so the synthetic finalization fires even when stopReason was received but message_stop was not. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Replace the "web search unsupported" warning with actual execution via a local SearXNG instance. Two interception paths: 1. Structured tool_call: WebSearch/WebFetch tool calls are suppressed from the stream and replaced with SearXNG results as a text block. 2. GLM <searchWeb> tags: GLM models emit <searchWeb><query>...</query> </searchWeb> in text — these are intercepted at finalize and replaced with SearXNG results. Also intercepts sub-agent web search/fetch requests at the proxy level (Claude Code sends "Perform a web search for the query: X" as a single user message). Additional improvements: - ToolState.suppressed flag for clean tool call suppression - Empty-response error handling: providers returning no content now emit a structured api_error event instead of a malformed empty message - Finalize logging with model, text length, tool count - Proxy logs tool names in incoming requests for debug Requires SEARXNG_URL env var. Graceful fallback when unavailable. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…responses Z.AI/GLM providers emit server_tool_use content blocks that the Anthropic SDK does not understand (it expects only text, tool_use, thinking). These blocks are now silently suppressed from the stream with index tracking. When the stream ends without a message_start (empty response from provider), emit a synthetic message sequence (message_start → text block with error message → message_stop) so the client SDK receives a well-formed response instead of a malformed empty one. Also includes minor whitespace reformatting for consistency. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…nsport providers Claude Code v2.1.153+ injects role:"system" messages inline in the messages array (e.g. system-reminders). Anthropic-compatible providers (Z.AI, MiniMax, Kimi) reject role:"system" in messages — only "user"/"assistant" are accepted. The fix strips inline system messages from the payload and merges their content into the top-level system prompt field. Applied at two levels: 1. ComposedHandler: strips from requestPayload.messages for anthropic-sse transport 2. openai-messages converter: handles the OpenAI format path by merging into the system message at index 0 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Add CLAUDISH_CAPTURE_DIR env-gated diagnostic capture that writes the full request body to JSON files. Disabled by default (no-op when env var is unset), so zero overhead in normal production use. Purpose: when a hang or malformed response is reported, enable the env var to capture the exact request payload for offline replay against different bun versions or configurations. Files are written to the capture dir with pattern: req-{pid}-{seq}-{timestamp}-{source}.json Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…rtifacts - Bump Dockerfile base image from oven/bun:1.2-alpine to 1.3-alpine (pending Docker Hub egress fix for rebuild validation) - Switch SEARXNG_URL from IP:port to search.myia.io DNS name - Add .gitignore entries for diagnostic capture/, trace files, deploy notes, and monitoring scripts Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…andling, and diagnostic capture Document three new subsystems: - Web Search Interception (v7.1+): SearXNG backend, two interception paths (tool_call suppression + GLM searchWeb tags), sub-agent handling - Inline System Message Handling: strip role:system for Anthropic-transport providers (Claude Code v2.1.153+ compatibility) - Diagnostic Body Capture: CLAUDISH_CAPTURE_DIR env-gated full request body capture for offline hang reproduction Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…de capture The finalize() function in openai-sse.ts referenced opts.modelName which does not exist in that scope (the parameter is target). This caused a ReferenceError thrown AFTER state.finalized=true but BEFORE controller.close(). The catch block re-invoked finalize() which returned immediately (already finalized), swallowing the error and leaving the ReadableStream permanently open - the client HTTP connection hung forever. Fix: replace opts.modelName with target (lines 160-161, 350). Also adds response-side SSE capture infrastructure (gated by CLAUDISH_CAPTURE_DIR env var, no-op when unset): - response-capture.ts: new diagnostic helper - Wired into openai-sse, anthropic-sse, and native-handler - request-logger: add machine= tag from x-claudish-machine header Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

… non-native providers ComposedHandler is never used for native Anthropic (that's NativeHandler), so every request here goes to a provider that doesn't understand Anthropic thinking signatures. Previously, Opus thinking blocks (with signatures) flowed through to Z.AI/GLM/MiniMax/etc. unmodified, causing either silent corruption or API rejection on subsequent turns. The strip removes thinking blocks (type: 'thinking') from assistant messages in the request payload before forwarding to the provider. Native Anthropic requests via NativeHandler are unaffected. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

jsboigeEpita and others added 21 commits May 14, 2026 01:09

fix(docker): use correct SearXNG URL (port 8181 on ai-01)

dfc1eba

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

jsboige mentioned this pull request Jun 2, 2026

fix(native-handler): strip unsigned thinking blocks to fix mixed-session 400 #140

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: stream hang prevention, thinking block strip, response capture, web search interception#137

fix: stream hang prevention, thinking block strip, response capture, web search interception#137
jsboige wants to merge 21 commits into
MadAppGang:mainfrom
jsboige:main

jsboige commented May 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jsboige commented May 31, 2026

Summary

Bug fixes

Features

Infrastructure

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants