fix: stream hang prevention, thinking block strip, response capture, web search interception#137
Open
jsboige wants to merge 21 commits into
Open
fix: stream hang prevention, thinking block strip, response capture, web search interception#137jsboige wants to merge 21 commits into
jsboige wants to merge 21 commits into
Conversation
…creds - Proxy auth middleware validates authorization/x-api-key/x-proxy-key headers against proxyKey from config (no custom header needed) - NativeHandler falls back to stored ANTHROPIC_API_KEY when client doesn't provide auth (cluster scenario) - Custom endpoints registered at runtime pass credential checks - loadConfig() preserves proxyKey field - Proxy binds to configurable hostname (0.0.0.0 for Docker) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
OAuth tokens (sk-ant-oat01-) are rejected by Anthropic when sent as x-api-key. Detect the prefix and use authorization: Bearer instead. Also pass proxy key to NativeHandler so it can distinguish proxy auth from genuine client tokens. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…tion Forward ALL incoming headers to api.anthropic.com instead of selectively forwarding only auth/beta headers. Claude Code sends internal headers that make Max subscription work for Opus/Sonnet — the proxy must not drop them. Proxy key override still works: when proxyKey is configured and matches client auth, it's replaced with the stored Anthropic key. When no proxyKey is set (pass-through mode), everything flows through unmodified. Same change applied to the count_tokens endpoint. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The standalone proxy now reads the default profile from ~/.claudish/config.json and passes its model mapping to createProxyServer(). This means: - opus/sonnet/haiku role requests get remapped to the profile's models - Profile "default" maps: opus→glm-5.1, sonnet→glm-4.7, haiku→glm-4.7-flash - Without a profile mapping, all models pass through unchanged Startup logs now show the active profile name and role mappings. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…elMap - Add /v1/models endpoint for Claude Code gateway discovery (CLAUDE_CODE_ENABLE_GATEWAY_MODEL_DISCOVERY=1) - Returns all models from config.json routing rules + custom endpoints - Standalone proxy passes no modelMap — every model routes via config rules - No role remapping: opus→claude-opus-4-7, sonnet→glm-5.1, haiku→qwen3.6 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ropic providers Claude Code injects `x-anthropic-billing-header: cc_version=...; cch=XXXXX;` into the system prompt body. The cch= token changes every request, which invalidates vLLM prefix caching (strict block hash). Strip this line from the prompt for all non-Anthropic handlers — only Anthropic uses it. Expected impact: vLLM TTFT from 30-67s → 1-3s on conversation turns 2+. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Dockerfile: Bun alpine, standalone proxy, healthcheck - docker-compose.yml: volume mount for config, port 3000 - .dockerignore: exclude traces, node_modules, dist - start-claudish-proxy.ps1: Windows service wrapper with logging Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Direct connections (no IIS ARR) had no x-forwarded-for/x-real-ip headers, so all LAN machines appeared as "direct" in logs. Capture remote address via Bun's server context and include in request logging. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…utocompact All non-Anthropic stream parsers (openai-sse, gemini-sse, ollama-jsonl, openai-responses-sse) hardcoded input_tokens: 100 in message_start and omitted input_tokens from message_delta. This broke Claude Code's autocompact because tokenCountWithEstimation() accumulated usage from the SSE stream — seeing input_tokens=100 forever meant the context appeared nearly empty, so condensation never triggered. Changes: - message_start: input_tokens: 100 → 0 (placeholder, replaced by delta) - message_delta: include actual input_tokens from provider response - openai-sse fallback: remove hardcoded 100 in onTokenUpdate Affected models: GLM-5.1, Qwen, Gemini, Ollama, Codex — all non-Anthropic providers using these parsers. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Extract all fork-specific code (proxy auth, model discovery, billing header strip, request logger, hostname binding, standalone proxy) into dedicated modules under packages/cli/src/fork/. proxy-server.ts drops from 715 to 597 lines, with fork diff now ~30 lines (import + registration call). This makes future upstream syncs straightforward — fork code is visible in one directory and easily rebased. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…e errors When thinking blocks are suppressed, subsequent content block indices can jump ahead of what the client expects. Clamp indices to highestSeen+1 and log the adjustment for debugging. Also pass CLAUDISH_PROXY_KEY via docker-compose environment. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Some Anthropic-compat providers (z.ai, GLM) send message_delta with stop_reason but never emit the message_stop event. Claude Code requires message_stop as the terminal event — without it, the client reports "API returned an empty or malformed response (HTTP 200)". Track sawMessageStop separately from stopReason so the synthetic finalization fires even when stopReason was received but message_stop was not. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Replace the "web search unsupported" warning with actual execution via a local SearXNG instance. Two interception paths: 1. Structured tool_call: WebSearch/WebFetch tool calls are suppressed from the stream and replaced with SearXNG results as a text block. 2. GLM <searchWeb> tags: GLM models emit <searchWeb><query>...</query> </searchWeb> in text — these are intercepted at finalize and replaced with SearXNG results. Also intercepts sub-agent web search/fetch requests at the proxy level (Claude Code sends "Perform a web search for the query: X" as a single user message). Additional improvements: - ToolState.suppressed flag for clean tool call suppression - Empty-response error handling: providers returning no content now emit a structured api_error event instead of a malformed empty message - Finalize logging with model, text length, tool count - Proxy logs tool names in incoming requests for debug Requires SEARXNG_URL env var. Graceful fallback when unavailable. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…responses Z.AI/GLM providers emit server_tool_use content blocks that the Anthropic SDK does not understand (it expects only text, tool_use, thinking). These blocks are now silently suppressed from the stream with index tracking. When the stream ends without a message_start (empty response from provider), emit a synthetic message sequence (message_start → text block with error message → message_stop) so the client SDK receives a well-formed response instead of a malformed empty one. Also includes minor whitespace reformatting for consistency. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…nsport providers Claude Code v2.1.153+ injects role:"system" messages inline in the messages array (e.g. system-reminders). Anthropic-compatible providers (Z.AI, MiniMax, Kimi) reject role:"system" in messages — only "user"/"assistant" are accepted. The fix strips inline system messages from the payload and merges their content into the top-level system prompt field. Applied at two levels: 1. ComposedHandler: strips from requestPayload.messages for anthropic-sse transport 2. openai-messages converter: handles the OpenAI format path by merging into the system message at index 0 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add CLAUDISH_CAPTURE_DIR env-gated diagnostic capture that writes the
full request body to JSON files. Disabled by default (no-op when env
var is unset), so zero overhead in normal production use.
Purpose: when a hang or malformed response is reported, enable the env
var to capture the exact request payload for offline replay against
different bun versions or configurations.
Files are written to the capture dir with pattern:
req-{pid}-{seq}-{timestamp}-{source}.json
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…rtifacts - Bump Dockerfile base image from oven/bun:1.2-alpine to 1.3-alpine (pending Docker Hub egress fix for rebuild validation) - Switch SEARXNG_URL from IP:port to search.myia.io DNS name - Add .gitignore entries for diagnostic capture/, trace files, deploy notes, and monitoring scripts Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…andling, and diagnostic capture Document three new subsystems: - Web Search Interception (v7.1+): SearXNG backend, two interception paths (tool_call suppression + GLM searchWeb tags), sub-agent handling - Inline System Message Handling: strip role:system for Anthropic-transport providers (Claude Code v2.1.153+ compatibility) - Diagnostic Body Capture: CLAUDISH_CAPTURE_DIR env-gated full request body capture for offline hang reproduction Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…de capture The finalize() function in openai-sse.ts referenced opts.modelName which does not exist in that scope (the parameter is target). This caused a ReferenceError thrown AFTER state.finalized=true but BEFORE controller.close(). The catch block re-invoked finalize() which returned immediately (already finalized), swallowing the error and leaving the ReadableStream permanently open - the client HTTP connection hung forever. Fix: replace opts.modelName with target (lines 160-161, 350). Also adds response-side SSE capture infrastructure (gated by CLAUDISH_CAPTURE_DIR env var, no-op when unset): - response-capture.ts: new diagnostic helper - Wired into openai-sse, anthropic-sse, and native-handler - request-logger: add machine= tag from x-claudish-machine header Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… non-native providers ComposedHandler is never used for native Anthropic (that's NativeHandler), so every request here goes to a provider that doesn't understand Anthropic thinking signatures. Previously, Opus thinking blocks (with signatures) flowed through to Z.AI/GLM/MiniMax/etc. unmodified, causing either silent corruption or API rejection on subsequent turns. The strip removes thinking blocks (type: 'thinking') from assistant messages in the request payload before forwarding to the provider. Native Anthropic requests via NativeHandler are unaffected. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Multiple bug fixes and features accumulated on the fork, all backwards-compatible:
Bug fixes
finalize()referenced undefinedopts.modelName, causing aReferenceErrorbeforecontroller.close(). The error was swallowed, leaving the ReadableStream permanently open and the client hanging forever. Fixed by using the correcttargetparameter.server_tool_useblocks that the Anthropic client doesn't understand. Also emits syntheticmessage_start/message_stopwhen the provider returns an empty response.role: "system"inline. Anthropic-compat providers reject these.Features
<searchWeb>tags.CLAUDISH_CAPTURE_DIRenv var. No-op in production.Infrastructure
response-capture.ts) — diagnostic helper gated byCLAUDISH_CAPTURE_DIR, wired into all three stream parsersTesting
closed=false(all streams close cleanly)bun test)🤖 Generated with Claude Code