fix: strip x-anthropic-billing-header to fix vLLM prefix caching by mtparet · Pull Request #101 · MadAppGang/claudish

mtparet · 2026-04-07T15:23:50Z

Summary

Strip x-anthropic-billing-header from system prompts in filterIdentity to fix vLLM prefix caching
Add unit tests for the identity filter

Problem

Claude Code injects x-anthropic-billing-header: cc_version=...; cch=<hash> into the system prompt on every turn. The cch (conversation context hash) changes per turn, making the system prompt tokens differ between requests. This breaks vLLM's prefix cache for the entire ~31k token prompt — reducing cache hits from 99% to 0.15% (only 48 tokens cached).

Root Cause Analysis

Sent 4 identical "test" messages via claudish → MiniMax M2.5 on vLLM
VictoriaMetrics showed prefix cache metrics were healthy (~50% cluster-wide)
But per-request cached_tokens was stuck at 48 out of ~31k
Dumped full request payloads and diffed between turns
Found the only difference: cch=8ae40 → cch=4e20a → cch=ad80e in the system prompt

Fix

Added .replace(/x-anthropic-billing-header:[^\n]*\n?/g, "") as the first filter in filterIdentity(). This is safe because filterIdentity only runs for non-Anthropic providers (OpenAI-compat, Gemini, LiteLLM, OpenRouter, local) — never for native Anthropic passthrough where the header is meaningful.

Validation

Before fix: cached_tokens = 48 / 31k on every turn (0.15%)
After fix: cached_tokens = 31,728 / 31,749 by 4th turn (99.9%)

Turn	Before Fix	After Fix
1st	48 (0.15%)	32 (cold)
2nd	48 (0.15%)	24,864 (79%)
3rd	48 (0.15%)	31,296 (99%)
4th	48 (0.15%)	31,728 (99.9%)

Test plan

bun test identity-filter.test.ts — 9 tests pass
bun run build — compiles clean
Manual: run claudish with --debug, send "test" twice, verify cached_tokens grows

🤖 Generated with Claude Code

Claude Code injects `x-anthropic-billing-header: ...cch=<hash>` into the system prompt, where the cch value changes every turn. This breaks vLLM prefix caching since the system prompt tokens differ between requests, reducing cache hits from 99% to ~0.15% (48 tokens out of ~31k). Strip this header in filterIdentity (which only runs for non-Anthropic providers) to keep system prompts stable across turns. Validated to restore prefix cache hit rates to 99.9%. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

mtparet · 2026-04-07T15:26:16Z

cc @erudenko is the CI failure expected ?

mtparet mentioned this pull request May 19, 2026

fix(proxy): strip Claude Code billing header from prompt for non-Anthropic providers #126

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: strip x-anthropic-billing-header to fix vLLM prefix caching#101

fix: strip x-anthropic-billing-header to fix vLLM prefix caching#101
mtparet wants to merge 1 commit into
MadAppGang:mainfrom
blackfuel-ai:fix/strip-billing-header-for-prefix-caching

mtparet commented Apr 7, 2026

Uh oh!

mtparet commented Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mtparet commented Apr 7, 2026

Summary

Problem

Root Cause Analysis

Fix

Validation

Test plan

Uh oh!

mtparet commented Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant