feat(claude): configurable prompt cache TTL and automatic image compression#2533
feat(claude): configurable prompt cache TTL and automatic image compression#2533gitrvc-hub wants to merge 3 commits intorouter-for-me:mainfrom
Conversation
Add two improvements to the Claude executor that reduce cost and prevent hard API errors for clients that don't handle these themselves. **Configurable prompt cache TTL** The proxy already auto-injects cache_control breakpoints when the client sends none. The TTL was hardcoded to the default 5-minute lifetime. This change adds a `prompt-cache-ttl` option (accepted values: `""` for the default 5 min, or `"1h"` for the extended lifetime) at two levels: - Global: `prompt-cache-ttl` in the root config applies to all Claude requests including OAuth credentials. - Per-credential: `prompt-cache-ttl` under `claude-api-key` entries overrides the global value for that key. The `prompt-caching-scope-2026-01-05` beta required for 1h TTL is already included in the default `Anthropic-Beta` header. **Automatic image compression** Claude enforces a 5 MB per-image limit and rejects images wider or taller than 8000px. Clients that pass through screenshots or high-res attachments hit hard 400 errors with no automatic recovery. The executor now compresses base64-encoded images in the messages payload before sending upstream. Images within limits are untouched. Oversized ones are re-encoded as JPEG at decreasing quality levels (85→70→50→30) until they fit. No new dependencies — uses only Go stdlib image packages. Both features include unit tests.
There was a problem hiding this comment.
Code Review
This pull request introduces configurable prompt cache TTLs and automatic image compression for Claude API requests. It adds a prompt-cache-ttl setting to global and per-credential configurations, enabling extended one-hour cache lifetimes. Additionally, the proxy now automatically compresses images exceeding Anthropic's size or dimension limits. Review feedback suggests validating the new configuration values, optimizing JSON manipulation and image processing performance, and refining the dimension checks used during image downscaling.
| // one-hour cache lifetime supported by the prompt-caching-scope-2026-01-05 beta. | ||
| // This only affects breakpoints that the proxy injects; client-supplied cache_control | ||
| // blocks are left untouched. | ||
| PromptCacheTTL string `yaml:"prompt-cache-ttl,omitempty" json:"prompt-cache-ttl,omitempty"` |
There was a problem hiding this comment.
| dataPath := fmt.Sprintf("messages.%d.content.%d.source.data", msgIdx, contentIdx) | ||
| typePath := fmt.Sprintf("messages.%d.content.%d.source.media_type", msgIdx, contentIdx) | ||
| raw := block.Get("source.data").String() | ||
| mediaType := block.Get("source.media_type").String() | ||
|
|
||
| compressed, newType, ok := maybeCompressImage(raw, mediaType) | ||
| if ok { | ||
| payload, _ = sjson.SetBytes(payload, dataPath, compressed) | ||
| payload, _ = sjson.SetBytes(payload, typePath, newType) | ||
| } |
There was a problem hiding this comment.
Repeatedly calling sjson.SetBytes on the same large payload is inefficient as each call involves re-parsing the JSON and allocating a new buffer. Since the source object for a base64 image in Claude's API only contains type, media_type, and data, you can update the entire object in a single sjson.SetBytes call to improve performance, especially for payloads with multiple images.
sourcePath := fmt.Sprintf("messages.%d.content.%d.source", msgIdx, contentIdx)
raw := block.Get("source.data").String()
mediaType := block.Get("source.media_type").String()
compressed, newType, ok := maybeCompressImage(raw, mediaType)
if ok {
payload, _ = sjson.SetBytes(payload, sourcePath, map[string]string{
"type": "base64",
"media_type": newType,
"data": compressed,
})
}| // or dimension limits. Returns the (possibly compressed) base64 data, the resulting | ||
| // media type, and true if any changes were made. Returns ("", "", false) on any error | ||
| // or when no compression is needed. | ||
| func maybeCompressImage(b64data, mediaType string) (string, string, bool) { |
| } | ||
|
|
||
| // Further downscale for very large files to help compression. | ||
| if needsCompress && img.Bounds().Dx() > 2048 { |
There was a problem hiding this comment.
The heuristic for further downscaling only checks the image width (Dx()). For very tall but narrow images (e.g., long screenshots), this check will be bypassed even if the file size is still significantly over the limit. It is safer to check both dimensions.
| if needsCompress && img.Bounds().Dx() > 2048 { | |
| if needsCompress && (img.Bounds().Dx() > 2048 || img.Bounds().Dy() > 2048) { |
| func resizeNearest(src image.Image, dstW, dstH int) image.Image { | ||
| b := src.Bounds() | ||
| srcW, srcH := b.Dx(), b.Dy() | ||
| dst := image.NewRGBA(image.Rect(0, 0, dstW, dstH)) | ||
| for y := 0; y < dstH; y++ { | ||
| sy := b.Min.Y + y*srcH/dstH | ||
| for x := 0; x < dstW; x++ { | ||
| sx := b.Min.X + x*srcW/dstW | ||
| dst.Set(x, y, src.At(sx, sy)) | ||
| } | ||
| } | ||
| return dst | ||
| } |
There was a problem hiding this comment.
Using src.At and dst.Set is the slowest way to process images in Go as it involves interface overhead and color conversion for every pixel. Since the destination is always an *image.RGBA, you can significantly speed this up by using type assertions and direct pixel access if the source is also a common type like *image.RGBA or *image.YCbCr.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5af2f68fc1
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if ttl := strings.TrimSpace(ck.PromptCacheTTL); ttl != "" { | ||
| attrs["prompt_cache_ttl"] = ttl |
There was a problem hiding this comment.
Preserve empty per-key prompt cache TTL overrides
This conditional drops empty prompt-cache-ttl values from per-key config, so a credential configured with prompt-cache-ttl: "" cannot override a global prompt-cache-ttl: "1h" back to the default 5-minute behavior. In that setup the attribute is omitted here and resolvePromptCacheTTL falls back to the global value, which breaks the documented ""/"1h" semantics for per-credential configuration.
Useful? React with 👍 / 👎.
| return "", "", false | ||
| } | ||
|
|
||
| img, _, err := image.Decode(bytes.NewReader(raw)) |
There was a problem hiding this comment.
Bound image metadata before full decode
This decodes untrusted base64 payloads with image.Decode before any dimension/pixel-count guard, so a decompression-bomb image (small upload but enormous logical dimensions) can trigger very large allocations and CPU work during decode, potentially causing OOM or severe latency in production. Since this path now runs on all Claude image blocks, it should inspect metadata first (e.g., DecodeConfig + max pixel cap) before full decode/resizing.
Useful? React with 👍 / 👎.
…cleanup - resolvePromptCacheTTL: switch from value-presence to key-presence check so an explicit per-key prompt-cache-ttl: "" can override a global "1h" back to the default 5-minute TTL - maybeCompressImage: call image.DecodeConfig before image.Decode to get dimensions cheaply; reject images exceeding 50 MP pixel limit before allocating the full pixel buffer (decompression bomb guard) - compressImagesInPayload: batch the two sjson.SetBytes calls into a single source-object write to avoid double payload serialization - drop unused mediaType parameter from maybeCompressImage - both-dimension check in downscale heuristic (Dy as well as Dx) - resizeNearest: *image.RGBA fast path using SetRGBA/RGBAAt to avoid interface dispatch in the inner loop - update tests to reflect new key-presence override semantics Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 81e21aeb28
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if hdr.Width*hdr.Height > maxImagePixels { | ||
| log.Warnf("[claude] image %dx%d exceeds pixel limit (%d MP), skipping compression", | ||
| hdr.Width, hdr.Height, maxImagePixels/1_000_000) | ||
| return "", "", false |
There was a problem hiding this comment.
Keep resize path for high-megapixel Claude images
The early maxImagePixels return skips all compression for images above 50 MP, so payloads like 8000x8000 (or any >50 MP image) are forwarded unchanged even when they exceed Claude’s per-image size/dimension limits. In those cases this new auto-compression feature does not recover the request and users still get upstream 400s. This guard should not bypass the resize/compress path for legitimate high-resolution inputs.
Useful? React with 👍 / 👎.
50 MP was below Claude's own practical ceiling (7680×7680 ≈ 59 MP), so images that legitimately need resizing were being forwarded unchanged and triggering upstream 400s. 150 MP still guards against decompression bombs while keeping the resize path reachable for all real-world inputs. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
luispater
left a comment
There was a problem hiding this comment.
Thanks for the PR — the cache TTL option and image compression look useful, and the targeted tests/build pass locally.
I found one blocking correctness issue in the per-credential TTL override path:
resolvePromptCacheTTLnow treatsauth.Attributes["prompt_cache_ttl"]key presence as an explicit override, including the empty string case.- But
synthesizeClaudeKeysonly materializesprompt_cache_ttlwhenck.PromptCacheTTLis non-empty. - Because
ClaudeKey.PromptCacheTTLis a plainstringpopulated viayaml.Unmarshal,prompt-cache-ttl: ""and an omitted field collapse to the same value before synthesis.
In practice, a claude-api-key entry cannot reset a global prompt-cache-ttl: "1h" back to the default 5-minute TTL, even though the new comments/tests claim that this should work.
Test plan:
go test ./internal/runtime/executor/... ./internal/watcher/synthesizer/... ./internal/config/...go build -o test-output ./cmd/server && rm -f test-output
What this does
Two improvements to the Claude executor that address real pain points when running the proxy in front of clients that don't handle caching or image limits themselves.
Configurable prompt cache TTL
The proxy already auto-injects
cache_controlbreakpoints when the client sends none — which is great. The TTL has always been hardcoded to the default 5-minute lifetime though.This adds a
prompt-cache-ttloption that accepts""(keep the current behaviour, 5 min) or"1h"for the extended cache lifetime. It can be set at two levels:The
prompt-caching-scope-2026-01-05beta that gates the 1h TTL is already in the defaultAnthropic-Betaheader, so no header changes needed.Automatic image compression
Claude enforces a hard 5 MB per-image cap and rejects anything wider or taller than 8000px. Clients that forward screenshots or high-res images hit opaque 400 errors with no recovery path.
The executor now checks every base64 image block in the messages payload before sending upstream. Images within limits pass through unchanged. Oversized ones are re-encoded as JPEG at decreasing quality levels (85 → 70 → 50 → 30) until they fit. No new dependencies — uses only Go stdlib image packages (
image/jpeg,image/png,image/gif).Changes
internal/config/config.go—PromptCacheTTLfield on both rootConfigandClaudeKeyinternal/watcher/synthesizer/config.go— propagate per-key TTL to auth attributesinternal/runtime/executor/claude_executor.go—resolvePromptCacheTTL,buildCacheControl, updated inject functions,compressImagesInPayloadconfig.example.yaml— documented both optionsbuildCacheControl, inject behaviour, image compression helpersTesting
All existing tests pass. New tests cover:
buildCacheControlwith and withoutttlresolvePromptCacheTTLprecedence (per-key over global, nil fallback)ensureCacheControlinjects correct TTLscaleToFitedge casesmaybeCompressImage(small image unchanged, oversized compressed, invalid base64 safe)compressImagesInPayloadend-to-end