Skip to content

feat(claude): configurable prompt cache TTL and automatic image compression#2533

Open
gitrvc-hub wants to merge 3 commits intorouter-for-me:mainfrom
E4Dev-Solutions:feat/claude-cache-ttl-image-compress
Open

feat(claude): configurable prompt cache TTL and automatic image compression#2533
gitrvc-hub wants to merge 3 commits intorouter-for-me:mainfrom
E4Dev-Solutions:feat/claude-cache-ttl-image-compress

Conversation

@gitrvc-hub
Copy link
Copy Markdown

What this does

Two improvements to the Claude executor that address real pain points when running the proxy in front of clients that don't handle caching or image limits themselves.

Configurable prompt cache TTL

The proxy already auto-injects cache_control breakpoints when the client sends none — which is great. The TTL has always been hardcoded to the default 5-minute lifetime though.

This adds a prompt-cache-ttl option that accepts "" (keep the current behaviour, 5 min) or "1h" for the extended cache lifetime. It can be set at two levels:

# Global default for all Claude requests (OAuth + api-key)
prompt-cache-ttl: "1h"

# Per-credential override under claude-api-key
claude-api-key:
  - api-key: "sk-..."
    prompt-cache-ttl: "1h"

The prompt-caching-scope-2026-01-05 beta that gates the 1h TTL is already in the default Anthropic-Beta header, so no header changes needed.

Automatic image compression

Claude enforces a hard 5 MB per-image cap and rejects anything wider or taller than 8000px. Clients that forward screenshots or high-res images hit opaque 400 errors with no recovery path.

The executor now checks every base64 image block in the messages payload before sending upstream. Images within limits pass through unchanged. Oversized ones are re-encoded as JPEG at decreasing quality levels (85 → 70 → 50 → 30) until they fit. No new dependencies — uses only Go stdlib image packages (image/jpeg, image/png, image/gif).

Changes

  • internal/config/config.goPromptCacheTTL field on both root Config and ClaudeKey
  • internal/watcher/synthesizer/config.go — propagate per-key TTL to auth attributes
  • internal/runtime/executor/claude_executor.goresolvePromptCacheTTL, buildCacheControl, updated inject functions, compressImagesInPayload
  • config.example.yaml — documented both options
  • Tests for TTL resolution, buildCacheControl, inject behaviour, image compression helpers

Testing

go test ./internal/runtime/executor/... ./internal/watcher/synthesizer/... ./internal/config/...

All existing tests pass. New tests cover:

  • buildCacheControl with and without ttl
  • resolvePromptCacheTTL precedence (per-key over global, nil fallback)
  • ensureCacheControl injects correct TTL
  • scaleToFit edge cases
  • maybeCompressImage (small image unchanged, oversized compressed, invalid base64 safe)
  • compressImagesInPayload end-to-end

Add two improvements to the Claude executor that reduce cost and prevent
hard API errors for clients that don't handle these themselves.

**Configurable prompt cache TTL**

The proxy already auto-injects cache_control breakpoints when the client
sends none. The TTL was hardcoded to the default 5-minute lifetime.
This change adds a `prompt-cache-ttl` option (accepted values: `""` for
the default 5 min, or `"1h"` for the extended lifetime) at two levels:

- Global: `prompt-cache-ttl` in the root config applies to all Claude
  requests including OAuth credentials.
- Per-credential: `prompt-cache-ttl` under `claude-api-key` entries
  overrides the global value for that key.

The `prompt-caching-scope-2026-01-05` beta required for 1h TTL is
already included in the default `Anthropic-Beta` header.

**Automatic image compression**

Claude enforces a 5 MB per-image limit and rejects images wider or
taller than 8000px. Clients that pass through screenshots or high-res
attachments hit hard 400 errors with no automatic recovery.

The executor now compresses base64-encoded images in the messages
payload before sending upstream. Images within limits are untouched.
Oversized ones are re-encoded as JPEG at decreasing quality levels
(85→70→50→30) until they fit. No new dependencies — uses only Go
stdlib image packages.

Both features include unit tests.
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces configurable prompt cache TTLs and automatic image compression for Claude API requests. It adds a prompt-cache-ttl setting to global and per-credential configurations, enabling extended one-hour cache lifetimes. Additionally, the proxy now automatically compresses images exceeding Anthropic's size or dimension limits. Review feedback suggests validating the new configuration values, optimizing JSON manipulation and image processing performance, and refining the dimension checks used during image downscaling.

// one-hour cache lifetime supported by the prompt-caching-scope-2026-01-05 beta.
// This only affects breakpoints that the proxy injects; client-supplied cache_control
// blocks are left untouched.
PromptCacheTTL string `yaml:"prompt-cache-ttl,omitempty" json:"prompt-cache-ttl,omitempty"`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The PromptCacheTTL field should be validated to ensure it only contains supported values (e.g., "" or "1h"). This prevents configuration errors from being silently ignored or causing unexpected behavior in the executor. Consider adding a sanitization step in SanitizeClaudeKeys.

Comment on lines +2057 to +2066
dataPath := fmt.Sprintf("messages.%d.content.%d.source.data", msgIdx, contentIdx)
typePath := fmt.Sprintf("messages.%d.content.%d.source.media_type", msgIdx, contentIdx)
raw := block.Get("source.data").String()
mediaType := block.Get("source.media_type").String()

compressed, newType, ok := maybeCompressImage(raw, mediaType)
if ok {
payload, _ = sjson.SetBytes(payload, dataPath, compressed)
payload, _ = sjson.SetBytes(payload, typePath, newType)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Repeatedly calling sjson.SetBytes on the same large payload is inefficient as each call involves re-parsing the JSON and allocating a new buffer. Since the source object for a base64 image in Claude's API only contains type, media_type, and data, you can update the entire object in a single sjson.SetBytes call to improve performance, especially for payloads with multiple images.

sourcePath := fmt.Sprintf("messages.%d.content.%d.source", msgIdx, contentIdx)
raw := block.Get("source.data").String()
mediaType := block.Get("source.media_type").String()

compressed, newType, ok := maybeCompressImage(raw, mediaType)
if ok {
	payload, _ = sjson.SetBytes(payload, sourcePath, map[string]string{
		"type":       "base64",
		"media_type": newType,
		"data":       compressed,
	})
}

// or dimension limits. Returns the (possibly compressed) base64 data, the resulting
// media type, and true if any changes were made. Returns ("", "", false) on any error
// or when no compression is needed.
func maybeCompressImage(b64data, mediaType string) (string, string, bool) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The mediaType parameter is currently unused within the maybeCompressImage function. Consider removing it to simplify the function signature.

}

// Further downscale for very large files to help compression.
if needsCompress && img.Bounds().Dx() > 2048 {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The heuristic for further downscaling only checks the image width (Dx()). For very tall but narrow images (e.g., long screenshots), this check will be bypassed even if the file size is still significantly over the limit. It is safer to check both dimensions.

Suggested change
if needsCompress && img.Bounds().Dx() > 2048 {
if needsCompress && (img.Bounds().Dx() > 2048 || img.Bounds().Dy() > 2048) {

Comment on lines +2151 to +2163
func resizeNearest(src image.Image, dstW, dstH int) image.Image {
b := src.Bounds()
srcW, srcH := b.Dx(), b.Dy()
dst := image.NewRGBA(image.Rect(0, 0, dstW, dstH))
for y := 0; y < dstH; y++ {
sy := b.Min.Y + y*srcH/dstH
for x := 0; x < dstW; x++ {
sx := b.Min.X + x*srcW/dstW
dst.Set(x, y, src.At(sx, sy))
}
}
return dst
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using src.At and dst.Set is the slowest way to process images in Go as it involves interface overhead and color conversion for every pixel. Since the destination is always an *image.RGBA, you can significantly speed this up by using type assertions and direct pixel access if the source is also a common type like *image.RGBA or *image.YCbCr.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5af2f68fc1

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +119 to +120
if ttl := strings.TrimSpace(ck.PromptCacheTTL); ttl != "" {
attrs["prompt_cache_ttl"] = ttl
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve empty per-key prompt cache TTL overrides

This conditional drops empty prompt-cache-ttl values from per-key config, so a credential configured with prompt-cache-ttl: "" cannot override a global prompt-cache-ttl: "1h" back to the default 5-minute behavior. In that setup the attribute is omitted here and resolvePromptCacheTTL falls back to the global value, which breaks the documented ""/"1h" semantics for per-credential configuration.

Useful? React with 👍 / 👎.

return "", "", false
}

img, _, err := image.Decode(bytes.NewReader(raw))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Bound image metadata before full decode

This decodes untrusted base64 payloads with image.Decode before any dimension/pixel-count guard, so a decompression-bomb image (small upload but enormous logical dimensions) can trigger very large allocations and CPU work during decode, potentially causing OOM or severe latency in production. Since this path now runs on all Claude image blocks, it should inspect metadata first (e.g., DecodeConfig + max pixel cap) before full decode/resizing.

Useful? React with 👍 / 👎.

…cleanup

- resolvePromptCacheTTL: switch from value-presence to key-presence check
  so an explicit per-key prompt-cache-ttl: "" can override a global "1h"
  back to the default 5-minute TTL
- maybeCompressImage: call image.DecodeConfig before image.Decode to get
  dimensions cheaply; reject images exceeding 50 MP pixel limit before
  allocating the full pixel buffer (decompression bomb guard)
- compressImagesInPayload: batch the two sjson.SetBytes calls into a
  single source-object write to avoid double payload serialization
- drop unused mediaType parameter from maybeCompressImage
- both-dimension check in downscale heuristic (Dy as well as Dx)
- resizeNearest: *image.RGBA fast path using SetRGBA/RGBAAt to avoid
  interface dispatch in the inner loop
- update tests to reflect new key-presence override semantics

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 81e21aeb28

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +2102 to +2105
if hdr.Width*hdr.Height > maxImagePixels {
log.Warnf("[claude] image %dx%d exceeds pixel limit (%d MP), skipping compression",
hdr.Width, hdr.Height, maxImagePixels/1_000_000)
return "", "", false
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep resize path for high-megapixel Claude images

The early maxImagePixels return skips all compression for images above 50 MP, so payloads like 8000x8000 (or any >50 MP image) are forwarded unchanged even when they exceed Claude’s per-image size/dimension limits. In those cases this new auto-compression feature does not recover the request and users still get upstream 400s. This guard should not bypass the resize/compress path for legitimate high-resolution inputs.

Useful? React with 👍 / 👎.

50 MP was below Claude's own practical ceiling (7680×7680 ≈ 59 MP), so
images that legitimately need resizing were being forwarded unchanged and
triggering upstream 400s. 150 MP still guards against decompression bombs
while keeping the resize path reachable for all real-world inputs.

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
Copy link
Copy Markdown
Collaborator

@luispater luispater left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR — the cache TTL option and image compression look useful, and the targeted tests/build pass locally.

I found one blocking correctness issue in the per-credential TTL override path:

  • resolvePromptCacheTTL now treats auth.Attributes["prompt_cache_ttl"] key presence as an explicit override, including the empty string case.
  • But synthesizeClaudeKeys only materializes prompt_cache_ttl when ck.PromptCacheTTL is non-empty.
  • Because ClaudeKey.PromptCacheTTL is a plain string populated via yaml.Unmarshal, prompt-cache-ttl: "" and an omitted field collapse to the same value before synthesis.

In practice, a claude-api-key entry cannot reset a global prompt-cache-ttl: "1h" back to the default 5-minute TTL, even though the new comments/tests claim that this should work.

Test plan:

  • go test ./internal/runtime/executor/... ./internal/watcher/synthesizer/... ./internal/config/...
  • go build -o test-output ./cmd/server && rm -f test-output

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants