Skip to content

Conversation

markpollack
Copy link
Member

Implement comprehensive prompt caching support for Anthropic Claude models in Spring AI:

Core Implementation:

  • Add AnthropicCacheStrategy enum with 4 strategic options: NONE, SYSTEM_ONLY, SYSTEM_AND_TOOLS, CONVERSATION_HISTORY
  • Implement strategic cache placement with automatic 4-breakpoint limit enforcement via CacheBreakpointTracker
  • Support configurable TTL durations: "5m" (default) and "1h" (requires beta header)
  • Add cache_control support to system messages, tools, and conversation history based on strategy

API Changes:

  • Extend AnthropicChatOptions with cacheStrategy() and cacheTtl() builder methods
  • Update AnthropicApi.Tool record to support cache_control field
  • Add cache usage tracking via cacheCreationInputTokens() and cacheReadInputTokens()

Testing & Quality:

  • Add comprehensive integration tests with real-world scenarios
  • Add extensive mock test coverage with complex multi-breakpoint scenarios
  • Fix all checkstyle violations and test failures
  • Add cache breakpoint limit warning for production debugging

Documentation:

  • Complete API documentation with practical examples and best practices
  • Add real-world use cases: legal document analysis, batch code review, customer support
  • Include cost optimization guidance demonstrating up to 90% savings
  • Document future enhancement roadmap for advanced scenarios

Signed-off-by: Mark Pollack [email protected]

sobychacko and others added 3 commits September 3, 2025 17:00
…tOptions

- Add cacheControl field to AnthropicChatOptions with builder method
- Create AnthropicCacheType enum with EPHEMERAL type for type-safe cache creation
- Update AnthropicChatModel.createRequest() to apply cache control from options to user message ContentBlocks
- Extend ContentBlock record with cacheControl parameter and constructor for API compatibility
- Update Usage record to include cacheCreationInputTokens and cacheReadInputTokens fields
- Update StreamHelper to handle new Usage constructor with cache token parameters
- Add AnthropicApiIT.chatWithPromptCache() test for low-level API validation
- Add AnthropicChatModelIT.chatWithPromptCacheViaOptions() integration test
- Add comprehensive unit tests for AnthropicChatOptions cache control functionality
- Update documentation with cacheControl() method examples and usage patterns

Cache control is configured through AnthropicChatOptions rather than message classes
to maintain provider portability. The cache control gets applied during request creation
in AnthropicChatModel when building ContentBlocks for user messages.

Original implementation provided by @Claudio-code (Claudio Silva Junior)
See spring-projects@15e5026

Fixes spring-projects#1403

Signed-off-by: Soby Chacko <[email protected]>
This commit implements comprehensive prompt caching support for Anthropic Claude models in Spring AI:

  Core Implementation:
  - Add AnthropicCacheStrategy enum with 4 strategic options: NONE, SYSTEM_ONLY, SYSTEM_AND_TOOLS, CONVERSATION_HISTORY
  - Implement strategic cache placement with automatic 4-breakpoint limit enforcement via CacheBreakpointTracker
  - Support configurable TTL durations: "5m" (default) and "1h" (requires beta header)
  - Add cache_control support to system messages, tools, and conversation history based on strategy

  API Changes:
  - Extend AnthropicChatOptions with cacheStrategy() and cacheTtl() builder methods
  - Update AnthropicApi.Tool record to support cache_control field
  - Add cache usage tracking via cacheCreationInputTokens() and cacheReadInputTokens()

  Testing & Quality:
  - Add comprehensive integration tests with real-world scenarios
  - Add extensive mock test coverage with complex multi-breakpoint scenarios
  - Fix all checkstyle violations and test failures
  - Add cache breakpoint limit warning for production debugging

  Documentation:
  - Complete API documentation with practical examples and best practices
  - Add real-world use cases: legal document analysis, batch code review, customer support
  - Include cost optimization guidance demonstrating up to 90% savings
  - Document future enhancement roadmap for advanced scenarios

Signed-off-by: Mark Pollack <[email protected]>
// Create cache control with TTL if specified, otherwise use default 5m
if (cacheTtl != null && !cacheTtl.equals("5m")) {
cacheControl = new ChatCompletionRequest.CacheControl("ephemeral", cacheTtl);
logger.info("Created cache control with TTL: type={}, ttl={}", "ephemeral", cacheTtl);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these need to be changed to 'debug'

@adase11
Copy link
Contributor

adase11 commented Sep 4, 2025

I see that allowing for more fine grained control of the caching strategy (1h vs 5m) is listed as a follow up, that makes sense. I also liked one additional feature that I included here sobychacko#2 where I could optimize the cache block usage by providing a minimum content size parameter so that we're not attempting to cache messages that will have very little impact (or none at all if they're under the minimum size Anthropic will even let you cache).

The 2 biggest benefits I saw when testing out my approach - in terms of cache utilization (number of tokens cached as a percent of the total tokens in the request) - were fine grained control over TTL based on message type and providing a configurable minimum content size in order for the message to be eligible to be attempted to cache.

In a production application very often System messages are the same across many conversations, and obviously user and assistant messages are unique to an individual conversation - thus the 1 hour cache is beneficial for system messages while less beneficial for user and assistant messages (the 5 min cache is better there because typically that would take care of a user having a multi turn conversation. This differentiation is why the more fine grained control over the caching TTL is beneficial.

The minimum message size is pretty self explanatory - it lets the caching strategy go about caching the messages that are going to have the largest impact on tokens used for a conversation. I found it useful to have this be able to be segmented by message type - but thats not as critical.

I'm happy to help out in any way I can if you choose to pick up either of these items. @markpollack

@sobychacko
Copy link
Contributor

This PR is merged via 5afd2d2.

@adase11, We can continue the conversation here, although this PR is closed. When appropriate, we can create a new issue to follow up on the concerns you raised.

@sobychacko sobychacko closed this Sep 5, 2025
@adase11
Copy link
Contributor

adase11 commented Sep 5, 2025

@sobychacko thanks for coordinating, sounds like a plan. How would you like me to move forward with my suggestions, I'm happy to discuss further or I can set up a example branch demonstrating some of what I'm talking about.

@sobychacko
Copy link
Contributor

@adase11, could you create a new issue and capture all your feedback there? Then we can proceed based on that issue. Perhaps you could copy/paste the above comments, as well as the ones you added on the PR branch? Once you create the issue, please notify us here so that we can prioritize it.

@adase11
Copy link
Contributor

adase11 commented Sep 5, 2025

Perfect, can do

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants