-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Add Anthropic prompt caching via AnthropicChatOptions #4300
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…tOptions - Add cacheControl field to AnthropicChatOptions with builder method - Create AnthropicCacheType enum with EPHEMERAL type for type-safe cache creation - Update AnthropicChatModel.createRequest() to apply cache control from options to user message ContentBlocks - Extend ContentBlock record with cacheControl parameter and constructor for API compatibility - Update Usage record to include cacheCreationInputTokens and cacheReadInputTokens fields - Update StreamHelper to handle new Usage constructor with cache token parameters - Add AnthropicApiIT.chatWithPromptCache() test for low-level API validation - Add AnthropicChatModelIT.chatWithPromptCacheViaOptions() integration test - Add comprehensive unit tests for AnthropicChatOptions cache control functionality - Update documentation with cacheControl() method examples and usage patterns Cache control is configured through AnthropicChatOptions rather than message classes to maintain provider portability. The cache control gets applied during request creation in AnthropicChatModel when building ContentBlocks for user messages. Original implementation provided by @Claudio-code (Claudio Silva Junior) See spring-projects@15e5026 Fixes spring-projects#1403 Signed-off-by: Soby Chacko <[email protected]>
This commit implements comprehensive prompt caching support for Anthropic Claude models in Spring AI: Core Implementation: - Add AnthropicCacheStrategy enum with 4 strategic options: NONE, SYSTEM_ONLY, SYSTEM_AND_TOOLS, CONVERSATION_HISTORY - Implement strategic cache placement with automatic 4-breakpoint limit enforcement via CacheBreakpointTracker - Support configurable TTL durations: "5m" (default) and "1h" (requires beta header) - Add cache_control support to system messages, tools, and conversation history based on strategy API Changes: - Extend AnthropicChatOptions with cacheStrategy() and cacheTtl() builder methods - Update AnthropicApi.Tool record to support cache_control field - Add cache usage tracking via cacheCreationInputTokens() and cacheReadInputTokens() Testing & Quality: - Add comprehensive integration tests with real-world scenarios - Add extensive mock test coverage with complex multi-breakpoint scenarios - Fix all checkstyle violations and test failures - Add cache breakpoint limit warning for production debugging Documentation: - Complete API documentation with practical examples and best practices - Add real-world use cases: legal document analysis, batch code review, customer support - Include cost optimization guidance demonstrating up to 90% savings - Document future enhancement roadmap for advanced scenarios Signed-off-by: Mark Pollack <[email protected]>
// Create cache control with TTL if specified, otherwise use default 5m | ||
if (cacheTtl != null && !cacheTtl.equals("5m")) { | ||
cacheControl = new ChatCompletionRequest.CacheControl("ephemeral", cacheTtl); | ||
logger.info("Created cache control with TTL: type={}, ttl={}", "ephemeral", cacheTtl); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these need to be changed to 'debug'
I see that allowing for more fine grained control of the caching strategy (1h vs 5m) is listed as a follow up, that makes sense. I also liked one additional feature that I included here sobychacko#2 where I could optimize the cache block usage by providing a minimum content size parameter so that we're not attempting to cache messages that will have very little impact (or none at all if they're under the minimum size Anthropic will even let you cache). The 2 biggest benefits I saw when testing out my approach - in terms of cache utilization (number of tokens cached as a percent of the total tokens in the request) - were fine grained control over TTL based on message type and providing a configurable minimum content size in order for the message to be eligible to be attempted to cache. In a production application very often System messages are the same across many conversations, and obviously user and assistant messages are unique to an individual conversation - thus the 1 hour cache is beneficial for system messages while less beneficial for user and assistant messages (the 5 min cache is better there because typically that would take care of a user having a multi turn conversation. This differentiation is why the more fine grained control over the caching TTL is beneficial. The minimum message size is pretty self explanatory - it lets the caching strategy go about caching the messages that are going to have the largest impact on tokens used for a conversation. I found it useful to have this be able to be segmented by message type - but thats not as critical. I'm happy to help out in any way I can if you choose to pick up either of these items. @markpollack |
@sobychacko thanks for coordinating, sounds like a plan. How would you like me to move forward with my suggestions, I'm happy to discuss further or I can set up a example branch demonstrating some of what I'm talking about. |
@adase11, could you create a new issue and capture all your feedback there? Then we can proceed based on that issue. Perhaps you could copy/paste the above comments, as well as the ones you added on the PR branch? Once you create the issue, please notify us here so that we can prioritize it. |
Perfect, can do |
Implement comprehensive prompt caching support for Anthropic Claude models in Spring AI:
Core Implementation:
API Changes:
Testing & Quality:
Documentation:
Signed-off-by: Mark Pollack [email protected]