feat(custom-providers): add context_cache setting to disable context length caching#3547
Open
gkraker04 wants to merge 2 commits intoNousResearch:mainfrom
Open
feat(custom-providers): add context_cache setting to disable context length caching#3547gkraker04 wants to merge 2 commits intoNousResearch:mainfrom
gkraker04 wants to merge 2 commits intoNousResearch:mainfrom
Conversation
…length caching Add context_cache boolean field to custom_providers config that controls whether detected context lengths are persisted to ~/.hermes/context_length_cache.yaml. When context_cache is False, the persistent cache lookup is skipped and Hermes performs fresh detection on every startup via endpoint queries, local server APIs, or models.dev registry. Use case: Dynamic local server configurations where num_ctx or models change frequently (e.g., Ollama with custom num_ctx, containerized deployments). Default: true (backward compatible, existing behavior unchanged) Files changed: - agent/model_metadata.py: Added context_cache parameter - agent/context_compressor.py: Passed through to get_model_context_length() - run_agent.py: Read from custom_providers config (provider and model level) - tests/agent/test_context_cache.py: 7 new tests Tests: All 75 existing tests pass + 7 new tests added
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do?
Adds a
context_cacheboolean field tocustom_providersconfig that controls whether detected context lengths are persisted to~/.hermes/context_length_cache.yaml.Problem: When using custom providers (local llama.cpp, Ollama, vLLM servers), Hermes caches the detected context length to avoid repeated API calls on startup. While efficient, this causes issues when:
num_ctxparameter)Solution: Add
context_cachefield (defaulttruefor backward compatibility) that can be set at:When
context_cache: false, the persistent cache lookup is skipped and Hermes performs fresh detection on every startup via endpoint queries, local server APIs, or models.dev registry.Related Issue
No existing issue - this is a new feature request from community feedback.
Type of Change
Changes Made
agent/model_metadata.py(+13, -3): Addedcontext_cacheparameter toget_model_context_length(), updated cache lookup logicagent/context_compressor.py(+2): Added parameter and passed through toget_model_context_length()run_agent.py(+16, -2): Readcontext_cachefrom custom_providers config (provider and model level), passed to ContextCompressortests/agent/test_context_cache.py(new, 184 lines): 7 comprehensive tests covering all scenariosHow to Test
All 75 existing tests pass + 7 new tests added.
Usage example:
Checklist
Code
feat(scope):, etc.)