Skip to content

feat(custom-providers): add context_cache setting to disable context length caching#3547

Open
gkraker04 wants to merge 2 commits intoNousResearch:mainfrom
gkraker04:feature/context-cache-setting
Open

feat(custom-providers): add context_cache setting to disable context length caching#3547
gkraker04 wants to merge 2 commits intoNousResearch:mainfrom
gkraker04:feature/context-cache-setting

Conversation

@gkraker04
Copy link
Copy Markdown

What does this PR do?

Adds a context_cache boolean field to custom_providers config that controls whether detected context lengths are persisted to ~/.hermes/context_length_cache.yaml.

Problem: When using custom providers (local llama.cpp, Ollama, vLLM servers), Hermes caches the detected context length to avoid repeated API calls on startup. While efficient, this causes issues when:

  • Server configuration changes (e.g., Ollama num_ctx parameter)
  • Models are swapped with different context limits
  • Users are debugging context detection issues
  • Containerized environments where context size varies between deployments

Solution: Add context_cache field (default true for backward compatibility) that can be set at:

  1. Provider level - affects all models on that endpoint
  2. Model level - granular control per model, overrides provider default

When context_cache: false, the persistent cache lookup is skipped and Hermes performs fresh detection on every startup via endpoint queries, local server APIs, or models.dev registry.

Related Issue

No existing issue - this is a new feature request from community feedback.

Type of Change

  • ✨ New feature (non-breaking change that adds functionality)

Changes Made

  • agent/model_metadata.py (+13, -3): Added context_cache parameter to get_model_context_length(), updated cache lookup logic
  • agent/context_compressor.py (+2): Added parameter and passed through to get_model_context_length()
  • run_agent.py (+16, -2): Read context_cache from custom_providers config (provider and model level), passed to ContextCompressor
  • tests/agent/test_context_cache.py (new, 184 lines): 7 comprehensive tests covering all scenarios

How to Test

# Run new tests
python -m pytest tests/agent/test_context_cache.py -v

# Verify no regression in existing tests
python -m pytest tests/agent/test_model_metadata.py -v

All 75 existing tests pass + 7 new tests added.

Usage example:

custom_providers:
  - name: "local-dev"
    base_url: "http://localhost:8080/v1"
    context_cache: false  # Always re-detect all models
    models:
      qwen3.5:27b:
        context_cache: true  # Per-model override

Checklist

Code

  • I've read the Contributing Guide
  • My commit messages follow Conventional Commits (feat(scope):, etc.)
  • I searched for existing PRs to make sure this isn't a duplicate
  • My PR contains only changes related to this feature (no unrelated commits)
  • I've run `pytest tests/

…length caching

Add context_cache boolean field to custom_providers config that controls
whether detected context lengths are persisted to ~/.hermes/context_length_cache.yaml.

When context_cache is False, the persistent cache lookup is skipped and Hermes
performs fresh detection on every startup via endpoint queries, local server
APIs, or models.dev registry.

Use case: Dynamic local server configurations where num_ctx or models change
frequently (e.g., Ollama with custom num_ctx, containerized deployments).

Default: true (backward compatible, existing behavior unchanged)

Files changed:
- agent/model_metadata.py: Added context_cache parameter
- agent/context_compressor.py: Passed through to get_model_context_length()
- run_agent.py: Read from custom_providers config (provider and model level)
- tests/agent/test_context_cache.py: 7 new tests

Tests: All 75 existing tests pass + 7 new tests added
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant