Skip to content

[codex] fix model cache invalidation on config reload#1311

Closed
lost9999 wants to merge 3 commits into
nesquena:masterfrom
lost9999:codex/fix-model-cache-mtime
Closed

[codex] fix model cache invalidation on config reload#1311
lost9999 wants to merge 3 commits into
nesquena:masterfrom
lost9999:codex/fix-model-cache-mtime

Conversation

@lost9999
Copy link
Copy Markdown
Contributor

Summary

Fix /api/models returning stale model metadata after config.yaml changes outside the WebUI default-model endpoint.

Root Cause

get_available_models() detects a config.yaml mtime change and calls reload_config(), but the first mtime check happens before the fast-path cache logic. After reload_config() updates _cfg_mtime, the later cache invalidation path no longer sees a config change, so the 24h in-memory _available_models_cache can keep serving old default_model and model groups.

Changes

  • Invalidate the available-models cache immediately after the early config mtime reload.
  • Replace the existing mtime cache test with a real temporary config.yaml edit so it catches stale in-memory cache reuse.

Validation

  • HERMES_HOME=/home/lost9999/.hermes HERMES_WEBUI_STATE_DIR=/home/lost9999/.hermes/webui /home/lost9999/.hermes/hermes-agent/venv/bin/python -m pytest tests/test_ttl_cache.py -q

Refs #1240.

@nesquena nesquena added the hold label Apr 30, 2026
This was referenced Apr 30, 2026
sunnysktsang pushed a commit to sunnysktsang/hermes-webui that referenced this pull request May 3, 2026
…sorbed

CHANGELOG, ROADMAP, TESTING bumped (3936 \u2192 3946).

8 constituent PRs:
- nesquena#1523 (@franksong2702) branch indicator codepoint fix
- nesquena#1519 (@franksong2702) onboarding API-key focus loss fix
- nesquena#1518 (@franksong2702) voice-mode toggle-off recognizer stop
- nesquena#1516 (@franksong2702) YAML newline CSS rules
- nesquena#1517 (@franksong2702) __CACHE_VERSION__ \u2192 __WEBUI_VERSION__ rename
- nesquena#1532 (@ai-ag2026) state.db WebUI session recovery
- nesquena#1525 (@ai-ag2026) stale stream state proactive cleanup
- nesquena#1526 (@ai-ag2026) max_tokens forwarding + OpenRouter quota classifier

Opus MUST-FIX absorbed: sw.js conflict-marker cleanup + regression guard.
Opus SHOULD-FIX deferred to follow-up nesquena#1533 (race in _clear_stale_stream_state).

2 closed as duplicates: nesquena#1528 (identical to nesquena#1517), nesquena#1529 (superseded by nesquena#1516).
1 maintainer-review label: nesquena#1531 (Asunfly stowaway change in force-push).
5 stay on hold: nesquena#1418 nesquena#1464 nesquena#1404 nesquena#1353 nesquena#1311.
@Michaelyklam
Copy link
Copy Markdown
Contributor

Reconciled this against current origin/master while working the #1240 model/provider cleanup slice.

Findings:

  • The cache-invalidation part here is already superseded on origin/master: api/config.py now stamps /api/models disk cache entries with _source_fingerprint, _schema_version, and _webui_version, and rejects both memory and disk caches when the current config.yaml or auth.json fingerprint changes. See api/config.py::_models_cache_source_fingerprint(), _is_loadable_disk_cache(), _get_fresh_memory_models_cache(), and _save_models_cache_to_disk().
  • Current regression coverage exists in tests/test_issue1699_model_cache_source_fingerprint.py for auth/config source drift and tests/test_issue1633_models_cache_version_stamp.py for disk cache version/schema invalidation.
  • I found one still-useful behavior in this PR that was not fully present on origin/master: routing model IDs listed only under custom_providers[].models back to the named custom provider. I extracted that as a fresh Michael-owned narrow PR: fix: route custom provider models dict selections #1752.

Verification from the fresh branch:

  • Targeted model/cache suites: 51 passed in 48.98s
  • Full isolated suite: 4597 passed, 2 skipped, 3 xpassed, 1 warning, 8 subtests passed in 378.87s

Recommendation: close this conflicting PR in favor of the already-merged cache work plus #1752 for the remaining custom-provider routing slice.

@nesquena-hermes
Copy link
Copy Markdown
Collaborator

Closing — superseded on master

Thanks @lost9999 — closing per @Michaelyklam's reconciliation comment from May 06.

The cache-invalidation slice from this PR is already on master: api/config.py now stamps /api/models disk cache entries with _source_fingerprint, _schema_version, and a TTL, and the on-config-reload invalidation path runs through the centralized invalidate_models_cache() helper that set_provider_key and remove_provider_key both call. That landed across the v0.50.282 / v0.50.288 / v0.51.x release cycle.

The branch here was draft + DIRTY for several weeks while the master-side cleanup absorbed the same fix shape. Rather than ask you to rebase a draft against the now-superseded code, closing this out so the queue stays clean. Appreciate getting it on our radar early — the canonical fix on master is structurally similar to your approach.

If you spot a residual cache-invalidation gap that the master-side invalidate_models_cache() plumbing doesn't cover, please open a fresh issue with a repro — happy to triage.

nesquena-hermes added a commit that referenced this pull request May 9, 2026
CHANGELOG, ROADMAP, TESTING refresh for v0.51.31 stage release covering
12 contributor PRs:

Added (2 PRs):
- #1956 JKJameson — persistent composer draft (server-side, cross-client)
- #1957 hermes-gimmethebeans — configurable session TTL via env + settings

Fixed (10 PRs):
- #1939 ai-ag2026 — theme-color + sw cache regression coverage
- #1941 ai-ag2026 — preserve chat scroll across final render
- #1945 franksong2702 — localize session jump controls (#1938)
- #1947 happy5318 — show same model from different custom providers
  (Co-authored-by hacker1e7 for #1874 close)
- #1949 Sanjays2402 — close #1937 endless-scroll vs Start-jump race
  with generation-token + mutex
  (Co-authored-by franksong2702 + Michaelyklam)
- #1950 franksong2702 — mute stale stopped gateway heartbeat (#1944)
- #1951 amlyczz — gate goal hook on goal-related turns (#1932)
  (Co-authored-by franksong2702 for #1946 close)
- #1953 lucky-yonug — skip provider peel for custom host:port slugs
- #1960 Michaelyklam — translate hidden-files workspace label (#1841)
- #1961 sbe27 — respect image_input_mode (#1959)

Closed in favor of canonical: #1942, #1962, #1946, #1874, #1311.

Stage-326 hotfixes (per Opus advisor):
- CRITICAL #1951 PENDING_GOAL_CONTINUATION race fix (removed finally
  discard that race-erased the marker before consumer could read it)
- #1956 composer-draft input validation (50 KB text / 50 file clamp +
  type coercion to prevent unbounded session-JSON bloat)
- #1957 SESSION_TTL constant preserved as named fallback (existing
  regression tests pin it; #1957 originally deleted it)

Tests: 5006 → 5028 (+51 net new) — 0 regressions, 142.61s runtime.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants