fix: context-window indicator broken on older sessions (#1436)#1437
Conversation
Fix two-layer bug where `/api/session` returned `context_length=0` for sessions that pre-date #1318, then the frontend silently fell back to cumulative `input_tokens` and the 128K JS default, producing nonsense indicators like "100" capped from "890% used (context exceeded), 1.2M / 131.1k tokens used". Empirical impact: 23 of 75 sessions on dev server rendered >100% before this fix. #1356 fixed the same symptom on the live SSE path but missed the GET /api/session load path that older sessions go through. Two-layer fix: 1. Backend (api/routes.py:1295-1313) — resolve context_length via agent.model_metadata.get_model_context_length() when the persisted value is 0. Mirrors api/streaming.py:2333-2342. 2. Frontend (static/ui.js:1269) — drop the cumulative `input_tokens` fallback. When last_prompt_tokens is missing, render "·" + "tokens used" (existing !hasPromptTok branch) instead of computing a percentage from the cumulative total. 10 regression tests in tests/test_issue1436_context_indicator_load_path.py covering both layers + the empty-model edge case (avoids the 256K default-for-unknown-model trap that get_model_context_length('') returns). Verified live: claude-opus-4-7 session with input_tokens=5,226,479 now renders "·" + "5.3M tokens used" instead of "100" + "3987% used". Reported by @AvidFuturist. Closes #1436.
nesquena
left a comment
There was a problem hiding this comment.
Review — end-to-end ✅ (clean approve, two-layer defense verified)
Note: PR #1436 doesn't exist as a PR number —
1436is the issue. The PR fixing it is #1437 ("fix: context-window indicator broken on older sessions (#1436)"). Reviewing #1437 here.
Surgical fix for the issue's symptom: older sessions (pre-#1318) loaded their context-window indicator showing 100 on the ring with a tooltip claiming 890% used (context exceeded), 1.2M / 131.1k tokens used. The bug was a missed coverage of the GET /api/session load path — #1356 (Apr 30) fixed the same symptom on the live SSE path but didn't carry the fallback to the load path.
What this ships
Two-layer defense for the indicator:
Layer 1 — Backend fallback at api/routes.py:1295-1311:
_persisted_cl = getattr(s, "context_length", 0) or 0
if not _persisted_cl:
_model_for_lookup = (getattr(s, "model", "") or effective_model or "").strip()
if _model_for_lookup:
try:
from agent.model_metadata import get_model_context_length as _get_cl
_fb_cl = _get_cl(_model_for_lookup, "") or 0
if _fb_cl:
_persisted_cl = _fb_cl
except Exception:
passThis mirrors the SSE-path fallback in api/streaming.py:2333-2342 exactly. The if _model_for_lookup: empty-model guard is critical — get_model_context_length('', '') returns 256K (the unknown-default per the resolution chain at /tmp/hermes-agent-fresh/agent/model_metadata.py:1250), which would silently overwrite legitimate zero values when a session has no persisted model.
Layer 2 — Frontend defense in depth at static/ui.js:1274:
// Old: const promptTok = usage.last_prompt_tokens || usage.input_tokens || 0;
// New: const promptTok = usage.last_prompt_tokens || 0;Cumulative input_tokens is fundamentally wrong for "context window % used" — it sums input across all turns. Dividing 5M cumulative input by a 128K context window gives rawPct = 3987%, which then displays as 100 on the ring (capped via Math.min(100, rawPct) at line 1289) and 3987% used (context exceeded) in the tooltip. The fix renders · + 5.2M tokens used instead — honest no-data through the existing !hasPromptTok branch at line 1287.
Behavioural harness — 6 scenarios
✅ Older session ctx=0 → resolved 1M, lpt=0, input=5.2M
center='·' tooltip='5250879 tokens used' ctxWindow=1000000
✅ Older session ctx=0, fallback skipped, lpt=0
center='·' tooltip='5250879 tokens used' ctxWindow=131072 (128K default)
✅ Healthy session, lpt=29217, ctx=1M
center='3' tooltip='3% used (97% left)' ctxWindow=1000000
✅ Pre-fix scenario (the bug)
center='·' tooltip='1230000 tokens used' ctxWindow=131072
↑ would have been: center='100' tooltip='890% used (context exceeded)'
✅ Fresh session, no data
hidden=true
✅ Single turn, lpt < ctx
center='3' tooltip='3% used (97% left)' ctxWindow=200000
The frontend defense alone (Layer 2) is sufficient to prevent the 890% display even when the backend fallback fails. Belt-and-suspenders.
Traced against upstream hermes-agent
get_model_context_length() signature verified at /tmp/hermes-agent-fresh/agent/model_metadata.py:1229 — (model, base_url, api_key, config_context_length, provider, custom_providers). The PR passes only the first two; rest default. Same call shape as the existing SSE-path fallback in streaming.py:2333-2342. The 9-stage resolution chain ends in 256K default, so even unknown models get a non-zero number — but the empty-model guard skips the call entirely when there's nothing to look up.
End-to-end trace
Pre-fix path (older session, click from sidebar):
- Frontend POST
GET /api/session?session_id=<old-sid>. - Backend at api/routes.py:1265 loads via
get_session(sid). - Returns
context_length: getattr(s, "context_length", 0) or 0= 0 (pre-#1318 session). - Frontend
_syncCtxIndicatorat static/ui.js:1264:promptTok = usage.last_prompt_tokens(0) || usage.input_tokens(5.2M) || 0= 5.2MctxWindow = usage.context_length(0) || DEFAULT_CTX= 131,072rawPct = 5.2M / 131K = ~3987→pct = min(100, 3987) = 100- Ring shows
100, tooltip shows3987% used (context exceeded). Bug.
Post-fix path:
1-2. Same.
3. NEW: Backend at api/routes.py:1299-1311 checks if persisted context_length=0, looks up via get_model_context_length(s.model, ""), returns 1,000,000 for claude-sonnet-4.6.
4. Frontend _syncCtxIndicator:
- NEW:
promptTok = usage.last_prompt_tokens(0) || 0= 0 ctxWindow = 1,000,000(resolved by Layer 1)hasPromptTok = false→ ring center shows·, tooltip shows5.2M tokens used. ✅
Cross-tool consistency
- ✅
get_model_context_lengthis read-only from agent's metadata cache. No CLI write contention. - ✅
/api/sessionresponse shape: only the value ofcontext_lengthchanged; the field is webui-only and not sent back to the agent. - ✅ No
config.yamlwrites. - ✅ No new endpoints.
Race / lock analysis
- Backend fallback is read-only via
get_model_context_length(cached lookup with possible fallback to network probe on first hit). May briefly hold an internal cache lock, but no shared mutable state in webui process. ✅ - Frontend change is single-threaded JS event-loop. ✅
Security audit
- ✅
get_model_context_lengthis read-only — no SSRF amplification, no SQL. - ✅
s.modelis loaded from the session JSON file (developer-controlled); not user-controllable via the GET request. - ✅
effective_modelfallback comes from the_resolve_effective_session_model_for_display(s)helper which reads cached config — not directly from request input. - ✅ No XSS surface added (frontend just changes a numeric calculation).
- ✅ Auth gate unchanged —
/api/sessionis gated bycheck_authinserver.py:76. - ✅ Exception swallowing is appropriately scoped (only swallows on the metadata import/call path, doesn't mask other errors).
Edge-case matrix
| Scenario | Pre-fix | Post-fix |
|---|---|---|
| Older session pre-#1318 with cumulative input > 128K | 100 / "890% used" misleading | · / "tokens used" honest ✅ |
| Older session, model unknown to metadata | Same bug | Falls back to 256K default → still avoids overflow display via Layer 2 ✅ |
| Older session, model field empty | Same bug | Layer 1 skips the call (empty-model guard); Layer 2 renders honest no-data ✅ |
Older session, agent build without get_model_context_length |
Same bug | try/except Exception swallows; Layer 2 renders honest no-data ✅ |
Healthy session with last_prompt_tokens set |
"X% used (Y% left)" | Same — unchanged ✅ |
| New session pre-first-turn | hidden=true | Same ✅ |
Session with context_length already set (post-#1318) |
"X% used" | Same — Layer 1 only fires on _persisted_cl == 0 ✅ |
Single user message session, lpt < ctx |
"3% used" | Same ✅ |
| Session where input >> ctx but lpt is set | "100 / 890% used" | "100% used (0% left)" — wait, NO — looking again, healthy lpt > ctx still shows real pct and rawPct > 100 shows (context exceeded) correctly. Edge: lpt > ctx (one user message bigger than context window) is handled by existing #1356 logic. ✅ |
Tests
test_issue1436_context_indicator_load_path.py— 10/10 pass:- 5 backend tests (passthrough, zero→fallback, model-passing, empty-model-skip, exception-swallow)
- 3 frontend tests (no input_tokens fallback, last_prompt_tokens-only assignment, no-data branch renders dot)
- 2 source markers (import location, fix-comment references issue number)
- Full suite: 3605 passed, 55 skipped, 3 xpassed, 0 failed in 17.55s on the PR branch.
- CI: 3.11/3.12/3.13 all green.
- Behavioural harness: 6-case Node simulation confirms 890% scenario now renders
·+ tokens.
(PR description claims 3658; my count 3605. Counting drift consistent with prior batches.)
Other audit — what I checked
- ✅
effective_modelis in scope at line 1302 (defined at line 1267). - ✅ Mirror of SSE-path fallback at
api/streaming.py:2333-2342— identical signature, same exception-swallow shape. - ✅ Empty-model guard correctly handles the "unknown" 256K default trap.
- ✅
Exception(broad) is appropriate here — coversImportError(older agent build),AttributeError(signature mismatch),OSError(network probe failure), etc. Same pattern as the SSE path. - ✅ Frontend test asserts the negative pattern (no
usage.input_tokensin promptTok assignment) — prevents silent revert. - ✅
!hasPromptTokbranch already existed at line 1287 — this PR just stops bypassing it via theinput_tokensfallback. No new code path needed.
Minor observations (non-blocking)
effective_modelmay beNoneifresolve_model=False. Theor effective_model or ""chain handles this. Safe.- Unknown-model 256K default: when a session has a non-empty but unrecognized model name, the resolver returns 256K as the unknown-default. For sessions with cumulative input > 256K, a misleading display is still possible IF the frontend layer 2 weren't there. The two-layer defense is the right answer.
- PR title references "#1436": that's the issue number, correctly cited. The PR itself is #1437. No confusion in the actual code references.
- Test count drift: 3605 local vs 3658 reported. Same pattern as prior batches.
- Frontend defense subsumes the backend fallback: even without Layer 1, the frontend's
last_prompt_tokens || 0(no input_tokens fallback) would prevent the 890% display by rendering·+ tokens. Layer 1 is still valuable because it gives sessions withlast_prompt_tokens > 0(post-first-turn but pre-#1318) a correct percentage, but if a future regression dropped Layer 1, the user would still see honest "no data" instead of misleading "890% used".
Recommendation
✅ Approved. Two-layer fix correctly mirrors the SSE-path pattern from #1356 to the GET /api/session load path. Backend resolves missing context_length via get_model_context_length() with the same empty-model guard as the SSE path. Frontend drops the cumulative input_tokens fallback that produced the misleading >100% display. Behavioral harness confirms the bug is fixed across 6 scenarios; healthy sessions are unaffected.
Parked at approval — ready for the release agent's merge/tag pipeline.
#1436 — context-window indicator broken on older sessions
Closes #1436. Reported by @AvidFuturist in Discord ("the 100 comes up way too often").
What was broken
Older sessions (pre-#1318, when
context_lengthwas added toSession) showed the context-window indicator as "100" on the ring with a tooltip claiming "890% used (context exceeded), 1.2M / 131.1k tokens used". The "compress now" hint kept appearing for sessions users couldn't actually compress.#1356 (closed Apr 30) fixed the same symptom on the live SSE path. This bug is the same shape on the GET /api/session load path that every older session goes through when clicked from the sidebar — the fix wasn't carried over.
Empirical impact: 23 of 75 sessions on the dev server were rendering >100% before this fix. With models like
claude-opus-4-7(1M context) and long sessions (input_tokens > 5M), the displayedrawPctwas as high as 11,000%.Root cause (two cascading fallbacks)
When an older session is loaded:
Backend
api/routes.py:1302-1304returnedcontext_length,threshold_tokens,last_prompt_tokensstraight from the persisted Session object — all0for pre-fix: context window indicator not updating + responsive sidebar + chat area width at 769px #1318 sessions:Frontend
static/ui.js:1269, 1273then triggered two more cascading fallbacks:Result:
rawPct = cumulative_input / 128K = ~890-11000%, capped to 100 on the ring per:1284(Math.min(100, rawPct)), with the tooltip showing the raw % per:1313(overflowed?${rawPct}% used (context exceeded)).This is exactly the symptom #1356 fixed on the SSE path, repeated on the load path because the same fallback was never added to
api/routes.py.Fix shape — two-layer defense
Layer 1: backend fallback (
api/routes.py:1295-1313)Mirror the SSE-path fallback from
api/streaming.py:2333-2342. When persistedcontext_lengthis 0 and the session has a model name, look up the model's static context window fromagent.model_metadata.get_model_context_length():The empty-model guard matters:
get_model_context_length('', '')returns 256K (the "unknown" default), which would silently overwrite legitimate zero values. Skipping the call avoids that trap.Layer 2: frontend defense in depth (
static/ui.js:1269)Drop the
usage.input_tokensfallback entirely:Cumulative
input_tokensis fundamentally wrong for "context window % used" — it sums input across all turns. Better to render·(no last-prompt data) than wrong data. The existing!hasPromptTokbranch already handles this case by switching to${tokens} tokens usedin the tooltip and the·glyph in the ring center; this PR removes the misleading numeric fallback that was bypassing it.This second change is the more important one: it prevents this specific failure mode from recurring even if a future code path forgets the backend fallback.
Verified live (before/after)
Patched the running dev server (port 8787) temporarily to compare:
Before (production v0.50.262):
After (this PR):
Healthy sessions are unaffected. Older sessions now render an honest no-data marker instead of a misleading >100% percentage.
Tests — 10 new regression tests
tests/test_issue1436_context_indicator_load_path.pypins both layers:Backend (5 tests) — exercise the actual
/api/sessionroute viaroutes.handle_get:test_persisted_context_length_passed_through_unchanged— non-zero values are NOT overwrittentest_zero_context_length_falls_back_to_model_metadata— the bug fixtest_fallback_called_with_persisted_model— verify it queries withSession.modeltest_empty_model_skips_fallback— avoid the 256K default-for-unknown traptest_fallback_exception_does_not_break_response— older agent builds without the helperFrontend (3 tests) — pin the no-input_tokens-fallback invariant:
test_promptTok_does_not_fall_back_to_input_tokenstest_promptTok_assignment_uses_last_prompt_tokens_onlytest_no_data_branch_renders_dotStatic markers (2 tests) — pin the comment + import location for future maintainers:
test_routes_load_path_imports_get_model_context_lengthtest_routes_load_path_marks_fix_with_issue_numberCross-links
context_lengthfield to Session), fix(streaming): fallback to model_metadata for context_length when compressor missing (#1318 follow-up) #1348 (fix: context window indicator not updating + responsive sidebar + chat area width at 769px #1318 follow-up — model_metadata fallback in session save), fix(ui): show context indicator percentage without explicit context_length #1349 (UI: show percentage without explicit context_length)Scope
api/routes.py(+19 LOC) +static/ui.js(+9 -2 LOC) + tests (+298 LOC) + CHANGELOG. Surgical fix, no API contract changes.