fix: context-window indicator broken on older sessions (#1436) by nesquena-hermes · Pull Request #1437 · nesquena/hermes-webui

nesquena-hermes · 2026-05-02T01:43:53Z

#1436 — context-window indicator broken on older sessions

Closes #1436. Reported by @AvidFuturist in Discord ("the 100 comes up way too often").

What was broken

Older sessions (pre-#1318, when context_length was added to Session) showed the context-window indicator as "100" on the ring with a tooltip claiming "890% used (context exceeded), 1.2M / 131.1k tokens used". The "compress now" hint kept appearing for sessions users couldn't actually compress.

#1356 (closed Apr 30) fixed the same symptom on the live SSE path. This bug is the same shape on the GET /api/session load path that every older session goes through when clicked from the sidebar — the fix wasn't carried over.

Empirical impact: 23 of 75 sessions on the dev server were rendering >100% before this fix. With models like claude-opus-4-7 (1M context) and long sessions (input_tokens > 5M), the displayed rawPct was as high as 11,000%.

Root cause (two cascading fallbacks)

When an older session is loaded:

Backend api/routes.py:1302-1304 returned context_length, threshold_tokens, last_prompt_tokens straight from the persisted Session object — all 0 for pre-fix: context window indicator not updating + responsive sidebar + chat area width at 769px #1318 sessions:
```
"context_length": getattr(s, "context_length", 0) or 0,
"threshold_tokens": getattr(s, "threshold_tokens", 0) or 0,
"last_prompt_tokens": getattr(s, "last_prompt_tokens", 0) or 0,
```

Frontend static/ui.js:1269, 1273 then triggered two more cascading fallbacks:

const promptTok = usage.last_prompt_tokens || usage.input_tokens || 0;  // → cumulative input
const ctxWindow = usage.context_length || DEFAULT_CTX;                  // → 131,072 (128K)

Result: rawPct = cumulative_input / 128K = ~890-11000%, capped to 100 on the ring per :1284 (Math.min(100, rawPct)), with the tooltip showing the raw % per :1313 (overflowed?${rawPct}% used (context exceeded)).

This is exactly the symptom #1356 fixed on the SSE path, repeated on the load path because the same fallback was never added to api/routes.py.

Fix shape — two-layer defense

Layer 1: backend fallback (`api/routes.py:1295-1313`)

Mirror the SSE-path fallback from api/streaming.py:2333-2342. When persisted context_length is 0 and the session has a model name, look up the model's static context window from agent.model_metadata.get_model_context_length():

# Resolve effective context_length with model-metadata fallback so
# older sessions (pre-#1318) that have context_length=0 persisted
# still render a meaningful indicator on load.  Mirrors the
# SSE-path fallback in api/streaming.py:2333-2342.  Fixes #1436.
_persisted_cl = getattr(s, "context_length", 0) or 0
if not _persisted_cl:
    _model_for_lookup = (
        getattr(s, "model", "") or effective_model or ""
    ).strip()
    if _model_for_lookup:
        try:
            from agent.model_metadata import get_model_context_length as _get_cl
            _fb_cl = _get_cl(_model_for_lookup, "") or 0
            if _fb_cl:
                _persisted_cl = _fb_cl
        except Exception:
            pass

The empty-model guard matters: get_model_context_length('', '') returns 256K (the "unknown" default), which would silently overwrite legitimate zero values. Skipping the call avoids that trap.

Layer 2: frontend defense in depth (`static/ui.js:1269`)

Drop the usage.input_tokens fallback entirely:

// Old:
const promptTok = usage.last_prompt_tokens || usage.input_tokens || 0;
// New:
const promptTok = usage.last_prompt_tokens || 0;

Cumulative input_tokens is fundamentally wrong for "context window % used" — it sums input across all turns. Better to render · (no last-prompt data) than wrong data. The existing !hasPromptTok branch already handles this case by switching to ${tokens} tokens used in the tooltip and the · glyph in the ring center; this PR removes the misleading numeric fallback that was bypassing it.

This second change is the more important one: it prevents this specific failure mode from recurring even if a future code path forgets the backend fallback.

Verified live (before/after)

Patched the running dev server (port 8787) temporarily to compare:

Before (production v0.50.262):

sid	ctx	lpt	input	model	rendered
5adb769c0974	0	0	5,226,479	claude-opus-4-7	"100" / "3987% used (context exceeded)"
695f76f9c246	0	0	14,550,872	claude-sonnet-4.6	"100" / "11101% used (context exceeded)"
a0bccfc8f372	1,000,000	29,217	58,097	claude-sonnet-4.6	"3" / "3% used (97% left)" ← unchanged baseline

After (this PR):

sid	ctx	lpt	input	rendered
5adb769c0974	1,000,000	0	5,226,479	"·" / "5.3M tokens used" — In: 5.2M · Out: 24.4k
695f76f9c246	1,000,000	0	14,550,872	"·" / "14.6M tokens used"
a0bccfc8f372	1,000,000	29,217	58,097	"3" / "3% used (97% left)" — unchanged ✓

Healthy sessions are unaffected. Older sessions now render an honest no-data marker instead of a misleading >100% percentage.

Tests — 10 new regression tests

tests/test_issue1436_context_indicator_load_path.py pins both layers:

Backend (5 tests) — exercise the actual /api/session route via routes.handle_get:

test_persisted_context_length_passed_through_unchanged — non-zero values are NOT overwritten
test_zero_context_length_falls_back_to_model_metadata — the bug fix
test_fallback_called_with_persisted_model — verify it queries with Session.model
test_empty_model_skips_fallback — avoid the 256K default-for-unknown trap
test_fallback_exception_does_not_break_response — older agent builds without the helper

Frontend (3 tests) — pin the no-input_tokens-fallback invariant:

test_promptTok_does_not_fall_back_to_input_tokens
test_promptTok_assignment_uses_last_prompt_tokens_only
test_no_data_branch_renders_dot

Static markers (2 tests) — pin the comment + import location for future maintainers:

test_routes_load_path_imports_get_model_context_length
test_routes_load_path_marks_fix_with_issue_number

$ pytest tests/test_issue1436_context_indicator_load_path.py -v
============ 10 passed in 3.04s ============
$ pytest -q --timeout=60
3658 passed, 2 skipped, 3 xpassed in 88.30s

Cross-links

Predecessor: fix: context window indicator shows 100% when context_length is missing from SSE payload #1356 (closed Apr 30 2026) — same symptom on SSE path, fix didn't cover load path
Related: fix: context window indicator not updating + responsive sidebar + chat area width at 769px #1318 (added context_length field to Session), fix(streaming): fallback to model_metadata for context_length when compressor missing (#1318 follow-up) #1348 (fix: context window indicator not updating + responsive sidebar + chat area width at 769px #1318 follow-up — model_metadata fallback in session save), fix(ui): show context indicator percentage without explicit context_length #1349 (UI: show percentage without explicit context_length)

Scope

api/routes.py (+19 LOC) + static/ui.js (+9 -2 LOC) + tests (+298 LOC) + CHANGELOG. Surgical fix, no API contract changes.

Fix two-layer bug where `/api/session` returned `context_length=0` for sessions that pre-date #1318, then the frontend silently fell back to cumulative `input_tokens` and the 128K JS default, producing nonsense indicators like "100" capped from "890% used (context exceeded), 1.2M / 131.1k tokens used". Empirical impact: 23 of 75 sessions on dev server rendered >100% before this fix. #1356 fixed the same symptom on the live SSE path but missed the GET /api/session load path that older sessions go through. Two-layer fix: 1. Backend (api/routes.py:1295-1313) — resolve context_length via agent.model_metadata.get_model_context_length() when the persisted value is 0. Mirrors api/streaming.py:2333-2342. 2. Frontend (static/ui.js:1269) — drop the cumulative `input_tokens` fallback. When last_prompt_tokens is missing, render "·" + "tokens used" (existing !hasPromptTok branch) instead of computing a percentage from the cumulative total. 10 regression tests in tests/test_issue1436_context_indicator_load_path.py covering both layers + the empty-model edge case (avoids the 256K default-for-unknown-model trap that get_model_context_length('') returns). Verified live: claude-opus-4-7 session with input_tokens=5,226,479 now renders "·" + "5.3M tokens used" instead of "100" + "3987% used". Reported by @AvidFuturist. Closes #1436.

nesquena

Review — end-to-end ✅ (clean approve, two-layer defense verified)

Note: PR #1436 doesn't exist as a PR number — 1436 is the issue. The PR fixing it is #1437 ("fix: context-window indicator broken on older sessions (#1436)"). Reviewing #1437 here.

Surgical fix for the issue's symptom: older sessions (pre-#1318) loaded their context-window indicator showing 100 on the ring with a tooltip claiming 890% used (context exceeded), 1.2M / 131.1k tokens used. The bug was a missed coverage of the GET /api/session load path — #1356 (Apr 30) fixed the same symptom on the live SSE path but didn't carry the fallback to the load path.

What this ships

Two-layer defense for the indicator:

Layer 1 — Backend fallback at api/routes.py:1295-1311:

_persisted_cl = getattr(s, "context_length", 0) or 0
if not _persisted_cl:
    _model_for_lookup = (getattr(s, "model", "") or effective_model or "").strip()
    if _model_for_lookup:
        try:
            from agent.model_metadata import get_model_context_length as _get_cl
            _fb_cl = _get_cl(_model_for_lookup, "") or 0
            if _fb_cl:
                _persisted_cl = _fb_cl
        except Exception:
            pass

This mirrors the SSE-path fallback in api/streaming.py:2333-2342 exactly. The if _model_for_lookup: empty-model guard is critical — get_model_context_length('', '') returns 256K (the unknown-default per the resolution chain at /tmp/hermes-agent-fresh/agent/model_metadata.py:1250), which would silently overwrite legitimate zero values when a session has no persisted model.

Layer 2 — Frontend defense in depth at static/ui.js:1274:

// Old:  const promptTok = usage.last_prompt_tokens || usage.input_tokens || 0;
// New:  const promptTok = usage.last_prompt_tokens || 0;

Cumulative input_tokens is fundamentally wrong for "context window % used" — it sums input across all turns. Dividing 5M cumulative input by a 128K context window gives rawPct = 3987%, which then displays as 100 on the ring (capped via Math.min(100, rawPct) at line 1289) and 3987% used (context exceeded) in the tooltip. The fix renders · + 5.2M tokens used instead — honest no-data through the existing !hasPromptTok branch at line 1287.

Behavioural harness — 6 scenarios

✅ Older session ctx=0 → resolved 1M, lpt=0, input=5.2M
   center='·'    tooltip='5250879 tokens used'     ctxWindow=1000000
✅ Older session ctx=0, fallback skipped, lpt=0
   center='·'    tooltip='5250879 tokens used'     ctxWindow=131072 (128K default)
✅ Healthy session, lpt=29217, ctx=1M
   center='3'    tooltip='3% used (97% left)'      ctxWindow=1000000
✅ Pre-fix scenario (the bug)
   center='·'    tooltip='1230000 tokens used'     ctxWindow=131072
   ↑ would have been: center='100' tooltip='890% used (context exceeded)'
✅ Fresh session, no data
   hidden=true
✅ Single turn, lpt < ctx
   center='3'    tooltip='3% used (97% left)'      ctxWindow=200000

The frontend defense alone (Layer 2) is sufficient to prevent the 890% display even when the backend fallback fails. Belt-and-suspenders.

Traced against upstream hermes-agent

get_model_context_length() signature verified at /tmp/hermes-agent-fresh/agent/model_metadata.py:1229 — (model, base_url, api_key, config_context_length, provider, custom_providers). The PR passes only the first two; rest default. Same call shape as the existing SSE-path fallback in streaming.py:2333-2342. The 9-stage resolution chain ends in 256K default, so even unknown models get a non-zero number — but the empty-model guard skips the call entirely when there's nothing to look up.

End-to-end trace

Pre-fix path (older session, click from sidebar):

Frontend POST GET /api/session?session_id=<old-sid>.
Backend at api/routes.py:1265 loads via get_session(sid).
Returns context_length: getattr(s, "context_length", 0) or 0 = 0 (pre-#1318 session).
Frontend _syncCtxIndicator at static/ui.js:1264:
- promptTok = usage.last_prompt_tokens(0) || usage.input_tokens(5.2M) || 0 = 5.2M
- ctxWindow = usage.context_length(0) || DEFAULT_CTX = 131,072
- rawPct = 5.2M / 131K = ~3987 → pct = min(100, 3987) = 100
- Ring shows 100, tooltip shows 3987% used (context exceeded). Bug.

Post-fix path:
1-2. Same.
3. NEW: Backend at api/routes.py:1299-1311 checks if persisted context_length=0, looks up via get_model_context_length(s.model, ""), returns 1,000,000 for claude-sonnet-4.6.
4. Frontend _syncCtxIndicator:

NEW: promptTok = usage.last_prompt_tokens(0) || 0 = 0
ctxWindow = 1,000,000 (resolved by Layer 1)
hasPromptTok = false → ring center shows ·, tooltip shows 5.2M tokens used. ✅

Cross-tool consistency

✅ get_model_context_length is read-only from agent's metadata cache. No CLI write contention.
✅ /api/session response shape: only the value of context_length changed; the field is webui-only and not sent back to the agent.
✅ No config.yaml writes.
✅ No new endpoints.

Race / lock analysis

Backend fallback is read-only via get_model_context_length (cached lookup with possible fallback to network probe on first hit). May briefly hold an internal cache lock, but no shared mutable state in webui process. ✅
Frontend change is single-threaded JS event-loop. ✅

Security audit

✅ get_model_context_length is read-only — no SSRF amplification, no SQL.
✅ s.model is loaded from the session JSON file (developer-controlled); not user-controllable via the GET request.
✅ effective_model fallback comes from the _resolve_effective_session_model_for_display(s) helper which reads cached config — not directly from request input.
✅ No XSS surface added (frontend just changes a numeric calculation).
✅ Auth gate unchanged — /api/session is gated by check_auth in server.py:76.
✅ Exception swallowing is appropriately scoped (only swallows on the metadata import/call path, doesn't mask other errors).

Edge-case matrix

Scenario	Pre-fix	Post-fix
Older session pre-#1318 with cumulative input > 128K	100 / "890% used" misleading	`·` / "tokens used" honest ✅
Older session, model unknown to metadata	Same bug	Falls back to 256K default → still avoids overflow display via Layer 2 ✅
Older session, model field empty	Same bug	Layer 1 skips the call (empty-model guard); Layer 2 renders honest no-data ✅
Older session, agent build without `get_model_context_length`	Same bug	`try/except Exception` swallows; Layer 2 renders honest no-data ✅
Healthy session with `last_prompt_tokens` set	"X% used (Y% left)"	Same — unchanged ✅
New session pre-first-turn	hidden=true	Same ✅
Session with `context_length` already set (post-#1318)	"X% used"	Same — Layer 1 only fires on `_persisted_cl == 0` ✅
Single user message session, `lpt < ctx`	"3% used"	Same ✅
Session where input >> ctx but lpt is set	"100 / 890% used"	"100% used (0% left)" — wait, NO — looking again, healthy lpt > ctx still shows real `pct` and `rawPct > 100` shows `(context exceeded)` correctly. Edge: lpt > ctx (one user message bigger than context window) is handled by existing #1356 logic. ✅

Tests

test_issue1436_context_indicator_load_path.py — 10/10 pass:
- 5 backend tests (passthrough, zero→fallback, model-passing, empty-model-skip, exception-swallow)
- 3 frontend tests (no input_tokens fallback, last_prompt_tokens-only assignment, no-data branch renders dot)
- 2 source markers (import location, fix-comment references issue number)
Full suite: 3605 passed, 55 skipped, 3 xpassed, 0 failed in 17.55s on the PR branch.
CI: 3.11/3.12/3.13 all green.
Behavioural harness: 6-case Node simulation confirms 890% scenario now renders · + tokens.

(PR description claims 3658; my count 3605. Counting drift consistent with prior batches.)

Other audit — what I checked

✅ effective_model is in scope at line 1302 (defined at line 1267).
✅ Mirror of SSE-path fallback at api/streaming.py:2333-2342 — identical signature, same exception-swallow shape.
✅ Empty-model guard correctly handles the "unknown" 256K default trap.
✅ Exception (broad) is appropriate here — covers ImportError (older agent build), AttributeError (signature mismatch), OSError (network probe failure), etc. Same pattern as the SSE path.
✅ Frontend test asserts the negative pattern (no usage.input_tokens in promptTok assignment) — prevents silent revert.
✅ !hasPromptTok branch already existed at line 1287 — this PR just stops bypassing it via the input_tokens fallback. No new code path needed.

Minor observations (non-blocking)

effective_model may be None if resolve_model=False. The or effective_model or "" chain handles this. Safe.
Unknown-model 256K default: when a session has a non-empty but unrecognized model name, the resolver returns 256K as the unknown-default. For sessions with cumulative input > 256K, a misleading display is still possible IF the frontend layer 2 weren't there. The two-layer defense is the right answer.
PR title references "#1436": that's the issue number, correctly cited. The PR itself is #1437. No confusion in the actual code references.
Test count drift: 3605 local vs 3658 reported. Same pattern as prior batches.
Frontend defense subsumes the backend fallback: even without Layer 1, the frontend's last_prompt_tokens || 0 (no input_tokens fallback) would prevent the 890% display by rendering · + tokens. Layer 1 is still valuable because it gives sessions with last_prompt_tokens > 0 (post-first-turn but pre-#1318) a correct percentage, but if a future regression dropped Layer 1, the user would still see honest "no data" instead of misleading "890% used".

Recommendation

✅ Approved. Two-layer fix correctly mirrors the SSE-path pattern from #1356 to the GET /api/session load path. Backend resolves missing context_length via get_model_context_length() with the same empty-model guard as the SSE path. Frontend drops the cumulative input_tokens fallback that produced the misleading >100% display. Behavioral harness confirms the bug is fixed across 6 scenarios; healthy sessions are unaffected.

Parked at approval — ready for the release agent's merge/tag pipeline.

nesquena-hermes added bug Something isn't working sprint-candidate Strong candidate for next sprint labels May 2, 2026

nesquena-hermes mentioned this pull request May 2, 2026

bug(context-indicator): older sessions show '100' / '890% used' on load — #1356 fix didn't cover GET /api/session path, falls back to cumulative input_tokens vs 128K default #1436

Closed

nesquena approved these changes May 2, 2026

View reviewed changes

docs: v0.50.263 release notes and version bump

51552e8

nesquena-hermes merged commit 9d0d86b into master May 2, 2026
3 checks passed

nesquena-hermes deleted the fix/issue-1436-context-indicator-load-path branch May 2, 2026 01:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: context-window indicator broken on older sessions (#1436)#1437

fix: context-window indicator broken on older sessions (#1436)#1437
nesquena-hermes merged 2 commits into
masterfrom
fix/issue-1436-context-indicator-load-path

nesquena-hermes commented May 2, 2026

Uh oh!

nesquena left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

nesquena-hermes commented May 2, 2026

#1436 — context-window indicator broken on older sessions

What was broken

Root cause (two cascading fallbacks)

Fix shape — two-layer defense

Layer 1: backend fallback (api/routes.py:1295-1313)

Layer 2: frontend defense in depth (static/ui.js:1269)

Verified live (before/after)

Tests — 10 new regression tests

Cross-links

Scope

Uh oh!

nesquena left a comment

Choose a reason for hiding this comment

Review — end-to-end ✅ (clean approve, two-layer defense verified)

What this ships

Behavioural harness — 6 scenarios

Traced against upstream hermes-agent

End-to-end trace

Cross-tool consistency

Race / lock analysis

Security audit

Edge-case matrix

Tests

Other audit — what I checked

Minor observations (non-blocking)

Recommendation

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Layer 1: backend fallback (`api/routes.py:1295-1313`)

Layer 2: frontend defense in depth (`static/ui.js:1269`)