Skip to content

fix: context-window indicator broken on older sessions (#1436)#1437

Merged
nesquena-hermes merged 2 commits into
masterfrom
fix/issue-1436-context-indicator-load-path
May 2, 2026
Merged

fix: context-window indicator broken on older sessions (#1436)#1437
nesquena-hermes merged 2 commits into
masterfrom
fix/issue-1436-context-indicator-load-path

Conversation

@nesquena-hermes

Copy link
Copy Markdown
Collaborator

#1436 — context-window indicator broken on older sessions

Closes #1436. Reported by @AvidFuturist in Discord ("the 100 comes up way too often").

What was broken

Older sessions (pre-#1318, when context_length was added to Session) showed the context-window indicator as "100" on the ring with a tooltip claiming "890% used (context exceeded), 1.2M / 131.1k tokens used". The "compress now" hint kept appearing for sessions users couldn't actually compress.

#1356 (closed Apr 30) fixed the same symptom on the live SSE path. This bug is the same shape on the GET /api/session load path that every older session goes through when clicked from the sidebar — the fix wasn't carried over.

Empirical impact: 23 of 75 sessions on the dev server were rendering >100% before this fix. With models like claude-opus-4-7 (1M context) and long sessions (input_tokens > 5M), the displayed rawPct was as high as 11,000%.

Root cause (two cascading fallbacks)

When an older session is loaded:

  1. Backend api/routes.py:1302-1304 returned context_length, threshold_tokens, last_prompt_tokens straight from the persisted Session object — all 0 for pre-fix: context window indicator not updating + responsive sidebar + chat area width at 769px #1318 sessions:

    "context_length": getattr(s, "context_length", 0) or 0,
    "threshold_tokens": getattr(s, "threshold_tokens", 0) or 0,
    "last_prompt_tokens": getattr(s, "last_prompt_tokens", 0) or 0,
  2. Frontend static/ui.js:1269, 1273 then triggered two more cascading fallbacks:

    const promptTok = usage.last_prompt_tokens || usage.input_tokens || 0;  // → cumulative input
    const ctxWindow = usage.context_length || DEFAULT_CTX;                  // → 131,072 (128K)
  3. Result: rawPct = cumulative_input / 128K = ~890-11000%, capped to 100 on the ring per :1284 (Math.min(100, rawPct)), with the tooltip showing the raw % per :1313 (overflowed?${rawPct}% used (context exceeded)).

This is exactly the symptom #1356 fixed on the SSE path, repeated on the load path because the same fallback was never added to api/routes.py.

Fix shape — two-layer defense

Layer 1: backend fallback (api/routes.py:1295-1313)

Mirror the SSE-path fallback from api/streaming.py:2333-2342. When persisted context_length is 0 and the session has a model name, look up the model's static context window from agent.model_metadata.get_model_context_length():

# Resolve effective context_length with model-metadata fallback so
# older sessions (pre-#1318) that have context_length=0 persisted
# still render a meaningful indicator on load.  Mirrors the
# SSE-path fallback in api/streaming.py:2333-2342.  Fixes #1436.
_persisted_cl = getattr(s, "context_length", 0) or 0
if not _persisted_cl:
    _model_for_lookup = (
        getattr(s, "model", "") or effective_model or ""
    ).strip()
    if _model_for_lookup:
        try:
            from agent.model_metadata import get_model_context_length as _get_cl
            _fb_cl = _get_cl(_model_for_lookup, "") or 0
            if _fb_cl:
                _persisted_cl = _fb_cl
        except Exception:
            pass

The empty-model guard matters: get_model_context_length('', '') returns 256K (the "unknown" default), which would silently overwrite legitimate zero values. Skipping the call avoids that trap.

Layer 2: frontend defense in depth (static/ui.js:1269)

Drop the usage.input_tokens fallback entirely:

// Old:
const promptTok = usage.last_prompt_tokens || usage.input_tokens || 0;
// New:
const promptTok = usage.last_prompt_tokens || 0;

Cumulative input_tokens is fundamentally wrong for "context window % used" — it sums input across all turns. Better to render · (no last-prompt data) than wrong data. The existing !hasPromptTok branch already handles this case by switching to ${tokens} tokens used in the tooltip and the · glyph in the ring center; this PR removes the misleading numeric fallback that was bypassing it.

This second change is the more important one: it prevents this specific failure mode from recurring even if a future code path forgets the backend fallback.

Verified live (before/after)

Patched the running dev server (port 8787) temporarily to compare:

Before (production v0.50.262):

sid ctx lpt input model rendered
5adb769c0974 0 0 5,226,479 claude-opus-4-7 "100" / "3987% used (context exceeded)"
695f76f9c246 0 0 14,550,872 claude-sonnet-4.6 "100" / "11101% used (context exceeded)"
a0bccfc8f372 1,000,000 29,217 58,097 claude-sonnet-4.6 "3" / "3% used (97% left)" ← unchanged baseline

After (this PR):

sid ctx lpt input rendered
5adb769c0974 1,000,000 0 5,226,479 "·" / "5.3M tokens used" — In: 5.2M · Out: 24.4k
695f76f9c246 1,000,000 0 14,550,872 "·" / "14.6M tokens used"
a0bccfc8f372 1,000,000 29,217 58,097 "3" / "3% used (97% left)" — unchanged ✓

Healthy sessions are unaffected. Older sessions now render an honest no-data marker instead of a misleading >100% percentage.

Tests — 10 new regression tests

tests/test_issue1436_context_indicator_load_path.py pins both layers:

Backend (5 tests) — exercise the actual /api/session route via routes.handle_get:

  • test_persisted_context_length_passed_through_unchanged — non-zero values are NOT overwritten
  • test_zero_context_length_falls_back_to_model_metadata — the bug fix
  • test_fallback_called_with_persisted_model — verify it queries with Session.model
  • test_empty_model_skips_fallback — avoid the 256K default-for-unknown trap
  • test_fallback_exception_does_not_break_response — older agent builds without the helper

Frontend (3 tests) — pin the no-input_tokens-fallback invariant:

  • test_promptTok_does_not_fall_back_to_input_tokens
  • test_promptTok_assignment_uses_last_prompt_tokens_only
  • test_no_data_branch_renders_dot

Static markers (2 tests) — pin the comment + import location for future maintainers:

  • test_routes_load_path_imports_get_model_context_length
  • test_routes_load_path_marks_fix_with_issue_number
$ pytest tests/test_issue1436_context_indicator_load_path.py -v
============ 10 passed in 3.04s ============
$ pytest -q --timeout=60
3658 passed, 2 skipped, 3 xpassed in 88.30s

Cross-links

Scope

api/routes.py (+19 LOC) + static/ui.js (+9 -2 LOC) + tests (+298 LOC) + CHANGELOG. Surgical fix, no API contract changes.

Fix two-layer bug where `/api/session` returned `context_length=0` for
sessions that pre-date #1318, then the frontend silently fell back to
cumulative `input_tokens` and the 128K JS default, producing nonsense
indicators like "100" capped from "890% used (context exceeded), 1.2M
/ 131.1k tokens used".

Empirical impact: 23 of 75 sessions on dev server rendered >100% before
this fix. #1356 fixed the same symptom on the live SSE path but missed
the GET /api/session load path that older sessions go through.

Two-layer fix:
  1. Backend (api/routes.py:1295-1313) — resolve context_length via
     agent.model_metadata.get_model_context_length() when the persisted
     value is 0. Mirrors api/streaming.py:2333-2342.
  2. Frontend (static/ui.js:1269) — drop the cumulative `input_tokens`
     fallback. When last_prompt_tokens is missing, render "·" + "tokens
     used" (existing !hasPromptTok branch) instead of computing a
     percentage from the cumulative total.

10 regression tests in tests/test_issue1436_context_indicator_load_path.py
covering both layers + the empty-model edge case (avoids the 256K
default-for-unknown-model trap that get_model_context_length('') returns).

Verified live: claude-opus-4-7 session with input_tokens=5,226,479 now
renders "·" + "5.3M tokens used" instead of "100" + "3987% used".

Reported by @AvidFuturist.
Closes #1436.

@nesquena nesquena left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review — end-to-end ✅ (clean approve, two-layer defense verified)

Note: PR #1436 doesn't exist as a PR number — 1436 is the issue. The PR fixing it is #1437 ("fix: context-window indicator broken on older sessions (#1436)"). Reviewing #1437 here.

Surgical fix for the issue's symptom: older sessions (pre-#1318) loaded their context-window indicator showing 100 on the ring with a tooltip claiming 890% used (context exceeded), 1.2M / 131.1k tokens used. The bug was a missed coverage of the GET /api/session load path#1356 (Apr 30) fixed the same symptom on the live SSE path but didn't carry the fallback to the load path.

What this ships

Two-layer defense for the indicator:

Layer 1 — Backend fallback at api/routes.py:1295-1311:

_persisted_cl = getattr(s, "context_length", 0) or 0
if not _persisted_cl:
    _model_for_lookup = (getattr(s, "model", "") or effective_model or "").strip()
    if _model_for_lookup:
        try:
            from agent.model_metadata import get_model_context_length as _get_cl
            _fb_cl = _get_cl(_model_for_lookup, "") or 0
            if _fb_cl:
                _persisted_cl = _fb_cl
        except Exception:
            pass

This mirrors the SSE-path fallback in api/streaming.py:2333-2342 exactly. The if _model_for_lookup: empty-model guard is critical — get_model_context_length('', '') returns 256K (the unknown-default per the resolution chain at /tmp/hermes-agent-fresh/agent/model_metadata.py:1250), which would silently overwrite legitimate zero values when a session has no persisted model.

Layer 2 — Frontend defense in depth at static/ui.js:1274:

// Old:  const promptTok = usage.last_prompt_tokens || usage.input_tokens || 0;
// New:  const promptTok = usage.last_prompt_tokens || 0;

Cumulative input_tokens is fundamentally wrong for "context window % used" — it sums input across all turns. Dividing 5M cumulative input by a 128K context window gives rawPct = 3987%, which then displays as 100 on the ring (capped via Math.min(100, rawPct) at line 1289) and 3987% used (context exceeded) in the tooltip. The fix renders · + 5.2M tokens used instead — honest no-data through the existing !hasPromptTok branch at line 1287.

Behavioural harness — 6 scenarios

✅ Older session ctx=0 → resolved 1M, lpt=0, input=5.2M
   center='·'    tooltip='5250879 tokens used'     ctxWindow=1000000
✅ Older session ctx=0, fallback skipped, lpt=0
   center='·'    tooltip='5250879 tokens used'     ctxWindow=131072 (128K default)
✅ Healthy session, lpt=29217, ctx=1M
   center='3'    tooltip='3% used (97% left)'      ctxWindow=1000000
✅ Pre-fix scenario (the bug)
   center='·'    tooltip='1230000 tokens used'     ctxWindow=131072
   ↑ would have been: center='100' tooltip='890% used (context exceeded)'
✅ Fresh session, no data
   hidden=true
✅ Single turn, lpt < ctx
   center='3'    tooltip='3% used (97% left)'      ctxWindow=200000

The frontend defense alone (Layer 2) is sufficient to prevent the 890% display even when the backend fallback fails. Belt-and-suspenders.

Traced against upstream hermes-agent

get_model_context_length() signature verified at /tmp/hermes-agent-fresh/agent/model_metadata.py:1229(model, base_url, api_key, config_context_length, provider, custom_providers). The PR passes only the first two; rest default. Same call shape as the existing SSE-path fallback in streaming.py:2333-2342. The 9-stage resolution chain ends in 256K default, so even unknown models get a non-zero number — but the empty-model guard skips the call entirely when there's nothing to look up.

End-to-end trace

Pre-fix path (older session, click from sidebar):

  1. Frontend POST GET /api/session?session_id=<old-sid>.
  2. Backend at api/routes.py:1265 loads via get_session(sid).
  3. Returns context_length: getattr(s, "context_length", 0) or 0 = 0 (pre-#1318 session).
  4. Frontend _syncCtxIndicator at static/ui.js:1264:
    • promptTok = usage.last_prompt_tokens(0) || usage.input_tokens(5.2M) || 0 = 5.2M
    • ctxWindow = usage.context_length(0) || DEFAULT_CTX = 131,072
    • rawPct = 5.2M / 131K = ~3987pct = min(100, 3987) = 100
    • Ring shows 100, tooltip shows 3987% used (context exceeded). Bug.

Post-fix path:
1-2. Same.
3. NEW: Backend at api/routes.py:1299-1311 checks if persisted context_length=0, looks up via get_model_context_length(s.model, ""), returns 1,000,000 for claude-sonnet-4.6.
4. Frontend _syncCtxIndicator:

  • NEW: promptTok = usage.last_prompt_tokens(0) || 0 = 0
  • ctxWindow = 1,000,000 (resolved by Layer 1)
  • hasPromptTok = false → ring center shows ·, tooltip shows 5.2M tokens used. ✅

Cross-tool consistency

  • get_model_context_length is read-only from agent's metadata cache. No CLI write contention.
  • /api/session response shape: only the value of context_length changed; the field is webui-only and not sent back to the agent.
  • No config.yaml writes.
  • No new endpoints.

Race / lock analysis

  • Backend fallback is read-only via get_model_context_length (cached lookup with possible fallback to network probe on first hit). May briefly hold an internal cache lock, but no shared mutable state in webui process. ✅
  • Frontend change is single-threaded JS event-loop. ✅

Security audit

  • get_model_context_length is read-only — no SSRF amplification, no SQL.
  • s.model is loaded from the session JSON file (developer-controlled); not user-controllable via the GET request.
  • effective_model fallback comes from the _resolve_effective_session_model_for_display(s) helper which reads cached config — not directly from request input.
  • No XSS surface added (frontend just changes a numeric calculation).
  • Auth gate unchanged — /api/session is gated by check_auth in server.py:76.
  • Exception swallowing is appropriately scoped (only swallows on the metadata import/call path, doesn't mask other errors).

Edge-case matrix

Scenario Pre-fix Post-fix
Older session pre-#1318 with cumulative input > 128K 100 / "890% used" misleading · / "tokens used" honest ✅
Older session, model unknown to metadata Same bug Falls back to 256K default → still avoids overflow display via Layer 2 ✅
Older session, model field empty Same bug Layer 1 skips the call (empty-model guard); Layer 2 renders honest no-data ✅
Older session, agent build without get_model_context_length Same bug try/except Exception swallows; Layer 2 renders honest no-data ✅
Healthy session with last_prompt_tokens set "X% used (Y% left)" Same — unchanged ✅
New session pre-first-turn hidden=true Same ✅
Session with context_length already set (post-#1318) "X% used" Same — Layer 1 only fires on _persisted_cl == 0
Single user message session, lpt < ctx "3% used" Same ✅
Session where input >> ctx but lpt is set "100 / 890% used" "100% used (0% left)" — wait, NO — looking again, healthy lpt > ctx still shows real pct and rawPct > 100 shows (context exceeded) correctly. Edge: lpt > ctx (one user message bigger than context window) is handled by existing #1356 logic. ✅

Tests

  • test_issue1436_context_indicator_load_path.py — 10/10 pass:
    • 5 backend tests (passthrough, zero→fallback, model-passing, empty-model-skip, exception-swallow)
    • 3 frontend tests (no input_tokens fallback, last_prompt_tokens-only assignment, no-data branch renders dot)
    • 2 source markers (import location, fix-comment references issue number)
  • Full suite: 3605 passed, 55 skipped, 3 xpassed, 0 failed in 17.55s on the PR branch.
  • CI: 3.11/3.12/3.13 all green.
  • Behavioural harness: 6-case Node simulation confirms 890% scenario now renders · + tokens.

(PR description claims 3658; my count 3605. Counting drift consistent with prior batches.)

Other audit — what I checked

  • effective_model is in scope at line 1302 (defined at line 1267).
  • Mirror of SSE-path fallback at api/streaming.py:2333-2342 — identical signature, same exception-swallow shape.
  • Empty-model guard correctly handles the "unknown" 256K default trap.
  • Exception (broad) is appropriate here — covers ImportError (older agent build), AttributeError (signature mismatch), OSError (network probe failure), etc. Same pattern as the SSE path.
  • Frontend test asserts the negative pattern (no usage.input_tokens in promptTok assignment) — prevents silent revert.
  • !hasPromptTok branch already existed at line 1287 — this PR just stops bypassing it via the input_tokens fallback. No new code path needed.

Minor observations (non-blocking)

  • effective_model may be None if resolve_model=False. The or effective_model or "" chain handles this. Safe.
  • Unknown-model 256K default: when a session has a non-empty but unrecognized model name, the resolver returns 256K as the unknown-default. For sessions with cumulative input > 256K, a misleading display is still possible IF the frontend layer 2 weren't there. The two-layer defense is the right answer.
  • PR title references "#1436": that's the issue number, correctly cited. The PR itself is #1437. No confusion in the actual code references.
  • Test count drift: 3605 local vs 3658 reported. Same pattern as prior batches.
  • Frontend defense subsumes the backend fallback: even without Layer 1, the frontend's last_prompt_tokens || 0 (no input_tokens fallback) would prevent the 890% display by rendering · + tokens. Layer 1 is still valuable because it gives sessions with last_prompt_tokens > 0 (post-first-turn but pre-#1318) a correct percentage, but if a future regression dropped Layer 1, the user would still see honest "no data" instead of misleading "890% used".

Recommendation

Approved. Two-layer fix correctly mirrors the SSE-path pattern from #1356 to the GET /api/session load path. Backend resolves missing context_length via get_model_context_length() with the same empty-model guard as the SSE path. Frontend drops the cumulative input_tokens fallback that produced the misleading >100% display. Behavioral harness confirms the bug is fixed across 6 scenarios; healthy sessions are unaffected.

Parked at approval — ready for the release agent's merge/tag pipeline.

@nesquena-hermes nesquena-hermes merged commit 9d0d86b into master May 2, 2026
3 checks passed
@nesquena-hermes nesquena-hermes deleted the fix/issue-1436-context-indicator-load-path branch May 2, 2026 01:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working sprint-candidate Strong candidate for next sprint

Projects

None yet

3 participants