Problem
Switching between large sessions can still feel slow even though Hermes WebUI already windows the transcript and caches rendered HTML for some idle sessions.
This issue tracks measured, small-slice improvements to session-switch latency. It should not become a broad session renderer rewrite or a standalone instrumentation project.
Evidence so far
Local 8787 measurement on three real idle sessions showed duplicate transcript post-processing before #2166:
highlightCode: 3 calls per switch
addCopyButtons: 2 calls per switch
- PDF / HTML / Mermaid / KaTeX helpers: 2 calls per switch
After #2166, the same helper set runs once per switch.
Measured wall-clock numbers from the same local 8787 pass:
| Session |
Messages |
Before first pass |
After first pass |
Before second pass |
After second pass |
20260513_112418_89ca08 |
1793 |
5266.7 ms |
3037.5 ms |
2732.5 ms |
1350.3 ms |
20260504_160031_1fd2bc |
855 |
4198.1 ms |
1668.0 ms |
2215.1 ms |
1298.4 ms |
20260503_152835_357902 |
724 |
3251.5 ms |
1182.4 ms |
1116.9 ms |
1249.9 ms |
The wall-clock values include API, filesystem, sidebar, workspace, and browser runtime variance. The stable invariant is the duplicate helper-call reduction. This tracking issue should avoid turning those wall-clock values into fake precision.
Current slices
#2166 — consolidate transcript post-render processing
#2166 removes the most obvious duplicate frontend work:
- cached transcript path
- fresh transcript rebuild path
- idle
loadSession() path no longer runs an extra highlightCode()
This is intentionally narrow and does not claim to fix every source of session-switch latency.
#2170 — skip CLI metadata lookup for native sessions
#2170 addresses the next measured backend bottleneck after #2166.
Local direct-route profiling showed native WebUI session metadata loads were still paying an unnecessary Agent/CLI metadata scan. Native WebUI sessions already have local metadata, so the CLI/messaging fallback path can be skipped for those sessions while preserved for imported CLI and messaging-backed sessions.
Measured local route timing for GET /api/session?messages=0&resolve_model=0 on 20260513_112418_89ca08 improved from median 662.82 ms to 1.71 ms.
#2171 — trim bounded tail response overhead
#2171 addresses tail-window response overhead for GET /api/session?messages=1&resolve_model=0&msg_limit=30.
Local profiling showed the response was inflated by historical session.tool_calls and spent avoidable time in response redaction. The slice keeps redaction and legacy tool-call fallback behavior, but:
- skips the full redactor for strings without likely credential markers
- omits historical
tool_calls when returned messages already carry per-message tool metadata
Measured local route timing, with #2170's metadata scan removed in the harness, reduced representative tail-window responses from hundreds of milliseconds and hundreds of KB / MB down to low milliseconds and much smaller payloads.
#2171 currently has an unrelated Python 3.13 CI failure in tests/test_ctl_script.py; #2172 / #2173 track and fix that separate ctl.sh lifecycle issue.
Follow-up policy
Further work should remain profiling-first and fix-first:
- Use local or temporary profiling to identify the next real bottleneck.
- Do not add standalone debug/timing UI to the main repository unless it directly supports a concrete fix and maintainer feedback asks for it.
- Each follow-up PR should name the measured bottleneck it addresses.
- Each follow-up PR should include before/after evidence or an invariant regression test.
- Perf claims should distinguish deterministic call-count / payload reductions from noisy wall-clock measurements.
Possible future slices
These are candidates only if measurement justifies them:
- Reduce avoidable side effects during session switches, for example workspace refresh or adjacent sidebar work that does not affect the visible switch result.
- Investigate active/in-flight session cache behavior by caching only stable historical transcript content while keeping the live assistant segment separate.
- Lazy-initialize heavy inline components such as Mermaid, PDF, HTML iframe, Excalidraw, or KaTeX only if artifact-heavy sessions prove this is the common slow path.
- Consider Markdown rendering changes only if profiling proves Markdown conversion is the dominant bottleneck.
Non-goals
Problem
Switching between large sessions can still feel slow even though Hermes WebUI already windows the transcript and caches rendered HTML for some idle sessions.
This issue tracks measured, small-slice improvements to session-switch latency. It should not become a broad session renderer rewrite or a standalone instrumentation project.
Evidence so far
Local 8787 measurement on three real idle sessions showed duplicate transcript post-processing before #2166:
highlightCode: 3 calls per switchaddCopyButtons: 2 calls per switchAfter #2166, the same helper set runs once per switch.
Measured wall-clock numbers from the same local 8787 pass:
20260513_112418_89ca0820260504_160031_1fd2bc20260503_152835_357902The wall-clock values include API, filesystem, sidebar, workspace, and browser runtime variance. The stable invariant is the duplicate helper-call reduction. This tracking issue should avoid turning those wall-clock values into fake precision.
Current slices
#2166 — consolidate transcript post-render processing
#2166 removes the most obvious duplicate frontend work:
loadSession()path no longer runs an extrahighlightCode()This is intentionally narrow and does not claim to fix every source of session-switch latency.
#2170 — skip CLI metadata lookup for native sessions
#2170 addresses the next measured backend bottleneck after #2166.
Local direct-route profiling showed native WebUI session metadata loads were still paying an unnecessary Agent/CLI metadata scan. Native WebUI sessions already have local metadata, so the CLI/messaging fallback path can be skipped for those sessions while preserved for imported CLI and messaging-backed sessions.
Measured local route timing for
GET /api/session?messages=0&resolve_model=0on20260513_112418_89ca08improved from median 662.82 ms to 1.71 ms.#2171 — trim bounded tail response overhead
#2171 addresses tail-window response overhead for
GET /api/session?messages=1&resolve_model=0&msg_limit=30.Local profiling showed the response was inflated by historical
session.tool_callsand spent avoidable time in response redaction. The slice keeps redaction and legacy tool-call fallback behavior, but:tool_callswhen returned messages already carry per-message tool metadataMeasured local route timing, with #2170's metadata scan removed in the harness, reduced representative tail-window responses from hundreds of milliseconds and hundreds of KB / MB down to low milliseconds and much smaller payloads.
#2171 currently has an unrelated Python 3.13 CI failure in
tests/test_ctl_script.py; #2172 / #2173 track and fix that separatectl.shlifecycle issue.Follow-up policy
Further work should remain profiling-first and fix-first:
Possible future slices
These are candidates only if measurement justifies them:
Non-goals