Skip to content

perf(session): track measured session-switch latency improvements #2168

@franksong2702

Description

@franksong2702

Problem

Switching between large sessions can still feel slow even though Hermes WebUI already windows the transcript and caches rendered HTML for some idle sessions.

This issue tracks measured, small-slice improvements to session-switch latency. It should not become a broad session renderer rewrite or a standalone instrumentation project.

Evidence so far

Local 8787 measurement on three real idle sessions showed duplicate transcript post-processing before #2166:

  • highlightCode: 3 calls per switch
  • addCopyButtons: 2 calls per switch
  • PDF / HTML / Mermaid / KaTeX helpers: 2 calls per switch

After #2166, the same helper set runs once per switch.

Measured wall-clock numbers from the same local 8787 pass:

Session Messages Before first pass After first pass Before second pass After second pass
20260513_112418_89ca08 1793 5266.7 ms 3037.5 ms 2732.5 ms 1350.3 ms
20260504_160031_1fd2bc 855 4198.1 ms 1668.0 ms 2215.1 ms 1298.4 ms
20260503_152835_357902 724 3251.5 ms 1182.4 ms 1116.9 ms 1249.9 ms

The wall-clock values include API, filesystem, sidebar, workspace, and browser runtime variance. The stable invariant is the duplicate helper-call reduction. This tracking issue should avoid turning those wall-clock values into fake precision.

Current slices

#2166 — consolidate transcript post-render processing

#2166 removes the most obvious duplicate frontend work:

  • cached transcript path
  • fresh transcript rebuild path
  • idle loadSession() path no longer runs an extra highlightCode()

This is intentionally narrow and does not claim to fix every source of session-switch latency.

#2170 — skip CLI metadata lookup for native sessions

#2170 addresses the next measured backend bottleneck after #2166.

Local direct-route profiling showed native WebUI session metadata loads were still paying an unnecessary Agent/CLI metadata scan. Native WebUI sessions already have local metadata, so the CLI/messaging fallback path can be skipped for those sessions while preserved for imported CLI and messaging-backed sessions.

Measured local route timing for GET /api/session?messages=0&resolve_model=0 on 20260513_112418_89ca08 improved from median 662.82 ms to 1.71 ms.

#2171 — trim bounded tail response overhead

#2171 addresses tail-window response overhead for GET /api/session?messages=1&resolve_model=0&msg_limit=30.

Local profiling showed the response was inflated by historical session.tool_calls and spent avoidable time in response redaction. The slice keeps redaction and legacy tool-call fallback behavior, but:

  • skips the full redactor for strings without likely credential markers
  • omits historical tool_calls when returned messages already carry per-message tool metadata

Measured local route timing, with #2170's metadata scan removed in the harness, reduced representative tail-window responses from hundreds of milliseconds and hundreds of KB / MB down to low milliseconds and much smaller payloads.

#2171 currently has an unrelated Python 3.13 CI failure in tests/test_ctl_script.py; #2172 / #2173 track and fix that separate ctl.sh lifecycle issue.

Follow-up policy

Further work should remain profiling-first and fix-first:

  • Use local or temporary profiling to identify the next real bottleneck.
  • Do not add standalone debug/timing UI to the main repository unless it directly supports a concrete fix and maintainer feedback asks for it.
  • Each follow-up PR should name the measured bottleneck it addresses.
  • Each follow-up PR should include before/after evidence or an invariant regression test.
  • Perf claims should distinguish deterministic call-count / payload reductions from noisy wall-clock measurements.

Possible future slices

These are candidates only if measurement justifies them:

  1. Reduce avoidable side effects during session switches, for example workspace refresh or adjacent sidebar work that does not affect the visible switch result.
  2. Investigate active/in-flight session cache behavior by caching only stable historical transcript content while keeping the live assistant segment separate.
  3. Lazy-initialize heavy inline components such as Mermaid, PDF, HTML iframe, Excalidraw, or KaTeX only if artifact-heavy sessions prove this is the common slow path.
  4. Consider Markdown rendering changes only if profiling proves Markdown conversion is the dominant bottleneck.

Non-goals

Metadata

Metadata

Assignees

No one assigned

    Labels

    performancePerformance, speed, memory, virtual scrolltrackingTracking issue for follow-up work

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions