Redesign live-to-final assistant replies for running agent sessions

## Summary

Hermes WebUI should redesign the live-to-final assistant reply experience for running agent sessions.

This is one of the most important UX surfaces in Hermes WebUI: when an agent is running, the user needs to understand what is happening now, what has already happened, and where the final answer begins. The current experience has repeatedly exposed edge cases where live progress, tool activity, replay/recovery, auto-compression, and final transcript rendering compete for the same visual surface.

The goal is not just to add a Worklog widget. The goal is to make the assistant reply lifecycle coherent:

- During a running turn, process text should remain the primary timeline.
- Tool activity should be visible, quiet, and close to the prose that caused it.
- When the turn settles, implementation detail should collapse into a compact activity summary above the final answer.
- The final answer must remain readable, complete, and clearly separate from the worklog.
- Recovery, session switching, and replay should reconstruct the same structure rather than creating a different UI state.

## Why this matters

Running agent sessions are the core Hermes WebUI experience. Users spend most of their time watching an agent reason, inspect files, run tools, recover from reconnects, and eventually produce a final answer. If this surface is noisy or unstable, the product feels unreliable even when the backend is working correctly.

Mature agent clients such as Codex and Claude Code generally treat intermediate tool/progress details as supporting activity rather than as the final answer itself. They make running state visible, but they do not force every internal lifecycle marker to remain in the final transcript. Hermes WebUI should follow the same product direction while preserving its own stronger browser affordances: replay, session switching, resumability, and structured tool details.

## Historical context

This issue intentionally cross-references prior Hermes WebUI problems because this UX has been revisited many times from different angles:

- Early working/final-answer separation: #536 established the direction that working output and final answer should not blur together.
- Manual and automatic compression UX: #469, #619, #699, #1142, #1316, #2355, #2357, #2973, and #3079 show repeated attempts to represent compression state without confusing the user.
- Live/replay/reconnect durability: #765, #1525, #1533, #1584, #2283, #2341, #2342, #2344, #2347, #2924, #2925, #3005, #3371, and #3391 cover the long-running theme that a browser client must be able to recover the running transcript from durable stream state.
- Activity/tool/thinking/progress rendering: #1298, #3014, and #3015 cover the need for visible progress while avoiding noisy thinking/tool surfaces.
- Terminal/no-final edge cases: #3315 and #3316 show that compression exhaustion and tool-tail termination need explicit terminal classification rather than pretending a turn completed normally.
- Stream ownership and cancellation: #3344 and #3345 cover duplicate ownership/cancel interactions that directly affect whether the user sees the correct active stream.
- Sidebar/session awareness: #856, #1012, #1341, #1358, #1370, #1436 and related session-list work show that running state also needs to remain coherent when users switch sessions.

These are not separate random bugs. They are symptoms of one product surface: running agent sessions need one consistent live-to-final assistant reply model.

## Product model

### Live phase

A running assistant turn should show:

- Visible prose/progress as the primary timeline.
- Quiet L2 tool rows near the prose that triggered them.
- Collapsed tool rows/groups by default.
- L3 details only after expansion: full command, args, output, and long payloads.
- A bottom live status/timer within the active assistant turn, not at the top of the transcript.
- Running-only lifecycle markers, such as automatic compression, as transient status rows/dividers rather than persistent transcript content.

### Final phase

A settled assistant turn should show:

- One compact L1 Activity summary at the top of the assistant reply.
- The L1 summary collapsed by default.
- Expanded L1 showing prose/tool Worklog details when the user asks for them.
- The final answer below the L1 summary, as normal assistant prose.
- No leftover running-only lifecycle markers that do not help the user interpret the final answer.

## Design requirements

1. Process prose is the primary live timeline.
2. Tool rows are supporting activity, visually quieter than prose.
3. Tool groups and individual tool rows are collapsed by default.
4. Full command/output details are L3-only.
5. Final Answer must not be swallowed by the Worklog.
6. Session switching and refresh must rebuild the same live/final structure from durable state.
7. Auto Compression should be visible while it is happening, but should not persist as final transcript text.
8. Recovery-control/internal lifecycle messages must not leak into the visible transcript.
9. Duplicate stream ownership must not create a hidden active stream that steals live/final events.
10. Terminal edge cases should be classified explicitly instead of producing misleading final UI.

## Non-goals for the first PR

The first implementation slice does not need to solve every related edge case:

- Queue composer behavior during compression can be handled later.
- A more explicit degraded/rebuild indicator during slow reattach can be handled later.
- Native SSE `Last-Event-ID` support can remain a follow-up.
- Compression-exhausted/no-final-answer and max-tool-call-limit terminal taxonomy can be refined in follow-up issues/PRs.
- Broader sidebar/session awareness improvements can remain separate.

## Acceptance criteria

A first implementation slice is acceptable if it demonstrates:

- Live prose and tool rows interleave in the main assistant timeline.
- Live does not prematurely show the final L1 summary.
- Final shows one L1 Activity summary above the answer.
- L1 is collapsed by default and expands to prose plus tool rows.
- L2 tool rows are quieter than prose and use readable short labels.
- L3 expansion contains full command/args/output.
- Auto Compression shows only as a running/live status and disappears from final settled content.
- Switching away and back to a running session can rebuild prose/tool content from replay rather than showing only an empty Running shell.
- Recovery-control/internal lifecycle messages remain hidden from the visible transcript.
- Duplicate same-session stream starts are prevented or safely reused.

## Suggested PR shape

The PR implementing this should stay focused on the live-to-final assistant reply lifecycle. Supporting fixes such as Automatic Compression display and duplicate stream ownership may be included when they are required to make the running-session experience coherent, but they should be framed as supporting edge-case fixes rather than the headline.

Use `Refs` rather than auto-close language unless the PR fully resolves the broader design space.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Redesign live-to-final assistant replies for running agent sessions #3400

Summary

Why this matters

Historical context

Product model

Live phase

Final phase

Design requirements

Non-goals for the first PR

Acceptance criteria

Suggested PR shape

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Redesign live-to-final assistant replies for running agent sessions #3400

Description

Summary

Why this matters

Historical context

Product model

Live phase

Final phase

Design requirements

Non-goals for the first PR

Acceptance criteria

Suggested PR shape

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions