Skip to content

Fix replay state for long-lived unified exec commands#187

Open
jflam wants to merge 3 commits intozed-industries:mainfrom
jflam:fix/unified-exec-process-terminality
Open

Fix replay state for long-lived unified exec commands#187
jflam wants to merge 3 commits intozed-industries:mainfrom
jflam:fix/unified-exec-process-terminality

Conversation

@jflam
Copy link
Copy Markdown

@jflam jflam commented Mar 13, 2026

I've been running the fix below locally for a few days now and it feels solid now. All the important parts were done by gpt-5.4/xhigh :)

Summary

This fixes a replay/resume bug for long-lived unified-exec commands in codex-acp and adds regression tests that capture the failure mode.

At a high level:

  • Replay now rehydrates unfinished unified-exec command state instead of treating startup output as terminal
  • write_stdin polling is routed back onto the original exec tool call during replay
  • Live terminal output / terminal end can attach to the replayed active command again
  • Regression tests cover both the replay-only path and the replay-then-live-terminal-end path

Problem

A long-lived exec_command started through unified exec persists a nonterminal startup response like Process running with session ID <id>. That command can then be polled via write_stdin and eventually complete with a real terminal exit.

When a session containing such a command is resumed through ACP replay:

  1. Replay reconstructs shell tool calls from ResponseItem::FunctionCall / ResponseItem::FunctionCallOutput
  2. The nonterminal unified-exec startup output is treated as if it were a completed tool result
  3. The original exec_command is effectively completed too early during replay
  4. The original active command state is not kept alive
  5. Later write_stdin poll output and/or the eventual live ExecCommandEnd have no active command to attach to

This leaves any ACP client in an indeterminate state — the command is still running, but replay has already collapsed the tool call, so later output and terminal completion cannot be mapped back to the original command.

Root cause

The root cause is in replay/resume, not the live unified-exec event contract.

The backend already uses a single real ExecCommandEnd for the long-lived PTY lifecycle. The problem is that replay rebuilds command state from ResponseItems in a way that cannot distinguish:

  • A startup/nonterminal unified-exec result (Process running with session ID ...)
  • A true terminal command completion

So replay synthesizes terminal completion too early and drops the state that later live events still need.

Fix

Treat replayed unified-exec commands as real in-flight command state:

  1. Replay keeps a retained PromptState for unfinished turns even without a live response channel
  2. Replay recognizes unified-exec exec_command calls and rehydrates the original active command
  3. Replay parses unified-exec FunctionCallOutput payloads to distinguish nonterminal startup output (with a session id) from terminal completion (with an exit code)
  4. Replay maps write_stdin polling back onto the original exec tool call by session/process id instead of materializing it as a separate logical command
  5. Replay only completes the tool call when it sees an actual terminal exit
  6. If replay is followed by live ExecCommandOutputDelta / ExecCommandEnd, those live events attach to the same replayed active command

This keeps the command lifecycle coherent across initial startup, intermediate polling, replay after interruption, and final live completion.

Regression tests

test_replay_unified_exec_routes_write_stdin_to_original_command

Replays an exec_command startup response, a write_stdin poll on the same session, and a terminal poll response with an exit code. Asserts that only the original exec_command tool call is treated as the command, that polls attach to it, and that completion lands on it.

test_replay_unified_exec_preserves_active_command_for_live_terminal_end

Replays a nonterminal unified-exec startup response, then feeds live ExecCommandOutputDelta and ExecCommandEnd events. Asserts that the replayed command remains active, live output lands on it, and the live terminal end completes it.

Both regressions go red on pre-fix code and green with the fix applied. Full test suite passes (19/19).

jflam and others added 3 commits March 11, 2026 08:21
Resolves merge conflicts between the unified exec replay fix and
upstream additions (abort_pending_interactions, exec approval tests,
MCP elicitation tests, blocked approval tests, shutdown tests).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@jflam jflam marked this pull request as ready for review March 13, 2026 18:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant