Skip to content

Post-restart native session continuity fails: prompt timeout after restart and after resumeSession (opencode, 0.2.0) #189

@handfuloflight

Description

@handfuloflight

Summary

After restarting sandbox-agent, gateway, or rivetkit, native session continuity appears broken for prompt execution in our Sprite environment:

  • promptSameHandleAfterRestart times out
  • resumeSession + getEvents can succeed
  • promptAfterResume still times out

Environment

  • sandbox-agent CLI/server: 0.2.0
  • SDK: sandbox-agent TypeScript package 0.2.0
  • Agent under test: opencode
  • Sandbox provider: Sprites (Fly)
  • Gateway route: https://<sprite>.sprites.app/sandbox

Repro (phase-level)

  1. Create session and prompt successfully.
  2. Restart one target (sandbox-agent OR gateway OR rivetkit).
  3. Prompt on same session handle (promptSameHandleAfterRestart) -> timeout.
  4. Reconnect + call getEvents -> success.
  5. Call resumeSession -> success.
  6. Prompt after resume (promptAfterResume) -> timeout.

Observed signatures

  • TypeError: terminated (unhandled rejection around restart window)
  • ACP client is closed

Results (inline)

Native continuity diagnostic (1 run/target)

  • total: 0/3 success
  • failure boundary phase: promptSameHandleAfterRestart in 3/3
  • per target:
    • sandbox-agent: same-handle prompt ❌, reconnect/getEvents ✅, resumeSession ✅, post-resume prompt ❌
    • gateway: same-handle prompt ❌, reconnect/getEvents ✅, resumeSession ✅, post-resume prompt ❌
    • rivet: same-handle prompt ❌, reconnect/getEvents ✅, resumeSession ✅, post-resume prompt ❌

Fallback contrast (recreate + rehydrate)

If we do not reuse the pre-restart session handle, and instead:

  • create a fresh post-restart session
  • rehydrate from a client-side transcript envelope

then matrix result is 6/6 success (sandbox-agent 2/2, gateway 2/2, rivet 2/2).

Minimal log excerpts

Error: promptSameHandleAfterRestart timed out after 60000ms
TypeError: terminated
ACP write error: Error: ACP client is closed
Error: promptAfterResume timed out after 60000ms
(recreate + rehydrate path)
createAfter: ok
promptRehydrate: ok (stopReason=end_turn)

Note on prior links

The previous issue body included links to a private repo by mistake. I removed those and pasted the key evidence inline here.

Ask

Can you confirm whether this is expected in 0.2.0 during restart transport churn, or a bug in native session continuity/rebind after restart? If helpful, I can provide a standalone public repro script with no private dependencies.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions