Skip to content

fix(acp): defer remote ACP init until run#2655

Closed
simonrosenberg wants to merge 1 commit intofix/always-install-node22-for-acpfrom
validate/pr2652-acp-send-message
Closed

fix(acp): defer remote ACP init until run#2655
simonrosenberg wants to merge 1 commit intofix/always-install-node22-for-acpfrom
validate/pr2652-acp-send-message

Conversation

@simonrosenberg
Copy link
Copy Markdown
Collaborator

Summary

  • defer ACP agent initialization until run() instead of send_message()
  • keep the conversation event POST path from blocking on ACP bootstrap
  • add a regression test that verifies ACP init is deferred until run()

Why

This is a follow-up to the runtime timeout investigation in OpenHands/runtime-api#458.

The failing POST /api/conversations/{id}/events request was reaching the runtime pod, but ACP eager initialization inside send_message() could stall that request path while the ACP subprocess bootstrapped. Moving ACP init to run() removes that synchronous dependency from the event POST.

This PR is intentionally stacked on top of #2652 because the branch currently being validated includes both:

Validation

  • PYTHONPATH=/Users/simonrosenberg/repositories/software-agent-sdk/openhands-sdk pytest -q /Users/simonrosenberg/repositories/software-agent-sdk/tests/sdk/conversation/local/test_conversation_send_message.py
  • targeted swebenchmultimodal validation on branch validate/pr2652-acp-send-message
  • correlation id: F15C96B3
  • instances: markedjs__marked-1262, processing__p5.js-3068, diegomura__react-pdf-1178

Current validation result:

  • the old The read operation timed out symptom has not reproduced
  • the run progressed into ACP initialization / prompt send
  • the run later hit a separate Runtime not yet ready (status: not found) issue on markedjs__marked-1262, so this validates removal of the original failure mode but is not yet a clean end-to-end green run

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 1, 2026

Coverage

Coverage Report •
FileStmtsMissCoverMissing
openhands-sdk/openhands/sdk/conversation/impl
   local_conversation.py4042693%292, 297, 325, 368, 386, 402, 467, 660–661, 664, 816, 824, 826, 830–831, 842, 844–846, 871, 943, 1069, 1073, 1143, 1150–1151
TOTAL21366623170% 

Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taste Rating: 🟡 Acceptable

This solves a real production problem (HTTP timeouts from ACP subprocess bootstrap blocking send_message). The fix is pragmatic and the validation on swebench instances shows you're taking eval impact seriously.

However: This changes agent initialization timing, which falls into the "could affect benchmark performance" category per repo guidelines. Leaving as COMMENT for human maintainer to validate with lightweight evals before merge.

Key Insight: The special-case logic for ACPAgent in the conversation layer is a pragmatic trade-off. The cleaner design would be a defer_initialization property on AgentBase, but that might be over-engineering for a single use case. Sometimes solving real problems means living with a little ugliness.

subprocess. Deferring that work to run() keeps send_message() fast and
avoids HTTP client read timeouts on the remote conversation endpoint.
"""
agent_kind = getattr(self.agent, "kind", self.agent.__class__.__name__)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Acceptable: Using getattr() with string comparison violates the repo guideline to "avoid getattr/hasattr guards and instead enforce type correctness."

The cleaner approach would be:

# In AgentBase:
defer_initialization: bool = False

# In ACPAgent:
defer_initialization: bool = True

# Here:
return not self.agent.defer_initialization

However, your pragmatic choice is reasonable if:

  1. The kind attribute is stable (used for serialization)
  2. You're avoiding import cycles
  3. Adding a base property for one use case feels like over-engineering

This is the kind of trade-off where theory and practice clash. I'd lean toward the base property for maintainability, but I won't block on it given this solves a real production issue.

Comment on lines +171 to +181
conv.state.execution_status = ConversationExecutionStatus.FINISHED

with (
patch.object(ACPAgent, "init_state", autospec=True) as mock_init_state,
patch.object(
ACPAgent,
"step",
autospec=True,
side_effect=_finish_immediately,
) as mock_step,
):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Acceptable: This test is entirely mock-based - you're not exercising real ACP initialization, just verifying the control flow.

Per repo guidelines, "prefer tests that exercise real code paths and assert on outputs/state." However, this is probably acceptable because:

  1. Real ACP subprocess initialization would be slow/flaky in unit tests
  2. The test does prove the key behavior: init is deferred to run()
  3. You're doing manual validation on real swebench instances

The test serves its purpose - it would catch if someone accidentally removed the deferral logic.

Comment on lines +487 to +493
def _should_initialize_agent_on_send_message(self) -> bool:
"""Return whether send_message() should eagerly initialize the agent.

ACPAgent startup is substantially heavier than regular agent
initialization because it launches and handshakes with an external ACP
subprocess. Deferring that work to run() keeps send_message() fast and
avoids HTTP client read timeouts on the remote conversation endpoint.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Documentation: The docstring explains why we defer ACP init (performance, HTTP timeouts) but doesn't explain the consequence.

Consider adding: "For ACP agents, the agent will not be initialized until run() is called. The message is enqueued in conversation state, but agent internals will not be accessible until after run() begins."

This matters if anyone tries to inspect agent state between send_message() and run() - they'll see different behavior for ACP vs other agents.

@simonrosenberg
Copy link
Copy Markdown
Collaborator Author

Superseded by #2656, which carries the same validated code as a single PR against main instead of a stacked PR on top of #2652.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants