fix(acp): defer remote ACP init until run#2655
fix(acp): defer remote ACP init until run#2655simonrosenberg wants to merge 1 commit intofix/always-install-node22-for-acpfrom
Conversation
all-hands-bot
left a comment
There was a problem hiding this comment.
Taste Rating: 🟡 Acceptable
This solves a real production problem (HTTP timeouts from ACP subprocess bootstrap blocking send_message). The fix is pragmatic and the validation on swebench instances shows you're taking eval impact seriously.
However: This changes agent initialization timing, which falls into the "could affect benchmark performance" category per repo guidelines. Leaving as COMMENT for human maintainer to validate with lightweight evals before merge.
Key Insight: The special-case logic for ACPAgent in the conversation layer is a pragmatic trade-off. The cleaner design would be a defer_initialization property on AgentBase, but that might be over-engineering for a single use case. Sometimes solving real problems means living with a little ugliness.
| subprocess. Deferring that work to run() keeps send_message() fast and | ||
| avoids HTTP client read timeouts on the remote conversation endpoint. | ||
| """ | ||
| agent_kind = getattr(self.agent, "kind", self.agent.__class__.__name__) |
There was a problem hiding this comment.
🟡 Acceptable: Using getattr() with string comparison violates the repo guideline to "avoid getattr/hasattr guards and instead enforce type correctness."
The cleaner approach would be:
# In AgentBase:
defer_initialization: bool = False
# In ACPAgent:
defer_initialization: bool = True
# Here:
return not self.agent.defer_initializationHowever, your pragmatic choice is reasonable if:
- The
kindattribute is stable (used for serialization) - You're avoiding import cycles
- Adding a base property for one use case feels like over-engineering
This is the kind of trade-off where theory and practice clash. I'd lean toward the base property for maintainability, but I won't block on it given this solves a real production issue.
| conv.state.execution_status = ConversationExecutionStatus.FINISHED | ||
|
|
||
| with ( | ||
| patch.object(ACPAgent, "init_state", autospec=True) as mock_init_state, | ||
| patch.object( | ||
| ACPAgent, | ||
| "step", | ||
| autospec=True, | ||
| side_effect=_finish_immediately, | ||
| ) as mock_step, | ||
| ): |
There was a problem hiding this comment.
🟢 Acceptable: This test is entirely mock-based - you're not exercising real ACP initialization, just verifying the control flow.
Per repo guidelines, "prefer tests that exercise real code paths and assert on outputs/state." However, this is probably acceptable because:
- Real ACP subprocess initialization would be slow/flaky in unit tests
- The test does prove the key behavior: init is deferred to run()
- You're doing manual validation on real swebench instances
The test serves its purpose - it would catch if someone accidentally removed the deferral logic.
| def _should_initialize_agent_on_send_message(self) -> bool: | ||
| """Return whether send_message() should eagerly initialize the agent. | ||
|
|
||
| ACPAgent startup is substantially heavier than regular agent | ||
| initialization because it launches and handshakes with an external ACP | ||
| subprocess. Deferring that work to run() keeps send_message() fast and | ||
| avoids HTTP client read timeouts on the remote conversation endpoint. |
There was a problem hiding this comment.
🟡 Documentation: The docstring explains why we defer ACP init (performance, HTTP timeouts) but doesn't explain the consequence.
Consider adding: "For ACP agents, the agent will not be initialized until run() is called. The message is enqueued in conversation state, but agent internals will not be accessible until after run() begins."
This matters if anyone tries to inspect agent state between send_message() and run() - they'll see different behavior for ACP vs other agents.
Summary
run()instead ofsend_message()run()Why
This is a follow-up to the runtime timeout investigation in OpenHands/runtime-api#458.
The failing
POST /api/conversations/{id}/eventsrequest was reaching the runtime pod, but ACP eager initialization insidesend_message()could stall that request path while the ACP subprocess bootstrapped. Moving ACP init torun()removes that synchronous dependency from the event POST.This PR is intentionally stacked on top of #2652 because the branch currently being validated includes both:
Validation
PYTHONPATH=/Users/simonrosenberg/repositories/software-agent-sdk/openhands-sdk pytest -q /Users/simonrosenberg/repositories/software-agent-sdk/tests/sdk/conversation/local/test_conversation_send_message.pyswebenchmultimodalvalidation on branchvalidate/pr2652-acp-send-messageF15C96B3markedjs__marked-1262,processing__p5.js-3068,diegomura__react-pdf-1178Current validation result:
The read operation timed outsymptom has not reproducedRuntime not yet ready (status: not found)issue onmarkedjs__marked-1262, so this validates removal of the original failure mode but is not yet a clean end-to-end green run