Fix swebenchmultimodal timeout for large repo images by simonrosenberg · Pull Request #579 · OpenHands/benchmarks

simonrosenberg · 2026-03-26T21:35:29Z

Summary

Increases api_timeout from default 60s to 300s for swebenchmultimodal remote workspaces
Configurable via REMOTE_API_TIMEOUT env var

Problem

The agent-server performs lazy initialization on the first send_message() call — it spawns the ACP subprocess, does protocol handshake, and creates a session inside _ensure_agent_ready(). For large prebuilt images (wp-calypso, p5.js, marked), this initialization consistently exceeds the default 60s api_timeout, causing httpx.ReadTimeout on the client side.

Evidence from eval runs on 2026-03-26 (eval-23598243168-claude-son, eval-23600690440-claude-4-6):

Repo	Timeout failure rate	Notes
Automattic/wp-calypso	100% (37/37)	Init always >> 60s
processing/p5.js	100% (16/16)	Init always >> 60s
markedjs/marked	100% (11/11)	Init always >> 60s
diegomura/react-pdf	~56% (5/9)	Init ~60s (borderline)
chartjs/Chart.js	~29% (10/34)	Init usually < 60s

Key observations:

Zero recovery across retries (each retry creates a fresh pod → same timeout)
resource_factor escalation (1→2→4→8) does not help because it's a time issue, not a resource issue
The /health endpoint passes (it's a no-op return "OK") but the server can't handle requests yet

SDK issue for the proper fix (move init to startup): OpenHands/software-agent-sdk#2588

Test plan

Run swebenchmultimodal with ACP agent on large-image instances (wp-calypso, p5.js, marked) and verify they no longer timeout during send_message
Verify REMOTE_API_TIMEOUT env var override works

🤖 Generated with Claude Code

SDK v1.15.0 changed the agent-server Dockerfile builder stage from --managed-python (self-contained venv) to --python-preference only-system (venv symlinks to system Python). This broke eval image builds because Dockerfile.agent-layer copies /agent-server from the builder into an eval-base image that doesn't have Python 3.13 at /usr/local/bin/, causing "no such file or directory" on every runtime pod. Fix: copy the Python 3.13 binary, stdlib, and shared library from the builder stage into the final image so the venv symlinks resolve. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

ldconfig fails in eval-base images that run as non-root user. Use LD_LIBRARY_PATH=/usr/local/lib instead to make libpython3.13 discoverable without needing root privileges. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The default api_timeout of 60s is too short for the ACP agent lazy initialization that happens on the first send_message() call. For large repo images (wp-calypso, p5.js, marked), the ACP subprocess startup consistently exceeds 60s, causing 100% failure rates even at resource_factor=8. Raising to 300s (configurable via REMOTE_API_TIMEOUT env var) gives the agent-server enough time to complete lazy init for large images. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

all-hands-bot

🟡 Acceptable - Pragmatic timeout fix for a real, well-documented problem

Analysis:

Code is clean: simple env var passthrough following the existing startup_timeout pattern
No complexity, no breaking changes, no security issues
Problem evidence is solid (100% failure rates on large images)
Acknowledged as a workaround (SDK #1095 for proper fix)

Issue:

Test plan checkboxes are unchecked - no proof the fix actually works

Recommendation:
Before merging, run eval on at least one problematic repo (wp-calypso, p5.js, or marked) with REMOTE_API_TIMEOUT=300 and add evidence to the PR description showing it succeeds. For a 2-line timeout change the code is obviously correct, but proving it solves the real-world problem is not optional.

Verdict: ✅ Worth merging after verification

Key Insight: Obvious solution to a timeout problem - just prove it works in production.

…-images' into fix/swebenchmultimodal-api-timeout

simonrosenberg · 2026-04-02T19:19:36Z

OpenHands/software-agent-sdk#2656

Debug Agent and others added 3 commits March 26, 2026 16:33

simonrosenberg mentioned this pull request Mar 26, 2026

Move agent lazy initialization from send_message() to server startup OpenHands/software-agent-sdk#2588

Open

all-hands-bot reviewed Mar 26, 2026

View reviewed changes

Merge remote-tracking branch 'origin/fix/copy-python-runtime-for-eval…

5158fa3

…-images' into fix/swebenchmultimodal-api-timeout

simonrosenberg closed this Apr 2, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix swebenchmultimodal timeout for large repo images#579

Fix swebenchmultimodal timeout for large repo images#579
simonrosenberg wants to merge 4 commits intomainfrom
fix/swebenchmultimodal-api-timeout

simonrosenberg commented Mar 26, 2026 •

edited

Loading

Uh oh!

all-hands-bot left a comment

Uh oh!

simonrosenberg commented Apr 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

simonrosenberg commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Test plan

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

Uh oh!

simonrosenberg commented Apr 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

simonrosenberg commented Mar 26, 2026 •

edited

Loading