Fix swebenchmultimodal timeout for large repo images#579
Closed
simonrosenberg wants to merge 4 commits intomainfrom
Closed
Fix swebenchmultimodal timeout for large repo images#579simonrosenberg wants to merge 4 commits intomainfrom
simonrosenberg wants to merge 4 commits intomainfrom
Conversation
SDK v1.15.0 changed the agent-server Dockerfile builder stage from --managed-python (self-contained venv) to --python-preference only-system (venv symlinks to system Python). This broke eval image builds because Dockerfile.agent-layer copies /agent-server from the builder into an eval-base image that doesn't have Python 3.13 at /usr/local/bin/, causing "no such file or directory" on every runtime pod. Fix: copy the Python 3.13 binary, stdlib, and shared library from the builder stage into the final image so the venv symlinks resolve. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
ldconfig fails in eval-base images that run as non-root user. Use LD_LIBRARY_PATH=/usr/local/lib instead to make libpython3.13 discoverable without needing root privileges. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The default api_timeout of 60s is too short for the ACP agent lazy initialization that happens on the first send_message() call. For large repo images (wp-calypso, p5.js, marked), the ACP subprocess startup consistently exceeds 60s, causing 100% failure rates even at resource_factor=8. Raising to 300s (configurable via REMOTE_API_TIMEOUT env var) gives the agent-server enough time to complete lazy init for large images. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Collaborator
all-hands-bot
left a comment
There was a problem hiding this comment.
🟡 Acceptable - Pragmatic timeout fix for a real, well-documented problem
Analysis:
- Code is clean: simple env var passthrough following the existing
startup_timeoutpattern - No complexity, no breaking changes, no security issues
- Problem evidence is solid (100% failure rates on large images)
- Acknowledged as a workaround (SDK #1095 for proper fix)
Issue:
- Test plan checkboxes are unchecked - no proof the fix actually works
Recommendation:
Before merging, run eval on at least one problematic repo (wp-calypso, p5.js, or marked) with REMOTE_API_TIMEOUT=300 and add evidence to the PR description showing it succeeds. For a 2-line timeout change the code is obviously correct, but proving it solves the real-world problem is not optional.
Verdict: ✅ Worth merging after verification
Key Insight: Obvious solution to a timeout problem - just prove it works in production.
…-images' into fix/swebenchmultimodal-api-timeout
Collaborator
Author
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
api_timeoutfrom default 60s to 300s for swebenchmultimodal remote workspacesREMOTE_API_TIMEOUTenv varProblem
The agent-server performs lazy initialization on the first
send_message()call — it spawns the ACP subprocess, does protocol handshake, and creates a session inside_ensure_agent_ready(). For large prebuilt images (wp-calypso, p5.js, marked), this initialization consistently exceeds the default 60sapi_timeout, causinghttpx.ReadTimeouton the client side.Evidence from eval runs on 2026-03-26 (
eval-23598243168-claude-son,eval-23600690440-claude-4-6):Key observations:
resource_factorescalation (1→2→4→8) does not help because it's a time issue, not a resource issue/healthendpoint passes (it's a no-opreturn "OK") but the server can't handle requests yetSDK issue for the proper fix (move init to startup): OpenHands/software-agent-sdk#2588
Test plan
send_messageREMOTE_API_TIMEOUTenv var override works🤖 Generated with Claude Code