Summary
The diegomura__react-pdf-1178 instance in swebenchmultimodal-dev consistently fails
runtime init across multiple models and runs, even after the runtime-api / sdk fix for
#523 (OpenHands/software-agent-sdk#2656). This is deterministic, not transient: every
attempt in every run fails with the same 502→503 runtime init pattern.
Two sibling instances (react-pdf-1280, react-pdf-1552) also fail frequently but
intermittently — they sometimes succeed on retry.
Evidence
Run 1: 2026-04-06, opus4.6 + sonnet4.5 (evaluation run 24040981135)
| Model |
Job |
react-pdf-1178 |
react-pdf-1280 |
react-pdf-1552 |
| claude-4.6-opus |
eval-24040981135-claude-4-6 |
❌ failed 4x |
✅ (on retry) |
✅ (on retry) |
| claude-sonnet-4-5-20250929 |
eval-24040981135-claude-son |
❌ failed 4x |
❌ failed 4x |
❌ failed 4x |
Run 2: 2026-04-04, gemini-3.1 (jobs eval-23983344719-gemini-3-1, eval-23968391657-gemini-3-1)
react-pdf-1178 failed on both runs (Datadog logs confirm failed after 4 attempts).
So: 3 distinct models × at least 3 eval runs over 3 days — react-pdf-1178 failed every
time.
Error pattern
From kube_job:eval-24040981135-claude-4-6 (Opus), instance diegomura__react-pdf-1178:
2026-04-06 19:43:17 ERROR HTTP request failed (502 Bad Gateway): ...
url: https://vqjiqojpacbxmmaw.eval-runtime.all-hands.dev/api/acp/conversations/9e52265f-...
2026-04-06 19:43:19 ERROR HTTP request failed (503 Service Unavailable): ...
... (20+ seconds of 1/s polling with 503s) ...
2026-04-06 19:43:37 WARNING [worker] runtime init failure instance=diegomura__react-pdf-1178
attempt=1 retry=1 runtime_id=vqjiqojpacbxmmaw
error=Conversation run failed ...: Remote conversation ended with error
... (3 more attempts with different runtime_ids, same pattern) ...
2026-04-06 21:59:16 ERROR [worker] Instance diegomura__react-pdf-1178 failed after 4 attempts.
Each attempt spins up a fresh runtime pod (new runtime_id), hits 502/503 on /api/acp/conversations/..., polls for 20+ seconds, gives up, and moves on. After 4 such attempts the
worker marks it failed.
Suspected root cause
Likely the same family as #352 (still OPEN): the
react-pdf-1178 image is slow to boot (or has a startup issue), the agent-server startup
probe on port 60000 fails, the pod is force-stopped, and the client sees 502/503 on
/api/acp/conversations/....
Unlike #523 (which was about large wp-calypso/p5.js images and was fixed by sdk#2656),
this one is narrower — a single react-pdf instance that consistently fails. Possibilities:
- Broken image:
react-pdf-1178 Dockerfile produces an image that never becomes
healthy (e.g. node version regression — see root_cause_acp_node.md in memory; Node
v12–v14 crashes claude-agent-acp).
- Unreasonable startup time: image starts but takes longer than the startup probe
window.
- ACP-specific init bug: the instance repo requires setup that ACP agents trip on.
Repro
/trigger-benchmarks swebenchmultimodal agent_type=acp-claude model_ids=claude-4.6-opus \
eval_limit=500 reason="repro react-pdf-1178 failure"
Expected: react-pdf-1178 fails after 4 runtime init attempts.
Proposed next steps
- Inspect the SWE-bench-M image for
diegomura/react-pdf-1178 — manually pull it,
start agent-server in it, verify /server_info on port 60000 comes up.
- Check if Node.js is old in that image (see known ACP/Node issue); force Node 22
install if missing.
- If image is fundamentally broken, consider blacklisting the 3 flaky instances
(1178, 1280, 1552) or filing upstream with SWE-bench-M.
- Add a fast-fail path in the runtime init polling: if 5xx persists for >5s on the
first attempt, abort that attempt instead of polling for 20s (saves ~60s per failed
instance — currently ~90s of wall-clock wasted on retries).
References
Summary
The
diegomura__react-pdf-1178instance in swebenchmultimodal-dev consistently failsruntime init across multiple models and runs, even after the runtime-api / sdk fix for
#523 (OpenHands/software-agent-sdk#2656). This is deterministic, not transient: every
attempt in every run fails with the same 502→503 runtime init pattern.
Two sibling instances (
react-pdf-1280,react-pdf-1552) also fail frequently butintermittently — they sometimes succeed on retry.
Evidence
Run 1: 2026-04-06, opus4.6 + sonnet4.5 (evaluation run 24040981135)
react-pdf-1178react-pdf-1280react-pdf-1552eval-24040981135-claude-4-6eval-24040981135-claude-sonRun 2: 2026-04-04, gemini-3.1 (jobs
eval-23983344719-gemini-3-1,eval-23968391657-gemini-3-1)react-pdf-1178failed on both runs (Datadog logs confirmfailed after 4 attempts).So: 3 distinct models × at least 3 eval runs over 3 days —
react-pdf-1178failed everytime.
Error pattern
From
kube_job:eval-24040981135-claude-4-6(Opus), instancediegomura__react-pdf-1178:Each attempt spins up a fresh runtime pod (new
runtime_id), hits 502/503 on/api/acp/conversations/..., polls for 20+ seconds, gives up, and moves on. After 4 such attempts theworker marks it failed.
Suspected root cause
Likely the same family as #352 (still OPEN): the
react-pdf-1178image is slow to boot (or has a startup issue), the agent-server startupprobe on port 60000 fails, the pod is force-stopped, and the client sees 502/503 on
/api/acp/conversations/....Unlike #523 (which was about large wp-calypso/p5.js images and was fixed by sdk#2656),
this one is narrower — a single
react-pdfinstance that consistently fails. Possibilities:react-pdf-1178Dockerfile produces an image that never becomeshealthy (e.g. node version regression — see
root_cause_acp_node.mdin memory; Nodev12–v14 crashes
claude-agent-acp).window.
Repro
Expected:
react-pdf-1178fails after 4 runtime init attempts.Proposed next steps
diegomura/react-pdf-1178— manually pull it,start
agent-serverin it, verify/server_infoon port 60000 comes up.install if missing.
(1178, 1280, 1552) or filing upstream with SWE-bench-M.
first attempt, abort that attempt instead of polling for 20s (saves ~60s per failed
instance — currently ~90s of wall-clock wasted on retries).
References
software-agent-sdk#2656)
E698D406)