Conversation
waitForHealthy polled /global/health with no per-request timeout, so a single fetch to a not-yet-listening opencode port could block on the OS-level TCP connect timeout and blow past the 10s budget (the loop only re-checks the deadline between awaits). This caused recurring ~5-minute "Timed out waiting for /global/health: fetch failed" failures on the macOS-14 CI runner. Bound each request with AbortSignal.timeout, raise the overall budget for slow CI cold starts, and capture server stdout + exit info so future failures are diagnosable instead of showing empty stderr.
Contributor
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
There was a problem hiding this comment.
No issues found across 5 files
Tip: cubic could auto-approve low-risk PRs like this, if it thinks it's safe to merge. Learn more
Re-trigger cubic
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes the recurring
Timed out waiting for /global/health: fetch failedfailures in the OpenWork Tests workflow (macOS-14 runner). Example failing run: https://github.com/different-ai/openwork/actions/runs/26646854880Root cause
In
apps/app/scripts/_util.mjs,waitForHealthy()polled the server with no per-request timeout:The
whileonly re-checks the deadline between awaits. Whenopencode servewas slow to accept connections on the macOS runner, a singlefetchto the not-yet-listening port blocked on the OS-level TCP connect timeout (tens of seconds, stacking), blowing far past the 10s budget. That's why failures showed a ~5-minute gap before erroring rather than ~10s. Linux runners reject/connect faster, so they passed. Emptystderrmade it undiagnosable because server stdout was discarded.Changes
_util.mjs: bound each health request withAbortSignal.timeout(5s); raise overall budget to 30s for slow CI cold starts._util.mjs: capture server stdout (stdio: ["ignore", "pipe", "pipe"]) and exposegetStdout()/getExitInfo().e2e.mjs,session-switch.mjs,fs-engine.mjs,browser-entry.mjs: includestdout+serverExitin failure output for debuggability.Testing
Ran the full chain locally with the pinned opencode
v1.15.12:Result: all scripts pass (
local-file-path,e2e,session-switch,fs-engine,browser-entry) in ~20s. Could not reproduce the macOS-runner hang locally (the connect behavior differs), but the per-request timeout makeswaitForHealthyrespect its budget regardless of connect latency.