Skip to content

fix(app): bound e2e health checks with per-request timeout#2005

Merged
src-opn merged 1 commit into
devfrom
fix-tests
May 29, 2026
Merged

fix(app): bound e2e health checks with per-request timeout#2005
src-opn merged 1 commit into
devfrom
fix-tests

Conversation

@src-opn
Copy link
Copy Markdown
Collaborator

@src-opn src-opn commented May 29, 2026

Summary

Fixes the recurring Timed out waiting for /global/health: fetch failed failures in the OpenWork Tests workflow (macOS-14 runner). Example failing run: https://github.com/different-ai/openwork/actions/runs/26646854880

Root cause

In apps/app/scripts/_util.mjs, waitForHealthy() polled the server with no per-request timeout:

while (Date.now() - start < timeoutMs) {   // 10s
  const health = await client.global.health();  // no timeout
}

The while only re-checks the deadline between awaits. When opencode serve was slow to accept connections on the macOS runner, a single fetch to the not-yet-listening port blocked on the OS-level TCP connect timeout (tens of seconds, stacking), blowing far past the 10s budget. That's why failures showed a ~5-minute gap before erroring rather than ~10s. Linux runners reject/connect faster, so they passed. Empty stderr made it undiagnosable because server stdout was discarded.

Changes

  • _util.mjs: bound each health request with AbortSignal.timeout(5s); raise overall budget to 30s for slow CI cold starts.
  • _util.mjs: capture server stdout (stdio: ["ignore", "pipe", "pipe"]) and expose getStdout() / getExitInfo().
  • e2e.mjs, session-switch.mjs, fs-engine.mjs, browser-entry.mjs: include stdout + serverExit in failure output for debuggability.

Testing

Ran the full chain locally with the pinned opencode v1.15.12:

pnpm --filter @openwork/app test:e2e

Result: all scripts pass (local-file-path, e2e, session-switch, fs-engine, browser-entry) in ~20s. Could not reproduce the macOS-runner hang locally (the connect behavior differs), but the per-request timeout makes waitForHealthy respect its budget regardless of connect latency.

waitForHealthy polled /global/health with no per-request timeout, so a
single fetch to a not-yet-listening opencode port could block on the
OS-level TCP connect timeout and blow past the 10s budget (the loop only
re-checks the deadline between awaits). This caused recurring ~5-minute
"Timed out waiting for /global/health: fetch failed" failures on the
macOS-14 CI runner.

Bound each request with AbortSignal.timeout, raise the overall budget for
slow CI cold starts, and capture server stdout + exit info so future
failures are diagnosable instead of showing empty stderr.
@vercel
Copy link
Copy Markdown
Contributor

vercel Bot commented May 29, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
openwork-app Ready Ready Preview, Comment May 29, 2026 4:16pm
openwork-den Ready Ready Preview, Comment May 29, 2026 4:16pm
openwork-den-worker-proxy Ready Ready Preview, Comment May 29, 2026 4:16pm
openwork-landing Ready Ready Preview, Comment, Open in v0 May 29, 2026 4:16pm

Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 5 files

Tip: cubic could auto-approve low-risk PRs like this, if it thinks it's safe to merge. Learn more

Re-trigger cubic

@src-opn src-opn merged commit 9e5031d into dev May 29, 2026
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant