Skip to content

Fix ZMQ env timeouts#1355

Merged
xeophon merged 1 commit into
mainfrom
fix/zmq-env-request-timeout
May 12, 2026
Merged

Fix ZMQ env timeouts#1355
xeophon merged 1 commit into
mainfrom
fix/zmq-env-request-timeout

Conversation

@xeophon
Copy link
Copy Markdown
Member

@xeophon xeophon commented May 12, 2026

Summary

  • make the ZMQ env request timeout configurable internally, with a 24h default
  • derive the env-server request timeout from existing timeout_minutes when present, or from hosted-runner PRIME_EVAL_TIMEOUT_MINUTES when running from CLI flags
  • record ZMQ env request timeouts as failed samples with stop_condition="env_request_timeout" instead of aborting the whole eval, while leaving unrelated TimeoutErrors untouched

A companion platform change is prepared locally to set PRIME_EVAL_TIMEOUT_MINUTES in hosted eval runner sandboxes from the existing timeout_minutes field.

Verification

  • uv run pytest tests/test_eval_cli.py tests/test_env_server.py tests/test_environment_extra.py — 104 passed
  • uv run ruff check verifiers/types.py verifiers/utils/eval_utils.py verifiers/scripts/eval.py tests/test_eval_cli.py tests/test_env_server.py tests/test_environment_extra.py docs/evaluation.md docs/reference.md
  • previous full uv run pytest tests/ had 1418 passed, 29 skipped, 3 unrelated failures: missing OpenAI key for toxicity_explanation/wiki_search and unsatisfiable bfcl_v3 dependency on mistralai

Note

Medium Risk
Changes ZMQ env request behavior by removing the 10h hardcoded default timeout, which can alter how long evals wait for env responses (potentially hanging if callers don’t supply a timeout). Scope is small and isolated to the ZMQ client.

Overview
Removes the hardcoded 10h DEFAULT_REQUEST_TIMEOUT in ZMQEnvClient and replaces it with None, making env request timeouts unlimited by default unless an explicit timeout is passed to send_request.

Reviewed by Cursor Bugbot for commit 64f2b92. Bugbot is set up for automated code reviews on this repo. Configure here.

@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, add credits to your account and enable them for code reviews in your settings.

Comment thread verifiers/scripts/eval.py Outdated
Comment thread docs/reference.md Outdated
@xeophon xeophon force-pushed the fix/zmq-env-request-timeout branch from 99f0ba8 to 33caa2f Compare May 12, 2026 12:41
Comment thread verifiers/scripts/eval.py Outdated
Comment thread verifiers/scripts/eval.py Outdated
@xeophon
Copy link
Copy Markdown
Member Author

xeophon commented May 12, 2026

Addressed the review comments in f37b555:

  • documented timeout_minutes in docs/evaluation.md, including how hosted runner PRIME_EVAL_TIMEOUT_MINUTES maps to env-server request timeout behavior
  • updated skills/evaluate-environments/SKILL.md with the hosted timeout workflow and distinction from per-rollout --timeout
  • the original --env-request-timeout CLI flag comment is stale because that flag was removed in favor of the existing hosted timeout budget

Verification:

  • uv run pytest tests/test_eval_cli.py tests/test_env_server.py tests/test_environment_extra.py — 104 passed
  • uv run ruff check verifiers/types.py verifiers/utils/eval_utils.py verifiers/scripts/eval.py tests/test_eval_cli.py tests/test_env_server.py tests/test_environment_extra.py — passed

@xeophon xeophon changed the title Handle ZMQ env request timeouts gracefully Fix ZMQ env timeouts May 12, 2026
@xeophon xeophon force-pushed the fix/zmq-env-request-timeout branch from f37b555 to 64f2b92 Compare May 12, 2026 16:24
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 64f2b92. Configure here.

"""ZMQ-based environment client."""

DEFAULT_REQUEST_TIMEOUT = 36_000 # 10h
DEFAULT_REQUEST_TIMEOUT: float | None = None
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default timeout removed, requests can hang forever

High Severity

DEFAULT_REQUEST_TIMEOUT was changed from 36_000 (10 hours) to None, removing the safety-net timeout entirely. The PR description states "with a 24h default," but None means asyncio.wait_for waits indefinitely. Since all callers (EnvClient.run_rollout and run_group) pass timeout=None, and nothing in the codebase overrides DEFAULT_REQUEST_TIMEOUT on instances, every ZMQ request now blocks forever if the server fails to respond.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 64f2b92. Configure here.

@xeophon xeophon merged commit cf0d9ac into main May 12, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants