Reduce EngineCore idle polling#552
Conversation
janhilgard
left a comment
There was a problem hiding this comment.
Code Review
CI all green. Clean diff (+108/-3), narrowly scoped to idle polling behavior.
What it does
Replaces the tight 1ms asyncio.sleep(step_interval) idle loop with a 100ms asyncio.Event.wait(timeout) pattern. When the scheduler has no requests, the engine loop waits up to 100ms but wakes immediately when add_request() signals new work via the event.
Assessment
Correctness: The _wait_for_idle_or_request helper correctly handles all edge cases:
request_event is None→ falls back toasyncio.sleep(backward compat for engines constructed without event)timeout <= 0→ yields withasyncio.sleep(0)- Normal path →
asyncio.wait_for(event.wait(), timeout)withTimeoutErrorsuppression - Event is cleared in
finallyblock to avoid stale signals
Latency impact: Zero impact on active generation — the idle interval only applies when scheduler.has_requests() returns False. New requests wake the loop immediately via _set_request_event, so request latency is bounded by event loop scheduling, not the 100ms timeout.
Defensive getattr: The code uses getattr(self, "_request_event", None) at all call sites. This handles the case where an EngineCore was constructed before this change (e.g., deserialized or subclassed without __init__). Slightly defensive but not wrong.
Resource savings: Reduces idle wakeups from ~1000/s to ~10/s with instant wake on new work. Meaningful for battery/thermal on Apple Silicon when the server is idle.
Tests: Two focused tests — one verifying the idle interval is used (not the active 1ms), one verifying add_request sets the event for immediate wake.
Minor notes (non-blocking)
-
The
step_intervalvariable is no longer read in_engine_loopafter this change (the active path usesasyncio.sleep(0), notasyncio.sleep(step_interval)). Thestep_intervalfield onEngineConfigis now effectively unused. Consider deprecating or documenting this in a follow-up. -
The
_clear_request_eventcall at the top of thehas_requests()branch is correct — it prevents a stale event from being set during processing, ensuring the next idle wait actually blocks.
LGTM.
|
The event logic looks right, I went hunting for a lost-wakeup race and didn't find one ( First, the tests never exercise the new event-driven path. This test forces the fallback: # test_engine_loop_uses_idle_interval_when_scheduler_is_empty
engine._request_event = None # forces the plain-sleep fallback,
# so _wait_for_idle_or_request's event wait is never coveredand async def test_idle_loop_wakes_on_add_request():
engine = EngineCore(..., idle_step_interval=5.0) # long idle sleep on purpose
await engine.start()
await asyncio.sleep(0.05) # loop is now parked in the idle wait
t0 = time.monotonic()
await engine.add_request(request) # should wake it immediately
_ = await get_first_output(engine, request_id)
assert time.monotonic() - t0 < 1.0 # would be ~5s if the wakeup were lostSecond, this changes One minor note while you're in there: |
Summary
Implements the light-tier idle polling part of #508.
EngineConfig.idle_step_intervalwith a 100ms default for the empty-scheduler path.add_request()sets so the idle loop wakes immediately when new work arrives.Observed behavior
The current empty-scheduler branch sleeps on the active
step_intervalvalue, defaulting to 1ms. That keeps the serve loop waking roughly 1000 times per second even when there is no queued work.Expected behavior
When the scheduler is empty, the loop should wait on a lower-frequency idle interval while still waking promptly on a new request.
Minimal patch shape
This is intentionally limited to
EngineCoreidle-loop behavior and regression tests. It does not add a CLI flag or new public API surface beyond theEngineConfigfield.Validation
uv run --python 3.12 --extra dev pytest tests/test_engine_core_idle_polling.py tests/test_engine_core_thread_streams.py tests/test_batching.py::TestEngineAsync::test_engine_lifecycle tests/test_batching.py::TestEngineAsync::test_engine_context_manager tests/test_batching.py::TestEngineAsync::test_stream_outputs_consumer_break_after_finished_does_not_abort -q-> 7 passeduv run --python 3.12 --extra dev ruff check vllm_mlx/engine_core.py tests/test_engine_core_idle_polling.py --select E,F,W --ignore E402,E501,E731,F811,F841-> passeduv run --python 3.12 --extra dev black --check --target-version py312 vllm_mlx/engine_core.py tests/test_engine_core_idle_polling.py-> passedpython3 /opt/ai-runtime/bin/lint-upstream-claims --root /opt/ai-runtime/worktrees/vllm-mlx/issue-508-adaptive-idle-polling vllm_mlx/engine_core.py tests/test_engine_core_idle_polling.py-> passedgit diff --check-> passedNot claimed
/healthstate transitions, or memory release semantics.