Reduce EngineCore idle polling by Thump604 · Pull Request #552 · waybarrios/vllm-mlx

Thump604 · 2026-05-20T00:31:18Z

Summary

Implements the light-tier idle polling part of #508.

Adds EngineConfig.idle_step_interval with a 100ms default for the empty-scheduler path.
Adds a request event that add_request() sets so the idle loop wakes immediately when new work arrives.
Keeps the active generation path unchanged: scheduler steps still run as before when requests are present.

Observed behavior

The current empty-scheduler branch sleeps on the active step_interval value, defaulting to 1ms. That keeps the serve loop waking roughly 1000 times per second even when there is no queued work.

Expected behavior

When the scheduler is empty, the loop should wait on a lower-frequency idle interval while still waking promptly on a new request.

Minimal patch shape

This is intentionally limited to EngineCore idle-loop behavior and regression tests. It does not add a CLI flag or new public API surface beyond the EngineConfig field.

Validation

uv run --python 3.12 --extra dev pytest tests/test_engine_core_idle_polling.py tests/test_engine_core_thread_streams.py tests/test_batching.py::TestEngineAsync::test_engine_lifecycle tests/test_batching.py::TestEngineAsync::test_engine_context_manager tests/test_batching.py::TestEngineAsync::test_stream_outputs_consumer_break_after_finished_does_not_abort -q -> 7 passed
uv run --python 3.12 --extra dev ruff check vllm_mlx/engine_core.py tests/test_engine_core_idle_polling.py --select E,F,W --ignore E402,E501,E731,F811,F841 -> passed
uv run --python 3.12 --extra dev black --check --target-version py312 vllm_mlx/engine_core.py tests/test_engine_core_idle_polling.py -> passed
python3 /opt/ai-runtime/bin/lint-upstream-claims --root /opt/ai-runtime/worktrees/vllm-mlx/issue-508-adaptive-idle-polling vllm_mlx/engine_core.py tests/test_engine_core_idle_polling.py -> passed
git diff --check -> passed

Not claimed

Does not implement lazy load, idle unload, /health state transitions, or memory release semantics.
Does not change model loading, decoding, scheduling policy, streaming semantics, or request output behavior.
Does not claim a measured CPU reduction; it only changes the empty-scheduler wait behavior and proves request wakeup semantics in unit tests.

janhilgard

Code Review

CI all green. Clean diff (+108/-3), narrowly scoped to idle polling behavior.

What it does

Replaces the tight 1ms asyncio.sleep(step_interval) idle loop with a 100ms asyncio.Event.wait(timeout) pattern. When the scheduler has no requests, the engine loop waits up to 100ms but wakes immediately when add_request() signals new work via the event.

Assessment

Correctness: The _wait_for_idle_or_request helper correctly handles all edge cases:

request_event is None → falls back to asyncio.sleep (backward compat for engines constructed without event)
timeout <= 0 → yields with asyncio.sleep(0)
Normal path → asyncio.wait_for(event.wait(), timeout) with TimeoutError suppression
Event is cleared in finally block to avoid stale signals

Latency impact: Zero impact on active generation — the idle interval only applies when scheduler.has_requests() returns False. New requests wake the loop immediately via _set_request_event, so request latency is bounded by event loop scheduling, not the 100ms timeout.

Defensive getattr: The code uses getattr(self, "_request_event", None) at all call sites. This handles the case where an EngineCore was constructed before this change (e.g., deserialized or subclassed without __init__). Slightly defensive but not wrong.

Resource savings: Reduces idle wakeups from ~1000/s to ~10/s with instant wake on new work. Meaningful for battery/thermal on Apple Silicon when the server is idle.

Tests: Two focused tests — one verifying the idle interval is used (not the active 1ms), one verifying add_request sets the event for immediate wake.

Minor notes (non-blocking)

The step_interval variable is no longer read in _engine_loop after this change (the active path uses asyncio.sleep(0), not asyncio.sleep(step_interval)). The step_interval field on EngineConfig is now effectively unused. Consider deprecating or documenting this in a follow-up.
The _clear_request_event call at the top of the has_requests() branch is correct — it prevents a stale event from being set during processing, ensuring the next idle wait actually blocks.

LGTM.

waybarrios · 2026-06-11T21:05:14Z

The event logic looks right, I went hunting for a lost-wakeup race and didn't find one (asyncio.Event.wait() returns immediately if the event is already set, so the idle-to-active transition is safe). Two things before this can merge though.

First, the tests never exercise the new event-driven path. This test forces the fallback:

# test_engine_loop_uses_idle_interval_when_scheduler_is_empty
engine._request_event = None   # forces the plain-sleep fallback,
                               # so _wait_for_idle_or_request's event wait is never covered

and test_add_request_wakes_idle_engine_loop only checks engine._request_event.is_set() without the loop running, so nothing verifies the loop actually wakes early. Something like this would cover the real behavior:

async def test_idle_loop_wakes_on_add_request():
    engine = EngineCore(..., idle_step_interval=5.0)  # long idle sleep on purpose
    await engine.start()
    await asyncio.sleep(0.05)            # loop is now parked in the idle wait
    t0 = time.monotonic()
    await engine.add_request(request)    # should wake it immediately
    _ = await get_first_output(engine, request_id)
    assert time.monotonic() - t0 < 1.0   # would be ~5s if the wakeup were lost

Second, this changes _engine_loop, which is hot path, so it needs measured numbers per the repo's PR rules: TTFT and decode tok/s for main vs this branch on the same hardware (batch=1 and batch=8), plus the idle CPU% delta since that's the headline claim. The PR body currently opts out of measurement, which doesn't fly for this file.

One minor note while you're in there: abort_request doesn't set the event, so an abort that empties the scheduler leaves the loop sleeping for up to 100ms (it used to notice in about 1ms). Probably harmless, but worth a code comment so nobody trips on it later.

Reduce EngineCore idle polling

481ca22

Thump604 requested a review from janhilgard May 20, 2026 00:31

Thump604 assigned janhilgard May 20, 2026

janhilgard approved these changes May 20, 2026

View reviewed changes

Thump604 requested a review from waybarrios May 20, 2026 12:08

Thump604 assigned waybarrios May 20, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce EngineCore idle polling#552

Reduce EngineCore idle polling#552
Thump604 wants to merge 1 commit into
waybarrios:mainfrom
Thump604:604/issue-508-adaptive-idle-polling

Thump604 commented May 20, 2026

Uh oh!

janhilgard left a comment

Uh oh!

waybarrios commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Thump604 commented May 20, 2026

Summary

Observed behavior

Expected behavior

Minimal patch shape

Validation

Not claimed

Uh oh!

janhilgard left a comment

Choose a reason for hiding this comment

Code Review

What it does

Assessment

Minor notes (non-blocking)

Uh oh!

waybarrios commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants