fix(ssd-cache): inline LRU unlinks so eviction frees queue capacity#1451
Merged
Conversation
a9792b3 to
6577237
Compare
Closed
5 tasks
0a8a1d0 to
1357c6a
Compare
869206a to
22e86ec
Compare
``_enforce_size_limit_for_new_block`` enqueues evicted file unlinks as
``("unlink", path)`` items onto ``_write_queue`` — the same bounded
queue that carries pending writes. Combined with the pre-eviction
``_write_queue.full()`` short-circuit at the top of ``save_block``, this
creates a deadlock under sustained save pressure:
1. Writer is saturated → ``_write_queue`` is full.
2. ``save_block``'s pre-eviction check sees ``full()`` → returns False
immediately, BEFORE calling ``_enforce_size_limit_for_new_block``.
3. Eviction never runs → cache stays at the size cap.
4. Every subsequent save drops; ``ssd_write_drops`` climbs forever
while ``_total_size`` sits pinned at the cap.
Inline the unlinks on the eviction-calling thread instead. Eviction
typically removes a single block per save (``evict_until_size`` stops
as soon as ``total_size <= target``), so this is one syscall per save
in steady state. The deferred-unlink justification ("avoid blocking
the inference thread with N file delete syscalls") doesn't materialise
under normal load, and inlining removes the bounded-queue contention
entirely.
Bounded inline burst. The ENOSPC-recovery path invalidates the 30 s
disk-usage cache, which can shrink ``_get_effective_max_size``
sharply on the next save — ``evict_until_size`` would then return
hundreds of entries at once and the inline-unlink loop would stall
the inference thread on a syscall storm. Cap the burst at
``_MAX_INLINE_UNLINKS_PER_SAVE = 32`` and reinsert the deferred
metadata into the index so subsequent saves drain the remainder.
Bounds per-call latency at the cost of taking multiple saves to fully
reconverge.
``evict_unlink_failures`` stats counter. Eviction now decrements the
index before the on-disk unlink; if ``Path.unlink`` raises ``OSError``,
the previous "log a warning and move on" pattern silently lost the
signal. Surfacing the counter lets operators see that the on-disk
size has drifted above what the index reports.
Tests (tests/test_paged_ssd_cache.py::TestInlineLRUUnlinks):
- test_eviction_does_not_enqueue_unlink_tasks: sentinel-patch on
``put_nowait`` asserts no ``("unlink", ...)`` items ever enter the
queue.
- test_eviction_frees_capacity_under_pressure: with the writer busy,
eviction still keeps ``_index.total_size`` near the configured cap.
- test_inline_eviction_burst_is_capped: forced mass-eviction removes
at most ``_MAX_INLINE_UNLINKS_PER_SAVE`` entries; the rest reinsert
so subsequent saves can drain.
- test_unlink_failure_increments_counter: a patched ``OSError`` from
``Path.unlink`` increments ``evict_unlink_failures``.
The dead writer-thread ``("unlink", file_path)`` dispatch branch is
removed since no path enqueues such tuples anymore.
87 existing paged_ssd_cache tests + 34 hot_cache tests + 4 new tests
pass.
22e86ec to
af0537d
Compare
Owner
|
Thanks for tracking this down. The root cause makes sense: eviction was sharing the bounded write queue with pending writes, so once the queue saturated, cache pressure could not drain cleanly. This looks good to me, and I am going to merge it. I verified tests/test_paged_ssd_cache.py and tests/test_hot_cache.py locally, and I will fold two small follow-ups into main after merge: preserving LRU order by sorting _lru by last_access after capped inline eviction reinserts deferred entries, and threading evict_unlink_failures through PagedSSDCacheStats so it shows up in the runtime/admin stats path. |
jundot
added a commit
that referenced
this pull request
Jun 6, 2026
cfbraun
added a commit
to cfbraun/omlx
that referenced
this pull request
Jun 7, 2026
Resolves conflicts in favor of the merged upstream shape for: - jundot#1628 (max_context_window_policy): upstream merged the reworked separate-nullable-field design, not the original "reuse max_context_window" design that still sat on this branch. Drop the 1_000_000 default and adopt the 32768 fallback + optional policy cap jundot ultimately wanted. - jundot#1451 (inline LRU unlinks): upstream's slice+reinsert variant of the burst cap landed (with observability counters from 021162b). Drop the local push-cap-into-evict_until_size variant. - Test suites realigned to the merged shapes (TestInlineLRUUnlinks, policy-cap test names, 32768 fallback assertions).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
_enforce_size_limit_for_new_blockenqueues evicted file unlinks as("unlink", path)items onto_write_queue— the same bounded queue that carries pending writes. Combined with the pre-eviction_write_queue.full()short-circuit at the top ofsave_block, this creates a deadlock under sustained save pressure:_write_queueis full.save_block's pre-eviction check seesfull()→ returns False immediately, before calling_enforce_size_limit_for_new_block.ssd_write_dropsclimbs forever while_total_sizesits pinned at the cap.Fix: inline the unlinks on the eviction-calling thread instead. Eviction typically removes a single block per save (
evict_until_sizestops as soon astotal_size <= target), so this is one syscall per save in steady state. The deferred-unlink justification ("avoid blocking the inference thread with N file delete syscalls") doesn't materialise under normal load, and inlining removes the bounded-queue contention entirely.Bounded inline burst. The ENOSPC-recovery path invalidates the 30 s disk-usage cache, which can shrink
_get_effective_max_sizesharply —evict_until_sizewould then return hundreds of entries at once and the inline-unlink loop would stall the inference thread on a syscall storm. Cap the burst at_MAX_INLINE_UNLINKS_PER_SAVE = 32and reinsert the deferred metadata into the index so subsequent saves drain the remainder.Also adds an
evict_unlink_failuresstats counter (eviction now decrements the index before the on-disk unlink; ifPath.unlinkraisesOSError, surfacing the counter lets operators see when on-disk size has drifted above what the index reports).The dead writer-thread
("unlink", file_path)dispatch branch is removed since no path enqueues such tuples anymore.Test plan
pytest tests/test_paged_ssd_cache.py::TestInlineLRUUnlinks— 4 passedpytest tests/test_paged_ssd_cache.py tests/test_hot_cache.py— 125 passed (4 new + 121 existing)