Skip to content

Conversation

@zeruniverse
Copy link

Rate Limiter Deadlock Fix

Problem Description

The original rate limiter implementation in Triton had a deadlock issue when using BLS calls with rate limiting enabled. The deadlock occurred when:

  1. BLS Model B (requiring rate limiter resource R1) makes calls to Model A.
  2. BLS Model C (requiring rate limiter resource R1) makes calls to Model A.
  3. Concurrent calls to B and C (let's say B executes and C lacks resources so is staged).
  4. Model A doesn't require any resources but still needs to be scheduled
  5. The rate limiter would only check the top instance in the staged queue (in this case, C)
  6. If the top instance couldn't be allocated due to insufficient resources, no other instances would be checked
  7. This created a deadlock where Model B was waiting for Model A, but Model A couldn't be scheduled (due to C at the top) even though it didn't need resources

Root Cause

The issue was in the AttemptAllocation() method in src/rate_limiter.cc (lines 576-587). The original implementation only attempted to allocate resources for the top instance in the priority queue:

// Original problematic code
void RateLimiter::AttemptAllocation()
{
  std::lock_guard<std::recursive_mutex> lk(staged_instances_mtx_);
  if (!staged_instances_.empty()) {
    ModelInstanceContext* instance = staged_instances_.top();
    if (resource_manager_->AllocateResources(instance)) {
      staged_instances_.pop();
      instance->Allocate();
    }
  }
}

Proposed Solution: Greedy Scheduling

The fix implements a greedy scheduling approach that:

  1. First attempts to allocate the highest priority instance (preserving priority when possible)
  2. If that fails, iterates through ALL staged instances to find ANY instance that can be allocated
  3. Allocates the first instance with sufficient resources (greedy approach)
  4. Recursively attempts more allocations after each successful allocation

This ensures that instances with no resource requirements (like Model A in BLS calls) or instances with available resources can proceed even when higher-priority instances are blocked.

@zeruniverse zeruniverse changed the title Fix rate limiter head‑of‑line blocking deadlock fix: rate limiter head‑of‑line blocking deadlock Aug 18, 2025
@zeruniverse zeruniverse changed the title fix: rate limiter head‑of‑line blocking deadlock fix: rate limiter deadlock due to head‑of‑line blocking Aug 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant