fix: rate limiter deadlock due to head‑of‑line blocking #448

zeruniverse · 2025-08-18T05:18:26Z

Rate Limiter Deadlock Fix

Problem Description

The original rate limiter implementation in Triton had a deadlock issue when using BLS calls with rate limiting enabled. The deadlock occurred when:

BLS Model B (requiring rate limiter resource R1) makes calls to Model A.
BLS Model C (requiring rate limiter resource R1) makes calls to Model A.
Concurrent calls to B and C (let's say B executes and C lacks resources so is staged).
Model A doesn't require any resources but still needs to be scheduled
The rate limiter would only check the top instance in the staged queue (in this case, C)
If the top instance couldn't be allocated due to insufficient resources, no other instances would be checked
This created a deadlock where Model B was waiting for Model A, but Model A couldn't be scheduled (due to C at the top) even though it didn't need resources

Root Cause

The issue was in the AttemptAllocation() method in src/rate_limiter.cc (lines 576-587). The original implementation only attempted to allocate resources for the top instance in the priority queue:

// Original problematic code
void RateLimiter::AttemptAllocation()
{
  std::lock_guard<std::recursive_mutex> lk(staged_instances_mtx_);
  if (!staged_instances_.empty()) {
    ModelInstanceContext* instance = staged_instances_.top();
    if (resource_manager_->AllocateResources(instance)) {
      staged_instances_.pop();
      instance->Allocate();
    }
  }
}

Proposed Solution: Greedy Scheduling

The fix implements a greedy scheduling approach that:

First attempts to allocate the highest priority instance (preserving priority when possible)
If that fails, iterates through ALL staged instances to find ANY instance that can be allocated
Allocates the first instance with sufficient resources (greedy approach)
Recursively attempts more allocations after each successful allocation

This ensures that instances with no resource requirements (like Model A in BLS calls) or instances with available resources can proceed even when higher-priority instances are blocked.

Fix rate limiter head‑of‑line blocking deadlock

d8c4e3d

zeruniverse mentioned this pull request Aug 18, 2025

DeadLock when BLS model requires resources triton-inference-server/server#8358

Open

zeruniverse changed the title ~~Fix rate limiter head‑of‑line blocking deadlock~~ fix: rate limiter head‑of‑line blocking deadlock Aug 18, 2025

zeruniverse changed the title ~~fix: rate limiter head‑of‑line blocking deadlock~~ fix: rate limiter deadlock due to head‑of‑line blocking Aug 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: rate limiter deadlock due to head‑of‑line blocking #448

fix: rate limiter deadlock due to head‑of‑line blocking #448

Uh oh!

zeruniverse commented Aug 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant

fix: rate limiter deadlock due to head‑of‑line blocking #448

Are you sure you want to change the base?

fix: rate limiter deadlock due to head‑of‑line blocking #448

Uh oh!

Conversation

zeruniverse commented Aug 18, 2025

Rate Limiter Deadlock Fix

Problem Description

Root Cause

Proposed Solution: Greedy Scheduling

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant