Skip to content

Conversation

@pskiran1
Copy link
Member

@pskiran1 pskiran1 commented Oct 13, 2025

This PR adds support for a max_inflight_requests parameter to prevent unbounded memory growth in ensemble models by implementing backpressure control. The feature limits concurrent in-flight responses from ensemble steps to downstream consumers.

Problem

When a fast decoupled producer (e.g., DALI video decoder generating 200 frames instantly) feeds a slow consumer (e.g., image classification taking 200ms per frame), responses pile up in memory waiting to be processed. This causes unbounded memory growth (25-35GB observed for a single request).

Solution

The new parameter blocks the producer when the downstream consumer has too many pending responses (configured limit reached), implementing backpressure control. Example configuration:

ensemble_scheduling {
  max_inflight_requests: 4
  step [
    {
  ...

CI: triton-inference-server/server#8458

@pskiran1 pskiran1 requested a review from Copilot October 13, 2025 17:21
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds support for a max_ensemble_inflight_responses parameter to prevent unbounded memory growth in ensemble models by implementing backpressure control. The feature limits concurrent inflight responses from ensemble steps to downstream consumers.

  • Adds backpressure configuration parameter parsing with validation
  • Implements producer blocking mechanism when downstream consumers are overloaded
  • Tracks inflight response counts per step with proper synchronization

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
src/ensemble_scheduler/ensemble_scheduler.h Adds max_inflight_responses_ field to EnsembleInfo struct
src/ensemble_scheduler/ensemble_scheduler.cc Implements backpressure logic with tracking, blocking, and configuration parsing

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Copy link
Contributor

@whoisj whoisj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have concerns here. This change creates an array of mutex + condition-variables that independently track, what I assume are, producer/consumer channels.
this seems overly complex to me.

why not use a simple integer to track the number of active vs capacity, and a single mutex + cv to handle interactions with those values?

Finally, does this guard against output overflows, where too many requests have completed but downstream models are incapable to consuming those outputs?

@pskiran1 pskiran1 requested review from tanmayv25 and yinggeh October 14, 2025 05:44
@yinggeh
Copy link
Contributor

yinggeh commented Oct 15, 2025

Need documentation and show the use case.

@pskiran1 pskiran1 changed the title feat: Add support for max_ensemble_inflight_responses parameter to prevent unbounded memory growth in ensemble models feat: Add support for max_inflight_responses parameter to prevent unbounded memory growth in ensemble models Oct 17, 2025
@pskiran1 pskiran1 requested a review from yinggeh October 17, 2025 21:08
…into spolisetty/tri-26-triton-dali-ensemble-model-memory-issue
whoisj
whoisj previously approved these changes Oct 24, 2025
@pskiran1 pskiran1 changed the title feat: Add support for max_inflight_responses parameter to prevent unbounded memory growth in ensemble models feat: Add support for max_inflight_requests parameter to prevent unbounded memory growth in ensemble models Oct 24, 2025
@pskiran1 pskiran1 requested a review from yinggeh October 24, 2025 16:00
@pskiran1 pskiran1 requested a review from yinggeh October 28, 2025 17:26
@pskiran1 pskiran1 requested a review from yinggeh October 30, 2025 05:08
@pskiran1 pskiran1 requested a review from yinggeh October 31, 2025 11:36
@pskiran1 pskiran1 requested a review from yinggeh October 31, 2025 20:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

4 participants