Skip to content

Conversation

LukeAVanDrie
Copy link
Contributor

What type of PR is this?
/kind feature

What this PR does / why we need it:

This PR introduces initial Prometheus metrics for the experimental Flow Control layer within the EPP. These metrics provide crucial visibility into the performance and behavior of the flow control mechanisms, such as queue times and buffered in-flight requests.

The key changes include:

  1. New Metrics:

    • inference_extension_flow_control_request_queue_duration_seconds: A histogram to track the total time requests spend in the Flow Control layer, from the moment they enter EnqueueAndWait until a final outcome. Labels: fairness_id, priority, outcome.
    • inference_extension_flow_control_queue_size: A gauge to track the instantaneous number of requests being actively managed by the Flow Control layer. Labels: fairness_id, priority.
  2. Instrumentation:

    • pkg/epp/flowcontrol/controller/controller.go: Increments/decrements queue size at the entry/exit of EnqueueAndWait.
    • pkg/epp/flowcontrol/controller/internal/item.go: Records queue duration upon item finalization.
  3. Documentation:

    • Updated site-src/guides/metrics-and-observability.md to include descriptions and label information for the new flow control metrics.

Which issue(s) this PR fixes:
Tracks #1708

Does this PR introduce a user-facing change?:

feat: Add initial Prometheus metrics for the experimental Flow Control layer.

@k8s-ci-robot k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Oct 13, 2025
Copy link

netlify bot commented Oct 13, 2025

Deploy Preview for gateway-api-inference-extension ready!

Name Link
🔨 Latest commit 369ae1f
🔍 Latest deploy log https://app.netlify.com/projects/gateway-api-inference-extension/deploys/68eea91af43f9f00084dddc9
😎 Deploy Preview https://deploy-preview-1714--gateway-api-inference-extension.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Oct 13, 2025
@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Oct 13, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @LukeAVanDrie. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Oct 13, 2025
@LukeAVanDrie
Copy link
Contributor Author

/cc @JeffLuoo

Hey Jeff,

In this PR, I'm introducing the first set of Prometheus metrics for the new experimental Flow Control layer:

  • inference_extension_flow_control_request_queue_duration_seconds
  • inference_extension_flow_control_queue_size

I wanted to give some context on the label choices, as they differ slightly from the inference_model_* metrics initially considered.

The Flow Control layer operates on a concept called FlowKey, which consists of a fairness_id and a priority. This key represents a distinct stream of traffic that the layer manages for fairness and queuing.

  • fairness_id: This typically corresponds to a tenant, user, or a specific model. It's the primary dimension for isolation and fairness.
  • priority: This allows for different levels of service within the same fairness_id.

Given this, I have labeled the new metrics with fairness_id and priority as these are the natural dimensions the Flow Control layer uses for its decisions. I also include an outcome label on the duration histogram to distinguish between dispatched, rejected, and evicted requests.

This is different from the model_name and target_model_name used in the inference_objective_* metrics. While a fairness_id might often be a model_name, the Flow Control layer is designed to be more general, and these labels may be orthogonal. For instance, a single model could have multiple priority bands, each represented by a different FlowKey and thus different label values in these new metrics.

I believe these labels (fairness_id, priority, outcome) provide the most direct insight into the Flow Control layer's behavior and queueing dynamics. The existing inference_objective_* metrics still provide the end-to-end view per "inference objective," while these new metrics zoom in on the time spent specifically within the flow control queuing and dispatching logic.

Please let me know if you have any thoughts or concerns. We could also expand the label set to also include model_name and target_model_name if desired.

@k8s-ci-robot k8s-ci-robot requested a review from JeffLuoo October 13, 2025 22:33
@ahg-g
Copy link
Contributor

ahg-g commented Oct 14, 2025

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Oct 14, 2025
Introduces initial Prometheus metrics for the experimental Flow Contorl
layer in EPP.

This change adds the following metrics:
- inference_extension_flow_control_request_queue_duration_seconds:
	A histogram to track the total time requests spend in the Flow
	Control layer, from invocation of EnqueueAndWait to final outcome.
- inference_extension_flow_control_queue_size:
  A gauge to track the number of requests currently being managed by
  the Flow Control layer.

These metrics are labeled by fairness_id, priority, and outcome (for
the duration metric).
@LukeAVanDrie LukeAVanDrie force-pushed the feat/flow-control-metrics branch from 48cd678 to 369ae1f Compare October 14, 2025 19:48
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 14, 2025
@ahg-g
Copy link
Contributor

ahg-g commented Oct 14, 2025

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 14, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ahg-g, LukeAVanDrie

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 14, 2025
@k8s-ci-robot k8s-ci-robot merged commit bd614af into kubernetes-sigs:main Oct 14, 2025
11 checks passed
@LukeAVanDrie LukeAVanDrie deleted the feat/flow-control-metrics branch October 14, 2025 21:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants