-
Notifications
You must be signed in to change notification settings - Fork 182
feat: Add initial Flow Control metrics #1714
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add initial Flow Control metrics #1714
Conversation
✅ Deploy Preview for gateway-api-inference-extension ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
Hi @LukeAVanDrie. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/cc @JeffLuoo Hey Jeff, In this PR, I'm introducing the first set of Prometheus metrics for the new experimental Flow Control layer:
I wanted to give some context on the label choices, as they differ slightly from the inference_model_* metrics initially considered. The Flow Control layer operates on a concept called FlowKey, which consists of a fairness_id and a priority. This key represents a distinct stream of traffic that the layer manages for fairness and queuing.
Given this, I have labeled the new metrics with fairness_id and priority as these are the natural dimensions the Flow Control layer uses for its decisions. I also include an outcome label on the duration histogram to distinguish between dispatched, rejected, and evicted requests. This is different from the model_name and target_model_name used in the inference_objective_* metrics. While a fairness_id might often be a model_name, the Flow Control layer is designed to be more general, and these labels may be orthogonal. For instance, a single model could have multiple priority bands, each represented by a different FlowKey and thus different label values in these new metrics. I believe these labels (fairness_id, priority, outcome) provide the most direct insight into the Flow Control layer's behavior and queueing dynamics. The existing inference_objective_* metrics still provide the end-to-end view per "inference objective," while these new metrics zoom in on the time spent specifically within the flow control queuing and dispatching logic. Please let me know if you have any thoughts or concerns. We could also expand the label set to also include model_name and target_model_name if desired. |
/ok-to-test |
Introduces initial Prometheus metrics for the experimental Flow Contorl layer in EPP. This change adds the following metrics: - inference_extension_flow_control_request_queue_duration_seconds: A histogram to track the total time requests spend in the Flow Control layer, from invocation of EnqueueAndWait to final outcome. - inference_extension_flow_control_queue_size: A gauge to track the number of requests currently being managed by the Flow Control layer. These metrics are labeled by fairness_id, priority, and outcome (for the duration metric).
48cd678
to
369ae1f
Compare
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ahg-g, LukeAVanDrie The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What type of PR is this?
/kind feature
What this PR does / why we need it:
This PR introduces initial Prometheus metrics for the experimental Flow Control layer within the EPP. These metrics provide crucial visibility into the performance and behavior of the flow control mechanisms, such as queue times and buffered in-flight requests.
The key changes include:
New Metrics:
inference_extension_flow_control_request_queue_duration_seconds
: A histogram to track the total time requests spend in the Flow Control layer, from the moment they enterEnqueueAndWait
until a final outcome. Labels:fairness_id
,priority
,outcome
.inference_extension_flow_control_queue_size
: A gauge to track the instantaneous number of requests being actively managed by the Flow Control layer. Labels:fairness_id
,priority
.Instrumentation:
pkg/epp/flowcontrol/controller/controller.go
: Increments/decrements queue size at the entry/exit ofEnqueueAndWait
.pkg/epp/flowcontrol/controller/internal/item.go
: Records queue duration upon item finalization.Documentation:
site-src/guides/metrics-and-observability.md
to include descriptions and label information for the new flow control metrics.Which issue(s) this PR fixes:
Tracks #1708
Does this PR introduce a user-facing change?: