Prefer kube-scheduler's resource metrics to kube-state-metrics' #815

rexagod · 2023-01-10T13:02:13Z

Use kube-scheduler's metrics instead of kube-state-metrics, as they are more precise.

Refer the links below for more details.

Also, refactor kube_pod_status_phase, since statuses other than "Pending" or "Running" are excluded or deprecated.

rules/apps.libsonnet

paulfantom

As far as I like this change I can see a problem with this in managed solutions (like EKS) where access to kube-scheduler is forbidden. In those cases alerts and dashboards which are based on kube-scheduler data won't be useful at all. Due to such an issue can we use OR statements instead of deprecating kube-state-metrics data? I think something like kube_pod_resource_request OR kube_pod_container_resource_request should do the trick.

rexagod · 2023-02-14T07:55:22Z

Note to self: Revisit this issue once prometheus/prometheus#9624 is implemented.

rexagod · 2023-02-14T10:56:49Z

Just wondering if there's a better (or any) way to format embedded PromQL expressions in *sonnet files? The existing tooling seems to ignore the PromQL expressions within such files.

github-actions · 2024-09-15T00:26:37Z

This PR has been automatically marked as stale because it has not
had any activity in the past 30 days.

The next time this stale check runs, the stale label will be
removed if there is new activity. The issue will be closed in 7
days if there is no new activity.

Thank you for your contributions!

rexagod · 2024-09-23T11:07:22Z

Rebasing.

rules/apps.libsonnet

simonpasquier · 2024-09-24T14:14:33Z

rules/apps.libsonnet

-                        kube_pod_container_resource_requests{resource="memory",%(kubeStateMetricsSelector)s}
-                      ) * on(namespace, pod, %(clusterLabel)s) group_left() max by (namespace, pod, %(clusterLabel)s) (
-                        kube_pod_status_phase{phase=~"Pending|Running"} == 1
+                        kube_pod_resource_request{resource="memory",%(kubeSchedulerSelector)s} or kube_pod_container_resource_requests{resource="memory",%(kubeStateMetricsSelector)s}


I'm not quite sure that it will do the right thing since kube_pod_resource_request and kube_pod_container_resource_requests don't have the exact same labels if I understand correctly.

I could be wrong here but wouldn't the differing labelsets (kube-scheduler metrics potentially having the additional scheduler and priority labels) after or be sanitized to the set of specified labels in the max operation, so doing an ignoring (scheduler,priority) will eventually have no effect on the final result?

rules/apps.libsonnet

DESIGN.md

alerts/apps_alerts.libsonnet

rules/apps.libsonnet

github-actions · 2024-10-28T00:26:19Z

This PR has been automatically marked as stale because it has not
had any activity in the past 30 days.

The next time this stale check runs, the stale label will be
removed if there is new activity. The issue will be closed in 7
days if there is no new activity.

Thank you for your contributions!

Since they are more accurate.

rexagod · 2025-08-17T23:10:27Z

I've went over and tried to address all outstanding reviews from before.

PLMK if I missed something!

github-actions · 2025-09-20T00:25:50Z

This PR has been automatically marked as stale because it has not
had any activity in the past 30 days.

The next time this stale check runs, the stale label will be
removed if there is new activity. The issue will be closed in 7
days if there is no new activity.

Thank you for your contributions!

rexagod · 2025-09-20T20:11:38Z

(bump)

github-actions · 2025-10-22T00:29:20Z

This PR has been automatically marked as stale because it has not
had any activity in the past 30 days.

The next time this stale check runs, the stale label will be
removed if there is new activity. The issue will be closed in 7
days if there is no new activity.

Thank you for your contributions!

skl

Thanks for your patience! I'm going to create another PR to get the kube-scheduler metrics in the new local dev env, looks like I'm missing the scrape config there to validate this.

EDIT: Here's the follow-up PR: #1116, please note this requires a new scrape job which might affect your job labels?

skl · 2025-10-22T12:45:40Z

tests/tests.yaml

  promql_expr_test:
  - eval_time: 0m
-    expr: namespace_cpu:kube_pod_container_resource_requests:sum
+    expr: namespace_cpu:kube_pod_resource_request_or_kube_pod_container_resource_requests:sum


Now that you are keeping the existing rules, the existing tests probably need to be restored, so that we end up with both the existing and the new tests alongside each other.

skl · 2025-10-22T12:47:39Z

rules/apps.libsonnet

+          {
+            record: 'cluster:namespace:pod_cpu:active:kube_pod_resource_request_or_kube_pod_container_resource_requests',
+            expr: |||
+              (kube_pod_resource_request{resource="memory",%(kubeSchedulerSelector)s} or kube_pod_container_resource_requests{resource="cpu",%(kubeStateMetricsSelector)s})


Suggested change

(kube_pod_resource_request{resource="memory",%(kubeSchedulerSelector)s} or kube_pod_container_resource_requests{resource="cpu",%(kubeStateMetricsSelector)s})

(kube_pod_resource_request{resource="cpu",%(kubeSchedulerSelector)s} or kube_pod_container_resource_requests{resource="cpu",%(kubeStateMetricsSelector)s})

skl · 2025-10-22T12:49:40Z

rules/apps.libsonnet

+            ||| % $._config,
+          },
+          {
+            record: 'namespace_memory:kube_pod_resource_request_or_kube_pod_container_resource_requests:sum',


I notice the new rules don't use kube_pod_status_phase, is that intentional/implied somehow in the scheduler version of the metrics?

skl · 2025-11-27T11:38:48Z

ping @rexagod, couple of questions above ^ happy to help you get this over the line

simonpasquier reviewed Jan 10, 2023

View reviewed changes

rules/apps.libsonnet Outdated Show resolved Hide resolved

simonpasquier reviewed Jan 10, 2023

View reviewed changes

rules/apps.libsonnet Outdated Show resolved Hide resolved

rules/apps.libsonnet Show resolved Hide resolved

rexagod force-pushed the mon-2823 branch from 0a5f952 to a5ecc6c Compare January 16, 2023 10:56

dgrisonnet mentioned this pull request Jan 19, 2023

Expose metrics about resource requests and limits that represent the pod model kubernetes/enhancements#1748

Closed

6 tasks

paulfantom requested changes Jan 31, 2023

View reviewed changes

dgrisonnet mentioned this pull request Feb 1, 2023

KEP-1748: update to stable kubernetes/enhancements#3810

Merged

rexagod force-pushed the mon-2823 branch from c9ad841 to 6f139d6 Compare February 14, 2023 07:54

rexagod force-pushed the mon-2823 branch from 6f139d6 to eb73d96 Compare February 14, 2023 08:58

rexagod requested a review from simonpasquier February 20, 2023 11:39

rexagod force-pushed the mon-2823 branch 2 times, most recently from 8102d0c to eb73d96 Compare March 11, 2024 23:13

rexagod changed the title ~~Use kube-scheduler's metrics instead of kube-state-metrics~~ Use kube-scheduler's metrics instead of kube-state-metrics' Mar 12, 2024

rexagod force-pushed the mon-2823 branch 2 times, most recently from ce7c3f8 to 817b784 Compare March 12, 2024 14:06

github-actions bot added the stale label Sep 15, 2024

github-actions bot closed this Sep 23, 2024

rexagod reopened this Sep 23, 2024

rexagod requested a review from povilasv as a code owner September 23, 2024 10:59

rexagod force-pushed the mon-2823 branch 5 times, most recently from 6997e03 to 86d83ae Compare September 23, 2024 23:04

rexagod changed the title ~~Use kube-scheduler's metrics instead of kube-state-metrics'~~ Prefer kube-scheduler's resource metrics to kube-state-metrics' Sep 23, 2024

github-actions bot removed the stale label Sep 24, 2024

skl reviewed Sep 24, 2024

View reviewed changes

rules/apps.libsonnet Show resolved Hide resolved

simonpasquier reviewed Sep 24, 2024

View reviewed changes

rexagod force-pushed the mon-2823 branch from a27bcc3 to 921f62f Compare September 25, 2024 12:30

skl reviewed Sep 25, 2024

View reviewed changes

rules/apps.libsonnet Show resolved Hide resolved

rules/apps.libsonnet Outdated Show resolved Hide resolved

rules/apps.libsonnet Outdated Show resolved Hide resolved

rules/apps.libsonnet Outdated Show resolved Hide resolved

simonpasquier reviewed Sep 27, 2024

View reviewed changes

rules/apps.libsonnet Show resolved Hide resolved

DESIGN.md Outdated Show resolved Hide resolved

alerts/apps_alerts.libsonnet Outdated Show resolved Hide resolved

rules/apps.libsonnet Show resolved Hide resolved

github-actions bot added the stale label Oct 28, 2024

github-actions bot closed this Nov 5, 2024

Prefer kube-scheduler's resource metrics to kube-state-metrics'

6e02033

Since they are more accurate.

rexagod reopened this Aug 17, 2025

rexagod force-pushed the mon-2823 branch from 921f62f to 6e02033 Compare August 17, 2025 22:49

github-actions bot removed the stale label Aug 18, 2025

rexagod requested a review from simonpasquier August 20, 2025 08:49

github-actions bot added the stale label Sep 20, 2025

github-actions bot removed the stale label Sep 21, 2025

github-actions bot added the stale label Oct 22, 2025

skl added keepalive Use to prevent automatic closing and removed stale labels Oct 22, 2025

skl reviewed Oct 22, 2025

View reviewed changes

skl mentioned this pull request Oct 22, 2025

feat: kube-scheduler local dev #1116

Merged

skl added the question Further information is requested label Oct 23, 2025

	(kube_pod_resource_request{resource="memory",%(kubeSchedulerSelector)s} or kube_pod_container_resource_requests{resource="cpu",%(kubeStateMetricsSelector)s})
	(kube_pod_resource_request{resource="cpu",%(kubeSchedulerSelector)s} or kube_pod_container_resource_requests{resource="cpu",%(kubeStateMetricsSelector)s})

Prefer kube-scheduler's resource metrics to kube-state-metrics' #815

Are you sure you want to change the base?

Prefer kube-scheduler's resource metrics to kube-state-metrics' #815

Uh oh!

Conversation

rexagod commented Jan 10, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

paulfantom left a comment

Choose a reason for hiding this comment

Uh oh!

rexagod commented Feb 14, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rexagod commented Feb 14, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Sep 15, 2024

Uh oh!

rexagod commented Sep 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

simonpasquier Sep 24, 2024

Choose a reason for hiding this comment

Uh oh!

rexagod Sep 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Oct 28, 2024

Uh oh!

rexagod commented Aug 17, 2025

Uh oh!

github-actions bot commented Sep 20, 2025

Uh oh!

rexagod commented Sep 20, 2025

Uh oh!

github-actions bot commented Oct 22, 2025

Uh oh!

skl left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

skl Oct 22, 2025

Choose a reason for hiding this comment

Uh oh!

skl Oct 22, 2025

Choose a reason for hiding this comment

Uh oh!

skl Oct 22, 2025

Choose a reason for hiding this comment

Uh oh!

skl commented Nov 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

rexagod commented Jan 10, 2023 •

edited

Loading

rexagod commented Feb 14, 2023 •

edited

Loading

rexagod commented Feb 14, 2023 •

edited

Loading

rexagod commented Sep 23, 2024 •

edited

Loading

rexagod Sep 25, 2024 •

edited

Loading

skl left a comment •

edited

Loading