Skip to content

Conversation

@rexagod
Copy link
Collaborator

@rexagod rexagod commented Jan 10, 2023

Use kube-scheduler's metrics instead of kube-state-metrics, as they are more precise.

Refer the links below for more details.


Also, refactor kube_pod_status_phase, since statuses other than "Pending" or "Running" are excluded or deprecated.

Copy link
Member

@paulfantom paulfantom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I like this change I can see a problem with this in managed solutions (like EKS) where access to kube-scheduler is forbidden. In those cases alerts and dashboards which are based on kube-scheduler data won't be useful at all. Due to such an issue can we use OR statements instead of deprecating kube-state-metrics data? I think something like kube_pod_resource_request OR kube_pod_container_resource_request should do the trick.

@rexagod
Copy link
Collaborator Author

rexagod commented Feb 14, 2023

Note to self: Revisit this issue once prometheus/prometheus#9624 is implemented.

@rexagod
Copy link
Collaborator Author

rexagod commented Feb 14, 2023

Just wondering if there's a better (or any) way to format embedded PromQL expressions in *sonnet files? The existing tooling seems to ignore the PromQL expressions within such files.

@rexagod rexagod force-pushed the mon-2823 branch 2 times, most recently from 8102d0c to eb73d96 Compare March 11, 2024 23:13
@rexagod rexagod changed the title Use kube-scheduler's metrics instead of kube-state-metrics Use kube-scheduler's metrics instead of kube-state-metrics' Mar 12, 2024
@rexagod rexagod force-pushed the mon-2823 branch 2 times, most recently from ce7c3f8 to 817b784 Compare March 12, 2024 14:06
@github-actions
Copy link

This PR has been automatically marked as stale because it has not
had any activity in the past 30 days.

The next time this stale check runs, the stale label will be
removed if there is new activity. The issue will be closed in 7
days if there is no new activity.

Thank you for your contributions!

@github-actions github-actions bot added the stale label Sep 15, 2024
@github-actions github-actions bot closed this Sep 23, 2024
@rexagod rexagod reopened this Sep 23, 2024
@rexagod rexagod requested a review from povilasv as a code owner September 23, 2024 10:59
@rexagod
Copy link
Collaborator Author

rexagod commented Sep 23, 2024

Rebasing.

@rexagod rexagod force-pushed the mon-2823 branch 5 times, most recently from 6997e03 to 86d83ae Compare September 23, 2024 23:04
@rexagod rexagod changed the title Use kube-scheduler's metrics instead of kube-state-metrics' Prefer kube-scheduler's resource metrics to kube-state-metrics' Sep 23, 2024
@github-actions github-actions bot removed the stale label Sep 24, 2024
kube_pod_container_resource_requests{resource="memory",%(kubeStateMetricsSelector)s}
) * on(namespace, pod, %(clusterLabel)s) group_left() max by (namespace, pod, %(clusterLabel)s) (
kube_pod_status_phase{phase=~"Pending|Running"} == 1
kube_pod_resource_request{resource="memory",%(kubeSchedulerSelector)s} or kube_pod_container_resource_requests{resource="memory",%(kubeStateMetricsSelector)s}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not quite sure that it will do the right thing since kube_pod_resource_request and kube_pod_container_resource_requests don't have the exact same labels if I understand correctly.

Copy link
Collaborator Author

@rexagod rexagod Sep 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could be wrong here but wouldn't the differing labelsets (kube-scheduler metrics potentially having the additional scheduler and priority labels) after or be sanitized to the set of specified labels in the max operation, so doing an ignoring (scheduler,priority) will eventually have no effect on the final result?

@github-actions
Copy link

This PR has been automatically marked as stale because it has not
had any activity in the past 30 days.

The next time this stale check runs, the stale label will be
removed if there is new activity. The issue will be closed in 7
days if there is no new activity.

Thank you for your contributions!

@github-actions github-actions bot added the stale label Oct 28, 2024
@github-actions github-actions bot closed this Nov 5, 2024
@rexagod
Copy link
Collaborator Author

rexagod commented Aug 17, 2025

I've went over and tried to address all outstanding reviews from before.

PLMK if I missed something!

@github-actions github-actions bot removed the stale label Aug 18, 2025
@rexagod rexagod requested a review from simonpasquier August 20, 2025 08:49
@github-actions
Copy link

This PR has been automatically marked as stale because it has not
had any activity in the past 30 days.

The next time this stale check runs, the stale label will be
removed if there is new activity. The issue will be closed in 7
days if there is no new activity.

Thank you for your contributions!

@github-actions github-actions bot added the stale label Sep 20, 2025
@rexagod
Copy link
Collaborator Author

rexagod commented Sep 20, 2025

(bump)

@github-actions github-actions bot removed the stale label Sep 21, 2025
@github-actions
Copy link

This PR has been automatically marked as stale because it has not
had any activity in the past 30 days.

The next time this stale check runs, the stale label will be
removed if there is new activity. The issue will be closed in 7
days if there is no new activity.

Thank you for your contributions!

@github-actions github-actions bot added the stale label Oct 22, 2025
@skl skl added keepalive Use to prevent automatic closing and removed stale labels Oct 22, 2025
Copy link
Collaborator

@skl skl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your patience! I'm going to create another PR to get the kube-scheduler metrics in the new local dev env, looks like I'm missing the scrape config there to validate this.

EDIT: Here's the follow-up PR: #1116, please note this requires a new scrape job which might affect your job labels?

promql_expr_test:
- eval_time: 0m
expr: namespace_cpu:kube_pod_container_resource_requests:sum
expr: namespace_cpu:kube_pod_resource_request_or_kube_pod_container_resource_requests:sum
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that you are keeping the existing rules, the existing tests probably need to be restored, so that we end up with both the existing and the new tests alongside each other.

{
record: 'cluster:namespace:pod_cpu:active:kube_pod_resource_request_or_kube_pod_container_resource_requests',
expr: |||
(kube_pod_resource_request{resource="memory",%(kubeSchedulerSelector)s} or kube_pod_container_resource_requests{resource="cpu",%(kubeStateMetricsSelector)s})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
(kube_pod_resource_request{resource="memory",%(kubeSchedulerSelector)s} or kube_pod_container_resource_requests{resource="cpu",%(kubeStateMetricsSelector)s})
(kube_pod_resource_request{resource="cpu",%(kubeSchedulerSelector)s} or kube_pod_container_resource_requests{resource="cpu",%(kubeStateMetricsSelector)s})

||| % $._config,
},
{
record: 'namespace_memory:kube_pod_resource_request_or_kube_pod_container_resource_requests:sum',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I notice the new rules don't use kube_pod_status_phase, is that intentional/implied somehow in the scheduler version of the metrics?

@skl skl added the question Further information is requested label Oct 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

keepalive Use to prevent automatic closing question Further information is requested

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants