Add OTEL Tracing to Scheduling/Consolidation #2005

jonathan-innis · 2025-02-18T13:25:40Z

Description

What problem are you trying to solve?

Right now, there isn't a good way to trace through what scheduling or consolidation is doing -- particularly when there is no output from the scheduling or consolidation loops -- it would be really nice if we built-out OTEL-based tracing that generated spans for the different function blocks and recorded important information like which nodes were attempted within that block.

How important is this feature to you?

This could provide critical insight into how the application is running that would help users debug for themselves what Karpenter is doing without having to dive too deep into the code internals and guessing at what the scheduler is doing.

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

jonathan-innis · 2025-02-18T13:25:53Z

/priority important-soon

jonathan-innis · 2025-02-18T13:26:01Z

/triage accepted

jonathan-innis · 2025-02-18T13:27:38Z

One other thing that we've talked about is somehow recording what the current cluster state is and periodically pushing that to some logging backend -- that might work, though it's unclear how much data that this would be. That option would at least let us reconstruct the inputs to our consolidation and provisioning loops but would require a re-run of the functions rather than being a historical record of what has already happened.

jonathan-innis · 2025-02-18T13:52:49Z

There is also an example of this being done for K8s system components -- one of particular interest (that would probably have traces closest to Karpenter) would be the kubelet tracing.

dashpole · 2025-02-18T14:46:18Z

Hey, i'm one of the TLs for SIG Instrumentation, and have been working on the tracing integration in the Kubelet and APIServer. I'm not very familiar with karpenter, but i'm assuming it follows the normal "operator" pattern, rather than serving requests directly.

The good aspects of tracing for operators is that it is a good way to provide very detailed information about the operator's behavior, especially if it is complex (e.g. multiple steps, parallelism), or if it involves making requests to external systems (e.g. cloud provider APIs).

The challenges are:

Sampling is random today. A low rate is ideal for leaving on production, where it will provide a random set of things the operator did, but it is not ideal if you are trying to debug a particular issue, since there isn't an established pattern for "forcing" a particular operator action to be traced.
K8s controllers have a lot of loops where they decide no changes need to be made. You need to make sure those aren't traced, or they will dominate your sampled traces. On the flip side, tracing isn't a good tool to understand why a decision to do nothing was made (e.g. why didn't it notice my unschedulable pod?)

I'm also a maintainer of the OpenTelemetry-Go project, so if you have any general questions about it, i'm happy to help.

jonathan-innis · 2025-02-18T16:48:39Z

tracing isn't a good tool to understand why a decision to do nothing was made

I think the question is less about Karpenter itself not doing anything and more about Karpenter reacting to something and then deciding that nothing needs to be changed on the cluster. In that scenario, I feel like tracing would be appropriate. The big problem today is that Karpenter will log when something is executed, but it won't log when nothing is done. As it stands today, there are so many operations that are taking place that logging may not be an effective option to enable for noting down when nothing happened. I would be interested to hear what you perceive as the trade-offs between using tracing and using logging w.r.t. tracking Karpenter's decision making.

Sampling is random today

Can't you force it to sample always? My understanding was that sampling is opt-in and that the default out-of-the-box experience with OTEL is that it would keep track and forward all spans back from the application. I was referring back to this documentation here. I guess it's mostly a question of the trade-offs between the performance impact of having this data and wanting things to scale well in production.

dashpole · 2025-02-18T17:06:56Z

Right. Your only option if you really want to sample something is to turn it up to 100% sampling. That is probably ok if you are developing or testing, but depending on the number of spans you generate might be too much for prod (or maybe it isn't!).

Logs can be structured, and can also attach a trace context (although this is odd to do without tracing), so the main differences between spans and logs are:

structure (traces give a tree-view, vs the flat view of logs)
cost control: tracing uses sampling to lower costs. Logging uses severity levels.

You should also consider using kubernetes events. If what you are trying to expose is relatively high-level, many tools/UIs already integrate well with events, and they integrate nicely with kubectl.

jonathan-innis · 2025-02-18T18:58:50Z

Yeah, we already have events enabled but we are looking for something that we could give to folks who are looking to really understand the behavior and decision making of the system. If we were to fire that as events, it would be too much and would overwhelm the apiserver.

dashpole · 2025-02-19T16:07:22Z

Tracing sounds like a reasonable fit for your needs, then.

jonathan-innis · 2025-02-21T14:26:13Z

/assign jonathan-innis

jonathan-innis · 2025-02-21T14:26:31Z

I'm POC-ing something and if anyone has thoughts or wants to help me out, let me know!

jonathan-innis · 2025-02-21T19:05:34Z

Something interesting and relevant to the conversation as well from Kubecon: https://www.youtube.com/watch?v=kzXT0WlTBpw

jonathan-innis added the kind/feature Categorizes issue or PR as related to a new feature. label Feb 18, 2025

k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority labels Feb 18, 2025

k8s-ci-robot added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed needs-priority labels Feb 18, 2025

k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Feb 18, 2025

k8s-ci-robot assigned jonathan-innis Feb 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add OTEL Tracing to Scheduling/Consolidation #2005

Add OTEL Tracing to Scheduling/Consolidation #2005

jonathan-innis commented Feb 18, 2025

jonathan-innis commented Feb 18, 2025

jonathan-innis commented Feb 18, 2025

jonathan-innis commented Feb 18, 2025

jonathan-innis commented Feb 18, 2025

dashpole commented Feb 18, 2025

jonathan-innis commented Feb 18, 2025 •

edited

Loading

dashpole commented Feb 18, 2025

jonathan-innis commented Feb 18, 2025

dashpole commented Feb 19, 2025

jonathan-innis commented Feb 21, 2025

jonathan-innis commented Feb 21, 2025

jonathan-innis commented Feb 21, 2025

Add OTEL Tracing to Scheduling/Consolidation #2005

Add OTEL Tracing to Scheduling/Consolidation #2005

Comments

jonathan-innis commented Feb 18, 2025

Description

jonathan-innis commented Feb 18, 2025

jonathan-innis commented Feb 18, 2025

jonathan-innis commented Feb 18, 2025

jonathan-innis commented Feb 18, 2025

dashpole commented Feb 18, 2025

jonathan-innis commented Feb 18, 2025 • edited Loading

dashpole commented Feb 18, 2025

jonathan-innis commented Feb 18, 2025

dashpole commented Feb 19, 2025

jonathan-innis commented Feb 21, 2025

jonathan-innis commented Feb 21, 2025

jonathan-innis commented Feb 21, 2025

jonathan-innis commented Feb 18, 2025 •

edited

Loading