Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add OTEL Tracing to Scheduling/Consolidation #2005

Open
jonathan-innis opened this issue Feb 18, 2025 · 12 comments
Open

Add OTEL Tracing to Scheduling/Consolidation #2005

jonathan-innis opened this issue Feb 18, 2025 · 12 comments
Assignees
Labels
kind/feature Categorizes issue or PR as related to a new feature. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@jonathan-innis
Copy link
Member

Description

What problem are you trying to solve?

Right now, there isn't a good way to trace through what scheduling or consolidation is doing -- particularly when there is no output from the scheduling or consolidation loops -- it would be really nice if we built-out OTEL-based tracing that generated spans for the different function blocks and recorded important information like which nodes were attempted within that block.

How important is this feature to you?

This could provide critical insight into how the application is running that would help users debug for themselves what Karpenter is doing without having to dive too deep into the code internals and guessing at what the scheduler is doing.

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment
@jonathan-innis jonathan-innis added the kind/feature Categorizes issue or PR as related to a new feature. label Feb 18, 2025
@k8s-ci-robot k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority labels Feb 18, 2025
@jonathan-innis
Copy link
Member Author

/priority important-soon

@k8s-ci-robot k8s-ci-robot added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed needs-priority labels Feb 18, 2025
@jonathan-innis
Copy link
Member Author

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Feb 18, 2025
@jonathan-innis
Copy link
Member Author

One other thing that we've talked about is somehow recording what the current cluster state is and periodically pushing that to some logging backend -- that might work, though it's unclear how much data that this would be. That option would at least let us reconstruct the inputs to our consolidation and provisioning loops but would require a re-run of the functions rather than being a historical record of what has already happened.

@jonathan-innis
Copy link
Member Author

There is also an example of this being done for K8s system components -- one of particular interest (that would probably have traces closest to Karpenter) would be the kubelet tracing.

@dashpole
Copy link

Hey, i'm one of the TLs for SIG Instrumentation, and have been working on the tracing integration in the Kubelet and APIServer. I'm not very familiar with karpenter, but i'm assuming it follows the normal "operator" pattern, rather than serving requests directly.

The good aspects of tracing for operators is that it is a good way to provide very detailed information about the operator's behavior, especially if it is complex (e.g. multiple steps, parallelism), or if it involves making requests to external systems (e.g. cloud provider APIs).

The challenges are:

  • Sampling is random today. A low rate is ideal for leaving on production, where it will provide a random set of things the operator did, but it is not ideal if you are trying to debug a particular issue, since there isn't an established pattern for "forcing" a particular operator action to be traced.
  • K8s controllers have a lot of loops where they decide no changes need to be made. You need to make sure those aren't traced, or they will dominate your sampled traces. On the flip side, tracing isn't a good tool to understand why a decision to do nothing was made (e.g. why didn't it notice my unschedulable pod?)

I'm also a maintainer of the OpenTelemetry-Go project, so if you have any general questions about it, i'm happy to help.

@jonathan-innis
Copy link
Member Author

jonathan-innis commented Feb 18, 2025

tracing isn't a good tool to understand why a decision to do nothing was made

I think the question is less about Karpenter itself not doing anything and more about Karpenter reacting to something and then deciding that nothing needs to be changed on the cluster. In that scenario, I feel like tracing would be appropriate. The big problem today is that Karpenter will log when something is executed, but it won't log when nothing is done. As it stands today, there are so many operations that are taking place that logging may not be an effective option to enable for noting down when nothing happened. I would be interested to hear what you perceive as the trade-offs between using tracing and using logging w.r.t. tracking Karpenter's decision making.

Sampling is random today

Can't you force it to sample always? My understanding was that sampling is opt-in and that the default out-of-the-box experience with OTEL is that it would keep track and forward all spans back from the application. I was referring back to this documentation here. I guess it's mostly a question of the trade-offs between the performance impact of having this data and wanting things to scale well in production.

@dashpole
Copy link

Right. Your only option if you really want to sample something is to turn it up to 100% sampling. That is probably ok if you are developing or testing, but depending on the number of spans you generate might be too much for prod (or maybe it isn't!).

Logs can be structured, and can also attach a trace context (although this is odd to do without tracing), so the main differences between spans and logs are:

  • structure (traces give a tree-view, vs the flat view of logs)
  • cost control: tracing uses sampling to lower costs. Logging uses severity levels.

You should also consider using kubernetes events. If what you are trying to expose is relatively high-level, many tools/UIs already integrate well with events, and they integrate nicely with kubectl.

@jonathan-innis
Copy link
Member Author

Yeah, we already have events enabled but we are looking for something that we could give to folks who are looking to really understand the behavior and decision making of the system. If we were to fire that as events, it would be too much and would overwhelm the apiserver.

@dashpole
Copy link

Tracing sounds like a reasonable fit for your needs, then.

@jonathan-innis
Copy link
Member Author

/assign jonathan-innis

@jonathan-innis
Copy link
Member Author

I'm POC-ing something and if anyone has thoughts or wants to help me out, let me know!

@jonathan-innis
Copy link
Member Author

Something interesting and relevant to the conversation as well from Kubecon: https://www.youtube.com/watch?v=kzXT0WlTBpw

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

3 participants