-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The units of kubernetes.cpu.usage.total has changed from agent5 #1552
Comments
@nimeshksingh thank you for submitting this issue. You're right, some metrics collected from the kubelet endpoint We are aware of that and we're working on this |
Cool, good to hear. If it's a known issue it may be worth adding to this page: https://github.com/DataDog/datadog-agent/blob/master/docs/agent/changes.md#kubernetes-support |
was going to report this too... is there any combination of metrics I can use until a fix is available? |
This is the on going fix to keep the metric |
@JulienBalestra, it would be very good if we had a option to keep the new behavior. Agent 5 scaled metrics require nearly every graph to have |
Just to confirm - the difference in |
Actually, the more I look at this, the more it seems that the kubernetes.cpu and kubernetes.mem metrics are just wrong a lot of the time with agent6. I'm seeing things like two pods that have the same cpu usage according the kuberenetes dashboard with wildly different cpu usages in datadog. Also, for processes with very steady memory usage, memory in datadog just jumps around in ways that don't match the kubernetes dashboard's view. Should I open a bug in integrations-core? |
@nimeshksingh the scaling factor is a first step to allow the metric We can keep this issue open to continue to track this problem. We need to continue to investigate on this ( |
Okay, cool. In the meantime is there a good way to turn off just the kubelet check in agent6, so I can run it alongside an agent5 that just gets these metrics? I don't want to just set KUBERNETES=false for agent6 because I still want the kubelet to be used for autodiscovery and labels for other checks. |
It's possible by deleting any associated file: Currently it's setup by the entrypoint: https://github.com/DataDog/datadog-agent/blob/master/Dockerfiles/agent/entrypoint/50-kubernetes.sh#L15 |
Hello, Running two agents on the same host is not supported, as host metadata will get mangled, and this is a not a use case we are planning to support. I am currently working on a port of the cadvisor collection logic into the new |
For reference, I am actually solving the problem temporarily by running both agents in parallel and using DD_AC_INCLUDE and DD_AC_EXCLUDE to get the new agent to ignore most containers. We're not running a 'legacy' version of k8s as far as what that PR is talking about - it's 1.8. I'm not really concerned about what particular units are being used, more that the data is actually correct. As I said above, the new prometheus-based check just seems to be wrong a lot of the time. I'm nervous about having to use a 'legacy' check as the solution to get correct data. Are there any tests that compare the data produced by the cadvisor check and the data produced by the prometheus check? |
Output of the info page (if this is a bug)
Describe what happened:
I have a daemonset running datadog agents in a kubernetes 1.8 cluster. After adding a new daemonset with a v6 agent and deleting the v5 agent daemonset, the kubernetes.cpu metrics have changed units. kubernetes.cpu.limits and kubernetes.cpu.requests is now 1/1000 of what it was before. kubernetes.cpu.usage.total is now a much smaller number. It usually seems approximately 1/1000 of what it was before, but not quite right.
Describe what you expected:
The cpu metrics should be unchanged.
Steps to reproduce the issue:
Upgrade from dd-agent 5 to datadog-agent 6.
Additional environment details (Operating System, Cloud provider, etc):
Kubernetes 1.8, datadog-agent jmx docker image, GKE
The text was updated successfully, but these errors were encountered: