-
Notifications
You must be signed in to change notification settings - Fork 608
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorrect Broker CPU utilization values on Kubernetes #1747
Comments
On second thought, the formula should still hold and provide correct values between the interval of [0.0, 1.0] so long as the behavior of In the example above, I'll continue investigating. In any case, we should still switch to use the new "container aware" Java methods exclusively. |
https://bugs.openjdk.java.net/browse/JDK-8269851 changed the behaviour making getProcessCpuLoad container aware as well |
Thanks @amuraru that is exactly what I was looking for,
I can't think of many situations where someone would want to use unpatched Java version so maybe it makes sense that we deprecate |
I agree - what we could do to improve this is to have cc provide an "official" docker image that can be consumed by downstream users that would take care of selecting the "right" jvm version as a base image |
A bug was introduced with the original work done for the broker CPU utilization reporting for Kubernetes mode [1] The formula used to calculate CPU utilization for a Kafka broker running in a container,
can produce values that exceed the allowed interval of [0.0, 1.0]. Note that this formula was intended to adjust the values from
((com.sun.management.OperatingSystemMXBean) ManagementFactory.getOperatingSystemMXBean()).getProcessCpuLoad()
for container boundaries. However, under certain conditions, the formula breaks and incorrectly produces values greater than 1.0.For example, for the following values:
These incorrect CPU utilization values cause Cruise Control to overestimate CPU usage and prevent Cruise Control from executing partition rebalances when the cluster has CPU resources available.
This ContainerMetricUtils class [2] was originally created as a substitute for the
getSystemCpuLoad()
[3] as we waited for the backports to makegetSystemCpuLoad()
"container aware" for OpenJDK 8, 11, 13, and 14. Since those backports have now long been completed, I suggest it is time we replace the ContainerMetricUtils class withgetSystemCpuLoad()
as we originally planned!Let me know what you think! If you agree, feel free to assign to me and I will send in a fix!
cc @amuraru
[1] #1277
[2]
cruise-control/cruise-control-metrics-reporter/src/main/java/com/linkedin/kafka/cruisecontrol/metricsreporter/metric/ContainerMetricUtils.java
Lines 90 to 108 in 4e5927b
[3] https://bugs.openjdk.java.net/browse/JDK-8226575
The text was updated successfully, but these errors were encountered: