KAFKA-19606: Fix anomaly of JMX metrics RequestHandlerAvgIdlePercent in kraft combined mode #20481

0xffff-zhiyan · 2025-09-04T20:36:31Z

This PR implements KIP-1207

https://issues.apache.org/jira/browse/KAFKA-19606

This PR implements a global shared thread counter mechanism to properly calculate the RequestHandlerAvgIdlePercent metric across all KafkaRequestHandlerPool instances within the same JVM process in Kraft combined mode. This ensures accurate idle percentage calculations, especially in combined KRaft mode where both broker and controller request handler pools coexist.

Previously, each KafkaRequestHandlerPool calculated idle percentages independently using only its own thread count as the denominator. In combined KRaft mode, this led to:

Inaccurate aggregate idle percentage calculations
Potential metric values exceeding 100% (values > 1.0)

Core Changes

Global Thread Counter: Added sharedAggregateTotalThreads as a global AtomicInteger in KafkaRequestHandlerPool
Modified KafkaRequestHandler to calculate two idle metrics:
Per-pool metric: Uses local thread count (totalHandlerThreads.get)
Aggregate metric: Uses global thread count (sharedAggregateTotalThreads.get)

Test
Added perPoolIdleMeter parameter to all KafkaRequestHandler instantiations
Added global counter initialization: KafkaRequestHandlerPool.sharedAggregateTotalThreads.set(1) in test class setup
Added new unit test verifies:
1.Global counter accumulation across multiple pools
2.Proper idle percentage calculation within [0, 1.05] range
3.Counter cleanup after pool shutdown

POC locally(in kraft combined mode):

kevin-wu24

Thanks for the changes @0xffff-zhiyan. Left a review of the code changes.

kevin-wu24 · 2025-09-08T16:03:18Z

core/src/main/scala/kafka/server/KafkaRequestHandler.scala

  time: Time,
-  nodeName: String = "broker"
+  nodeName: String = "broker",
+  val perPoolIdleMeter: Meter,


Can we group this with aggregateIdleMeter in the class header?

make sense. fixed

kevin-wu24 · 2025-09-08T16:14:46Z

core/src/main/scala/kafka/server/KafkaRequestHandler.scala

 }

+object KafkaRequestHandlerPool {
+  val sharedAggregateTotalThreads = new AtomicInteger(0)


sharedAggregateTotalThreads is redundant. We can just name this totalThreads or aggregateThreads.

renamed it to aggregateThreads

kevin-wu24 · 2025-09-08T16:27:02Z

core/src/main/scala/kafka/server/KafkaRequestHandler.scala

  this.logIdent = s"[data-plane Kafka Request Handler on ${nodeName.capitalize} $brokerId] "
  val runnables = new mutable.ArrayBuffer[KafkaRequestHandler](numThreads)
+  // when using shared aggregate counter, register this pool's threads
+  sharedAggregateTotalThreads.addAndGet(numThreads)


Lets move this into the synchronized method createHandler and call incrementAndGet when each thread is created.

kevin-wu24

Thanks for the changes @0xffff-zhiyan. Left some comments:

kevin-wu24 · 2025-10-01T14:23:35Z

core/src/main/scala/kafka/server/KafkaRequestHandler.scala

  apis: ApiRequestHandler,
  time: Time,
-  nodeName: String = "broker"
+  nodeName: String = "broker",


This comma is not necessary.

kevin-wu24 · 2025-10-01T14:29:17Z

core/src/main/scala/kafka/server/KafkaRequestHandler.scala

  val apis: ApiRequestHandler,
  time: Time,
  numThreads: Int,
  requestHandlerAvgIdleMetricName: String,


Can we clean up this from the constructor? Its usages always resolve to RequestHandlerAvgIdlePercent. Let's make this an object level constant, since its value comes from aggregateThreads.

kevin-wu24 · 2025-10-01T14:29:46Z

core/src/main/scala/kafka/server/KafkaRequestHandler.scala

  val threadPoolSize: AtomicInteger = new AtomicInteger(numThreads)
-  /* a meter to track the average free capacity of the request handlers */
+  /* Per-pool idle meter (broker-only or controller-only) */
+  private val perPoolIdleMeterName = nodeName + "RequestHandlerAvgIdlePercent"


We need to capitalize nodeName to match metrics naming convention, which is camel case.

kevin-wu24 · 2025-10-01T14:32:43Z

core/src/test/scala/kafka/server/KafkaRequestHandlerTest.scala

  val topic2 = "topic2"
  val brokerTopicMetrics: BrokerTopicMetrics = brokerTopicStats.topicStats(topic)
  val allTopicMetrics: BrokerTopicMetrics = brokerTopicStats.allTopicsStats
+  KafkaRequestHandlerPool.sharedAggregateTotalThreads.set(1)


Why do we do a set here? We explicitly set this value to 0 in testGlobalSharedThreadCounter().

Because the aggregateThreads was updated by KafkaRequestHandlerPool but in this test we don't create the pool so aggregateThreads is default to 0. When the RequestHandler runs aggregateIdleMeter.mark(idleTime / aggregateThreads.get) we will see an error because the divisor is 0.

kevin-wu24 · 2025-10-01T14:39:38Z

core/src/test/scala/kafka/server/KafkaRequestHandlerTest.scala

+      while (System.currentTimeMillis() < deadline && value == 0.0) {
+        Thread.sleep(200)
+        value = brokerAggregateMeter.oneMinuteRate()
+      }
+      // Verify that the aggregate meter shows reasonable idle percentage
+      // Since both pools are hitting the same global counter (8 threads), the rate should be normalized
+      assertTrue(value >= 0.0 && value <= 1.05, s"aggregate idle percent should be within [0,1], got $value")


I still think having this as assertTrue(value >= 0.0 && value <= 1.05) is wrong. The value for both of these meters should never be greater than 1. If you set this to value <= 1 is the test failing? And with what value?

I have a suspicion that the while check on L752 is problematic, and the marking logic of meters is not accurate given the small amount of time we are measuring for.

Yes I agree. I think the sleep(200) is the reason why I observed the value sightly greater than 1(like 1.00xx, only seen once). The test samples oneMinuteRate() every 200ms, but the algorithm(called EWMA) needs some time to properly digest new data points. During idle periods, when large idle time values are suddenly recorded, the algorithm goes through a brief overshoot phase before stabilizing. A longer sleep interval (1-2 seconds) would likely eliminate this issue.

kevin-wu24 · 2025-10-01T14:40:13Z

core/src/test/scala/kafka/server/KafkaRequestHandlerTest.scala

+
+    try {
+      // Get the aggregate meter from broker pool using reflection
+      val aggregateMeterField = classOf[KafkaRequestHandlerPool].getDeclaredField("aggregateIdleMeter")


Can we also test the meter value for the perPoolIdle metrics? Those should also always be between [0,1] too.

kevin-wu24 · 2025-10-01T14:44:04Z

core/src/main/scala/kafka/server/KafkaRequestHandler.scala

      val idleTime = endTime - startSelectTime
-      aggregateIdleMeter.mark(idleTime / totalHandlerThreads.get)
+      // Per-pool idle ratio uses the pool's own thread count as denominator
+      perPoolIdleMeter.mark(idleTime / totalHandlerThreads.get)


Can we rename totalHandlerThreads to something more appropriate? Maybe poolHandlerThreads?

kevin-wu24 · 2025-10-01T14:44:18Z

core/src/test/scala/kafka/server/KafkaRequestHandlerTest.scala

+      // Verify that the aggregate meter shows reasonable idle percentage
+      // Since both pools are hitting the same global counter (8 threads), the rate should be normalized
+      assertTrue(value >= 0.0 && value <= 1.05, s"aggregate idle percent should be within [0,1], got $value")
+


In this test, can we also resize the pools? For example, lets shrink on and expand the other and check the denominator values of what the meters WOULD be marking (i.e. aggregateThreads.get and totalHandlerThreads.get).

kevin-wu24

Thanks for the changes @0xffff-zhiyan. Left a few more comments regarding the tests.

kevin-wu24 · 2025-10-06T18:59:27Z

core/src/test/scala/kafka/server/KafkaRequestHandlerTest.scala

  }
+
+  @Test
+  def testGlobalSharedThreadCounter(): Unit = {


Can we rename this test to something more appropriate? Maybe testRequestThreadMetrics?

kevin-wu24 · 2025-10-06T19:11:26Z

core/src/test/scala/kafka/server/KafkaRequestHandlerTest.scala

+        brokerPerPoolValue = brokerPerPoolIdleMeter.oneMinuteRate()
+        controllerPerPoolValue = controllerPerPoolIdleMeter.oneMinuteRate()
+      }
+      print(s"Aggregate: $aggregateValue, Broker PerPool: $brokerPerPoolValue, Controller PerPool: $controllerPerPoolValue")


Can we remove this print statement?

kevin-wu24 · 2025-10-06T19:17:17Z

core/src/test/scala/kafka/server/KafkaRequestHandlerTest.scala

+      while (System.currentTimeMillis() < deadline && (aggregateValue == 0.0 || brokerPerPoolValue == 0.0 || controllerPerPoolValue == 0.0)) {
+        Thread.sleep(2000)
+        aggregateValue = aggregateMeter.oneMinuteRate()
+        brokerPerPoolValue = brokerPerPoolIdleMeter.oneMinuteRate()
+        controllerPerPoolValue = controllerPerPoolIdleMeter.oneMinuteRate()
+      }


This loop is confusing. The predicate you have exits once all the ...Value variables have been set to a non-zero value, so the while condition never evaluates to true again. I think it is sufficient just to sleep for some time, and then assign the aggregateValue, brokerPerPoolValue, etc. using oneMinuteRate().

Yeah, make sense. Fixed

kevin-wu24 · 2025-10-06T19:18:32Z

core/src/test/scala/kafka/server/KafkaRequestHandlerTest.scala

+      Thread.sleep(1000)
+
+    } finally {
+      controllerPool.shutdown()


Can we check the threadPoolSize + aggregateThreads after each pool is shutdown?

shutdown() will not change the threadPoolSize of the pool. Do we need to add such logic?

Apologies, lets just check aggregateThreads.

kevin-wu24 · 2025-10-06T19:53:28Z

core/src/test/scala/kafka/server/KafkaRequestHandlerTest.scala

+      assertEquals(2, brokerPool.threadPoolSize.get)
+      assertEquals(6, controllerPool.threadPoolSize.get)
+      assertEquals(8, KafkaRequestHandlerPool.aggregateThreads.get)
+      Thread.sleep(1000)


I think we can remove this Thread.sleep call too right?

Actually we can't. Becasue when resizeThreadPool() shrinks the pool, it only sets a stopped flag on removed threads but they're still running. If shutdown() is called immediately, it can deadlock because the removed threads (still executing receiveRequest()) may hold locks or resources that the remaining threads need to shutdown cleanly. Thread.sleep(1000) gives removed threads time to fully exit before shutdown begins.

Can we describe why this is necessary in a comment? Thanks for the info!

When we resize the pool then shut down the pool immediately, the process might get stuck. I tested it several times locally.
I'm not sure if it's a bug or not but It can happen without my changes.

Yeah, I understand. I'm saying can u add a comment in the code describing the phenomenon mentioned here so other readers of the code can understand why you added the sleep call.

kevin-wu24

Changes LGTM. @jsancio for committer review.

jsancio

Thanks for the feature @0xffff-zhiyan . Here are some of my comments so far.

jsancio · 2025-10-16T15:00:25Z

core/src/test/scala/kafka/server/KafkaRequestHandlerTest.scala

+      var brokerPerPoolValue = 0.0
+      var controllerPerPoolValue = 0.0
+
+      Thread.sleep(2000)


Please do not called Thread.sleep in tests. This slows down tests, software development and can make tests unreliable. Kafka has a mocked Time that you can use finely control the time returned to objects that rely on the Time.

This means that instead of using val time = Time.SYSTEM, you can use val time = new MockTime().

Please remove all calls to Thread.sleep() in this PR.

jsancio · 2025-10-16T16:07:50Z

core/src/main/scala/kafka/server/KafkaRequestHandler.scala

+  val aggregateThreads = new AtomicInteger(0)
+  val requestHandlerAvgIdleMetricName = "RequestHandlerAvgIdlePercent"


In Scala "constants" at object use upper camel case by convention. E.g. RequestHandlerAvgIdleMetricName.

jsancio · 2025-10-16T16:09:27Z

core/src/main/scala/kafka/server/KafkaRequestHandler.scala

+  val perPoolIdleMeter: Meter,
+  val poolHandlerThreads: AtomicInteger,


Let's use similar naming conventions. E.g. poolIdleMeter and poolHandlerThreads.

jsancio · 2025-10-16T16:12:14Z

core/src/main/scala/kafka/server/KafkaRequestHandler.scala

+      // Per-pool idle ratio uses the pool's own thread count as denominator
+      perPoolIdleMeter.mark(idleTime / poolHandlerThreads.get)
+      // Aggregate idle ratio uses the total threads across all pools as denominator
+      aggregateIdleMeter.mark(idleTime / aggregateThreads.get)


It is an inconsistent design that the aggregateIdleMeter is defined in the constructor while the aggregateThreads is defined in an object. Why is that? Why not pass the aggregateThreads through the constructor?

The aggregateThreads must be defined in a singleton object because all pool instances need to increment/decrement the exact same AtomicInteger instance, whereas aggregateIdleMeter can be instance-specific because each pool creates its own Meter that reports to the same metric name, and the metrics registry automatically aggregates data from meters with identical names.

passing aggregateThreads through the constructor would require manually ensuring all pools receive the same reference, while the singleton pattern guarantees this by design.

aggregateThreads is a shared mutable counter that all pool instances must modify together while aggregateIdleMeter is an independent reporter in each pool instance that happens to report to the same metric name. So they have different purposes.

jsancio · 2025-10-16T16:26:06Z

core/src/main/scala/kafka/server/KafkaRequestHandler.scala

+      time,
+      nodeName,
+    )
+    aggregateThreads.getAndIncrement()


It is an inconsistent design that the aggregate threads are incremented through createHandler but they are decremented through resizeThreadPool. You should be able to remove this inconsistency by fixing the constructor and calling resizeThreadPool in the constructor.

You can also enforce this by making createHandler private and removing the synchronized. You can also add a private method (internalResizeThreadPool) that doesn't synchronized which is call by the constructor and resizeThreadPool which does synchronized.

I think we could add a deleteHandler()method where we remove handler and decrement the aggregate threads, and then resizeThreadPool() would simply call createHandler() and deleteHandler() as needed.

jsancio · 2025-10-16T16:28:30Z

core/src/test/scala/kafka/server/KafkaRequestHandlerTest.scala

  val topic2 = "topic2"
  val brokerTopicMetrics: BrokerTopicMetrics = brokerTopicStats.topicStats(topic)
  val allTopicMetrics: BrokerTopicMetrics = brokerTopicStats.allTopicsStats
+  KafkaRequestHandlerPool.aggregateThreads.set(1)


This code smell is a good hit that something is not right with the design and implementation.

Because usually KafkaRequestHandler instances are created through the pool, and the pool's creation process increments aggregateThreads, but in our unit test the KafkaRequestHandler is created independently, so the aggregateThreads value remains 0, which is why we set it to 1.

0xffff-zhiyan added 2 commits September 4, 2025 15:18

add metrics Broker/ControllerRequestHandlerAvgIdlePercent

716cac2

fix

191f3bc

github-actions bot added triage PRs from the community core Kafka Broker labels Sep 4, 2025

0xffff-zhiyan changed the title ~~KIP-1207: Fix anomaly of JMX metrics RequestHandlerAvgIdlePercent in kraft combined mode~~ KAFKA-19606: Fix anomaly of JMX metrics RequestHandlerAvgIdlePercent in kraft combined mode Sep 8, 2025

kevin-wu24 reviewed Sep 8, 2025

View reviewed changes

github-actions bot removed the triage PRs from the community label Sep 9, 2025

0xffff-zhiyan added 2 commits September 19, 2025 15:02

Merge branch 'trunk' into KIP-1207

4bca99e

fix

034c498

0xffff-zhiyan requested a review from kevin-wu24 September 30, 2025 15:37

kevin-wu24 reviewed Oct 1, 2025

View reviewed changes

0xffff-zhiyan added 5 commits October 2, 2025 12:12

fix

53098db

fix

70eb3a3

fix

5db6b53

fix

90655da

fix

46e5e7e

0xffff-zhiyan requested a review from kevin-wu24 October 6, 2025 18:47

kevin-wu24 reviewed Oct 6, 2025

View reviewed changes

0xffff-zhiyan added 2 commits October 6, 2025 14:28

fix

b6b7102

fix

73f0257

kevin-wu24 reviewed Oct 6, 2025

View reviewed changes

0xffff-zhiyan added 2 commits October 6, 2025 16:03

fix

c46ce30

fix

509a0d2

kevin-wu24 approved these changes Oct 7, 2025

View reviewed changes

jsancio reviewed Oct 16, 2025

View reviewed changes

jsancio added the ci-approved label Oct 16, 2025

fix

f1d7638

		val aggregateThreads = new AtomicInteger(0)
		val requestHandlerAvgIdleMetricName = "RequestHandlerAvgIdlePercent"

		val perPoolIdleMeter: Meter,
		val poolHandlerThreads: AtomicInteger,

KAFKA-19606: Fix anomaly of JMX metrics RequestHandlerAvgIdlePercent in kraft combined mode #20481

Are you sure you want to change the base?

KAFKA-19606: Fix anomaly of JMX metrics RequestHandlerAvgIdlePercent in kraft combined mode #20481

Conversation

0xffff-zhiyan commented Sep 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kevin-wu24 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kevin-wu24 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kevin-wu24 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kevin-wu24 left a comment

Choose a reason for hiding this comment

Uh oh!

jsancio left a comment

Choose a reason for hiding this comment

Uh oh!

0xffff-zhiyan commented Sep 4, 2025 •

edited

Loading