KAFKA-17472: Speed Up DescribeConsumerGroupTest #17117

m1a2st · 2024-09-06T13:04:51Z

Jira: https://issues.apache.org/jira/browse/KAFKA-17472
The DescribeConsumerGroupTest is slow, we should add some config to speed up this test

Committer Checklist (excluded from commit message)

Verify design and implementation
Verify test coverage and CI build status
Verify documentation (including upgrade notes)

m1a2st · 2024-09-06T13:22:45Z

If add the group.initial.rebalance.delay.ms to 1000

test name	type	time(Before)	time(After)
DescribeConsumerGroupTest		1m 3s	53s 989ms
testDescribeWithMultiPartitionTopicAndMultipleConsumers	[1] Type=Raft-Combined, consumerGroupCoordinator,MetadataVersion=4.0-IV2,Security=PLAINTEXT	55s 928ms	39s 888ms
testDescribeWithMultiPartitionTopicAndMultipleConsumers	[2] Type=ZK, classicGroupCoordinator,MetadataVersion=4.0-IV2,Security=PLAINTEXT	7s 7ms	14s 81ms

m1a2st · 2024-09-06T13:35:05Z

If add the max.poll.interval.ms to 1000 and group.initial.rebalance.delay.ms to 1000

test name	type	time(Before)	time(After)
DescribeConsumerGroupTest		53s 989ms	33s 567ms
testDescribeWithMultiPartitionTopicAndMultipleConsumers	[1] Type=Raft-Combined, consumerGroupCoordinator,MetadataVersion=4.0-IV2,Security=PLAINTEXT	39s 888ms	24s 899ms
testDescribeWithMultiPartitionTopicAndMultipleConsumers	[2] Type=ZK, classicGroupCoordinator,MetadataVersion=4.0-IV2,Security=PLAINTEXT	14s 81ms	8s 666ms

chia7712 · 2024-09-08T18:51:58Z

@m1a2st could you please merge trunk to run CI again?

m1a2st · 2024-09-09T09:00:30Z

This testDescribeWithMultiPartitionTopicAndMultipleConsumers is not stable for 30 seconds in my local after rebased trunk, I should think another way to speed up it.

chia7712 · 2024-09-09T12:10:21Z

tools/src/test/java/org/apache/kafka/tools/consumer/group/DescribeConsumerGroupTest.java

@@ -987,6 +987,7 @@ private Map<String, Object> composeConfigs(String groupId, String groupProtocol,
        configs.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
        configs.put(ConsumerConfig.GROUP_PROTOCOL_CONFIG, groupProtocol);
        configs.put(ConsumerConfig.PARTITION_ASSIGNMENT_STRATEGY_CONFIG, RangeAssignor.class.getName());
+        configs.put(ConsumerConfig.MAX_POLL_INTERVAL_MS_CONFIG, "1000");


Pardon me, not sure why we need to add this config

Also, it seems to me group.consumer.heartbeat.interval.ms is the true critical config, as they needs some HB cycles to reach the stable group.

Pardon me, not sure why we need to add this config

The document write that

A new config, group.initial.rebalance.delay.ms, was introduced. This config specifies the time, in milliseconds, that the GroupCoordinator will delay the initial consumer rebalance. The rebalance will be further delayed by the value of group.initial.rebalance.delay.ms as new members join the group, up to a maximum of max.poll.interval.ms. The default value for this is 3 seconds. During development and testing it might be desirable to set this to 0 in order to not delay test execution time.

This config will work in test, thus I think that it is useful.

The config I didn't catch is group.initial.rebalance.delay.ms rather than max.poll.interval.ms. max.poll.interval.ms the the upper limit of "timeout" and in testing we don't expect to see timeout, so I can't understand the purpose of reducing the value.

You are right, the test should not expect timeout, I misunderstanding the max.poll.interval.ms config

m1a2st · 2024-09-09T14:38:48Z

test name	type	time(Before)	time(After)
DescribeConsumerGroupTest		1m 18m	33s 663ms
testDescribeWithMultiPartitionTopicAndMultipleConsumers	[1] Type=Raft-Combined, consumerGroupCoordinator,MetadataVersion=4.0-IV2,Security=PLAINTEXT	1m 1s	20s 201ms
testDescribeWithMultiPartitionTopicAndMultipleConsumers	[2] Type=ZK, classicGroupCoordinator,MetadataVersion=4.0-IV2,Security=PLAINTEXT	16s 497ms	13s 462ms

chenyulin0719 · 2024-09-11T03:40:53Z

tools/src/test/java/org/apache/kafka/tools/consumer/group/ConsumerGroupCommandTestUtils.java

@@ -85,6 +88,9 @@ static List<ClusterConfig> forKRaftGroupCoordinator() {
        Map<String, String> serverProperties = new HashMap<>();
        serverProperties.put(OFFSETS_TOPIC_PARTITIONS_CONFIG, "1");
        serverProperties.put(OFFSETS_TOPIC_REPLICATION_FACTOR_CONFIG, "1");
+        serverProperties.put(GROUP_INITIAL_REBALANCE_DELAY_MS_CONFIG, "1000");


Hi @m1a2st, one question, based on the document of group.initial.rebalance.delay.ms. (Link)

The default value for this is 3 seconds. During development and testing it might be desirable to set this to 0 in order to not delay test execution time.

I think setting group.initial.rebalance.delay.ms to zero might be better. Is there have any reason we use 1000 ms?
I can see some tests are using 1000ms, but some others are using 0ms.

Hi @chenyulin0719, You can see this discussion, we set the group.initial.rebalance.delay.ms from 3000 to 1000 to not only keep the initial rebalance delay but also speed up the test.

I see. Comparing to set group.initial.rebalance.delay.ms to 0 ms, adding some delay could speed up the metadata synchronization in Kraft mode. Let's a suprising finding to me.

Thanks for sharing the discussion.

I think setting group.initial.rebalance.delay.ms to zero might be better.

That is good, but we want to have delay in testing. That can "reduce" the difference between test and production. Also, 1.5s is good enough in speedup.

chia7712 · 2024-09-15T18:44:29Z

@m1a2st please don't use "force push" again. My previous comment about using CONSUMER_GROUP_HEARTBEAT_INTERVAL_MS_CONFIG is removed due to force push. I can't share the comment to @chenyulin0719 ...

chia7712 · 2024-09-15T18:49:51Z

please don't use "force push" again. My previous comment about using CONSUMER_GROUP_HEARTBEAT_INTERVAL_MS_CONFIG is removed due to force push.

sorry that please ignore this comment. I looked at incorrect PR 😢

Reviewers: Chia-Ping Tsai <[email protected]>

m1a2st force-pushed the gh-KAFKA-17472 branch from 639a6bc to 50464cd Compare September 6, 2024 13:23

m1a2st marked this pull request as ready for review September 6, 2024 13:32

chia7712 approved these changes Sep 8, 2024

View reviewed changes

m1a2st added 2 commits September 9, 2024 08:10

add GroupCoordinatorConfig in ConsumerGroupCommandTestUtils

0829390

add MAX_POLL_INTERVAL_MS_CONFIG in DescribeConsumerGroupTest

17bf18d

m1a2st force-pushed the gh-KAFKA-17472 branch from d9aca00 to 17bf18d Compare September 9, 2024 00:11

chia7712 reviewed Sep 9, 2024

View reviewed changes

m1a2st added 2 commits September 9, 2024 21:22

add new config for broker

8b18308

remove MAX_POLL_INTERVAL_MS_CONFIG in consumer

4268f6c

chenyulin0719 reviewed Sep 11, 2024

View reviewed changes

chia7712 merged commit 95b734d into apache:trunk Sep 17, 2024
6 of 7 checks passed

chia7712 mentioned this pull request Sep 17, 2024

KAFKA-16813: Add global timeout for @ClusterTemplate, @ClusterTest and @ClusterTests #16957

Merged

3 tasks

tedyu pushed a commit to tedyu/kafka that referenced this pull request Jan 6, 2025

KAFKA-17472 Speed Up DescribeConsumerGroupTest (apache#17117)

ccca87b

Reviewers: Chia-Ping Tsai <[email protected]>

KAFKA-17472: Speed Up DescribeConsumerGroupTest #17117

KAFKA-17472: Speed Up DescribeConsumerGroupTest #17117

Uh oh!

Conversation

m1a2st commented Sep 6, 2024

Committer Checklist (excluded from commit message)

Uh oh!

m1a2st commented Sep 6, 2024

Uh oh!

m1a2st commented Sep 6, 2024

Uh oh!

chia7712 commented Sep 8, 2024

Uh oh!

m1a2st commented Sep 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chia7712 Sep 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chia7712 Sep 9, 2024

Choose a reason for hiding this comment

Uh oh!

m1a2st Sep 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chia7712 Sep 9, 2024

Choose a reason for hiding this comment

Uh oh!

m1a2st Sep 9, 2024

Choose a reason for hiding this comment

Uh oh!

m1a2st commented Sep 9, 2024

Uh oh!

chenyulin0719 Sep 11, 2024

Choose a reason for hiding this comment

Uh oh!

m1a2st Sep 11, 2024

Choose a reason for hiding this comment

Uh oh!

chenyulin0719 Sep 11, 2024

Choose a reason for hiding this comment

Uh oh!

chia7712 Sep 15, 2024

Choose a reason for hiding this comment

Uh oh!

chia7712 commented Sep 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chia7712 commented Sep 15, 2024

Uh oh!

Uh oh!

Uh oh!

m1a2st commented Sep 9, 2024 •

edited

Loading

chia7712 Sep 9, 2024 •

edited

Loading

m1a2st Sep 9, 2024 •

edited

Loading

chia7712 commented Sep 15, 2024 •

edited

Loading