Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KAFKA-17472: Speed Up DescribeConsumerGroupTest #17117

Merged
merged 4 commits into from
Sep 17, 2024

Conversation

m1a2st
Copy link
Contributor

@m1a2st m1a2st commented Sep 6, 2024

Jira: https://issues.apache.org/jira/browse/KAFKA-17472
The DescribeConsumerGroupTest is slow, we should add some config to speed up this test

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

@m1a2st
Copy link
Contributor Author

m1a2st commented Sep 6, 2024

If add the group.initial.rebalance.delay.ms to 1000

test name type time(Before) time(After)
DescribeConsumerGroupTest 1m 3s 53s 989ms
testDescribeWithMultiPartitionTopicAndMultipleConsumers [1] Type=Raft-Combined, consumerGroupCoordinator,MetadataVersion=4.0-IV2,Security=PLAINTEXT 55s 928ms 39s 888ms
testDescribeWithMultiPartitionTopicAndMultipleConsumers [2] Type=ZK, classicGroupCoordinator,MetadataVersion=4.0-IV2,Security=PLAINTEXT 7s 7ms 14s 81ms

@m1a2st m1a2st marked this pull request as ready for review September 6, 2024 13:32
@m1a2st
Copy link
Contributor Author

m1a2st commented Sep 6, 2024

If add the max.poll.interval.ms to 1000 and group.initial.rebalance.delay.ms to 1000

test name type time(Before) time(After)
DescribeConsumerGroupTest 53s 989ms 33s 567ms
testDescribeWithMultiPartitionTopicAndMultipleConsumers [1] Type=Raft-Combined, consumerGroupCoordinator,MetadataVersion=4.0-IV2,Security=PLAINTEXT 39s 888ms 24s 899ms
testDescribeWithMultiPartitionTopicAndMultipleConsumers [2] Type=ZK, classicGroupCoordinator,MetadataVersion=4.0-IV2,Security=PLAINTEXT 14s 81ms 8s 666ms

@chia7712
Copy link
Member

chia7712 commented Sep 8, 2024

@m1a2st could you please merge trunk to run CI again?

@m1a2st
Copy link
Contributor Author

m1a2st commented Sep 9, 2024

This testDescribeWithMultiPartitionTopicAndMultipleConsumers is not stable for 30 seconds in my local after rebased trunk, I should think another way to speed up it.

@@ -987,6 +987,7 @@ private Map<String, Object> composeConfigs(String groupId, String groupProtocol,
configs.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
configs.put(ConsumerConfig.GROUP_PROTOCOL_CONFIG, groupProtocol);
configs.put(ConsumerConfig.PARTITION_ASSIGNMENT_STRATEGY_CONFIG, RangeAssignor.class.getName());
configs.put(ConsumerConfig.MAX_POLL_INTERVAL_MS_CONFIG, "1000");
Copy link
Member

@chia7712 chia7712 Sep 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pardon me, not sure why we need to add this config

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, it seems to me group.consumer.heartbeat.interval.ms is the true critical config, as they needs some HB cycles to reach the stable group.

Copy link
Contributor Author

@m1a2st m1a2st Sep 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pardon me, not sure why we need to add this config

The document write that

A new config, group.initial.rebalance.delay.ms, was introduced. This config specifies the time, in milliseconds, that the GroupCoordinator will delay the initial consumer rebalance. The rebalance will be further delayed by the value of group.initial.rebalance.delay.ms as new members join the group, up to a maximum of max.poll.interval.ms. The default value for this is 3 seconds. During development and testing it might be desirable to set this to 0 in order to not delay test execution time.

This config will work in test, thus I think that it is useful.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The config I didn't catch is group.initial.rebalance.delay.ms rather than max.poll.interval.ms. max.poll.interval.ms the the upper limit of "timeout" and in testing we don't expect to see timeout, so I can't understand the purpose of reducing the value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, the test should not expect timeout, I misunderstanding the max.poll.interval.ms config

@m1a2st
Copy link
Contributor Author

m1a2st commented Sep 9, 2024

test name type time(Before) time(After)
DescribeConsumerGroupTest 1m 18m 33s 663ms
testDescribeWithMultiPartitionTopicAndMultipleConsumers [1] Type=Raft-Combined, consumerGroupCoordinator,MetadataVersion=4.0-IV2,Security=PLAINTEXT 1m 1s 20s 201ms
testDescribeWithMultiPartitionTopicAndMultipleConsumers [2] Type=ZK, classicGroupCoordinator,MetadataVersion=4.0-IV2,Security=PLAINTEXT 16s 497ms 13s 462ms

@@ -85,6 +88,9 @@ static List<ClusterConfig> forKRaftGroupCoordinator() {
Map<String, String> serverProperties = new HashMap<>();
serverProperties.put(OFFSETS_TOPIC_PARTITIONS_CONFIG, "1");
serverProperties.put(OFFSETS_TOPIC_REPLICATION_FACTOR_CONFIG, "1");
serverProperties.put(GROUP_INITIAL_REBALANCE_DELAY_MS_CONFIG, "1000");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @m1a2st, one question, based on the document of group.initial.rebalance.delay.ms. (Link)

The default value for this is 3 seconds. During development and testing it might be desirable to set this to 0 in order to not delay test execution time.

I think setting group.initial.rebalance.delay.ms to zero might be better. Is there have any reason we use 1000 ms?
I can see some tests are using 1000ms, but some others are using 0ms.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @chenyulin0719, You can see this discussion, we set the group.initial.rebalance.delay.ms from 3000 to 1000 to not only keep the initial rebalance delay but also speed up the test.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Comparing to set group.initial.rebalance.delay.ms to 0 ms, adding some delay could speed up the metadata synchronization in Kraft mode. Let's a suprising finding to me.

Thanks for sharing the discussion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think setting group.initial.rebalance.delay.ms to zero might be better.

That is good, but we want to have delay in testing. That can "reduce" the difference between test and production. Also, 1.5s is good enough in speedup.

@chia7712
Copy link
Member

chia7712 commented Sep 15, 2024

@m1a2st please don't use "force push" again. My previous comment about using CONSUMER_GROUP_HEARTBEAT_INTERVAL_MS_CONFIG is removed due to force push. I can't share the comment to @chenyulin0719 ...

@chia7712
Copy link
Member

please don't use "force push" again. My previous comment about using CONSUMER_GROUP_HEARTBEAT_INTERVAL_MS_CONFIG is removed due to force push.

sorry that please ignore this comment. I looked at incorrect PR 😢

@chia7712 chia7712 merged commit 95b734d into apache:trunk Sep 17, 2024
6 of 7 checks passed
tedyu pushed a commit to tedyu/kafka that referenced this pull request Jan 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants