-
Notifications
You must be signed in to change notification settings - Fork 14.7k
KAFKA-19829: Implement group-level initial rebalance delay #20755
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: trunk
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces a new group-level configuration for streams groups to delay the initial rebalance. This helps prevent the first joining member from being assigned all active tasks and then slowly revoking them when additional members join, which was causing performance issues particularly with slow or overloaded members.
Key changes:
- Added
STREAMS_GROUP_INITIAL_REBALANCE_DELAY_MS_CONFIGconfiguration with a default value of 3000ms at both the group coordinator and group config levels - Implemented accessor methods to retrieve the initial rebalance delay setting
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| GroupCoordinatorConfig.java | Added broker-level configuration constant, field, initialization, and accessor for streams group initial rebalance delay |
| GroupConfig.java | Added group-level configuration constant, field, initialization, and accessor for streams initial rebalance delay |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good to me as a first step, but I am a bit uneasy introducing a config that is not used at all, not even in tests. Maybe we should just go ahead and extend this PR to a working subset of the feature.
group-coordinator/src/main/java/org/apache/kafka/coordinator/group/GroupCoordinatorConfig.java
Show resolved
Hide resolved
| public static final String STREAMS_GROUP_MAX_STANDBY_REPLICAS_DOC = "The maximum allowed value for the group-level configuration of " + GroupConfig.STREAMS_NUM_STANDBY_REPLICAS_CONFIG; | ||
|
|
||
| public static final String STREAMS_GROUP_INITIAL_REBALANCE_DELAY_MS_CONFIG = "group.streams.initial.rebalance.delay.ms"; | ||
| public static final int STREAMS_GROUP_INITIAL_REBALANCE_DELAY_MS_DEFAULT = 3000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mjsax Are we good with a default of 3 seconds for delaying the initial rebalance?
|
@lucasbru Hi Lucas, I'm not going to merge it now, I'll add more things to it. Just want you to take a quick look to make sure I'm on the right direction! |
| * | ||
| * @return An empty result. | ||
| */ | ||
| private CoordinatorResult<Void, CoordinatorRecord> fireStreamsInitialRebalance(String groupId) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is not doing anything, I think we can inline this.
|
|
||
| // Actually bump the group epoch | ||
| int groupEpoch = group.groupEpoch(); | ||
| boolean isInitialRebalance = group.isEmpty(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Group being empty does not mean initial rebalance right? It could be that the group became empty again?
| if (bumpGroupEpoch) { | ||
| groupEpoch += 1; | ||
| if (isInitialRebalance) { | ||
| groupEpoch = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the group becomes empty at epoch 9, we are going back in time here to epoch 2, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this makes sense to me (except for the weirdness around initial epoch that I commented on), but I am wondering now if we could indeed compute a new target assignment when the timer triggers. This could be especially useful when we have offloaded assignments (in the future) and we'd otherwise even not get an assignment on the next heartbeat.
Maybe @squah-confluent could have a quick look how this would interact with the offloaded assignment code that he is going to implement in AK.
I think inside fireStreamsInitialRebalance, we could, if group.epoch() > group.assignmentEpoch(), just create the targetAssignmentbuilder for the current group ID. You can get all the information about the group from the groups map instance variable.
I can think of one corner case: If configuredTopology is not defined, we should just skip computing the target assignment. It will be computed on the next heartbeat.
|
Let's make sure we do not forget to update the docs, for the new config. |
During testing, an artifact of the new rebalance protocol showed up. In some cases, the first joining member gets all active tasks assigned, and is slow to revoke the tasks after more member has joined the group. This affects in particular cases where the first member is slow (possibly overloaded in the case of cloudlimits benchmarks) and there are a lot of tasks to be assigned.
To help with this situation, we want to introduce a new group-specific
configuration to delay the initial rebalance.