Skip to content

Conversation

@RaidenE1
Copy link
Contributor

@RaidenE1 RaidenE1 commented Oct 23, 2025

During testing, an artifact of the new rebalance protocol showed up. In some cases, the first joining member gets all active tasks assigned, and is slow to revoke the tasks after more member has joined the group. This affects in particular cases where the first member is slow (possibly overloaded in the case of cloudlimits benchmarks) and there are a lot of tasks to be assigned.

To help with this situation, we want to introduce a new group-specific
configuration to delay the initial rebalance.

@github-actions github-actions bot added triage PRs from the community group-coordinator small Small PRs labels Oct 23, 2025
@lucasbru lucasbru requested review from Copilot and lucasbru October 23, 2025 07:48
@lucasbru lucasbru self-assigned this Oct 23, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces a new group-level configuration for streams groups to delay the initial rebalance. This helps prevent the first joining member from being assigned all active tasks and then slowly revoking them when additional members join, which was causing performance issues particularly with slow or overloaded members.

Key changes:

  • Added STREAMS_GROUP_INITIAL_REBALANCE_DELAY_MS_CONFIG configuration with a default value of 3000ms at both the group coordinator and group config levels
  • Implemented accessor methods to retrieve the initial rebalance delay setting

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
GroupCoordinatorConfig.java Added broker-level configuration constant, field, initialization, and accessor for streams group initial rebalance delay
GroupConfig.java Added group-level configuration constant, field, initialization, and accessor for streams initial rebalance delay

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Copy link
Member

@lucasbru lucasbru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me as a first step, but I am a bit uneasy introducing a config that is not used at all, not even in tests. Maybe we should just go ahead and extend this PR to a working subset of the feature.

public static final String STREAMS_GROUP_MAX_STANDBY_REPLICAS_DOC = "The maximum allowed value for the group-level configuration of " + GroupConfig.STREAMS_NUM_STANDBY_REPLICAS_CONFIG;

public static final String STREAMS_GROUP_INITIAL_REBALANCE_DELAY_MS_CONFIG = "group.streams.initial.rebalance.delay.ms";
public static final int STREAMS_GROUP_INITIAL_REBALANCE_DELAY_MS_DEFAULT = 3000;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mjsax Are we good with a default of 3 seconds for delaying the initial rebalance?

@RaidenE1
Copy link
Contributor Author

@lucasbru Hi Lucas, I'm not going to merge it now, I'll add more things to it. Just want you to take a quick look to make sure I'm on the right direction!

@github-actions github-actions bot removed small Small PRs triage PRs from the community labels Oct 24, 2025
*
* @return An empty result.
*/
private CoordinatorResult<Void, CoordinatorRecord> fireStreamsInitialRebalance(String groupId) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is not doing anything, I think we can inline this.


// Actually bump the group epoch
int groupEpoch = group.groupEpoch();
boolean isInitialRebalance = group.isEmpty();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Group being empty does not mean initial rebalance right? It could be that the group became empty again?

if (bumpGroupEpoch) {
groupEpoch += 1;
if (isInitialRebalance) {
groupEpoch = 2;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the group becomes empty at epoch 9, we are going back in time here to epoch 2, right?

Copy link
Member

@lucasbru lucasbru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this makes sense to me (except for the weirdness around initial epoch that I commented on), but I am wondering now if we could indeed compute a new target assignment when the timer triggers. This could be especially useful when we have offloaded assignments (in the future) and we'd otherwise even not get an assignment on the next heartbeat.

Maybe @squah-confluent could have a quick look how this would interact with the offloaded assignment code that he is going to implement in AK.

I think inside fireStreamsInitialRebalance, we could, if group.epoch() > group.assignmentEpoch(), just create the targetAssignmentbuilder for the current group ID. You can get all the information about the group from the groups map instance variable.

I can think of one corner case: If configuredTopology is not defined, we should just skip computing the target assignment. It will be computed on the next heartbeat.

@mjsax mjsax added the streams label Oct 24, 2025
@mjsax
Copy link
Member

mjsax commented Oct 25, 2025

Let's make sure we do not forget to update the docs, for the new config.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants