Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KAFKA-18655: Implement the consumer group size counter with scheduled task #18717

Open
wants to merge 5 commits into
base: trunk
Choose a base branch
from

Conversation

dongnuo123
Copy link
Collaborator

During testing we discovered that the empty group count is not updated in group conversion, but when the new group is transition to other state, the empty group count is decremented. This could result in negative empty group count.

We can have a new consumer group count implementation that follows the pattern we did for the classic group count. The timeout task periodically refreshes the metrics based on the current groups soft state.

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

@github-actions github-actions bot added the triage PRs from the community label Jan 27, 2025
@@ -71,7 +71,7 @@ public TimelineGaugeCounter(TimelineLong timelineLong, AtomicLong atomicLong) {
/**
* Consumer group size gauge counters keyed by the metric name.
*/
private final Map<ConsumerGroupState, TimelineGaugeCounter> consumerGroupGauges;
private volatile Map<ConsumerGroupState, Long> consumerGroupGauges;
Copy link
Collaborator Author

@dongnuo123 dongnuo123 Jan 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the timeline gauge counter because we recompute the metric from scratch every 60 second, so the result will eventually be consistent despite any rollbacks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

confirming the concurrency pattern here:

  • we have only one reader (metrics collector) and writer (runtime) thread for a shard at any given time.
  • we are atomically updating the map to a new map

does this sound right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes correct

@github-actions github-actions bot removed the triage PRs from the community label Jan 28, 2025
@jeffkbkim jeffkbkim added the KIP-848 The Next Generation of the Consumer Rebalance Protocol label Jan 29, 2025
Copy link
Contributor

@jeffkbkim jeffkbkim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR and the simplification. left some comments

@@ -71,7 +71,7 @@ public TimelineGaugeCounter(TimelineLong timelineLong, AtomicLong atomicLong) {
/**
* Consumer group size gauge counters keyed by the metric name.
*/
private final Map<ConsumerGroupState, TimelineGaugeCounter> consumerGroupGauges;
private volatile Map<ConsumerGroupState, Long> consumerGroupGauges;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

confirming the concurrency pattern here:

  • we have only one reader (metrics collector) and writer (runtime) thread for a shard at any given time.
  • we are atomically updating the map to a new map

does this sound right?

Copy link
Contributor

@jeffkbkim jeffkbkim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left some minor comments. lgtm otherwise

Comment on lines +136 to +138
* Set the number of consumer groups. The method is the only way to update
* the map and is called by the scheduled task that updates the metrics
* in {@link org.apache.kafka.coordinator.group.GroupCoordinatorShard}.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "This method should be the only way.." and add "..breaking this will result in inconsistent behavior" as well as for setClassicGroupGauges

*
* @param state the consumer group state.
* @param consumerGroupGauges The map counting the number of consumer groups in each state.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we update the classicGroupGauges parameter similarly to this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
KIP-848 The Next Generation of the Consumer Rebalance Protocol
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants