-
Notifications
You must be signed in to change notification settings - Fork 603
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kafka/server: add metrics and config for consumer lag reporting #24977
base: dev
Are you sure you want to change the base?
kafka/server: add metrics and config for consumer lag reporting #24977
Conversation
CI test resultstest results on build#61359
test results on build#61509
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice!
"Sum of consumer group lag for all partitions in a group"), | ||
labels), | ||
sm::make_gauge( | ||
"lag_max", | ||
[this] { return _lag_metrics.max; }, | ||
sm::description( | ||
"Maximum consumer group lag across all partitions in a group"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe one for docs, but topic-partitions might be more easily understood.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like this?
"Sum of consumer group lag for all partitions in a group"), | |
labels), | |
sm::make_gauge( | |
"lag_max", | |
[this] { return _lag_metrics.max; }, | |
sm::description( | |
"Maximum consumer group lag across all partitions in a group"), | |
"Sum of consumer group lag for all topic-partitions"), | |
labels), | |
sm::make_gauge( | |
"lag_max", | |
[this] { return _lag_metrics.max; }, | |
sm::description( | |
"Maximum consumer group lag across topic-partitions"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree.
@@ -196,6 +196,7 @@ struct configuration final : public config_store { | |||
property<bool> disable_metrics; | |||
property<bool> disable_public_metrics; | |||
property<bool> aggregate_metrics; | |||
property<bool> enable_consumer_group_lag_metrics; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should have more than a bool here.
I see a use for:
- partition-level metrics (what we have with
redpanda_kafka_consumer_group_committed_offset
) - consumer_lag (this)
- both
enable_group_metrics
is hard-coded as on (off in some tests), so it could be wrapped up.
Perhaps enable_group_metrics
with options partition
, group
. Not sure how to spell the option both, if, perhaps, we wanted to add topic
one day. Maybe the accepted values could be something like partition|lag
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds like something maybe we can tackle in future metrics rework?
76291af
to
53aa756
Compare
config: Add consumer lag metric config Note that this commit contains only the metric infrastructure, i.e. the probe and the mechanism to dynamically enable/disable these metrics. A subsequent commit will implement the logic to populate the consumer lag metrics data.
53aa756
to
b880e01
Compare
Changes in force-push:
Changes in force-push:
|
, enable_consumer_group_lag_metrics( | ||
*this, | ||
"enable_consumer_group_lag_metrics", | ||
"Enable registering metrics for consumer group lag exposed on " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Enable registering metrics for consumer group lag exposed on " | |
"Enable metrics for consumer group lag exposed on " |
Implements: https://redpandadata.atlassian.net/browse/CORE-8914
Introduce "enable_consumer_group_lag_metrics" which controls whether the consumer lag metrics are active. This can be changed without needing a restart.
Introduce the metrics scaffolding needed to have metrics that can be enabled/disabled at runtime.
redpanda_kafka_consumer_group_lag_max
group
,shard
redpanda_kafka_consumer_group_lag_sum
group
,shard
Backports Required
Release Notes