Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KAFKA-16524: Metrics for KIP-853 #18304

Merged
merged 22 commits into from
Jan 30, 2025
Merged

KAFKA-16524: Metrics for KIP-853 #18304

merged 22 commits into from
Jan 30, 2025

Conversation

kevin-wu24
Copy link
Contributor

@kevin-wu24 kevin-wu24 commented Dec 23, 2024

This PR implements the following gauges in KafkaRaftMetrics, BrokerServerMetrics, and ControllerServerMetrics:

KafkaRaftMetrics:

  • number-of-voters: number of voters for a KRaft topic partition. All of the replicas will report this metric.
  • number-of-observers: number of observers being tracked by the KRaft topic partition leader.
  • uncommitted-voter-change: 1 if there is a voter change that has not been committed, 0 otherwise.

number-of-observers and uncommitted-voter-change will only be present on the active controller, since it does not make sense for other replicas to report these metrics. If number-of-offline-voters is implemented, it will also only be reported by the leader.

In order to make these two metrics thread-safe, KafkaRaftMetrics needs to be passed into LeaderState, and therefore QuorumState. This introduces a circularity since the KafkaRaftMetrics constructor takes in QuorumState. To break the circularity for now, the logic using QuorumState will be moved to a KafkaRaftMetrics#initialize method.

BrokerServerMetrics:

  • ignored-static-voters: 1 if reading the voter set from the log, 0 if the voter set comes from static configuration.

ControllerServerMetrics:

  • IgnoredStaticVoters: 1 if reading the voter set from the log, 0 if the voter set comes from static configuration. (same as the metric above but for controllers)
  • IsObserver: 1 if the controller node is not a voter for the KRaft cluster metadata partition, 0 otherwise. This metric will not be implemented in this PR. Instead, the current-state metric in conjunction with the presence of kafka.server.controller metrics can be used to provide the same information.

To implement the ignored-static-voters metrics, this PR introduces the ExternalKRaftMetrics class, which allows for higher layer metrics objects to be accessible within the raft module. number-of-voters and ignored-static-voters will be updated in KRaftControlRecordStateMachine#handleBatch when a new VotersRecord is read from log.

Testing:

  • Unit tests in KRaftControlRecordStateMachineTest and KafkaRaftMetricsTest.
  • Added/modified tests in KafkaRaftClientReconfigTest to verify new metrics behavior.

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

@github-actions github-actions bot added triage PRs from the community core Kafka Broker kraft labels Dec 23, 2024
Copy link

A label of 'needs-attention' was automatically added to this PR in order to raise the
attention of the committers. Once this issue has been triaged, the triage label
should be removed to prevent this automation from happening again.

@jsancio jsancio self-assigned this Jan 2, 2025
Copy link
Member

@jsancio jsancio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes @kevin-wu24 . Quick review of src/main.

@github-actions github-actions bot removed the triage PRs from the community label Jan 3, 2025
Copy link
Member

@jsancio jsancio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your improvements @kevin-wu24

@github-actions github-actions bot added the build Gradle build or GitHub Actions label Jan 5, 2025
@kevin-wu24
Copy link
Contributor Author

kevin-wu24 commented Jan 5, 2025

Pushed some commits. Changes of note:

  • Moved DefaultExternalKRaftMetrics to /core/src/main/scala/kafka/raft to avoid circular dependency and avoid BrokerServerMetrics and ControllerMetadataMetrics needing to implement another interface.
  • I'm not sure whether or not it makes sense to provide a mock implementation of ExternalKRaftMetrics for the purpose of testing. The most recent changes just use the DefaultExternalKRaftMetrics implementation directly, since we want to validate that the metrics values in the brokerServerMetrics and controllerMetadataMetrics objects contained in the implementation are actually being updated. This has the side effect of requiring dependencies being added in the build file for raft tests and allowing the relevant imports.
  • Added tests in KafkaMetricsTest for the newly-added number-of-voters, number-of-observers, and uncommitted-voter-change.
  • Modified tests in KRaftControlRecordStateMachineTest to test the two ignored-static-voters metrics.
  • Added testLeaderMetricsAreReset in KafkaRaftClientReconfigTest to cover the case where a leader loses leadership and regains it, checking that number-of-observers and uncommitted-voter-change are not using the values from previous leadership.

Copy link
Member

@jsancio jsancio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes. I reviewed src/main.

build.gradle Outdated Show resolved Hide resolved
checkstyle/import-control.xml Outdated Show resolved Hide resolved
core/src/main/scala/kafka/raft/RaftManager.scala Outdated Show resolved Hide resolved
core/src/main/scala/kafka/tools/TestRaftServer.scala Outdated Show resolved Hide resolved
raft/src/main/java/org/apache/kafka/raft/VoterSet.java Outdated Show resolved Hide resolved
@kevin-wu24
Copy link
Contributor Author

Thanks for the review @jsancio. Pushed some commits:

  • Added MockExternalKRaftMetrics to prevent additional raft module dependencies. Added DefaultExternalKRaftMetricsTest since raft layer tests use the mock instead.
  • The staticVoterSet passed to KRaftControlRecordStateMachine can never be null. Instead the default value for controller.quorum.voters is the empty list in both implementation and testing. Fixed updating logic for the ignored-static-voters metric so that it always stays false when the mentioned config is not set. Only when that config is actually set can the metric's value ever go from false to true.
  • In the case of truncateNewEntries truncating the entire log, if controller.quorum.voters is set, ignored-static-voters metric can go from true back to false. Implementation now covers for this case.
  • KRaftControlRecordStateMachineTest checks both ignored-static-voters and number-of-voters metric values.

Copy link
Member

@jsancio jsancio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Submitted a quick review.

Copy link
Member

@jsancio jsancio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update @kevin-wu24 . Let's make sure that the metrics that represent a boolean expose the value as 1 or 0 and not as true or false.

Copy link
Member

@jsancio jsancio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jsancio jsancio merged commit 184b891 into apache:trunk Jan 30, 2025
9 checks passed
cmccabe pushed a commit that referenced this pull request Feb 4, 2025
This change implement some of the metrics enumerated in KIP-853.

The KafkaRaftMetrics object now exposes number-of-voters, number-of-observers and uncommitted-voter-change. The number-of-observers and uncommitted-voter-change metrics are only present on the active controller or leader, since it does not make sense for other replicas to report these metrics.

In order to make these two metrics thread-safe, KafkaRaftMetrics needs to be passed into LeaderState, and therefore QuorumState. This introduces a circularity since the KafkaRaftMetrics constructor takes in QuorumState. To break the circularity for now, the logic using QuorumState will be moved to the KafkaRaftMetrics#initialize method.

The BrokerServerMetrics object now exposes ignored-static-voters. The ControllerServerMetrics object now exposes IgnoredStaticVoters. To implement both metrics for "ignored static voters", this PR introduces the ExternalKRaftMetrics interface, which allows for higher layer metrics objects to be accessible within the raft module.

Reviewers: José Armando García Sancio <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build Gradle build or GitHub Actions ci-approved core Kafka Broker kraft
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants