Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix ServerStatusService.statusWatchers concurrent modification. #12884

Merged

Conversation

weixiang1862
Copy link
Member

Fix ServerStatusService.statusWatchers concurrent modification.

  • Add a unit test to verify that the fix works.
  • Explain briefly why the bug exists and how to fix it.
  • If this pull request closes/resolves/fixes an existing issue, replace the issue number. Closes #.
  • Update the CHANGES log.

ServerStatusService#registerWatcher called in main thread and ServerStatusService#rebalancedCluster called in
cluster coordinator thread, this may cause concurrent modification of statusWatchers ArrayList when oap server bootstrap.

ccdee70023589eacb6b37e5648848ace

Comment on lines 67 to 69
if (!bootingStatus.isBooted()) {
return;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should wait here rather than return? If we skip this step, then rebalancedTime and the telemetry metric of cluster_rebalanced_time are not being updated. Worse, ServerStatusWatcher implementations would not get any notification.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ServerStatusWatcher will be notified by the method ServerStatusService#bootNow right after bootstrap complete.

I think notice once is enough in bootstrap stage. If you concern about cluster_rebalanced_time telemetry metric, how about move this return judge after cluster_rebalanced_time creating?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, thanks for the explanation. Then you need to move bootingStatus.setBooted(true); to the last line of the bootedNow method. Otherwise, the race condition could still happen, right?

I think notice once is enough in bootstrap stage. If you concern about cluster_rebalanced_time telemetry metric, how about move this return judge after cluster_rebalanced_time creating?

That is not correct. Because actually, we don't do rebalancing notifications.

@wu-sheng wu-sheng added bug Something isn't working and you are sure it's a bug! backend OAP backend related. labels Dec 20, 2024
@wu-sheng wu-sheng added this to the 10.2.0 milestone Dec 20, 2024
@wu-sheng wu-sheng merged commit 6d262cc into apache:master Dec 20, 2024
168 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend OAP backend related. bug Something isn't working and you are sure it's a bug!
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants