Skip to content

HBASE-29376 ReplicationLogCleaner.preClean/getDeletableFiles should return early when asyncClusterConnection closes during HMaster stopping #7071

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

guluo2016
Copy link
Contributor

Details see: HBASE-29376

After applying this patch

Setting hbase.master.cleaner.interval to 10000ms, and when stopping the HMaster service, we can observe the following logs.

2025-06-05T07:30:03,108 INFO  [RS:0;localhost:16020] regionserver.HRegionServer: Closing user regions
2025-06-05T07:30:08,773 WARN  [master/localhost:16000.Chore.1] master.ReplicationLogCleaner: Rpc client has been stopped.
2025-06-05T07:30:09,124 INFO  [RS:0;localhost:16020] regionserver.HRegionServer: ***** STOPPING region server 'localhost,16020,1749079746517' *****

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache9 Apache9 requested a review from Copilot June 5, 2025 02:55
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements early return checks in ReplicationLogCleaner methods when the async cluster connection is closed, ensuring that no file deletion occurs during HMaster shutdown. Key changes include:

  • Adding early return logic in preClean() and getDeletableFiles() based on asyncClusterConnection status.
  • Updating the ReplicationLogCleaner initialization to track MasterServices via a new masterService field.
  • Introducing new tests in TestReplicationLogCleaner.java to verify the early return behavior when the async connection is closed.

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/master/ReplicationLogCleaner.java Added early return checks for async connection closed and stored MasterServices instance via masterService.
hbase-server/src/test/java/org/apache/hadoop/hbase/replication/master/TestReplicationLogCleaner.java Added tests to validate that preClean() and getDeletableFiles() return early when the async connection is closed.

@@ -77,6 +78,12 @@ public void preClean() {
if (this.getConf() == null) {
return;
}

if (masterService.getAsyncClusterConnection().isClosed()) {
Copy link
Preview

Copilot AI Jun 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The asyncClusterConnection closure check is duplicated in both preClean() and getDeletableFiles(). Consider extracting this logic into a separate private helper method to reduce code duplication and improve maintainability.

Copilot uses AI. Check for mistakes.

*/
private boolean isAsyncClusterConnectionClosed() {
AsyncClusterConnection asyncClusterConnection = masterService.getAsyncClusterConnection();
return asyncClusterConnection != null && asyncClusterConnection.isClosed();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A null check for asyncClusterConnection here, as HMaster performs the same check.

// HMaster.shutdown() 
if (this.asyncClusterConnection != null) {
    this.asyncClusterConnection.close();
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If asyncClusterConnection is null, we should also skip the deletion check?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit busy now, I'll recheck the code this weekend.
Thanks for your review

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My fault, the check should be skipped when asyncClusterConnection == null
Thanks.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@guluo2016
Copy link
Contributor Author

TestCleanerChore passes locally. Let's try a rebuild

[INFO] Tests run: 14, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 7.547 s - in org.apache.hadoop.hbase.master.cleaner.TestCleanerChore
[INFO]
[INFO] Results:
[INFO]
[INFO] Tests run: 14, Failures: 0, Errors: 0, Skipped: 0
[INFO]
[INFO]
[INFO] --- surefire:3.1.0:test (secondPartTestsExecution) @ hbase-server ---
[INFO] Tests are skipped.
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary for Apache HBase 4.0.0-alpha-1-SNAPSHOT:

@guluo2016 guluo2016 force-pushed the hbase_HBASE-29376 branch from f9cd7da to 090de30 Compare June 5, 2025 15:42
@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@guluo2016 guluo2016 force-pushed the hbase_HBASE-29376 branch from 090de30 to 8c0e8d6 Compare June 29, 2025 13:02
@guluo2016
Copy link
Contributor Author

Trigger a rebuild

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

guluo2016 added 2 commits July 4, 2025 22:25
…eturn early when asyncClusterConnection closes during HMaster stopping
@guluo2016 guluo2016 force-pushed the hbase_HBASE-29376 branch from 8c0e8d6 to fc7a7b6 Compare July 4, 2025 14:33
@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 27s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
_ master Compile Tests _
+1 💚 mvninstall 3m 37s master passed
+1 💚 compile 3m 14s master passed
+1 💚 checkstyle 0m 36s master passed
+1 💚 spotbugs 1m 34s master passed
+1 💚 spotless 0m 46s branch has no errors when running spotless:check.
_ Patch Compile Tests _
+1 💚 mvninstall 2m 52s the patch passed
+1 💚 compile 3m 16s the patch passed
+1 💚 javac 3m 16s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 0m 35s /results-checkstyle-hbase-server.txt hbase-server: The patch generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0)
+1 💚 spotbugs 1m 37s the patch passed
+1 💚 hadoopcheck 11m 16s Patch does not cause any errors with Hadoop 3.3.6 3.4.0.
+1 💚 spotless 0m 42s patch has no errors when running spotless:check.
_ Other Tests _
+1 💚 asflicense 0m 10s The patch does not generate ASF License warnings.
37m 57s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7071/5/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #7071
Optional Tests dupname asflicense javac spotbugs checkstyle codespell detsecrets compile hadoopcheck hbaseanti spotless
uname Linux 22a3cbdf4d50 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / fc7a7b6
Default Java Eclipse Adoptium-17.0.11+9
Max. process+thread count 85 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7071/5/console
versions git=2.34.1 maven=3.9.8 spotbugs=4.7.3
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 32s Docker mode activated.
-0 ⚠️ yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --author-ignore-list --blanks-eol-ignore-file --blanks-tabs-ignore-file --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+1 💚 mvninstall 3m 51s master passed
+1 💚 compile 0m 59s master passed
+1 💚 javadoc 0m 28s master passed
+1 💚 shadedjars 6m 6s branch has no errors when building our shaded downstream artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 3m 10s the patch passed
+1 💚 compile 0m 57s the patch passed
+1 💚 javac 0m 57s the patch passed
+1 💚 javadoc 0m 27s the patch passed
+1 💚 shadedjars 6m 4s patch has no errors when building our shaded downstream artifacts.
_ Other Tests _
+1 💚 unit 216m 18s hbase-server in the patch passed.
243m 45s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7071/5/artifact/yetus-jdk17-hadoop3-check/output/Dockerfile
GITHUB PR #7071
Optional Tests javac javadoc unit compile shadedjars
uname Linux 69b0ba1aa348 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / fc7a7b6
Default Java Eclipse Adoptium-17.0.11+9
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7071/5/testReport/
Max. process+thread count 5489 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7071/5/console
versions git=2.34.1 maven=3.9.8
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants