Skip to content

HBASE-29291: Add a command to refresh/sync hbase:meta table #7058

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: HBASE-29081
Choose a base branch
from

Conversation

Kota-SH
Copy link
Contributor

@Kota-SH Kota-SH commented Jun 2, 2025

@Kota-SH Kota-SH marked this pull request as draft June 2, 2025 20:05
@Kota-SH Kota-SH marked this pull request as ready for review June 2, 2025 20:08
@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

Copy link
Contributor

@taklwu taklwu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM and have few minor comments, let me try to use the copilot to have another round of verification

@taklwu taklwu requested a review from Copilot June 3, 2025 19:10
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This pull request implements a new command to refresh/sync the hbase:meta table, addressing Jira ticket HBASE-29291. The changes include adding a refreshMeta method across various Admin interfaces and RPC endpoints, integration and unit tests for the new procedure, and corresponding protocol buffer definitions and updates.

  • Added refreshMeta methods in HBase admin and client classes.
  • Introduced new tests and integration procedures for the meta table refresh workflow.
  • Updated RPC and protobuf definitions to support the refreshMeta command.

Reviewed Changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated no comments.

Show a summary per file
File Description
hbase-thrift/src/main/java/org/apache/hadoop/hbase/thrift2/client/ThriftAdmin.java Added a refreshMeta stub throwing NotImplementedException
hbase-server/src/test/java/org/apache/hadoop/hbase/rsgroup/VerifyingRSGroupAdmin.java Delegated refreshMeta call to underlying admin
hbase-server/src/test/java/org/apache/hadoop/hbase/master/procedure/TestRefreshMetaProcedureIntegration.java Integrated end-to-end test for meta refresh behavior
hbase-server/src/test/java/org/apache/hadoop/hbase/master/procedure/TestRefreshMetaProcedure.java Added various unit tests for RefreshMetaProcedure functionality
hbase-server/src/main/java/org/apache/hadoop/hbase/security/access/ReadOnlyController.java Allowed deletes on system tables during read-only mode
hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterRpcServices.java Implemented RPC endpoint for refreshMeta
hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java Added refreshMeta implementation using RefreshMetaProcedure
hbase-server/src/main/java/org/apache/hadoop/hbase/MetaTableAccessor.java Changed deleteFromMetaTable method visibility to public
hbase-protocol-shaded/src/main/protobuf/server/master/MasterProcedure.proto Added RefreshMeta state definitions
hbase-protocol-shaded/src/main/protobuf/server/master/Master.proto Added RefreshMeta request/response messages
hbase-client/src/main/java/org/apache/hadoop/hbase/client/RawAsyncHBaseAdmin.java Added async refreshMeta call
hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncHBaseAdmin.java Exposed async refreshMeta in client API
hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncAdmin.java Added refreshMeta to AsyncAdmin interface
hbase-client/src/main/java/org/apache/hadoop/hbase/client/AdminOverAsyncAdmin.java Exposed refreshMeta over AsyncAdmin
hbase-client/src/main/java/org/apache/hadoop/hbase/client/Admin.java Added refreshMeta method in the Admin API
Comments suppressed due to low confidence (2)

hbase-server/src/test/java/org/apache/hadoop/hbase/master/procedure/TestRefreshMetaProcedureIntegration.java:168

  • [nitpick] The fixed wait time of 3000 ms might be insufficient on slower environments, potentially leading to flaky tests. Consider increasing the timeout or making it configurable.
TEST_UTIL.waitFor(3000, () -> {

hbase-server/src/main/java/org/apache/hadoop/hbase/MetaTableAccessor.java:745

  • The visibility of deleteFromMetaTable has been changed from private to public. Please confirm that exposing this method aligns with the overall API design and does not introduce unintended access.
public static void deleteFromMetaTable(final Connection connection, final List<Delete> deletes)

Copy link
Contributor

@taklwu taklwu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: one more thought, we don't need to address in this PR, but assuming one cluster with 100k regions, would this refresh_meta for all tables still work?

@Kota-SH
Copy link
Contributor Author

Kota-SH commented Jun 3, 2025

@taklwu - Thanks for the review!

nit: one more thought, we don't need to address in this PR, but assuming one cluster with 100k regions, would this refresh_meta for all tables still work?

Well, it becomes a heavy operation, and that's something we should test for.
There is another idea about periodically persisting a region to hfiles mapping in the storage, and being able to apply edits on demand based on that mapping. For now, it is just an idea that was discussed a long time ago upstream.

I'm thinking this refresh_meta implementation will be a fallback operation, in case we need to refresh_meta manually without depending on the active cluster persisting updates.

Copy link

@kgeisz kgeisz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good overall! I just have some minor comments, and I think the changes to ReadOnlyController.java can be cleaned up a bit.

@Kota-SH Kota-SH force-pushed the HBASE-29291-ref-meta branch from 6e01319 to f861d08 Compare June 9, 2025 16:06
@Kota-SH Kota-SH requested a review from kgeisz June 9, 2025 16:37
Copy link

@kgeisz kgeisz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes based on my feedback look good to me.

@Apache-HBase

This comment has been minimized.

Copy link
Contributor

@anmolnar anmolnar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please find my initial review comments below.

Set<RegionInfo> latestSet = new HashSet<>(latest);

// Find regions to add (present in latest but not in current)
for (RegionInfo ri : latest) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the loop work if "latest" is null? You checked that both cannot be the null at the same time, but one of them still can.

.findFirst()
.orElse(null);

if (currentRegion != null && hasBoundaryChanged(currentRegion, latestRegion)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this logic actually work? You say that currentSet contains the latestRegion which - based on the COMPARATOR - means that tableName and boundaries are the same. How could they be different here?

Why don't you just rely on region IDs? You can build two hashmaps where key is RegionId and value is RegionInfo. Missing / extra regions are handled as above, comparison would go on boundaries.

Comment on lines 218 to 219
for (int i = 0; i < puts.size(); i += CHUNK_SIZE) {
int end = Math.min(puts.size(), i + CHUNK_SIZE);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can partition a List easily with Guava or Apache Commons libraries.

(Guava)

List<List<Put>> subSets = Lists.partition(puts, CHUNK_SIZE);

https://www.baeldung.com/java-list-split

Comment on lines 248 to 229
for (int attempt = 1; attempt <= 3; attempt++) {
try {
MetaTableAccessor.deleteFromMetaTable(connection, chunk);
LOG.debug("Successfully processed delete batch {}-{}", i, end);
break;
} catch (IOException e) {
LOG.warn("Delete batch {}-{} failed on attempt {}/3", i, end, attempt, e);
if (attempt == 3) {
throw e;
}
try {
Thread.sleep(100);
} catch (InterruptedException ie) {
Thread.currentThread().interrupt();
throw new IOException("Interrupted during retry", ie);
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The retry logic is redundant with the Put case. I'm not sure if it's already something like this available in the codebase that you can reuse, but at least you could move it to a common method.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we make the number of retries and the wait time configurable?

Comment on lines 273 to 277
if (current == null || latest == null) {
LOG.warn("Cannot compare null region lists - current: {}, latest: {}",
current != null, latest != null);
return false;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You've already done this check in the calling method. I think you should either do it here or there only.

/**
* Determines if an update is needed by comparing current and latest regions.
*/
boolean needsUpdate(List<RegionInfo> current, List<RegionInfo> latest) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need this method at all. Scanning the two lists you will end up having a number of Puts and Deletes and if both are empty no update is needed (as you already log in the Puts.isEmpty() and Deletes.isEmpty() if branch). No need to do these scans upfront. Without using the results of the first scan, it will just doubles processing time.

@Apache-HBase

This comment has been minimized.

}

@Test
public void testDetectBoundaryChangesInRegions() throws Exception {
Copy link
Contributor

@anmolnar anmolnar Jun 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this test validates the code path that you've intended for. From the logs:

2025-06-09T19:35:02,351 INFO  [Time-limited test {}] procedure.RefreshMetaProcedure(284): Region mismatch: current=2, latest=2

Indicates that the HashSet has already detected the difference between the regions and no need for your further boundary check.

This is because the hash code of MutableRegionInfo is calculated based on everything in the object including region boundaries.

@Kota-SH Kota-SH force-pushed the HBASE-29291-ref-meta branch from f861d08 to 2fa72c4 Compare June 16, 2025 21:11
@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Kota-SH Kota-SH force-pushed the HBASE-29291-ref-meta branch from 2fa72c4 to b9469a3 Compare June 17, 2025 15:43
@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

Change-Id: Ia04bb12cdaf580f26cb14d9a34b5963105065faa
@Kota-SH Kota-SH force-pushed the HBASE-29291-ref-meta branch from b9469a3 to ee81220 Compare June 23, 2025 21:50
@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 37s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+0 🆗 buf 0m 0s buf was not available.
+0 🆗 buf 0m 0s buf was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
_ HBASE-29081 Compile Tests _
+0 🆗 mvndep 0m 44s Maven dependency ordering for branch
+1 💚 mvninstall 4m 33s HBASE-29081 passed
+1 💚 compile 7m 11s HBASE-29081 passed
+1 💚 checkstyle 1m 48s HBASE-29081 passed
+1 💚 spotbugs 8m 20s HBASE-29081 passed
+1 💚 spotless 1m 0s branch has no errors when running spotless:check.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 10s Maven dependency ordering for patch
+1 💚 mvninstall 3m 36s the patch passed
+1 💚 compile 6m 56s the patch passed
+1 💚 cc 6m 56s the patch passed
+1 💚 javac 6m 56s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 0m 16s /buildtool-patch-checkstyle-hbase-server.txt The patch fails to run checkstyle in hbase-server
-0 ⚠️ rubocop 0m 30s /results-rubocop.txt The patch generated 5 new + 444 unchanged - 0 fixed = 449 total (was 444)
-1 ❌ spotbugs 1m 43s /new-spotbugs-hbase-server.html hbase-server generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0)
+1 💚 hadoopcheck 11m 59s Patch does not cause any errors with Hadoop 3.3.6 3.4.0.
+1 💚 hbaseprotoc 2m 11s the patch passed
-1 ❌ spotless 0m 21s patch has 30 errors when running spotless:check, run spotless:apply to fix.
_ Other Tests _
+1 💚 asflicense 0m 43s The patch does not generate ASF License warnings.
67m 12s
Reason Tests
SpotBugs module:hbase-server
org.apache.hadoop.hbase.master.procedure.RefreshMetaProcedure.compareAndUpdateRegions(Map, Map, Connection, MasterProcedureEnv) makes inefficient use of keySet iterator instead of entrySet iterator At RefreshMetaProcedure.java:inefficient use of keySet iterator instead of entrySet iterator At RefreshMetaProcedure.java:[line 175]
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7058/5/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #7058
JIRA Issue HBASE-29291
Optional Tests dupname asflicense javac spotbugs checkstyle codespell detsecrets compile hadoopcheck hbaseanti spotless cc buflint bufcompat hbaseprotoc rubocop
uname Linux 1ca46493f99e 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision HBASE-29081 / ee81220
Default Java Eclipse Adoptium-17.0.11+9
spotless https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7058/5/artifact/yetus-general-check/output/patch-spotless.txt
Max. process+thread count 86 (vs. ulimit of 30000)
modules C: hbase-protocol-shaded hbase-client hbase-server hbase-thrift hbase-shell U: .
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7058/5/console
versions git=2.34.1 maven=3.9.8 spotbugs=4.7.3 rubocop=1.37.1
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.


List<Mutation> mutations = new ArrayList<>();

for (String regionId : latestMap.keySet()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Iterate this way:

for (Map.Entry<String, RegionInfo> entry : latestMap.entrySet()) {
  String regionId = entry.getKey();
  RegionInfo latestRegion = entry.getValue();
...
}

private List<RegionInfo> scanRegionsInTable(FileSystem fs, List<Path> regionDirs) throws IOException {
List<RegionInfo> regions = new ArrayList<>();

for (Path regionDir : regionDirs) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason for using standard single-threaded for loop here and stream parallel one level up?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I parallelized at the table level because a directory listing call might have higher latency than reading individual .regioninfo files. So I just used a loop to avoid the overhead of thread coordination.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha. Makes sense to me.

Comment on lines +340 to +343
private boolean isValidTableDirectory(Path path) {
return !(path.getName().matches("^[._-].*")) &&
!(path.getName().startsWith(TableName.META_TABLE_NAME.getNameAsString()));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a similar function already in HBase which is the standard way of this check?

!(path.getName().startsWith(TableName.META_TABLE_NAME.getNameAsString()));
}

private boolean isValidRegionInfo(RegionInfo regionInfo, String expectedEncodedName) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. No HBase standard tool for that?

return true;
}

private boolean isRelevantDirectory(Path path) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 37s Docker mode activated.
-0 ⚠️ yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --author-ignore-list --blanks-eol-ignore-file --blanks-tabs-ignore-file --quick-hadoopcheck
_ Prechecks _
_ HBASE-29081 Compile Tests _
+0 🆗 mvndep 0m 11s Maven dependency ordering for branch
+1 💚 mvninstall 3m 49s HBASE-29081 passed
+1 💚 compile 2m 59s HBASE-29081 passed
+1 💚 javadoc 2m 4s HBASE-29081 passed
+1 💚 shadedjars 6m 57s branch has no errors when building our shaded downstream artifacts.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 15s Maven dependency ordering for patch
+1 💚 mvninstall 4m 12s the patch passed
+1 💚 compile 3m 17s the patch passed
+1 💚 javac 3m 17s the patch passed
+1 💚 javadoc 1m 47s the patch passed
+1 💚 shadedjars 6m 58s patch has no errors when building our shaded downstream artifacts.
_ Other Tests _
+1 💚 unit 0m 42s hbase-protocol-shaded in the patch passed.
+1 💚 unit 1m 54s hbase-client in the patch passed.
-1 ❌ unit 244m 31s /patch-unit-hbase-server.txt hbase-server in the patch failed.
+1 💚 unit 6m 46s hbase-thrift in the patch passed.
-1 ❌ unit 7m 46s /patch-unit-hbase-shell.txt hbase-shell in the patch failed.
300m 28s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7058/5/artifact/yetus-jdk17-hadoop3-check/output/Dockerfile
GITHUB PR #7058
JIRA Issue HBASE-29291
Optional Tests javac javadoc unit compile shadedjars
uname Linux 97f3045de941 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision HBASE-29081 / ee81220
Default Java Eclipse Adoptium-17.0.11+9
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7058/5/testReport/
Max. process+thread count 5528 (vs. ulimit of 30000)
modules C: hbase-protocol-shaded hbase-client hbase-server hbase-thrift hbase-shell U: .
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7058/5/console
versions git=2.34.1 maven=3.9.8
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

if (currentRegions == null || latestRegions == null) {
LOG.error("Can not execute update on null lists. "
+ "Meta Table Regions - {}, Storage Regions - {}", currentRegions, latestRegions);
throw new IOException((currentRegions == null ? "current regions" : "latest regions") + "list is null");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Add a space before "list is null". Otherwise, it will be concatenated with "current regions" or "latest regions" without a space between.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants