-
Notifications
You must be signed in to change notification settings - Fork 3.4k
HBASE-29291: Add a command to refresh/sync hbase:meta table #7058
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: HBASE-29081
Are you sure you want to change the base?
Conversation
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM and have few minor comments, let me try to use the copilot to have another round of verification
hbase-server/src/main/java/org/apache/hadoop/hbase/MetaTableAccessor.java
Show resolved
Hide resolved
hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/RefreshMetaProcedure.java
Outdated
Show resolved
Hide resolved
hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/RefreshMetaProcedure.java
Outdated
Show resolved
Hide resolved
hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/RefreshMetaProcedure.java
Outdated
Show resolved
Hide resolved
hbase-server/src/main/java/org/apache/hadoop/hbase/security/access/ReadOnlyController.java
Show resolved
Hide resolved
hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/RefreshMetaProcedure.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This pull request implements a new command to refresh/sync the hbase:meta table, addressing Jira ticket HBASE-29291. The changes include adding a refreshMeta method across various Admin interfaces and RPC endpoints, integration and unit tests for the new procedure, and corresponding protocol buffer definitions and updates.
- Added refreshMeta methods in HBase admin and client classes.
- Introduced new tests and integration procedures for the meta table refresh workflow.
- Updated RPC and protobuf definitions to support the refreshMeta command.
Reviewed Changes
Copilot reviewed 16 out of 16 changed files in this pull request and generated no comments.
Show a summary per file
File | Description |
---|---|
hbase-thrift/src/main/java/org/apache/hadoop/hbase/thrift2/client/ThriftAdmin.java | Added a refreshMeta stub throwing NotImplementedException |
hbase-server/src/test/java/org/apache/hadoop/hbase/rsgroup/VerifyingRSGroupAdmin.java | Delegated refreshMeta call to underlying admin |
hbase-server/src/test/java/org/apache/hadoop/hbase/master/procedure/TestRefreshMetaProcedureIntegration.java | Integrated end-to-end test for meta refresh behavior |
hbase-server/src/test/java/org/apache/hadoop/hbase/master/procedure/TestRefreshMetaProcedure.java | Added various unit tests for RefreshMetaProcedure functionality |
hbase-server/src/main/java/org/apache/hadoop/hbase/security/access/ReadOnlyController.java | Allowed deletes on system tables during read-only mode |
hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterRpcServices.java | Implemented RPC endpoint for refreshMeta |
hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java | Added refreshMeta implementation using RefreshMetaProcedure |
hbase-server/src/main/java/org/apache/hadoop/hbase/MetaTableAccessor.java | Changed deleteFromMetaTable method visibility to public |
hbase-protocol-shaded/src/main/protobuf/server/master/MasterProcedure.proto | Added RefreshMeta state definitions |
hbase-protocol-shaded/src/main/protobuf/server/master/Master.proto | Added RefreshMeta request/response messages |
hbase-client/src/main/java/org/apache/hadoop/hbase/client/RawAsyncHBaseAdmin.java | Added async refreshMeta call |
hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncHBaseAdmin.java | Exposed async refreshMeta in client API |
hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncAdmin.java | Added refreshMeta to AsyncAdmin interface |
hbase-client/src/main/java/org/apache/hadoop/hbase/client/AdminOverAsyncAdmin.java | Exposed refreshMeta over AsyncAdmin |
hbase-client/src/main/java/org/apache/hadoop/hbase/client/Admin.java | Added refreshMeta method in the Admin API |
Comments suppressed due to low confidence (2)
hbase-server/src/test/java/org/apache/hadoop/hbase/master/procedure/TestRefreshMetaProcedureIntegration.java:168
- [nitpick] The fixed wait time of 3000 ms might be insufficient on slower environments, potentially leading to flaky tests. Consider increasing the timeout or making it configurable.
TEST_UTIL.waitFor(3000, () -> {
hbase-server/src/main/java/org/apache/hadoop/hbase/MetaTableAccessor.java:745
- The visibility of deleteFromMetaTable has been changed from private to public. Please confirm that exposing this method aligns with the overall API design and does not introduce unintended access.
public static void deleteFromMetaTable(final Connection connection, final List<Delete> deletes)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: one more thought, we don't need to address in this PR, but assuming one cluster with 100k regions, would this refresh_meta for all tables still work?
@taklwu - Thanks for the review!
Well, it becomes a heavy operation, and that's something we should test for. I'm thinking this refresh_meta implementation will be a fallback operation, in case we need to refresh_meta manually without depending on the active cluster persisting updates. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good overall! I just have some minor comments, and I think the changes to ReadOnlyController.java
can be cleaned up a bit.
hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/RefreshMetaProcedure.java
Outdated
Show resolved
Hide resolved
hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/RefreshMetaProcedure.java
Show resolved
Hide resolved
hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/RefreshMetaProcedure.java
Outdated
Show resolved
Hide resolved
hbase-server/src/main/java/org/apache/hadoop/hbase/security/access/ReadOnlyController.java
Outdated
Show resolved
Hide resolved
hbase-server/src/main/java/org/apache/hadoop/hbase/security/access/ReadOnlyController.java
Outdated
Show resolved
Hide resolved
hbase-server/src/main/java/org/apache/hadoop/hbase/security/access/ReadOnlyController.java
Outdated
Show resolved
Hide resolved
hbase-server/src/main/java/org/apache/hadoop/hbase/security/access/ReadOnlyController.java
Outdated
Show resolved
Hide resolved
...-server/src/test/java/org/apache/hadoop/hbase/master/procedure/TestRefreshMetaProcedure.java
Outdated
Show resolved
Hide resolved
6e01319
to
f861d08
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes based on my feedback look good to me.
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please find my initial review comments below.
Set<RegionInfo> latestSet = new HashSet<>(latest); | ||
|
||
// Find regions to add (present in latest but not in current) | ||
for (RegionInfo ri : latest) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the loop work if "latest" is null? You checked that both cannot be the null at the same time, but one of them still can.
.findFirst() | ||
.orElse(null); | ||
|
||
if (currentRegion != null && hasBoundaryChanged(currentRegion, latestRegion)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this logic actually work? You say that currentSet
contains the latestRegion
which - based on the COMPARATOR - means that tableName and boundaries are the same. How could they be different here?
Why don't you just rely on region IDs? You can build two hashmaps where key is RegionId and value is RegionInfo. Missing / extra regions are handled as above, comparison would go on boundaries.
for (int i = 0; i < puts.size(); i += CHUNK_SIZE) { | ||
int end = Math.min(puts.size(), i + CHUNK_SIZE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can partition a List easily with Guava or Apache Commons libraries.
(Guava)
List<List<Put>> subSets = Lists.partition(puts, CHUNK_SIZE);
for (int attempt = 1; attempt <= 3; attempt++) { | ||
try { | ||
MetaTableAccessor.deleteFromMetaTable(connection, chunk); | ||
LOG.debug("Successfully processed delete batch {}-{}", i, end); | ||
break; | ||
} catch (IOException e) { | ||
LOG.warn("Delete batch {}-{} failed on attempt {}/3", i, end, attempt, e); | ||
if (attempt == 3) { | ||
throw e; | ||
} | ||
try { | ||
Thread.sleep(100); | ||
} catch (InterruptedException ie) { | ||
Thread.currentThread().interrupt(); | ||
throw new IOException("Interrupted during retry", ie); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The retry logic is redundant with the Put case. I'm not sure if it's already something like this available in the codebase that you can reuse, but at least you could move it to a common method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we make the number of retries and the wait time configurable?
if (current == null || latest == null) { | ||
LOG.warn("Cannot compare null region lists - current: {}, latest: {}", | ||
current != null, latest != null); | ||
return false; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You've already done this check in the calling method. I think you should either do it here or there only.
/** | ||
* Determines if an update is needed by comparing current and latest regions. | ||
*/ | ||
boolean needsUpdate(List<RegionInfo> current, List<RegionInfo> latest) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you need this method at all. Scanning the two lists you will end up having a number of Puts and Deletes and if both are empty no update is needed (as you already log in the Puts.isEmpty() and Deletes.isEmpty() if branch). No need to do these scans upfront. Without using the results of the first scan, it will just doubles processing time.
hbase-server/src/main/java/org/apache/hadoop/hbase/security/access/ReadOnlyController.java
Show resolved
Hide resolved
This comment has been minimized.
This comment has been minimized.
} | ||
|
||
@Test | ||
public void testDetectBoundaryChangesInRegions() throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this test validates the code path that you've intended for. From the logs:
2025-06-09T19:35:02,351 INFO [Time-limited test {}] procedure.RefreshMetaProcedure(284): Region mismatch: current=2, latest=2
Indicates that the HashSet has already detected the difference between the regions and no need for your further boundary check.
This is because the hash code of MutableRegionInfo
is calculated based on everything in the object including region boundaries.
f861d08
to
2fa72c4
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
2fa72c4
to
b9469a3
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Change-Id: Ia04bb12cdaf580f26cb14d9a34b5963105065faa
b9469a3
to
ee81220
Compare
💔 -1 overall
This message was automatically generated. |
|
||
List<Mutation> mutations = new ArrayList<>(); | ||
|
||
for (String regionId : latestMap.keySet()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Iterate this way:
for (Map.Entry<String, RegionInfo> entry : latestMap.entrySet()) {
String regionId = entry.getKey();
RegionInfo latestRegion = entry.getValue();
...
}
private List<RegionInfo> scanRegionsInTable(FileSystem fs, List<Path> regionDirs) throws IOException { | ||
List<RegionInfo> regions = new ArrayList<>(); | ||
|
||
for (Path regionDir : regionDirs) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the reason for using standard single-threaded for loop here and stream parallel one level up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I parallelized at the table level because a directory listing call might have higher latency than reading individual .regioninfo files. So I just used a loop to avoid the overhead of thread coordination.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha. Makes sense to me.
private boolean isValidTableDirectory(Path path) { | ||
return !(path.getName().matches("^[._-].*")) && | ||
!(path.getName().startsWith(TableName.META_TABLE_NAME.getNameAsString())); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a similar function already in HBase which is the standard way of this check?
!(path.getName().startsWith(TableName.META_TABLE_NAME.getNameAsString())); | ||
} | ||
|
||
private boolean isValidRegionInfo(RegionInfo regionInfo, String expectedEncodedName) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here. No HBase standard tool for that?
return true; | ||
} | ||
|
||
private boolean isRelevantDirectory(Path path) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here.
💔 -1 overall
This message was automatically generated. |
if (currentRegions == null || latestRegions == null) { | ||
LOG.error("Can not execute update on null lists. " | ||
+ "Meta Table Regions - {}, Storage Regions - {}", currentRegions, latestRegions); | ||
throw new IOException((currentRegions == null ? "current regions" : "latest regions") + "list is null"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Add a space before "list is null". Otherwise, it will be concatenated with "current regions" or "latest regions" without a space between.
Jira: https://issues.apache.org/jira/browse/HBASE-29291