Skip to content

Conversation

@ankitsol
Copy link

Non-continuous Incremental backup uses backup system table to identify which bulkload hfiles it needs to copy.

With continuous incremental backup, this change uses BulkLoadCollectorJob to identify bulkload hfiles it needs to copy. BulkLoadCollectorJob is run on back-ed up WAL instead of source cluster WALs

JIRA: https://issues.apache.org/jira/browse/HBASE-29656

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 12s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
_ HBASE-28957 Compile Tests _
+1 💚 mvninstall 5m 10s HBASE-28957 passed
+1 💚 compile 0m 47s HBASE-28957 passed
-0 ⚠️ checkstyle 0m 13s /buildtool-branch-checkstyle-hbase-backup.txt The patch fails to run checkstyle in hbase-backup
+1 💚 spotbugs 0m 43s HBASE-28957 passed
+1 💚 spotless 1m 4s branch has no errors when running spotless:check.
_ Patch Compile Tests _
+1 💚 mvninstall 4m 35s the patch passed
+1 💚 compile 0m 45s the patch passed
+1 💚 javac 0m 45s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 0m 10s /buildtool-patch-checkstyle-hbase-backup.txt The patch fails to run checkstyle in hbase-backup
+1 💚 spotbugs 0m 50s the patch passed
+1 💚 hadoopcheck 13m 58s Patch does not cause any errors with Hadoop 3.3.6 3.4.0.
+1 💚 spotless 0m 58s patch has no errors when running spotless:check.
_ Other Tests _
+1 💚 asflicense 0m 12s The patch does not generate ASF License warnings.
38m 13s
Subsystem Report/Notes
Docker ClientAPI=1.48 ServerAPI=1.48 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7400/1/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #7400
Optional Tests dupname asflicense javac spotbugs checkstyle codespell detsecrets compile hadoopcheck hbaseanti spotless
uname Linux 9a0b382755a2 6.8.0-1024-aws #26~22.04.1-Ubuntu SMP Wed Feb 19 06:54:57 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision HBASE-28957 / d218c57
Default Java Eclipse Adoptium-17.0.11+9
Max. process+thread count 71 (vs. ulimit of 30000)
modules C: hbase-backup U: hbase-backup
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7400/1/console
versions git=2.34.1 maven=3.9.8 spotbugs=4.7.3
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 12s Docker mode activated.
-0 ⚠️ yetus 0m 4s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --author-ignore-list --blanks-eol-ignore-file --blanks-tabs-ignore-file --quick-hadoopcheck
_ Prechecks _
_ HBASE-28957 Compile Tests _
+1 💚 mvninstall 4m 33s HBASE-28957 passed
+1 💚 compile 0m 25s HBASE-28957 passed
+1 💚 javadoc 0m 19s HBASE-28957 passed
+1 💚 shadedjars 7m 23s branch has no errors when building our shaded downstream artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 4m 38s the patch passed
+1 💚 compile 0m 26s the patch passed
+1 💚 javac 0m 26s the patch passed
+1 💚 javadoc 0m 16s the patch passed
+1 💚 shadedjars 7m 33s patch has no errors when building our shaded downstream artifacts.
_ Other Tests _
+1 💚 unit 36m 7s hbase-backup in the patch passed.
63m 6s
Subsystem Report/Notes
Docker ClientAPI=1.48 ServerAPI=1.48 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7400/1/artifact/yetus-jdk17-hadoop3-check/output/Dockerfile
GITHUB PR #7400
Optional Tests javac javadoc unit compile shadedjars
uname Linux 248b94c6ba12 6.8.0-1024-aws #26~22.04.1-Ubuntu SMP Wed Feb 19 06:54:57 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision HBASE-28957 / d218c57
Default Java Eclipse Adoptium-17.0.11+9
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7400/1/testReport/
Max. process+thread count 1853 (vs. ulimit of 30000)
modules C: hbase-backup U: hbase-backup
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7400/1/console
versions git=2.34.1 maven=3.9.8
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

Copy link
Contributor

@anmolnar anmolnar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm.

@taklwu taklwu requested a review from Copilot October 21, 2025 21:38
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements WAL scanning to identify bulk load operations for continuous incremental backups. Instead of relying on the backup system table to track bulk load hfiles, continuous incremental backups now use BulkLoadCollectorJob to scan backed-up WAL files directly.

Key changes:

  • BulkLoadCollectorJob now runs on backed-up WALs instead of source cluster WALs for continuous incremental backups
  • The backup system table is no longer used to track bulk loads when continuous backup is enabled
  • The replication checkpoint timestamp is captured at backup start for filtering WAL entries

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
TestIncrementalBackupWithContinuous.java Updated test assertions to verify bulk load rows are not stored in system table for continuous backups
TestBackupBase.java Added empty maps to method signatures for compatibility
BulkLoadCollectorJob.java Changed constructor visibility from protected to public
TableBackupClient.java Added capture of replication checkpoint timestamp at backup start
IncrementalTableBackupClient.java Implemented WAL-based bulk load collection using BulkLoadCollectorJob for continuous backups
BackupObserver.java Modified to skip registering bulk loads in system table when continuous backup is enabled

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Path archive = new Path(archiveDir, filename);
List<Path> bulkloadPaths =
BulkFilesCollector.collectFromWalDirs(conf, walDirsCsv, collectorOutput, table, table,
tablesToPrevBackupTs.get(table), backupInfo.getIncrCommittedWalTs());
Copy link

Copilot AI Oct 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential NullPointerException if tablesToPrevBackupTs.get(table) returns null. The map may not contain an entry for the table if no previous backup exists, which would cause an NPE when the primitive long is expected.

Suggested change
tablesToPrevBackupTs.get(table), backupInfo.getIncrCommittedWalTs());
tablesToPrevBackupTs.get(table) != null ? tablesToPrevBackupTs.get(table) : 0L, backupInfo.getIncrCommittedWalTs());

Copilot uses AI. Check for mistakes.
// Continuous incremental backup: run BulkLoadCollectorJob over backed-up WALs
Path collectorOutput = new Path(getBulkOutputDir(), BULKLOAD_COLLECTOR_OUTPUT);
for (TableName table : tablesToBackup) {
String walDirsCsv = String.join(",", tablesToWALFileList.get(table));
Copy link

Copilot AI Oct 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential NullPointerException if tablesToWALFileList.get(table) returns null. If the table has no WAL files in the map, String.join will throw an NPE when attempting to join null.

Suggested change
String walDirsCsv = String.join(",", tablesToWALFileList.get(table));
List<String> walDirs = tablesToWALFileList.get(table);
String walDirsCsv = String.join(",", walDirs != null ? walDirs : java.util.Collections.emptyList());

Copilot uses AI. Check for mistakes.
Copy link
Contributor

@taklwu taklwu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just have one minor comment about the change in walToHFiles about the WALInputFormat.END_TIME_KEY, do we have any existing or new unit tests cover this timestamp change?

conf.set(WALInputFormat.END_TIME_KEY, Long.toString(backupInfo.getIncrCommittedWalTs()));


if (fullyBackedUpTables.contains(tableName)) {
if (
fullyBackedUpTables.contains(tableName) && !continuousBackupTableSet.containsKey(tableName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] do you see a lot of entries before this change that keeps registering for the same table? if so and if this is not only unit test, do you think it's a logic error from that trigger?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a suggestion. Perhaps we could add a comment stating that for continuous backup, this isn't necessary, as everything will be utilized from the WAL backup location.

Copy link
Author

@ankitsol ankitsol Oct 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I didn't understand the question completely. This BackupObserver#registerBulkLoad() is called for each bulkload operation and registers them in backup system table

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after adding the comment should have addressed my concerns, and yeah !continuousBackupTableSet.containsKey(tableName) means only non-continuous backup need this register bulkload.

Comment on lines 217 to 221
if (bulkLoadFiles.isEmpty()) {
LOG.info("No bulk-load files found for table {}", table);
} else {
mergeSplitAndCopyBulkloadedHFiles(bulkLoadFiles, table, tgtFs);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] using continue may align the style with the other loop of !backupInfo.isContinuousBackupEnabled()

Suggested change
if (bulkLoadFiles.isEmpty()) {
LOG.info("No bulk-load files found for table {}", table);
} else {
mergeSplitAndCopyBulkloadedHFiles(bulkLoadFiles, table, tgtFs);
}
if (bulkLoadFiles.isEmpty()) {
LOG.info("No bulk-load files found for table {}", table);
continue;
}
mergeSplitAndCopyBulkloadedHFiles(bulkLoadFiles, table, tgtFs);

Copy link
Contributor

@kgeisz kgeisz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have one minor nit. LGTM otherwise.

Comment on lines 168 to 171
if (!tablesToBackup.contains(srcTable)) {
LOG.debug("Skipping {} since it is not in tablesToBackup", srcTable);
continue;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Should this be moved Line 162? It looks like some variables are being set, but they could end up just not being used because of this if block.

Copy link
Contributor

@vinayakphegde vinayakphegde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added few comments regarding minimizing duplicate code.


if (fullyBackedUpTables.contains(tableName)) {
if (
fullyBackedUpTables.contains(tableName) && !continuousBackupTableSet.containsKey(tableName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a suggestion. Perhaps we could add a comment stating that for continuous backup, this isn't necessary, as everything will be utilized from the WAL backup location.

// set the start timestamp of the overall backup
long startTs = EnvironmentEdgeManager.currentTime();
backupInfo.setStartTs(startTs);
if (backupInfo.getType() == BackupType.INCREMENTAL && backupInfo.isContinuousBackupEnabled()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this logic added to TableBackupClient? Wouldn't it be more appropriate to place it in IncrementalTableBackupClient?

}

private List<String> getBackupLogs(long startTs) throws IOException {
private List<String> getBackupLogs(long startTs, long endTs) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's avoid duplicating code. We already have similar functionality for retrieving log files within a time range in org.apache.hadoop.hbase.backup.impl.AbstractPitrRestoreHandler#getValidWalDirs. Can we use that instead? We could move the file to a common location such as src/main/java/org/apache/hadoop/hbase/backup/util.

Path archiveDir = HFileArchiveUtil.getStoreArchivePath(conf, srcTable, regionName, fam);
Path archive = new Path(archiveDir, filename);
List<Path> bulkloadPaths =
BulkFilesCollector.collectFromWalDirs(conf, walDirsCsv, collectorOutput, table, table,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than calling BulkFilesCollector directly, we can use the org.apache.hadoop.hbase.backup.impl.AbstractPitrRestoreHandler#collectBulkFiles() method, which serves as a higher-level approach and internally invokes BulkFilesCollector.collectFromWalDirs(). This helps us avoid duplicating code. In both restore and incremental backup scenarios, we need to extract bulkload files by reading WAL files within a given time range, so it makes sense to have a single logic for this. We should consider placing this common logic in a utility class under the util package.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BulkFilesCollector#collectFromWalDirs() is itself a utility function. I have computed valid WAL directory using BackupUtils#getValidWalDirs() once already in IncrementalTableBackupClient#convertWALsToHFiles() so here I am reusing that. If I call AbstractPitrRestoreHandler#collectBulkFiles() it would again call BackupUtils#getValidWalDirs()

Copy link
Contributor

@anmolnar anmolnar Oct 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @ankitsol . This class should not make a call to an abstract class - you would have to make the method public -, instead move more logic to the utility class if you want to share more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BulkFilesCollector#collectFromWalDirs() is itself a utility function. I have computed valid WAL directory using BackupUtils#getValidWalDirs() once already in IncrementalTableBackupClient#convertWALsToHFiles() so here I am reusing that. If I call AbstractPitrRestoreHandler#collectBulkFiles() it would again call BackupUtils#getValidWalDirs()

Consider passing that as a parameter. Adjust the original methods as minimally as possible to accommodate both scenarios.

This class should not make a call to an abstract class

No, as mentioned earlier, we should move the shared elements to a utility class.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider passing that as a parameter. Adjust the original methods as minimally as possible to accommodate both scenarios.

Please elaborate

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I call AbstractPitrRestoreHandler#collectBulkFiles() it would again call BackupUtils#getValidWalDirs()

instead of calling BackupUtils#getValidWalDirs() inside AbstractPitrRestoreHandler#collectBulkFiles(), take the output of BackupUtils#getValidWalDirs() as parameter.

Comment on lines 73 to 75
LOG.info("Called collectFromWalDirs for source table {}, target table {}, startTime {}, endTime"
+ " {}, restoreRootDir {}", sourceTable, targetTable, startTime, endTime, restoreRootDir);

Copy link
Contributor

@taklwu taklwu Oct 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] is this a debug message? or should we delete it ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants