HBASE-29656 Scan WALs to identify bulkload operations for incremental backup #7400

ankitsol · 2025-10-20T10:47:44Z

Non-continuous Incremental backup uses backup system table to identify which bulkload hfiles it needs to copy.

With continuous incremental backup, this change uses BulkLoadCollectorJob to identify bulkload hfiles it needs to copy. BulkLoadCollectorJob is run on back-ed up WAL instead of source cluster WALs

JIRA: https://issues.apache.org/jira/browse/HBASE-29656

Apache-HBase · 2025-10-20T18:10:04Z

🎊 +1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	0m 12s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 0s		No case conflicting files found.
+0 🆗	codespell	0m 0s		codespell was not available.
+0 🆗	detsecrets	0m 0s		detect-secrets was not available.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
+1 💚	hbaseanti	0m 0s		Patch does not have any anti-patterns.
			_ HBASE-28957 Compile Tests _
+1 💚	mvninstall	5m 10s		HBASE-28957 passed
+1 💚	compile	0m 47s		HBASE-28957 passed
-0 ⚠️	checkstyle	0m 13s	/buildtool-branch-checkstyle-hbase-backup.txt	The patch fails to run checkstyle in hbase-backup
+1 💚	spotbugs	0m 43s		HBASE-28957 passed
+1 💚	spotless	1m 4s		branch has no errors when running spotless:check.
			_ Patch Compile Tests _
+1 💚	mvninstall	4m 35s		the patch passed
+1 💚	compile	0m 45s		the patch passed
+1 💚	javac	0m 45s		the patch passed
+1 💚	blanks	0m 0s		The patch has no blanks issues.
-0 ⚠️	checkstyle	0m 10s	/buildtool-patch-checkstyle-hbase-backup.txt	The patch fails to run checkstyle in hbase-backup
+1 💚	spotbugs	0m 50s		the patch passed
+1 💚	hadoopcheck	13m 58s		Patch does not cause any errors with Hadoop 3.3.6 3.4.0.
+1 💚	spotless	0m 58s		patch has no errors when running spotless:check.
			_ Other Tests _
+1 💚	asflicense	0m 12s		The patch does not generate ASF License warnings.
		38m 13s

Subsystem	Report/Notes
Docker	ClientAPI=1.48 ServerAPI=1.48 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7400/1/artifact/yetus-general-check/output/Dockerfile
GITHUB PR	#7400
Optional Tests	dupname asflicense javac spotbugs checkstyle codespell detsecrets compile hadoopcheck hbaseanti spotless
uname	Linux 9a0b382755a2 6.8.0-1024-aws #26~22.04.1-Ubuntu SMP Wed Feb 19 06:54:57 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/hbase-personality.sh
git revision	HBASE-28957 / `d218c57`
Default Java	Eclipse Adoptium-17.0.11+9
Max. process+thread count	71 (vs. ulimit of 30000)
modules	C: hbase-backup U: hbase-backup
Console output	https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7400/1/console
versions	git=2.34.1 maven=3.9.8 spotbugs=4.7.3
Powered by	Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

Apache-HBase · 2025-10-20T19:13:52Z

🎊 +1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	0m 12s		Docker mode activated.
-0 ⚠️	yetus	0m 4s		Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --author-ignore-list --blanks-eol-ignore-file --blanks-tabs-ignore-file --quick-hadoopcheck
			_ Prechecks _
			_ HBASE-28957 Compile Tests _
+1 💚	mvninstall	4m 33s		HBASE-28957 passed
+1 💚	compile	0m 25s		HBASE-28957 passed
+1 💚	javadoc	0m 19s		HBASE-28957 passed
+1 💚	shadedjars	7m 23s		branch has no errors when building our shaded downstream artifacts.
			_ Patch Compile Tests _
+1 💚	mvninstall	4m 38s		the patch passed
+1 💚	compile	0m 26s		the patch passed
+1 💚	javac	0m 26s		the patch passed
+1 💚	javadoc	0m 16s		the patch passed
+1 💚	shadedjars	7m 33s		patch has no errors when building our shaded downstream artifacts.
			_ Other Tests _
+1 💚	unit	36m 7s		hbase-backup in the patch passed.
		63m 6s

Subsystem	Report/Notes
Docker	ClientAPI=1.48 ServerAPI=1.48 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7400/1/artifact/yetus-jdk17-hadoop3-check/output/Dockerfile
GITHUB PR	#7400
Optional Tests	javac javadoc unit compile shadedjars
uname	Linux 248b94c6ba12 6.8.0-1024-aws #26~22.04.1-Ubuntu SMP Wed Feb 19 06:54:57 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/hbase-personality.sh
git revision	HBASE-28957 / `d218c57`
Default Java	Eclipse Adoptium-17.0.11+9
Test Results	https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7400/1/testReport/
Max. process+thread count	1853 (vs. ulimit of 30000)
modules	C: hbase-backup U: hbase-backup
Console output	https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7400/1/console
versions	git=2.34.1 maven=3.9.8
Powered by	Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

anmolnar

lgtm.

Copilot

Pull Request Overview

This PR implements WAL scanning to identify bulk load operations for continuous incremental backups. Instead of relying on the backup system table to track bulk load hfiles, continuous incremental backups now use BulkLoadCollectorJob to scan backed-up WAL files directly.

Key changes:

BulkLoadCollectorJob now runs on backed-up WALs instead of source cluster WALs for continuous incremental backups
The backup system table is no longer used to track bulk loads when continuous backup is enabled
The replication checkpoint timestamp is captured at backup start for filtering WAL entries

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
TestIncrementalBackupWithContinuous.java	Updated test assertions to verify bulk load rows are not stored in system table for continuous backups
TestBackupBase.java	Added empty maps to method signatures for compatibility
BulkLoadCollectorJob.java	Changed constructor visibility from protected to public
TableBackupClient.java	Added capture of replication checkpoint timestamp at backup start
IncrementalTableBackupClient.java	Implemented WAL-based bulk load collection using BulkLoadCollectorJob for continuous backups
BackupObserver.java	Modified to skip registering bulk loads in system table when continuous backup is enabled

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-10-21T21:39:30Z

...e-backup/src/main/java/org/apache/hadoop/hbase/backup/impl/IncrementalTableBackupClient.java

-      Path archive = new Path(archiveDir, filename);
+        List<Path> bulkloadPaths =
+          BulkFilesCollector.collectFromWalDirs(conf, walDirsCsv, collectorOutput, table, table,
+            tablesToPrevBackupTs.get(table), backupInfo.getIncrCommittedWalTs());


Potential NullPointerException if tablesToPrevBackupTs.get(table) returns null. The map may not contain an entry for the table if no previous backup exists, which would cause an NPE when the primitive long is expected.

Suggested change

tablesToPrevBackupTs.get(table), backupInfo.getIncrCommittedWalTs());

tablesToPrevBackupTs.get(table) != null ? tablesToPrevBackupTs.get(table) : 0L, backupInfo.getIncrCommittedWalTs());

Copilot · 2025-10-21T21:39:30Z

...e-backup/src/main/java/org/apache/hadoop/hbase/backup/impl/IncrementalTableBackupClient.java

+      // Continuous incremental backup: run BulkLoadCollectorJob over backed-up WALs
+      Path collectorOutput = new Path(getBulkOutputDir(), BULKLOAD_COLLECTOR_OUTPUT);
+      for (TableName table : tablesToBackup) {
+        String walDirsCsv = String.join(",", tablesToWALFileList.get(table));


Potential NullPointerException if tablesToWALFileList.get(table) returns null. If the table has no WAL files in the map, String.join will throw an NPE when attempting to join null.

Suggested change

String walDirsCsv = String.join(",", tablesToWALFileList.get(table));

List<String> walDirs = tablesToWALFileList.get(table);

String walDirsCsv = String.join(",", walDirs != null ? walDirs : java.util.Collections.emptyList());

taklwu

just have one minor comment about the change in walToHFiles about the WALInputFormat.END_TIME_KEY, do we have any existing or new unit tests cover this timestamp change?

conf.set(WALInputFormat.END_TIME_KEY, Long.toString(backupInfo.getIncrCommittedWalTs()));

taklwu · 2025-10-21T21:22:10Z

hbase-backup/src/main/java/org/apache/hadoop/hbase/backup/BackupObserver.java


-      if (fullyBackedUpTables.contains(tableName)) {
+      if (
+        fullyBackedUpTables.contains(tableName) && !continuousBackupTableSet.containsKey(tableName)


[nit] do you see a lot of entries before this change that keeps registering for the same table? if so and if this is not only unit test, do you think it's a logic error from that trigger?

I have a suggestion. Perhaps we could add a comment stating that for continuous backup, this isn't necessary, as everything will be utilized from the WAL backup location.

Sorry I didn't understand the question completely. This BackupObserver#registerBulkLoad() is called for each bulkload operation and registers them in backup system table

after adding the comment should have addressed my concerns, and yeah !continuousBackupTableSet.containsKey(tableName) means only non-continuous backup need this register bulkload.

taklwu · 2025-10-21T21:41:44Z

...e-backup/src/main/java/org/apache/hadoop/hbase/backup/impl/IncrementalTableBackupClient.java

+        if (bulkLoadFiles.isEmpty()) {
+          LOG.info("No bulk-load files found for table {}", table);
+        } else {
+          mergeSplitAndCopyBulkloadedHFiles(bulkLoadFiles, table, tgtFs);
        }


[nit] using continue may align the style with the other loop of !backupInfo.isContinuousBackupEnabled()

Suggested change

if (bulkLoadFiles.isEmpty()) {

LOG.info("No bulk-load files found for table {}", table);

} else {

mergeSplitAndCopyBulkloadedHFiles(bulkLoadFiles, table, tgtFs);

}

if (bulkLoadFiles.isEmpty()) {

LOG.info("No bulk-load files found for table {}", table);

continue;

}

mergeSplitAndCopyBulkloadedHFiles(bulkLoadFiles, table, tgtFs);

...e-backup/src/main/java/org/apache/hadoop/hbase/backup/impl/IncrementalTableBackupClient.java

kgeisz

I have one minor nit. LGTM otherwise.

kgeisz · 2025-10-21T22:26:55Z

...e-backup/src/main/java/org/apache/hadoop/hbase/backup/impl/IncrementalTableBackupClient.java

+        if (!tablesToBackup.contains(srcTable)) {
+          LOG.debug("Skipping {} since it is not in tablesToBackup", srcTable);
+          continue;
+        }


nit: Should this be moved Line 162? It looks like some variables are being set, but they could end up just not being used because of this if block.

vinayakphegde

I have added few comments regarding minimizing duplicate code.

vinayakphegde · 2025-10-22T11:33:49Z

hbase-backup/src/main/java/org/apache/hadoop/hbase/backup/BackupObserver.java


-      if (fullyBackedUpTables.contains(tableName)) {
+      if (
+        fullyBackedUpTables.contains(tableName) && !continuousBackupTableSet.containsKey(tableName)


I have a suggestion. Perhaps we could add a comment stating that for continuous backup, this isn't necessary, as everything will be utilized from the WAL backup location.

vinayakphegde · 2025-10-22T11:43:15Z

hbase-backup/src/main/java/org/apache/hadoop/hbase/backup/impl/TableBackupClient.java

    // set the start timestamp of the overall backup
    long startTs = EnvironmentEdgeManager.currentTime();
    backupInfo.setStartTs(startTs);
+    if (backupInfo.getType() == BackupType.INCREMENTAL && backupInfo.isContinuousBackupEnabled()) {


Why was this logic added to TableBackupClient? Wouldn't it be more appropriate to place it in IncrementalTableBackupClient?

vinayakphegde · 2025-10-22T11:54:05Z

...e-backup/src/main/java/org/apache/hadoop/hbase/backup/impl/IncrementalTableBackupClient.java

  }

-  private List<String> getBackupLogs(long startTs) throws IOException {
+  private List<String> getBackupLogs(long startTs, long endTs) throws IOException {


Let's avoid duplicating code. We already have similar functionality for retrieving log files within a time range in org.apache.hadoop.hbase.backup.impl.AbstractPitrRestoreHandler#getValidWalDirs. Can we use that instead? We could move the file to a common location such as src/main/java/org/apache/hadoop/hbase/backup/util.

vinayakphegde · 2025-10-22T14:06:31Z

...e-backup/src/main/java/org/apache/hadoop/hbase/backup/impl/IncrementalTableBackupClient.java

-      Path archiveDir = HFileArchiveUtil.getStoreArchivePath(conf, srcTable, regionName, fam);
-      Path archive = new Path(archiveDir, filename);
+        List<Path> bulkloadPaths =
+          BulkFilesCollector.collectFromWalDirs(conf, walDirsCsv, collectorOutput, table, table,


Rather than calling BulkFilesCollector directly, we can use the org.apache.hadoop.hbase.backup.impl.AbstractPitrRestoreHandler#collectBulkFiles() method, which serves as a higher-level approach and internally invokes BulkFilesCollector.collectFromWalDirs(). This helps us avoid duplicating code. In both restore and incremental backup scenarios, we need to extract bulkload files by reading WAL files within a given time range, so it makes sense to have a single logic for this. We should consider placing this common logic in a utility class under the util package.

BulkFilesCollector#collectFromWalDirs() is itself a utility function. I have computed valid WAL directory using BackupUtils#getValidWalDirs() once already in IncrementalTableBackupClient#convertWALsToHFiles() so here I am reusing that. If I call AbstractPitrRestoreHandler#collectBulkFiles() it would again call BackupUtils#getValidWalDirs()

I agree with @ankitsol . This class should not make a call to an abstract class - you would have to make the method public -, instead move more logic to the utility class if you want to share more.

BulkFilesCollector#collectFromWalDirs() is itself a utility function. I have computed valid WAL directory using BackupUtils#getValidWalDirs() once already in IncrementalTableBackupClient#convertWALsToHFiles() so here I am reusing that. If I call AbstractPitrRestoreHandler#collectBulkFiles() it would again call BackupUtils#getValidWalDirs()

Consider passing that as a parameter. Adjust the original methods as minimally as possible to accommodate both scenarios.

This class should not make a call to an abstract class

No, as mentioned earlier, we should move the shared elements to a utility class.

Consider passing that as a parameter. Adjust the original methods as minimally as possible to accommodate both scenarios.

Please elaborate

If I call AbstractPitrRestoreHandler#collectBulkFiles() it would again call BackupUtils#getValidWalDirs()

instead of calling BackupUtils#getValidWalDirs() inside AbstractPitrRestoreHandler#collectBulkFiles(), take the output of BackupUtils#getValidWalDirs() as parameter.

taklwu · 2025-10-23T16:34:54Z

hbase-backup/src/main/java/org/apache/hadoop/hbase/backup/util/BulkFilesCollector.java

+    LOG.info("Called collectFromWalDirs for source table {}, target table {}, startTime {}, endTime"
+      + " {}, restoreRootDir {}", sourceTable, targetTable, startTime, endTime, restoreRootDir);
+


[nit] is this a debug message? or should we delete it ?

asolomonC added 4 commits October 15, 2025 22:03

Scan WALs to identify bulkload operations for incremental backup

9db5f51

Update unit test

fa71d8d

Info log

12f1d39

Minor test fix

d218c57

anmolnar approved these changes Oct 21, 2025

View reviewed changes

taklwu requested a review from Copilot October 21, 2025 21:38

Copilot AI reviewed Oct 21, 2025

View reviewed changes

taklwu approved these changes Oct 21, 2025

View reviewed changes

kgeisz approved these changes Oct 21, 2025

View reviewed changes

vinayakphegde reviewed Oct 22, 2025

View reviewed changes

asolomonC added 4 commits October 22, 2025 20:20

Address review comments

0bc69a0

Spotless apply

a2dd167

Addressed review comment

4f414d2

spotless

b4f88b3

ankitsol requested a review from vinayakphegde October 23, 2025 16:03

vinayakphegde approved these changes Oct 23, 2025

View reviewed changes

taklwu reviewed Oct 23, 2025

View reviewed changes

asolomonC added 2 commits October 24, 2025 01:28

Remove log

730dac3

Retrigger CI

192353a

	tablesToPrevBackupTs.get(table), backupInfo.getIncrCommittedWalTs());
	tablesToPrevBackupTs.get(table) != null ? tablesToPrevBackupTs.get(table) : 0L, backupInfo.getIncrCommittedWalTs());

	String walDirsCsv = String.join(",", tablesToWALFileList.get(table));
	List<String> walDirs = tablesToWALFileList.get(table);
	String walDirsCsv = String.join(",", walDirs != null ? walDirs : java.util.Collections.emptyList());

		LOG.info("Called collectFromWalDirs for source table {}, target table {}, startTime {}, endTime"
		+ " {}, restoreRootDir {}", sourceTable, targetTable, startTime, endTime, restoreRootDir);

HBASE-29656 Scan WALs to identify bulkload operations for incremental backup #7400

Are you sure you want to change the base?

HBASE-29656 Scan WALs to identify bulkload operations for incremental backup #7400

Conversation

ankitsol commented Oct 20, 2025

Uh oh!

Apache-HBase commented Oct 20, 2025

Uh oh!

Apache-HBase commented Oct 20, 2025

Uh oh!

anmolnar left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Oct 21, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 21, 2025

Choose a reason for hiding this comment

Uh oh!

taklwu left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ankitsol Oct 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kgeisz left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vinayakphegde left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

anmolnar Oct 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

taklwu Oct 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

ankitsol Oct 22, 2025 •

edited

Loading

anmolnar Oct 22, 2025 •

edited

Loading

taklwu Oct 23, 2025 •

edited

Loading