Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HBASE-28836 Parallize the file archival to improve the split times #6616

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

mnpoonia
Copy link
Contributor

No description provided.

@mnpoonia
Copy link
Contributor Author

@virajjasani @stoty I am giving it another try. I hope this doesn't fail this time. I am running tests locally and they haven't finished yet. But i am optimistic this time. 🤞🏾

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@stoty
Copy link
Contributor

stoty commented Jan 21, 2025

I still see some failures that are not the usual flakies.

@mnpoonia mnpoonia force-pushed the parallel_HBASE-28836 branch 2 times, most recently from 4bd65e7 to d3b481d Compare January 21, 2025 08:06
@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@mnpoonia
Copy link
Contributor Author

@stoty I created same PR on 2.5 branch #6615
And i don't see any failures there. I will check the failures on this PR and will try them locally after current build finishes.

@Apache-HBase

This comment has been minimized.

@mnpoonia
Copy link
Contributor Author

@stoty No test failure this time. Please have a look. Let me know if i am missing something here.

Queue<File> failures, String startTime) {
LOG.trace("Archiving {} files concurrently into directory: {}", files.size(), baseArchiveDir);

ExecutorService executorService = Executors.newCachedThreadPool();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most such similar pools are configurable.

Have you configured making the thread pool configurable ?
Would it make sense to use a global pool here, and limit the number of concurrent move operations ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering we are shutting down the threads after execution is it okay if we give some valid constant rather than a configuration? I am of the opinion that one more configuration would not help us. I also understand that having a max cap on number of threads is an important aspect.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH I don't have a lot of experience with object storage deletion performance.

Are resolveAndArchive calls serial, or is it possible to have multiple invocations running at the same time ?

What do you think @wchevreuil, @BukrosSzabolcs ?

@mnpoonia mnpoonia force-pushed the parallel_HBASE-28836 branch from 2063007 to 63f84cc Compare January 24, 2025 10:09
chandrasekhar-188k and others added 2 commits January 24, 2025 15:43
… methods. (apache#6500)

Signed-off-by: Duo Zhang <[email protected]>
Signed-off-by: Pankaj Kumar<[email protected]>

HBASE-28836 Parallize the file archival to improve the split times
@mnpoonia mnpoonia force-pushed the parallel_HBASE-28836 branch from 63f84cc to ccb7f27 Compare January 24, 2025 10:14
Map<File, Future<Boolean>> futureMap = new HashMap<>();
// Submit file archiving tasks
// default is 16 which comes equal hbase.hstore.blockingStoreFiles default value
int maxThreads = conf.getInt("hbase.hfilearchiver.per.region.thread.pool.max", 16);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stoty Does this code make sense. Here we are able to put a limit on number of threads per region dir. And it aligns with what i was initially thinking to implement as well.

Had to force push my branch for PR because i had messed it up. Sorry for that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is definitely an improvement.

As I said, I don't have a lot of experience with object storage, neither do I know how these archival chores are started.

On a large system, we have 10s os thousands of regions, so this could still be a lot of threads, which may overload AWS's computers.

If these are started on the RSs as opposed to master, maybe it would make sense use a per-instance pool ?

I don't have enough background info to have a solid opinion, just sharing my thoughts.

Copy link
Contributor

@gvprathyusha6 gvprathyusha6 Jan 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On a large system, we have 10s os thousands of regions, so this could still be a lot of threads,

I think this could be an issue for Delete table scenario for large table where it actually archives all the regions at once in a loop, we might end up creating 16 X noOfRegions threads.

@stoty @mnpoonia
How about, keeping a flag on archiveFiles(boolean needsConcurrent), so that archival done as part of SplitProc does it concurrently and DeleteProc can still do it in existent sequential mode (since its not critical/timebound unlike SpitProcedure which can effect availability)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that would alleviate some of the potential problems.

I really don't know the controll flow and parallel behaviour here, so I cannot be more definite.

I know we have seen issues where the (serial) cleanup took several days, and had trouble keeping up, but I haven't followed the exact circumstances, i.e. was it deletion or just simple compactions.

Copy link
Contributor Author

@mnpoonia mnpoonia Jan 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To explain more of what i know.
Here the max number of threads that can be created after this change are hbase.hfilearchiver.per.region.thread.pool.max * hbase.hfilearchiver.thread.pool.max which by default comes to 8 * 16 = 128. But currently the defaults are 8 * 1 = 8 threads.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stoty I think i am missing something. Are you saying that if we have 10k files in one folder and single layer with 128 threads then that would perform better than having two level parallel threads? Can you elaborate on this a bit?

IUC The outer threadpool walks over directories, then feeds the archival operations into the inner threadpool.
If the outer pool has a size of 8, and the inner pools have a size of 16, then, since a single inner pool processes all files in a single directory, the parallelism will only be 16 (in the pathological example).

If the threads processing directories share a bigger threadpool (which is equal to the max possible in the current case, i.e. 128), then the parallelism for the arhichival file operation will be 128 even in the pathological case.

Copy link
Contributor Author

@mnpoonia mnpoonia Jan 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I was going after a case where number of regions to be cleaned are very small maybe 2 or 3 regions but number of hfiles in the regions is 50 to 100. So since we have parallelism at region layer and no parallelism within region to clean hfiles we get a performance hit.

So clearly we have two different problems and both are valid. I am just thinking now how we can solve both more generically.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mnpoonia , I'm a bit confused by how this is indeed impacting split times. My readings from the code is that this is only used from splits in the case of rollback. Am I missing something here?

Others parts this would run from master would be from delete table procedure, tuncate or the region catalog janitor. In all these cases, we would be creating one pool per region in the table being deleted/truncated, or to amount of regions to be cleaned by a given catalog janitor run.

Copy link
Contributor Author

@mnpoonia mnpoonia Jan 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're splitting a region with store files which have been compacted but not yet archived. In that case region close will wait for these compacted files to be removed which goes through HFileArchiver.archive. In one of my test the unassign proc's were taking ~30 secs with S3 as rootdir.
Also adding the code flow from my IDE for reference

Screenshot 2025-01-28 at 5 05 58 PM

Copy link
Contributor Author

@mnpoonia mnpoonia Jan 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wchevreuil Was i able to answer what you asked for?

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 26s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
_ master Compile Tests _
+1 💚 mvninstall 3m 9s master passed
+1 💚 compile 3m 3s master passed
+1 💚 checkstyle 0m 35s master passed
+1 💚 spotbugs 1m 32s master passed
+1 💚 spotless 0m 44s branch has no errors when running spotless:check.
_ Patch Compile Tests _
+1 💚 mvninstall 2m 59s the patch passed
+1 💚 compile 3m 5s the patch passed
+1 💚 javac 3m 5s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 35s the patch passed
+1 💚 spotbugs 1m 36s the patch passed
+1 💚 hadoopcheck 11m 34s Patch does not cause any errors with Hadoop 3.3.6 3.4.0.
+1 💚 spotless 0m 43s patch has no errors when running spotless:check.
_ Other Tests _
+1 💚 asflicense 0m 10s The patch does not generate ASF License warnings.
37m 11s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6616/4/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #6616
Optional Tests dupname asflicense javac spotbugs checkstyle codespell detsecrets compile hadoopcheck hbaseanti spotless
uname Linux 30c3f8365003 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / ccb7f27
Default Java Eclipse Adoptium-17.0.11+9
Max. process+thread count 83 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6616/4/console
versions git=2.34.1 maven=3.9.8 spotbugs=4.7.3
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 30s Docker mode activated.
-0 ⚠️ yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --author-ignore-list --blanks-eol-ignore-file --blanks-tabs-ignore-file --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+1 💚 mvninstall 3m 11s master passed
+1 💚 compile 0m 54s master passed
+1 💚 javadoc 0m 27s master passed
+1 💚 shadedjars 5m 48s branch has no errors when building our shaded downstream artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 3m 1s the patch passed
+1 💚 compile 0m 57s the patch passed
+1 💚 javac 0m 57s the patch passed
+1 💚 javadoc 0m 26s the patch passed
+1 💚 shadedjars 5m 47s patch has no errors when building our shaded downstream artifacts.
_ Other Tests _
+1 💚 unit 278m 27s hbase-server in the patch passed.
304m 37s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6616/4/artifact/yetus-jdk17-hadoop3-check/output/Dockerfile
GITHUB PR #6616
Optional Tests javac javadoc unit compile shadedjars
uname Linux 6325ef5cc619 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / ccb7f27
Default Java Eclipse Adoptium-17.0.11+9
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6616/4/testReport/
Max. process+thread count 4453 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6616/4/console
versions git=2.34.1 maven=3.9.8
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants