-
Notifications
You must be signed in to change notification settings - Fork 3.4k
HBASE-28836 Parallize the file archival to improve the split times #6616
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@virajjasani @stoty I am giving it another try. I hope this doesn't fail this time. I am running tests locally and they haven't finished yet. But i am optimistic this time. 🤞🏾 |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
I still see some failures that are not the usual flakies. |
4bd65e7
to
d3b481d
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
@stoty No test failure this time. Please have a look. Let me know if i am missing something here. |
hbase-server/src/main/java/org/apache/hadoop/hbase/backup/HFileArchiver.java
Show resolved
Hide resolved
Queue<File> failures, String startTime) { | ||
LOG.trace("Archiving {} files concurrently into directory: {}", files.size(), baseArchiveDir); | ||
|
||
ExecutorService executorService = Executors.newCachedThreadPool(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most such similar pools are configurable.
Have you configured making the thread pool configurable ?
Would it make sense to use a global pool here, and limit the number of concurrent move operations ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Considering we are shutting down the threads after execution is it okay if we give some valid constant rather than a configuration? I am of the opinion that one more configuration would not help us. I also understand that having a max cap on number of threads is an important aspect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TBH I don't have a lot of experience with object storage deletion performance.
Are resolveAndArchive calls serial, or is it possible to have multiple invocations running at the same time ?
What do you think @wchevreuil, @BukrosSzabolcs ?
2063007
to
63f84cc
Compare
… methods. (apache#6500) Signed-off-by: Duo Zhang <[email protected]> Signed-off-by: Pankaj Kumar<[email protected]> HBASE-28836 Parallize the file archival to improve the split times
Add a config to limit thread per region
63f84cc
to
ccb7f27
Compare
Map<File, Future<Boolean>> futureMap = new HashMap<>(); | ||
// Submit file archiving tasks | ||
// default is 16 which comes equal hbase.hstore.blockingStoreFiles default value | ||
int maxThreads = conf.getInt("hbase.hfilearchiver.per.region.thread.pool.max", 16); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stoty Does this code make sense. Here we are able to put a limit on number of threads per region dir. And it aligns with what i was initially thinking to implement as well.
Had to force push my branch for PR because i had messed it up. Sorry for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is definitely an improvement.
As I said, I don't have a lot of experience with object storage, neither do I know how these archival chores are started.
On a large system, we have 10s os thousands of regions, so this could still be a lot of threads, which may overload AWS's computers.
If these are started on the RSs as opposed to master, maybe it would make sense use a per-instance pool ?
I don't have enough background info to have a solid opinion, just sharing my thoughts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On a large system, we have 10s os thousands of regions, so this could still be a lot of threads,
I think this could be an issue for Delete table scenario for large table where it actually archives all the regions at once in a loop, we might end up creating 16 X noOfRegions threads.
@stoty @mnpoonia
How about, keeping a flag on archiveFiles(boolean needsConcurrent), so that archival done as part of SplitProc does it concurrently and DeleteProc can still do it in existent sequential mode (since its not critical/timebound unlike SpitProcedure which can effect availability)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that would alleviate some of the potential problems.
I really don't know the controll flow and parallel behaviour here, so I cannot be more definite.
I know we have seen issues where the (serial) cleanup took several days, and had trouble keeping up, but I haven't followed the exact circumstances, i.e. was it deletion or just simple compactions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To explain more of what i know.
Here the max number of threads that can be created after this change are hbase.hfilearchiver.per.region.thread.pool.max
* hbase.hfilearchiver.thread.pool.max
which by default comes to 8 * 16 = 128. But currently the defaults are 8 * 1 = 8 threads.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're splitting a region with store files which have been compacted but not yet archived. In that case region close will wait for these compacted files to be removed which goes through HFileArchiver.archive. In one of my test the unassign proc's were taking ~30 secs with S3 as rootdir.
Also adding the code flow from my IDE for reference

There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wchevreuil Was i able to answer what you asked for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stoty @wchevreuil ping/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@petersomogyi is back now, and IIRC he also worked on file deletion/cleanup issues on S3.
Can you check this and share your opinion ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This level of thread count configuration seems enough for me. In our case the large amount of files accumulated because of Guava's lazy iterator (see HBASE-27590) and frequent Master restarts.
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's move this forward. Will merge after 72 hours unless there are further comments.
Map<File, Future<Boolean>> futureMap = new HashMap<>(); | ||
// Submit file archiving tasks | ||
// default is 16 which comes equal hbase.hstore.blockingStoreFiles default value | ||
int maxThreads = conf.getInt("hbase.hfilearchiver.per.region.thread.pool.max", 16); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This level of thread count configuration seems enough for me. In our case the large amount of files accumulated because of Guava's lazy iterator (see HBASE-27590) and frequent Master restarts.
…6616) Signed-off-by: Andrew Purtell <[email protected]> Signed-off-by: Peter Somogyi <[email protected]>
…6616) Signed-off-by: Andrew Purtell <[email protected]> Signed-off-by: Peter Somogyi <[email protected]>
…6616) Signed-off-by: Andrew Purtell <[email protected]> Signed-off-by: Peter Somogyi <[email protected]>
…6616) Signed-off-by: Andrew Purtell <[email protected]> Signed-off-by: Peter Somogyi <[email protected]>
No description provided.