Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HBASE-28399 region size can be wrong from RegionSizeCalculator #5700

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

frostruan
Copy link
Contributor

No description provided.

@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 27s Docker mode activated.
-0 ⚠️ yetus 0m 2s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+1 💚 mvninstall 2m 52s master passed
+1 💚 compile 0m 16s master passed
+1 💚 shadedjars 5m 34s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 13s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 2m 27s the patch passed
+1 💚 compile 0m 16s the patch passed
+1 💚 javac 0m 16s the patch passed
+1 💚 shadedjars 5m 34s patch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 12s the patch passed
_ Other Tests _
-1 ❌ unit 0m 46s hbase-mapreduce in the patch failed.
19m 35s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5700/1/artifact/yetus-jdk8-hadoop3-check/output/Dockerfile
GITHUB PR #5700
Optional Tests javac javadoc unit shadedjars compile
uname Linux 96916b5ab17f 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 7be588e
Default Java Temurin-1.8.0_352-b08
unit https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5700/1/artifact/yetus-jdk8-hadoop3-check/output/patch-unit-hbase-mapreduce.txt
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5700/1/testReport/
Max. process+thread count 230 (vs. ulimit of 30000)
modules C: hbase-mapreduce U: hbase-mapreduce
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5700/1/console
versions git=2.34.1 maven=3.8.6
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 37s Docker mode activated.
-0 ⚠️ yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+1 💚 mvninstall 3m 7s master passed
+1 💚 compile 0m 20s master passed
+1 💚 shadedjars 5m 11s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 17s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 2m 49s the patch passed
+1 💚 compile 0m 20s the patch passed
+1 💚 javac 0m 20s the patch passed
+1 💚 shadedjars 5m 10s patch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 15s the patch passed
_ Other Tests _
-1 ❌ unit 0m 57s hbase-mapreduce in the patch failed.
20m 1s
Subsystem Report/Notes
Docker ClientAPI=1.44 ServerAPI=1.44 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5700/1/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile
GITHUB PR #5700
Optional Tests javac javadoc unit shadedjars compile
uname Linux 868b0960aea1 5.4.0-169-generic #187-Ubuntu SMP Thu Nov 23 14:52:28 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 7be588e
Default Java Eclipse Adoptium-11.0.17+8
unit https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5700/1/artifact/yetus-jdk11-hadoop3-check/output/patch-unit-hbase-mapreduce.txt
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5700/1/testReport/
Max. process+thread count 236 (vs. ulimit of 30000)
modules C: hbase-mapreduce U: hbase-mapreduce
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5700/1/console
versions git=2.34.1 maven=3.8.6
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@frostruan
Copy link
Contributor Author

Some unit tests failed. Let me try to fix them.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 26s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
_ master Compile Tests _
+1 💚 mvninstall 3m 8s master passed
+1 💚 compile 0m 30s master passed
+1 💚 checkstyle 0m 11s master passed
+1 💚 spotless 0m 43s branch has no errors when running spotless:check.
+1 💚 spotbugs 0m 30s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 2m 45s the patch passed
+1 💚 compile 0m 29s the patch passed
+1 💚 javac 0m 29s the patch passed
+1 💚 checkstyle 0m 10s the patch passed
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 hadoopcheck 5m 4s Patch does not cause any errors with Hadoop 3.3.6.
+1 💚 spotless 0m 40s patch has no errors when running spotless:check.
+1 💚 spotbugs 0m 35s the patch passed
_ Other Tests _
+1 💚 asflicense 0m 9s The patch does not generate ASF License warnings.
21m 52s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5700/1/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #5700
Optional Tests dupname asflicense javac spotbugs hadoopcheck hbaseanti spotless checkstyle compile
uname Linux 30e6cfeb49a2 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 7be588e
Default Java Eclipse Adoptium-11.0.17+8
Max. process+thread count 80 (vs. ulimit of 30000)
modules C: hbase-mapreduce U: hbase-mapreduce
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5700/1/console
versions git=2.34.1 maven=3.8.6 spotbugs=4.7.3
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 27s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
_ master Compile Tests _
+1 💚 mvninstall 3m 3s master passed
+1 💚 compile 0m 30s master passed
+1 💚 checkstyle 0m 11s master passed
+1 💚 spotless 0m 41s branch has no errors when running spotless:check.
+1 💚 spotbugs 0m 28s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 2m 39s the patch passed
+1 💚 compile 0m 27s the patch passed
+1 💚 javac 0m 27s the patch passed
+1 💚 checkstyle 0m 9s the patch passed
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 hadoopcheck 5m 7s Patch does not cause any errors with Hadoop 3.3.6.
+1 💚 spotless 0m 40s patch has no errors when running spotless:check.
+1 💚 spotbugs 0m 35s the patch passed
_ Other Tests _
+1 💚 asflicense 0m 9s The patch does not generate ASF License warnings.
21m 33s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5700/2/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #5700
Optional Tests dupname asflicense javac spotbugs hadoopcheck hbaseanti spotless checkstyle compile
uname Linux 262e146e230d 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 7be588e
Default Java Eclipse Adoptium-11.0.17+8
Max. process+thread count 81 (vs. ulimit of 30000)
modules C: hbase-mapreduce U: hbase-mapreduce
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5700/2/console
versions git=2.34.1 maven=3.8.6 spotbugs=4.7.3
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 26s Docker mode activated.
-0 ⚠️ yetus 0m 2s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+1 💚 mvninstall 2m 21s master passed
+1 💚 compile 0m 16s master passed
+1 💚 shadedjars 5m 31s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 11s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 2m 27s the patch passed
+1 💚 compile 0m 16s the patch passed
+1 💚 javac 0m 16s the patch passed
+1 💚 shadedjars 5m 32s patch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 11s the patch passed
_ Other Tests _
+1 💚 unit 14m 25s hbase-mapreduce in the patch passed.
32m 34s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5700/2/artifact/yetus-jdk8-hadoop3-check/output/Dockerfile
GITHUB PR #5700
Optional Tests javac javadoc unit shadedjars compile
uname Linux ac81fbdfe395 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 7be588e
Default Java Temurin-1.8.0_352-b08
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5700/2/testReport/
Max. process+thread count 2908 (vs. ulimit of 30000)
modules C: hbase-mapreduce U: hbase-mapreduce
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5700/2/console
versions git=2.34.1 maven=3.8.6
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 38s Docker mode activated.
-0 ⚠️ yetus 0m 2s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+1 💚 mvninstall 3m 3s master passed
+1 💚 compile 0m 20s master passed
+1 💚 shadedjars 5m 8s branch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 17s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 2m 49s the patch passed
+1 💚 compile 0m 20s the patch passed
+1 💚 javac 0m 20s the patch passed
+1 💚 shadedjars 5m 12s patch has no errors when building our shaded downstream artifacts.
+1 💚 javadoc 0m 16s the patch passed
_ Other Tests _
+1 💚 unit 14m 58s hbase-mapreduce in the patch passed.
34m 7s
Subsystem Report/Notes
Docker ClientAPI=1.44 ServerAPI=1.44 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5700/2/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile
GITHUB PR #5700
Optional Tests javac javadoc unit shadedjars compile
uname Linux 54fc516a6b87 5.4.0-169-generic #187-Ubuntu SMP Thu Nov 23 14:52:28 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 7be588e
Default Java Eclipse Adoptium-11.0.17+8
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5700/2/testReport/
Max. process+thread count 2990 (vs. ulimit of 30000)
modules C: hbase-mapreduce U: hbase-mapreduce
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-5700/2/console
versions git=2.34.1 maven=3.8.6
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@@ -82,8 +81,8 @@ private void init(RegionLocator regionLocator, Admin admin) throws IOException {
regionLocator.getName())) {

byte[] regionId = regionLoad.getRegionName();
long regionSizeBytes =
((long) regionLoad.getStoreFileSize().get(Size.Unit.MEGABYTE)) * MEGABYTE;
long regionSizeBytes = (long) regionLoad.getMemStoreSize().get(Size.Unit.BYTE)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we use Unit.MEGABYTE and then multiply 1024 * 1024 in the past? Can you find any related issues about this? Seems strange...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems it was like this in the first place...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for reviewing Duo.

From my understanding, before we introduce the Size api, we got region size by getMemstoreSizeMB and getStoreFileSizeMB, so I guess this is just to keep the same when applied the new Size api ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, checked the code on how we constructor the store file size and memstore size, the default unit is MB, so it is useless to pass an unit less than MB here...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But anyway, I think the problem here is we also need to account memstore size when calculating region size?

Copy link
Contributor Author

@frostruan frostruan Feb 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. otherwise the data in memstore will be lost.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that Hadoop assumes these split sizes are in megabytes and so we follow suit.

To protect against loss of precision, when the bytes-unit value is non-0, we can apply a minimum of 1mb. We should always add this minimum when running against an online-cluster.

When running against a snapshot, I'm not sure. MR over snapshots instantiates the region in the mapper process -- I assume that also reads the WAL and populates a memstore. In that case, we need the 1mb minimum here too. If not, we can permit the 0 to pass through and give the empty split optomization a chance.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To protect against loss of precision, when the bytes-unit value is non-0, we can apply a minimum of 1mb. We should always add this minimum when running against an online-cluster.

Agree. And I believe this problem has been solved in https://issues.apache.org/jira/browse/HBASE-26609

When running against a snapshot, I'm not sure. MR over snapshots instantiates the region in the mapper process -- I assume that also reads the WAL and populates a memstore. In that case, we need the 1mb minimum here too. If not, we can permit the 0 to pass through and give the empty split optomization a chance.

The value of snapshot input split length will always be 0.
https://github.com/apache/hbase/blob/rel/3.0.0-beta-1/hbase-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/TableSnapshotInputFormatImpl.java#L185
I think maybe this should be increased to 1MB too ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants