Skip to content

Conversation

@droudnitsky
Copy link
Contributor

@droudnitsky droudnitsky commented Oct 15, 2025

https://issues.apache.org/jira/browse/HBASE-29675

Add a bounds check before doing byte array comparison in BinaryComponentComparator and throw a descriptive OffsetOutOfBoundsException which subclasses ArrayIndexOutOfBoundsException instead of doing an unchecked byte array comparison and throwing a nondescript ArrayIndexOutOfBoundsException which is difficult for clients to decipher root cause from.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 29s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 1s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
_ master Compile Tests _
+0 🆗 mvndep 0m 16s Maven dependency ordering for branch
+1 💚 mvninstall 3m 7s master passed
+1 💚 compile 8m 27s master passed
+1 💚 checkstyle 1m 10s master passed
+1 💚 spotbugs 9m 39s master passed
+1 💚 spotless 0m 44s branch has no errors when running spotless:check.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 16s Maven dependency ordering for patch
+1 💚 mvninstall 3m 10s the patch passed
+1 💚 compile 8m 21s the patch passed
+1 💚 javac 8m 21s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 1m 11s /results-checkstyle-root.txt root: The patch generated 1 new + 4 unchanged - 0 fixed = 5 total (was 4)
+1 💚 xmllint 0m 0s No new issues.
+1 💚 spotbugs 10m 4s the patch passed
+1 💚 hadoopcheck 12m 0s Patch does not cause any errors with Hadoop 3.3.6 3.4.1.
+1 💚 spotless 0m 45s patch has no errors when running spotless:check.
_ Other Tests _
+1 💚 asflicense 0m 28s The patch does not generate ASF License warnings.
68m 9s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7389/2/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #7389
Optional Tests dupname asflicense javac spotbugs checkstyle codespell detsecrets compile hadoopcheck hbaseanti spotless xmllint
uname Linux c3422ad41524 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / cc1d916
Default Java Eclipse Adoptium-17.0.11+9
Max. process+thread count 190 (vs. ulimit of 30000)
modules C: hbase-client hbase-server . U: .
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7389/2/console
versions git=2.34.1 maven=3.9.8 spotbugs=4.7.3 xmllint=20913
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 29s Docker mode activated.
-0 ⚠️ yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --author-ignore-list --blanks-eol-ignore-file --blanks-tabs-ignore-file --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+0 🆗 mvndep 0m 26s Maven dependency ordering for branch
+1 💚 mvninstall 3m 41s master passed
+1 💚 compile 2m 16s master passed
+1 💚 javadoc 2m 42s master passed
+1 💚 shadedjars 6m 17s branch has no errors when building our shaded downstream artifacts.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 16s Maven dependency ordering for patch
+1 💚 mvninstall 3m 2s the patch passed
+1 💚 compile 2m 14s the patch passed
+1 💚 javac 2m 14s the patch passed
+1 💚 javadoc 2m 38s the patch passed
+1 💚 shadedjars 6m 11s patch has no errors when building our shaded downstream artifacts.
_ Other Tests _
-1 ❌ unit 236m 28s /patch-unit-root.txt root in the patch failed.
272m 47s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7389/2/artifact/yetus-jdk17-hadoop3-check/output/Dockerfile
GITHUB PR #7389
Optional Tests javac javadoc unit compile shadedjars
uname Linux f943f76dc312 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / cc1d916
Default Java Eclipse Adoptium-17.0.11+9
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7389/2/testReport/
Max. process+thread count 4806 (vs. ulimit of 30000)
modules C: hbase-client hbase-server . U: .
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7389/2/console
versions git=2.34.1 maven=3.9.8
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

@droudnitsky
Copy link
Contributor Author

Test failures are in TestReplicationMetricsforUI , not related

// increment the number of requests that were exceptions.
metrics.exception(e);

if (e instanceof DoNotRetryRuntimeException) throw new DoNotRetryIOException(e);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Apache9 would appreciate your advice here - do you think theres a better way to do this, possibly without introducing a new DoNotRetryRuntimeException ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me check the related code to see if there are other ways to throw checked exception out...
If not, maybe this is the only way?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you Duo for the response, the exception happens in compareTo method from Comparable interface that doesn't allow for a checked exception, so yes you are correct its not possible to throw a checked exception directly.

One option I was considering , was instead of adding a generic DoNotRetryRuntimeException that is wrapped in DoNotRetryIOException in the RpcServer layer, is to try to do this catching/conversion to checked exception in a deeper layer closer to where the comparator is applied, but I did not find a clean place or a clean way to do this, so I think adding a generic DoNotRetryRuntimeException is the best/most maintainable way, and I am thinking it can be reused in other places if needed where its not possible to throw a checked exception but we need to be able to bubble a DoNotRetryIOException to the client.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is a RuntimeException usually means there is a code bug, so we may fail to clean up something in the call stack which causes problem, as we do not expect there will be a RuntimeException...
This is what I actually concern about.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I see your point here, its not good to rely on runtime exception in the general case of issues deep in the call stack. The issue with the comparator that this is trying to address stems from the design of the comparator - I think its unique in that as far as I'm aware its the only comparator which breaks/cannot function if the shape of the data on the server clashes with the parameter the user specified when they created the comparator.

Ideally , I think this should have been designed/implemented in a way that allows the filter to skip rows/cells which are too short to be able to do the comparison, so the filter can function normally regardless of the length of the data on the server and skip where needed instead of erroring out, but because it was implemented as a comparator, AFAIK we cannot do this kind of skipping, the compareTo interface does not allow it, the only option is to error out if its not possible to do the comparison in compareTo.

To your point this seems to be a unique problem that we do not want/need a generalized solution for, given that in most cases RuntimeException is due to a code bug and not due to a normal case where is a mismatch between the parameter the user is providing and the data on the server. I am thinking to remove the generic DoNotRetryRuntimeException that I added , and in RpcServer only check for the specific OffsetOutOfBoundsException that only this comparator throws, and wrap it in DoNotRetryIOException. What do you think ? This way this special runtime exception handling only applies to this specific case and nothing else.

@Apache9
Copy link
Contributor

Apache9 commented Oct 18, 2025

Checked the code, this is only used in CompareFilter?

Actually, all the filterXXX method in Filter can throw an IOException, and I think usually throwing RuntimeException in Comparator implementation does not hurt the whole system, so maybe a possible way is to catch RuntimeException in Filter implemention, and convert it to a DoNotRetryIOException to indicate that there is a misconfigured filter or code bug.

WDYT?

Thanks.

@droudnitsky
Copy link
Contributor Author

Checked the code, this is only used in CompareFilter?

Yes those are the only types of filters one can use the problematic byte array comparator with.

Actually, all the filterXXX method in Filter can throw an IOException, and I think usually throwing RuntimeException in Comparator implementation does not hurt the whole system, so maybe a possible way is to catch RuntimeException in Filter implemention, and convert it to a DoNotRetryIOException to indicate that there is a misconfigured filter or code bug.

Ah yes thank you - I see filter methods can throw IOException in the kinds of cases such as this one - "Concrete implementers can signal a failure condition in their code by throwing an {@link IOException}" - so the comparator cannot throw IOException , but filter applying the comparator would be the next best layer that can throw checked exception where its appropriate to do the catching/wrapping of runtime exceptions coming from comparator below. I think this is a much better generalized approach to handle issues at/below filter layer.

I think its safe to assume that if a runtime exception occurs during filter application its extremely likely to happen again if the same scan RPC is retried with the same filters/data. Do you think its appropriate to treat any runtime exception that occurs during any filter application as DoNotRetryIOException ? Or limit the runtime exception handling only to CompareFilter? I think its best to handle all runtime exceptions for all filters this way if it makes sense, in order to keep things consistent and cover all possible code bugs/misconfigured filters/comparators that can lead to runtime exception.

@droudnitsky
Copy link
Contributor Author

Looked into these two options mentioned above:

Option 1 - Treat any runtime exception that occurs specifically when applying a comparator for CompareFilter as DoNotRetryIOException - does not require a very big change since all the filters which extend CompareFilter (e.g RowFilter) use the same methods to invoke the comparator, and this limits the set of runtime exceptions that we rethrow as checked exceptions to only runtime exceptions coming from comparators being used by CompareFilter - I think it makes a lot of sense to cover all comparator runtime exceptions there - I will create a new jira/PR for handling this general CompareFilter case and keep this PR limited to adding a bounds check/useful error message to BinaryComponentComparator

Option 2 - More generally wrap any runtime exception that happens during filter application as DoNotRetryIOException - requires more extensive changes , and would change the behavior of how runtime exceptions are handled across the entire filter layer. If my assumption that if a runtime exception occurs during filter application its extremely likely to happen again if the same scan RPC is retried with the same filters/data, this makes sense, but then maybe limit this only to user scans to prevent RPC retries/give client a clean exception, and for system scans keep the same behavior and not wrap the exception since system scans don't suffer from RPC retry problem and as to not change server behavior. @Apache9 wondering what you think about this

@droudnitsky
Copy link
Contributor Author

Checked the code, this is only used in CompareFilter?

Yes those are the only types of filters one can use the problematic byte array comparator with.

This was not accurate - any filter which takes a ByteArrayComparable can use the problematic byte array comparator, and majority of those (5/8) filters are an extension of CompareFilter , but there are 3 outlier ColumnValue filters I found which do not extend CompareFilter:
org.apache.hadoop.hbase.filter.ColumnValueFilter
org.apache.hadoop.hbase.filter.SingleColumnValueExcludeFilter
org.apache.hadoop.hbase.filter.SingleColumnValueFilter

So those 3 filters need special handling , and the other 5 filters can be handled through CompareFilter. Used chatgpt + ClassGraph to identify all filters which take a ByteArrayComparable -

    String pkg = "org.apache.hadoop.hbase.filter"; // <-- the package to scan
    List<Class<?>> hits = new ArrayList<>();

    try (ScanResult scan = new ClassGraph()
      .acceptPackages(pkg)
      .enableClassInfo()
      .scan()) {

      for (ClassInfo ci : scan.getAllClasses()) {
        Class<?> cls = ci.loadClass();
        for (Constructor<?> ctor : cls.getDeclaredConstructors()) {
          for (Class<?> pt : ctor.getParameterTypes()) {
            if (pt.getName().equals("org.apache.hadoop.hbase.filter.ByteArrayComparable")   // FQCN safer
              || pt.getSimpleName().equals("ByteArrayComparable")) {  // simple name
              hits.add(cls);
              break;
            }
          }
        }
      }
    }
    System.out.println("matches: " + hits.size());

org.apache.hadoop.hbase.filter.ColumnValueFilter
org.apache.hadoop.hbase.filter.CompareFilter
org.apache.hadoop.hbase.filter.DependentColumnFilter
org.apache.hadoop.hbase.filter.FamilyFilter
org.apache.hadoop.hbase.filter.QualifierFilter
org.apache.hadoop.hbase.filter.RowFilter
org.apache.hadoop.hbase.filter.SingleColumnValueExcludeFilter
org.apache.hadoop.hbase.filter.SingleColumnValueFilter
org.apache.hadoop.hbase.filter.ValueFilter

@droudnitsky
Copy link
Contributor Author

Opened this PR to handle runtime exceptions coming from a comparator at the filter layer - https://github.com/apache/hbase/pull/7397/files

@Apache9
Copy link
Contributor

Apache9 commented Oct 19, 2025

Checked the code, this is only used in CompareFilter?

Yes those are the only types of filters one can use the problematic byte array comparator with.

Actually, all the filterXXX method in Filter can throw an IOException, and I think usually throwing RuntimeException in Comparator implementation does not hurt the whole system, so maybe a possible way is to catch RuntimeException in Filter implemention, and convert it to a DoNotRetryIOException to indicate that there is a misconfigured filter or code bug.

Ah yes thank you - I see filter methods can throw IOException in the kinds of cases such as this one - "Concrete implementers can signal a failure condition in their code by throwing an {@link IOException}" - so the comparator cannot throw IOException , but filter applying the comparator would be the next best layer that can throw checked exception where its appropriate to do the catching/wrapping of runtime exceptions coming from comparator below. I think this is a much better generalized approach to handle issues at/below filter layer.

I think its safe to assume that if a runtime exception occurs during filter application its extremely likely to happen again if the same scan RPC is retried with the same filters/data. Do you think its appropriate to treat any runtime exception that occurs during any filter application as DoNotRetryIOException ? Or limit the runtime exception handling only to CompareFilter? I think its best to handle all runtime exceptions for all filters this way if it makes sense, in order to keep things consistent and cover all possible code bugs/misconfigured filters/comparators that can lead to runtime exception.

I think first we'd better only handle the offset out of bounds for BinaryComponentComparator, so maybe we can still introduce a new type of RuntimeException(maybe sub class of ArrayIndexOutOfBoundsIndex?), and in the filter implementation we only catch this exception and convert it to a DoNotRetryIOException, for others, maybe we can wrap it as a general HBaseIOException to let client retry?

WDYT?

Thanks.

@droudnitsky
Copy link
Contributor Author

Thank you Duo I think your proposal is good and incrementally improves things without changing too much behavior, so I plan to:

  1. Wrap any runtime exception coming from a comparator at the filter layer as a general HBaseIOException to let client retry - I will make small adjustment to https://github.com/apache/hbase/pull/7397/files - this will be able to be easily extended to treat certain exceptions as DoNotRetryIOException instead of HBaseIOException
  2. Add new a RuntimeException which sub classes ArrayIndexOutOfBoundsIndex, and for that specific exception we will wrap it as DoNotRetryIOException, this will be a small extension of the work done in 1, I will keep that in this PR.

If we find there are other comparators with such issues as BinaryComponentComparator we can handle them on a case by case basis instead of treating all runtime exceptions as non retryable.

@droudnitsky droudnitsky changed the title HBASE-29654 Handle offset out of bounds gracefully in BinaryComponentComparator HBASE-29675 Add bounds check/descriptive OffsetOutOfBoundsException to BinaryComponentComparator Oct 19, 2025
@droudnitsky
Copy link
Contributor Author

droudnitsky commented Oct 19, 2025

So I will split this up into three smaller tasks / PRs which should be easier to review/can be reviewed independently of one another:

  1. HBASE-29675 / this PR - Add a more descriptive OffsetOutOfBoundsException to BinaryComponentComparator which subclasses ArrayIndexOutOfBoundsIndex - provides a much clearer exception message without changing any other behavior - simple change which can be reviewed/merged without dependency on anything else
  2. HBASE-29672 / HBASE-29672 Handle runtime comparison failures during filtering gracefully #7397 - Only for filters which take a comparator, wrap runtime exceptions from comparator layer in HBaseIOException with a clear message for the client which will be retried, this PR is also ready/independent of PR 1
  3. HBASE-29676 - Handle the specific OffsetOutOfBoundsException as DoNotRetryIOException - will be a very small PR once 1 & 2 are merged to add this special case

@Apache9 when you have the time would you be able to kindly take a look at this PR which is ready as well as #7397

@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 12s Docker mode activated.
-0 ⚠️ yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --author-ignore-list --blanks-eol-ignore-file --blanks-tabs-ignore-file --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+0 🆗 mvndep 0m 18s Maven dependency ordering for branch
+1 💚 mvninstall 4m 58s master passed
+1 💚 compile 1m 50s master passed
+1 💚 javadoc 1m 3s master passed
+1 💚 shadedjars 7m 18s branch has no errors when building our shaded downstream artifacts.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 15s Maven dependency ordering for patch
+1 💚 mvninstall 4m 27s the patch passed
+1 💚 compile 1m 45s the patch passed
+1 💚 javac 1m 45s the patch passed
+1 💚 javadoc 1m 1s the patch passed
+1 💚 shadedjars 7m 11s patch has no errors when building our shaded downstream artifacts.
_ Other Tests _
+1 💚 unit 2m 46s hbase-client in the patch passed.
-1 ❌ unit 18m 17s /patch-unit-hbase-server.txt hbase-server in the patch failed.
53m 21s
Subsystem Report/Notes
Docker ClientAPI=1.48 ServerAPI=1.48 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7389/3/artifact/yetus-jdk17-hadoop3-check/output/Dockerfile
GITHUB PR #7389
Optional Tests javac javadoc unit compile shadedjars
uname Linux c50703a0f3f1 6.8.0-1024-aws #26~22.04.1-Ubuntu SMP Wed Feb 19 06:54:57 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / a420fd0
Default Java Eclipse Adoptium-17.0.11+9
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7389/3/testReport/
Max. process+thread count 775 (vs. ulimit of 30000)
modules C: hbase-client hbase-server U: .
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7389/3/console
versions git=2.34.1 maven=3.9.8
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 1m 47s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
_ master Compile Tests _
+0 🆗 mvndep 0m 18s Maven dependency ordering for branch
+1 💚 mvninstall 5m 16s master passed
+1 💚 compile 5m 35s master passed
+1 💚 checkstyle 1m 52s master passed
+1 💚 spotbugs 3m 14s master passed
+1 💚 spotless 1m 16s branch has no errors when running spotless:check.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 14s Maven dependency ordering for patch
+1 💚 mvninstall 4m 58s the patch passed
+1 💚 compile 6m 28s the patch passed
+1 💚 javac 6m 28s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 1m 55s the patch passed
+1 💚 spotbugs 4m 0s the patch passed
+1 💚 hadoopcheck 14m 58s Patch does not cause any errors with Hadoop 3.3.6 3.4.1.
+1 💚 spotless 1m 1s patch has no errors when running spotless:check.
_ Other Tests _
+1 💚 asflicense 0m 23s The patch does not generate ASF License warnings.
62m 9s
Subsystem Report/Notes
Docker ClientAPI=1.48 ServerAPI=1.48 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7389/3/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #7389
Optional Tests dupname asflicense javac spotbugs checkstyle codespell detsecrets compile hadoopcheck hbaseanti spotless
uname Linux 288ad6066b7a 6.8.0-1024-aws #26~22.04.1-Ubuntu SMP Wed Feb 19 06:54:57 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / a420fd0
Default Java Eclipse Adoptium-17.0.11+9
Max. process+thread count 71 (vs. ulimit of 30000)
modules C: hbase-client hbase-server U: .
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7389/3/console
versions git=2.34.1 maven=3.9.8 spotbugs=4.7.3
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

@droudnitsky
Copy link
Contributor Author

Test failure is in org.apache.hadoop.hbase.master.http.TestMasterStatusUtil.testGetFragmentationInfoTurnedOn and unrelated

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants