Skip to content

HADOOP-19400: Expand specification and contract test coverage for InputStream reads. #7367

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

cnauroth
Copy link
Contributor

@cnauroth cnauroth commented Feb 7, 2025

Description of PR

Enhance the FS specification and contract tests to cover expected semantics of the InputStream single-byte and multi-byte read methods:

  • Multi-byte read should validate the arguments passed to it, according to the patterns established in the JDK base InputStream class and HDFS DFSInputStream.
  • You should get the same bytes whether going through single-byte or multi-byte read.
  • It is legal to mix calls to single-byte and multi-byte read, and this should also yield the same bytes.

How was this patch tested?

I ran all subclasses of AbstractContractOpenTest for local, HDFS and S3A (connecting to GCS's S3-compatible XML API).

mvn -o \
    -pl hadoop-common-project/hadoop-common,hadoop-hdfs-project/hadoop-hdfs,hadoop-hdfs-project/hadoop-hdfs-rbf,hadoop-tools/hadoop-aws test \
    -Dtest=TestFTPContractOpen,TestHDFSContractOpen,TestRouterHDFSContractOpenSecure,TestRouterWebHDFSContractOpen,TestRouterHDFSContractOpen,ITestS3AContractOpen,TestLocalFSContractOpen,TestRawlocalContractOpen

These new tests also pass for GoogleHadoopFileSystem.

./mvnw -o test \
    -Dtest=TestGoogleContractOpen,TestInMemoryGoogleContractOpen -Dsurefire.failIfNoSpecifiedTests=false

I generated the site locally and confirmed the new specification content.

mvn -o site site:stage -Preleasedocs,docs -DstagingDirectory=/tmp/hadoop-site

For code changes:

  • Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')?
  • Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation?
  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • If applicable, have you updated the LICENSE, LICENSE-binary, NOTICE-binary files?

@cnauroth
Copy link
Contributor Author

cnauroth commented Feb 7, 2025

Hi @steveloughran ! This is the follow-up you requested for more specification/test coverage of input stream reads. I've confirmed this is passing on local, HDFS and S3A (GCS as the back-end). I can't easily check the others unfortunately.

@cnauroth
Copy link
Contributor Author

cnauroth commented Feb 7, 2025

These new tests also pass for GoogleHadoopFileSystem.

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 11m 44s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 1s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+0 🆗 detsecrets 0m 1s detect-secrets was not available.
+0 🆗 markdownlint 0m 1s markdownlint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 2 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 36m 32s trunk passed
+1 💚 compile 16m 53s trunk passed with JDK Ubuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04
+1 💚 compile 15m 33s trunk passed with JDK Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06
+1 💚 checkstyle 1m 17s trunk passed
+1 💚 mvnsite 1m 38s trunk passed
+1 💚 javadoc 1m 18s trunk passed with JDK Ubuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04
+1 💚 javadoc 0m 54s trunk passed with JDK Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06
+1 💚 spotbugs 2m 31s trunk passed
+1 💚 shadedclient 36m 17s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 55s the patch passed
+1 💚 compile 16m 17s the patch passed with JDK Ubuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04
+1 💚 javac 16m 17s the patch passed
+1 💚 compile 15m 14s the patch passed with JDK Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06
+1 💚 javac 15m 14s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 1m 11s the patch passed
+1 💚 mvnsite 1m 34s the patch passed
+1 💚 javadoc 1m 11s the patch passed with JDK Ubuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04
+1 💚 javadoc 0m 54s the patch passed with JDK Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06
+1 💚 spotbugs 2m 38s the patch passed
+1 💚 shadedclient 36m 0s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 5m 26s hadoop-common in the patch passed.
+1 💚 asflicense 1m 4s The patch does not generate ASF License warnings.
207m 52s
Subsystem Report/Notes
Docker ClientAPI=1.47 ServerAPI=1.47 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7367/1/artifact/out/Dockerfile
GITHUB PR #7367
Optional Tests dupname asflicense mvnsite codespell detsecrets markdownlint compile javac javadoc mvninstall unit shadedclient spotbugs checkstyle
uname Linux bc49a8c022f6 5.15.0-130-generic #140-Ubuntu SMP Wed Dec 18 17:59:53 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 3a8608b
Default Java Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7367/1/testReport/
Max. process+thread count 559 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7367/1/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

Copy link
Contributor

@steveloughran steveloughran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all code is good; comments about the spec. Nice work!

@@ -175,6 +175,24 @@ must block until at least one byte is returned. Thus, for any data source
of length greater than zero, repeated invocations of this `read()` operation
will eventually read all the data.

#### Implementation Notes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. is there a way to add these specification statements in the python statements? as that's designed to be what people write tests off.
  2. Please you the SHALL/MUST/MAYR than other terms "is expected to" etc. Yes, yours is the better prose, but we want no ambiguity here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review, Steve.

  1. I pushed up a change trying to move some of this up. A few of the implementation notes remain to state the unique implementation choices of HDFS.
  2. The RFC verbiage is important. Thanks!

`IndexOutOfBoundsException` is thrown.
1. A read of `length` 0 is a no-op, and the returned `result` is 0. No exception is thrown, assuming
all other arguments are valid.
1. Reads through any method are expected to return the same data.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ooh, good one this. It's implicit in the model of data as an array of bytes, but yes, nice to call out.

dataset(len, 0x40, 0x80));
try (FSDataInputStream is = fs.openFile(path).build().get()) {
Assertions.assertThatThrownBy(() -> is.read(null, 0, 10))
.isInstanceOfAny(IllegalArgumentException.class, NullPointerException.class);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not seen this before, interesting.
I wonder if we could extend intercept() to take a list of classes.

I prefer intercept because it does two things assertj doesn't

  • returns the assert for future analysis
  • if there is no exception raised, returns the result of any lambda-expression which doesn't return void.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's an interesting idea. I just tried going down the path of writing an interceptAny accepting a list of exception classes. The problem we run into though is that we want to parameterize on the exception type E and return the specific type. If the list can contain any exception type, then we can't put a meaningful bound on the returned E. It seems all we can do is devolve back to Throwable.

I also don't currently have the use case (yet) of chaining additional analysis on that returned exception.

I think I'll hold off on this.

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 32s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+0 🆗 markdownlint 0m 0s markdownlint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 2 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 36m 2s trunk passed
+1 💚 compile 16m 56s trunk passed with JDK Ubuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04
+1 💚 compile 15m 33s trunk passed with JDK Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06
+1 💚 checkstyle 1m 16s trunk passed
+1 💚 mvnsite 1m 39s trunk passed
+1 💚 javadoc 1m 16s trunk passed with JDK Ubuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04
+1 💚 javadoc 0m 53s trunk passed with JDK Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06
+1 💚 spotbugs 2m 32s trunk passed
+1 💚 shadedclient 36m 8s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 57s the patch passed
+1 💚 compile 16m 8s the patch passed with JDK Ubuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04
+1 💚 javac 16m 8s the patch passed
+1 💚 compile 15m 25s the patch passed with JDK Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06
+1 💚 javac 15m 25s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 1m 10s the patch passed
+1 💚 mvnsite 1m 34s the patch passed
+1 💚 javadoc 1m 12s the patch passed with JDK Ubuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04
+1 💚 javadoc 0m 53s the patch passed with JDK Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06
+1 💚 spotbugs 2m 42s the patch passed
+1 💚 shadedclient 36m 10s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 14m 54s hadoop-common in the patch passed.
+1 💚 asflicense 1m 2s The patch does not generate ASF License warnings.
205m 35s
Subsystem Report/Notes
Docker ClientAPI=1.47 ServerAPI=1.47 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7367/2/artifact/out/Dockerfile
GITHUB PR #7367
Optional Tests dupname asflicense mvnsite codespell detsecrets markdownlint compile javac javadoc mvninstall unit shadedclient spotbugs checkstyle
uname Linux 5ca2bb3e9cca 5.15.0-130-generic #140-Ubuntu SMP Wed Dec 18 17:59:53 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 617d9b5
Default Java Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7367/2/testReport/
Max. process+thread count 2061 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7367/2/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@cnauroth
Copy link
Contributor Author

Hello @anujmodi2021. Are you able to apply this patch and try a test run of ITestAbfsFileSystemContractOpen, TestAdlContractOpenLive and ITestAzureNativeContractOpen? I'd like to see if the new tests I'm adding pass on Azure. Thank you.

@cnauroth
Copy link
Contributor Author

Hello @anujmodi2021. Are you able to apply this patch and try a test run of ITestAbfsFileSystemContractOpen, TestAdlContractOpenLive and ITestAzureNativeContractOpen? I'd like to see if the new tests I'm adding pass on Azure. Thank you.

@anujmodi2021 , gentle reminder on this request. Thank you!

@anujmodi2021
Copy link
Contributor

Hello @anujmodi2021. Are you able to apply this patch and try a test run of ITestAbfsFileSystemContractOpen, TestAdlContractOpenLive and ITestAzureNativeContractOpen? I'd like to see if the new tests I'm adding pass on Azure. Thank you.

@anujmodi2021 , gentle reminder on this request. Thank you!

Pardon me for the delay in response. Was not getting time to get to it. Will look into this today and get back to you soon.

@cnauroth
Copy link
Contributor Author

Hello @anujmodi2021. Are you able to apply this patch and try a test run of ITestAbfsFileSystemContractOpen, TestAdlContractOpenLive and ITestAzureNativeContractOpen? I'd like to see if the new tests I'm adding pass on Azure. Thank you.

@anujmodi2021 , gentle reminder on this request. Thank you!

Pardon me for the delay in response. Was not getting time to get to it. Will look into this today and get back to you soon.

Thank you, totally understand it, and I appreciate the response!

Copy link
Contributor

@anujmodi2021 anujmodi2021 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1
Tests run fine on Azure.

@cnauroth
Copy link
Contributor Author

+1 Tests run fine on Azure.

Awesome, thanks very much @anujmodi2021 !

@steveloughran , do you feel like we're at a +1 for this?

Copy link
Contributor

@steveloughran steveloughran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 from me

@cnauroth
Copy link
Contributor Author

cnauroth commented Mar 3, 2025

Thanks very much, @steveloughran . I'm going to hold off committing this for now to see if Anuj can help with a test run on Azure.

@cnauroth
Copy link
Contributor Author

Thanks very much, @steveloughran . I'm going to hold off committing this for now to see if Anuj can help with a test run on Azure.

Please disregard this. I forgot Anuj had already responded on this a few comments back. I'll proceed with committing.

@cnauroth cnauroth closed this in 32da0d6 Mar 11, 2025
cnauroth added a commit that referenced this pull request Mar 11, 2025
…utStream reads.

Closes #7367

Signed-off-by: Anuj Modi <[email protected]>
Signed-off-by: Steve Loughran <[email protected]>
(cherry picked from commit 32da0d6)
@cnauroth
Copy link
Contributor Author

I committed this to trunk and branch-3.4. Thanks for the reviews, @anujmodi2021 and @steveloughran .

YanivKunda pushed a commit to YanivKunda/hadoop that referenced this pull request Mar 23, 2025
…utStream reads.

Closes apache#7367

Signed-off-by: Anuj Modi <[email protected]>
Signed-off-by: Steve Loughran <[email protected]>
@ahmarsuhail
Copy link
Contributor

@cnauroth, my team is working on an analytics focussed stream for S3A: https://github.com/awslabs/analytics-accelerator-s3/tree/main (AAL)

In the stream, when offset is negative, we do Preconditions.checkArgument(offset >= 0, "Offset is negative"); and throw IllegalArgumentException.

From your docs in this PR, for len:

If the caller passes a negative value for `length`, then an unchecked exception MUST be thrown.
The base JDK `InputStream` implementation throws `IndexOutOfBoundsException`. HDFS historically used
`IllegalArgumentException`. Implementations MAY use either of these.

Should be the case for offset as well? Without this testInputStreamReadNegativePosition fails with AAL. So i'm trying to understand if we need to change the exception AAL throws or if we can add another assertion testInputStreamReadNegativePosition

@cnauroth
Copy link
Contributor Author

Hello @ahmarsuhail . I think the answer here is that a read to a negative offset is considered an error by the caller:

https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/Display.java#L430-L438

I wasn't sure if you were considering documentation updates for this. If so, I would help get that committed.

@ahmarsuhail
Copy link
Contributor

thanks @cnauroth, tracking this in: https://issues.apache.org/jira/browse/HADOOP-19556, will discuss with Steve and update docs/fix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants