Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HADOOP-19400: Expand specification and contract test coverage for InputStream reads. #7367

Open
wants to merge 1 commit into
base: trunk
Choose a base branch
from

Conversation

cnauroth
Copy link
Contributor

@cnauroth cnauroth commented Feb 7, 2025

Description of PR

Enhance the FS specification and contract tests to cover expected semantics of the InputStream single-byte and multi-byte read methods:

  • Multi-byte read should validate the arguments passed to it, according to the patterns established in the JDK base InputStream class and HDFS DFSInputStream.
  • You should get the same bytes whether going through single-byte or multi-byte read.
  • It is legal to mix calls to single-byte and multi-byte read, and this should also yield the same bytes.

How was this patch tested?

I ran all subclasses of AbstractContractOpenTest for local, HDFS and S3A (connecting to GCS's S3-compatible XML API).

mvn -o \
    -pl hadoop-common-project/hadoop-common,hadoop-hdfs-project/hadoop-hdfs,hadoop-hdfs-project/hadoop-hdfs-rbf,hadoop-tools/hadoop-aws test \
    -Dtest=TestFTPContractOpen,TestHDFSContractOpen,TestRouterHDFSContractOpenSecure,TestRouterWebHDFSContractOpen,TestRouterHDFSContractOpen,ITestS3AContractOpen,TestLocalFSContractOpen,TestRawlocalContractOpen

These new tests also pass for GoogleHadoopFileSystem.

./mvnw -o test \
    -Dtest=TestGoogleContractOpen,TestInMemoryGoogleContractOpen -Dsurefire.failIfNoSpecifiedTests=false

I generated the site locally and confirmed the new specification content.

mvn -o site site:stage -Preleasedocs,docs -DstagingDirectory=/tmp/hadoop-site

For code changes:

  • Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')?
  • Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation?
  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • If applicable, have you updated the LICENSE, LICENSE-binary, NOTICE-binary files?

@cnauroth
Copy link
Contributor Author

cnauroth commented Feb 7, 2025

Hi @steveloughran ! This is the follow-up you requested for more specification/test coverage of input stream reads. I've confirmed this is passing on local, HDFS and S3A (GCS as the back-end). I can't easily check the others unfortunately.

@cnauroth
Copy link
Contributor Author

cnauroth commented Feb 7, 2025

These new tests also pass for GoogleHadoopFileSystem.

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 11m 44s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 1s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+0 🆗 detsecrets 0m 1s detect-secrets was not available.
+0 🆗 markdownlint 0m 1s markdownlint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 2 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 36m 32s trunk passed
+1 💚 compile 16m 53s trunk passed with JDK Ubuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04
+1 💚 compile 15m 33s trunk passed with JDK Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06
+1 💚 checkstyle 1m 17s trunk passed
+1 💚 mvnsite 1m 38s trunk passed
+1 💚 javadoc 1m 18s trunk passed with JDK Ubuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04
+1 💚 javadoc 0m 54s trunk passed with JDK Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06
+1 💚 spotbugs 2m 31s trunk passed
+1 💚 shadedclient 36m 17s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 55s the patch passed
+1 💚 compile 16m 17s the patch passed with JDK Ubuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04
+1 💚 javac 16m 17s the patch passed
+1 💚 compile 15m 14s the patch passed with JDK Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06
+1 💚 javac 15m 14s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 1m 11s the patch passed
+1 💚 mvnsite 1m 34s the patch passed
+1 💚 javadoc 1m 11s the patch passed with JDK Ubuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04
+1 💚 javadoc 0m 54s the patch passed with JDK Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06
+1 💚 spotbugs 2m 38s the patch passed
+1 💚 shadedclient 36m 0s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 5m 26s hadoop-common in the patch passed.
+1 💚 asflicense 1m 4s The patch does not generate ASF License warnings.
207m 52s
Subsystem Report/Notes
Docker ClientAPI=1.47 ServerAPI=1.47 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7367/1/artifact/out/Dockerfile
GITHUB PR #7367
Optional Tests dupname asflicense mvnsite codespell detsecrets markdownlint compile javac javadoc mvninstall unit shadedclient spotbugs checkstyle
uname Linux bc49a8c022f6 5.15.0-130-generic #140-Ubuntu SMP Wed Dec 18 17:59:53 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 3a8608b
Default Java Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7367/1/testReport/
Max. process+thread count 559 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7367/1/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

Copy link
Contributor

@steveloughran steveloughran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all code is good; comments about the spec. Nice work!

@@ -175,6 +175,24 @@ must block until at least one byte is returned. Thus, for any data source
of length greater than zero, repeated invocations of this `read()` operation
will eventually read all the data.

#### Implementation Notes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. is there a way to add these specification statements in the python statements? as that's designed to be what people write tests off.
  2. Please you the SHALL/MUST/MAYR than other terms "is expected to" etc. Yes, yours is the better prose, but we want no ambiguity here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review, Steve.

  1. I pushed up a change trying to move some of this up. A few of the implementation notes remain to state the unique implementation choices of HDFS.
  2. The RFC verbiage is important. Thanks!

`IndexOutOfBoundsException` is thrown.
1. A read of `length` 0 is a no-op, and the returned `result` is 0. No exception is thrown, assuming
all other arguments are valid.
1. Reads through any method are expected to return the same data.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ooh, good one this. It's implicit in the model of data as an array of bytes, but yes, nice to call out.

dataset(len, 0x40, 0x80));
try (FSDataInputStream is = fs.openFile(path).build().get()) {
Assertions.assertThatThrownBy(() -> is.read(null, 0, 10))
.isInstanceOfAny(IllegalArgumentException.class, NullPointerException.class);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not seen this before, interesting.
I wonder if we could extend intercept() to take a list of classes.

I prefer intercept because it does two things assertj doesn't

  • returns the assert for future analysis
  • if there is no exception raised, returns the result of any lambda-expression which doesn't return void.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's an interesting idea. I just tried going down the path of writing an interceptAny accepting a list of exception classes. The problem we run into though is that we want to parameterize on the exception type E and return the specific type. If the list can contain any exception type, then we can't put a meaningful bound on the returned E. It seems all we can do is devolve back to Throwable.

I also don't currently have the use case (yet) of chaining additional analysis on that returned exception.

I think I'll hold off on this.

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 32s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+0 🆗 markdownlint 0m 0s markdownlint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 2 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 36m 2s trunk passed
+1 💚 compile 16m 56s trunk passed with JDK Ubuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04
+1 💚 compile 15m 33s trunk passed with JDK Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06
+1 💚 checkstyle 1m 16s trunk passed
+1 💚 mvnsite 1m 39s trunk passed
+1 💚 javadoc 1m 16s trunk passed with JDK Ubuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04
+1 💚 javadoc 0m 53s trunk passed with JDK Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06
+1 💚 spotbugs 2m 32s trunk passed
+1 💚 shadedclient 36m 8s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 57s the patch passed
+1 💚 compile 16m 8s the patch passed with JDK Ubuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04
+1 💚 javac 16m 8s the patch passed
+1 💚 compile 15m 25s the patch passed with JDK Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06
+1 💚 javac 15m 25s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 1m 10s the patch passed
+1 💚 mvnsite 1m 34s the patch passed
+1 💚 javadoc 1m 12s the patch passed with JDK Ubuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04
+1 💚 javadoc 0m 53s the patch passed with JDK Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06
+1 💚 spotbugs 2m 42s the patch passed
+1 💚 shadedclient 36m 10s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 14m 54s hadoop-common in the patch passed.
+1 💚 asflicense 1m 2s The patch does not generate ASF License warnings.
205m 35s
Subsystem Report/Notes
Docker ClientAPI=1.47 ServerAPI=1.47 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7367/2/artifact/out/Dockerfile
GITHUB PR #7367
Optional Tests dupname asflicense mvnsite codespell detsecrets markdownlint compile javac javadoc mvninstall unit shadedclient spotbugs checkstyle
uname Linux 5ca2bb3e9cca 5.15.0-130-generic #140-Ubuntu SMP Wed Dec 18 17:59:53 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 617d9b5
Default Java Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.26+4-post-Ubuntu-1ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_442-8u442-b06us1-0ubuntu120.04-b06
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7367/2/testReport/
Max. process+thread count 2061 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7367/2/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants