Skip to content

Conversation

ahmarsuhail
Copy link
Contributor

@ahmarsuhail ahmarsuhail commented Oct 2, 2025

Description of PR

Based off of #7763

Adds stats for

  • bytes prefetched
  • footer parsing failures
  • readVectored

How was this patch tested?

For code changes:

  • Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')?
  • Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation?
  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • If applicable, have you updated the LICENSE, LICENSE-binary, NOTICE-binary files?

@github-actions github-actions bot added the Common label Oct 7, 2025
@ahmarsuhail ahmarsuhail changed the title HADOOP-19364. IoStats support for AAL [DRAFT] HADOOP-19364. IoStats support for AAL. Oct 9, 2025
@apache apache deleted a comment from hadoop-yetus Oct 9, 2025
@apache apache deleted a comment from hadoop-yetus Oct 9, 2025
@apache apache deleted a comment from hadoop-yetus Oct 9, 2025
@ahmarsuhail
Copy link
Contributor Author

@steveloughran this PR address some of your comments on the original IoStats PR.

The way the AAL code works currently means it's quite hard to report on a cache hit accurately, so I've skipped that for now. It's something we should report, but will need a bit of a rewrite on our end. I'll see how we can do that.

Also quite hard to report on durations (I couldn't think of a way, but it would be nice to do that). We'll need someway so that when the GET request starts, it creates a duration tracker, and then when it finishes, that tracker is closed. but since these callbacks are implemented at a stream level, it doesn't seem possible to track durations for each individual request. any suggestions?

Other than that this PR is now ready for another review.

@ahmarsuhail
Copy link
Contributor Author

though will need an AAL release, corresponding AAL PR: awslabs/analytics-accelerator-s3#358

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 57s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 1s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 9 new or modified test files.
_ trunk Compile Tests _
+0 🆗 mvndep 7m 47s Maven dependency ordering for branch
+1 💚 mvninstall 38m 50s trunk passed
+1 💚 compile 16m 5s trunk passed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04
+1 💚 compile 13m 52s trunk passed with JDK Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09
+1 💚 checkstyle 2m 53s trunk passed
+1 💚 mvnsite 2m 39s trunk passed
+1 💚 javadoc 2m 12s trunk passed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 1m 47s trunk passed with JDK Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09
+1 💚 spotbugs 3m 59s trunk passed
+1 💚 shadedclient 35m 17s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 36s Maven dependency ordering for patch
-1 ❌ mvninstall 0m 24s /patch-mvninstall-hadoop-tools_hadoop-aws.txt hadoop-aws in the patch failed.
-1 ❌ compile 14m 18s /patch-compile-root-jdkUbuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04.txt root in the patch failed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04.
-1 ❌ javac 14m 18s /patch-compile-root-jdkUbuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04.txt root in the patch failed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04.
-1 ❌ compile 12m 58s /patch-compile-root-jdkPrivateBuild-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09.txt root in the patch failed with JDK Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09.
-1 ❌ javac 12m 58s /patch-compile-root-jdkPrivateBuild-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09.txt root in the patch failed with JDK Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09.
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 2m 51s /results-checkstyle-root.txt root: The patch generated 23 new + 5 unchanged - 0 fixed = 28 total (was 5)
-1 ❌ mvnsite 0m 49s /patch-mvnsite-hadoop-tools_hadoop-aws.txt hadoop-aws in the patch failed.
+1 💚 javadoc 2m 5s the patch passed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 1m 45s the patch passed with JDK Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09
-1 ❌ spotbugs 0m 48s /patch-spotbugs-hadoop-tools_hadoop-aws.txt hadoop-aws in the patch failed.
+1 💚 shadedclient 36m 20s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 22m 43s hadoop-common in the patch passed.
-1 ❌ unit 0m 47s /patch-unit-hadoop-tools_hadoop-aws.txt hadoop-aws in the patch failed.
+1 💚 asflicense 1m 6s The patch does not generate ASF License warnings.
232m 12s
Subsystem Report/Notes
Docker ClientAPI=1.51 ServerAPI=1.51 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8007/4/artifact/out/Dockerfile
GITHUB PR #8007
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux ef2295fda23a 5.15.0-156-generic #166-Ubuntu SMP Sat Aug 9 00:02:46 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 76e7bfa
Default Java Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8007/4/testReport/
Max. process+thread count 1267 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8007/4/console
versions git=2.25.1 maven=3.9.11 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 33s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 1s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 9 new or modified test files.
_ trunk Compile Tests _
+0 🆗 mvndep 7m 30s Maven dependency ordering for branch
+1 💚 mvninstall 40m 50s trunk passed
+1 💚 compile 16m 9s trunk passed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04
+1 💚 compile 13m 47s trunk passed with JDK Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09
+1 💚 checkstyle 2m 52s trunk passed
+1 💚 mvnsite 2m 41s trunk passed
+1 💚 javadoc 2m 16s trunk passed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 1m 49s trunk passed with JDK Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09
+1 💚 spotbugs 3m 57s trunk passed
+1 💚 shadedclient 36m 41s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 35s Maven dependency ordering for patch
-1 ❌ mvninstall 0m 24s /patch-mvninstall-hadoop-tools_hadoop-aws.txt hadoop-aws in the patch failed.
-1 ❌ compile 14m 27s /patch-compile-root-jdkUbuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04.txt root in the patch failed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04.
-1 ❌ javac 14m 27s /patch-compile-root-jdkUbuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04.txt root in the patch failed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04.
-1 ❌ compile 13m 11s /patch-compile-root-jdkPrivateBuild-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09.txt root in the patch failed with JDK Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09.
-1 ❌ javac 13m 11s /patch-compile-root-jdkPrivateBuild-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09.txt root in the patch failed with JDK Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09.
-1 ❌ blanks 0m 0s /blanks-eol.txt The patch has 1 line(s) that end in blanks. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply
-0 ⚠️ checkstyle 2m 46s /results-checkstyle-root.txt root: The patch generated 23 new + 5 unchanged - 0 fixed = 28 total (was 5)
-1 ❌ mvnsite 0m 49s /patch-mvnsite-hadoop-tools_hadoop-aws.txt hadoop-aws in the patch failed.
+1 💚 javadoc 2m 9s the patch passed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 1m 48s the patch passed with JDK Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09
-1 ❌ spotbugs 0m 48s /patch-spotbugs-hadoop-tools_hadoop-aws.txt hadoop-aws in the patch failed.
+1 💚 shadedclient 36m 41s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 22m 43s hadoop-common in the patch passed.
-1 ❌ unit 0m 48s /patch-unit-hadoop-tools_hadoop-aws.txt hadoop-aws in the patch failed.
+1 💚 asflicense 1m 5s The patch does not generate ASF License warnings.
235m 57s
Subsystem Report/Notes
Docker ClientAPI=1.51 ServerAPI=1.51 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8007/5/artifact/out/Dockerfile
GITHUB PR #8007
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux 7b061705616c 5.15.0-144-generic #157-Ubuntu SMP Mon Jun 16 07:33:10 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 038a76f
Default Java Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8007/5/testReport/
Max. process+thread count 3150 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8007/5/console
versions git=2.25.1 maven=3.9.11 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@steveloughran
Copy link
Contributor

regarding tracking, we can tune the relevant statistic api. Maybe a factory function which returns and AutoCloseable that passed down and closed(). But that isn't enough to measure failure counts.

Maybe:

  • AAL adds an interface for something a bit like DurationTracker; with a failed() callback that could even take a Throwable if anyone wanted any extra instrumentation/tracking in future.
  • AAL takes a function/interface which asks for one of these before it fetches a block.

Copy link
Contributor

@steveloughran steveloughran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

really like this, especially how the cost tests show the savings in http IO.

some minor changes. regarding duration tracking, it would be nice, but let's not make a blocker.

Key change is to put all new statistics you want the FS to aggregate into Statistics enum, declaring type. Instrumentation scans that, creates fs metrics from it

/**
* Bytes that were prefetched by the stream.
*/
public static final String STREAM_READ_PREFETCHED_BYTES = "stream_read_prefetched_bytes";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add entries in org.apache.hadoop.fs.s3a.Statistic; these are scanned and used to create the full filesystem instance stats which the input stream updates in close()

package org.apache.hadoop.fs.s3a.impl.streams;

import org.apache.hadoop.fs.s3a.statistics.S3AInputStreamStatistics;
import software.amazon.s3.analyticsaccelerator.util.RequestCallback;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: import ordering

import static org.apache.hadoop.fs.s3a.S3ATestUtils.skipForAnyEncryptionExceptSSES3;
import static org.apache.hadoop.fs.contract.ContractTestUtils.*;
import static org.apache.hadoop.fs.s3a.Constants.ANALYTICS_ACCELERATOR_CONFIGURATION_PREFIX;
import static org.apache.hadoop.fs.s3a.S3ATestUtils.*;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prefer not the .* except for lots of constants; can you stop the IDE from auto-enabling it.

@MethodSource("params")
public class ITestS3AContractAnalyticsStreamVectoredRead extends AbstractContractVectoredReadTest {

private static final int ONE_KB = 1024;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

org.apache.hadoop.io.Sizes.S_1K

Configuration conf = super.createConfiguration();
// Set the coalesce tolerance to 1KB, default is 1MB.
conf.setInt(ANALYTICS_ACCELERATOR_CONFIGURATION_PREFIX +
"." + "physicalio.request.coalesce.tolerance", 10 * ONE_KB);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

create a new S_10K in Sizes for this, then use.

// request will be is 128KB. Since the file being read is 128KB, we need to use this here to demonstrate that
// separate GET requests are made for ranges that are not coalesced.
conf.setInt(ANALYTICS_ACCELERATOR_CONFIGURATION_PREFIX +
"." + "physicalio.readbuffersize", 32 * ONE_KB);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

S_32K

protected Configuration createConfiguration() {
Configuration conf = super.createConfiguration();
// Set the coalesce tolerance to 1KB, default is 1MB.
conf.setInt(ANALYTICS_ACCELERATOR_CONFIGURATION_PREFIX +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add new strings in Constants, and use removeBaseAndBufferOverrides to make sure there's no manual overrrides there to break tests


// Total file size is: 21511173, and read starts from pos 5. Since policy is WHOLE_FILE, the whole file starts
// getting prefetched as soon as the stream to it is opened. So prefetched bytes is 21511173 - 5 = 21511168
verifyStatisticCounterValue(ioStats, STREAM_READ_PREFETCHED_BYTES, 21511168);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

explicitly do the maths in the code "len - 5" for future maintenance. Leave that explanation. In fact, we should plan for the nightmare scenario of "file goes away" by not having any assumptions. We also need to handle test setups where its on a third-party store.

  • grab its length
  • if too short, fail the test meaningfully
  • calculate the relevant values

inputStream.seek(3 * S_1M);
inputStream.read(new byte[512 * S_1K]);

verifyStatisticCounterValue(ioStats, ACTION_HTTP_GET_REQUEST, 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

really nice to see this.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 56s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 1s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 9 new or modified test files.
_ trunk Compile Tests _
+0 🆗 mvndep 8m 22s Maven dependency ordering for branch
+1 💚 mvninstall 38m 51s trunk passed
+1 💚 compile 16m 2s trunk passed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04
+1 💚 compile 13m 45s trunk passed with JDK Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09
+1 💚 checkstyle 2m 55s trunk passed
+1 💚 mvnsite 2m 41s trunk passed
+1 💚 javadoc 2m 7s trunk passed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 1m 48s trunk passed with JDK Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09
+1 💚 spotbugs 3m 56s trunk passed
+1 💚 shadedclient 36m 22s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 35s Maven dependency ordering for patch
-1 ❌ mvninstall 0m 23s /patch-mvninstall-hadoop-tools_hadoop-aws.txt hadoop-aws in the patch failed.
-1 ❌ compile 14m 13s /patch-compile-root-jdkUbuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04.txt root in the patch failed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04.
-1 ❌ javac 14m 13s /patch-compile-root-jdkUbuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04.txt root in the patch failed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04.
-1 ❌ compile 13m 8s /patch-compile-root-jdkPrivateBuild-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09.txt root in the patch failed with JDK Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09.
-1 ❌ javac 13m 8s /patch-compile-root-jdkPrivateBuild-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09.txt root in the patch failed with JDK Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09.
-1 ❌ blanks 0m 0s /blanks-eol.txt The patch has 1 line(s) that end in blanks. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply
-0 ⚠️ checkstyle 2m 54s /results-checkstyle-root.txt root: The patch generated 30 new + 5 unchanged - 0 fixed = 35 total (was 5)
-1 ❌ mvnsite 0m 49s /patch-mvnsite-hadoop-tools_hadoop-aws.txt hadoop-aws in the patch failed.
+1 💚 javadoc 2m 4s the patch passed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 1m 47s the patch passed with JDK Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09
-1 ❌ spotbugs 0m 45s /patch-spotbugs-hadoop-tools_hadoop-aws.txt hadoop-aws in the patch failed.
+1 💚 shadedclient 36m 43s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 22m 52s hadoop-common in the patch passed.
-1 ❌ unit 0m 46s /patch-unit-hadoop-tools_hadoop-aws.txt hadoop-aws in the patch failed.
+1 💚 asflicense 1m 3s The patch does not generate ASF License warnings.
234m 22s
Subsystem Report/Notes
Docker ClientAPI=1.51 ServerAPI=1.51 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8007/6/artifact/out/Dockerfile
GITHUB PR #8007
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux 8cbb62c31500 5.15.0-156-generic #166-Ubuntu SMP Sat Aug 9 00:02:46 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / dbc684b
Default Java Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8007/6/testReport/
Max. process+thread count 1367 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8007/6/console
versions git=2.25.1 maven=3.9.11 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants