HADOOP-19645. [ABFS][ReadAheadV2] Improve Metrics for Read Calls to identify type of read done. #7837

anujmodi2021 · 2025-07-30T05:11:12Z

Description of PR

JIRA: https://issues.apache.org/jira/browse/HADOOP-19645

There are a number of ways in which ABFS driver can trigger a network call to read data. We need a way to identify what type of read call was made from client. Plan is to add an indication for this in already present ClientRequestId header.

Following are types of read we want to identify:

Direct Read: Read from a given position in remote file. This will be synchronous read
Normal Read: Read from current seeked position where read ahead was bypassed. This will be synchronous read.
Prefetch Read: Read triggered from background threads filling up in memory cache. This will be asynchronous read.
Missed Cache Read: Read triggered after nothing was received from read ahead. This will be synchronous read.
Footer Read: Read triggered as part of footer read optimization. This will be synchronous.
Small File Read: Read triggered as a part of small file read. This will be synchronous read.

We will add another field in the Tracing Header (Client Request Id) for each request. We can call this field "Operation Specific Header" very similar to how we have "Retry Header" today. As part of this we will only use it for read operations keeping it empty for other operations. Moving ahead f we need to publish any operation specific info, same header can be used.

How was this patch tested?

New tests around changes in Tracing Header and intoduction of read specific header added.
Existing test suite ran across all combinations. Results added as comment

hadoop-yetus · 2025-07-30T09:59:08Z

💔 -1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	1m 28s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 0s		No case conflicting files found.
+0 🆗	codespell	0m 2s		codespell was not available.
+0 🆗	detsecrets	0m 2s		detect-secrets was not available.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
+1 💚	test4tests	0m 0s		The patch appears to include 3 new or modified test files.
			_ trunk Compile Tests _
+1 💚	mvninstall	45m 31s		trunk passed
+1 💚	compile	0m 44s		trunk passed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04
+1 💚	compile	0m 37s		trunk passed with JDK Private Build-1.8.0_452-8u452-ga~~us1-0ubuntu1~~20.04-b09
+1 💚	checkstyle	0m 34s		trunk passed
+1 💚	mvnsite	0m 42s		trunk passed
+1 💚	javadoc	0m 42s		trunk passed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04
+1 💚	javadoc	0m 34s		trunk passed with JDK Private Build-1.8.0_452-8u452-ga~~us1-0ubuntu1~~20.04-b09
+1 💚	spotbugs	1m 10s		trunk passed
+1 💚	shadedclient	40m 58s		branch has no errors when building and testing our client artifacts.
			_ Patch Compile Tests _
+1 💚	mvninstall	0m 30s		the patch passed
+1 💚	compile	0m 34s		the patch passed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04
+1 💚	javac	0m 34s		the patch passed
+1 💚	compile	0m 29s		the patch passed with JDK Private Build-1.8.0_452-8u452-ga~~us1-0ubuntu1~~20.04-b09
+1 💚	javac	0m 29s		the patch passed
+1 💚	blanks	0m 0s		The patch has no blanks issues.
-0 ⚠️	checkstyle	0m 21s	/results-checkstyle-hadoop-tools_hadoop-azure.txt	hadoop-tools/hadoop-azure: The patch generated 18 new + 1 unchanged - 0 fixed = 19 total (was 1)
+1 💚	mvnsite	0m 32s		the patch passed
-1 ❌	javadoc	0m 29s	/patch-javadoc-hadoop-tools_hadoop-azure-jdkUbuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04.txt	hadoop-azure in the patch failed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04.
-1 ❌	javadoc	0m 26s	/patch-javadoc-hadoop-tools_hadoop-azure-jdkPrivateBuild-1.8.0_452-8u452-ga~~us1-0ubuntu1~~20.04-b09.txt	hadoop-azure in the patch failed with JDK Private Build-1.8.0_452-8u452-ga~~us1-0ubuntu1~~20.04-b09.
+1 💚	spotbugs	1m 9s		the patch passed
+1 💚	shadedclient	46m 42s		patch has no errors when building and testing our client artifacts.
			_ Other Tests _
+1 💚	unit	2m 42s		hadoop-azure in the patch passed.
+1 💚	asflicense	0m 35s		The patch does not generate ASF License warnings.
		148m 56s

Subsystem	Report/Notes
Docker	ClientAPI=1.51 ServerAPI=1.51 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7837/3/artifact/out/Dockerfile
GITHUB PR	#7837
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname	Linux 5f43a0e97825 5.15.0-144-generic #157-Ubuntu SMP Mon Jun 16 07:33:10 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/bin/hadoop.sh
git revision	trunk / `224f712`
Default Java	Private Build-1.8.0_452-8u452-ga~~us1-0ubuntu1~~20.04-b09
Multi-JDK versions	/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_452-8u452-ga~~us1-0ubuntu1~~20.04-b09
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7837/3/testReport/
Max. process+thread count	596 (vs. ulimit of 5500)
modules	C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7837/3/console
versions	git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by	Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

anujmodi2021 · 2025-07-30T11:15:43Z

:::: AGGREGATED TEST RESULT ::::

============================================================
HNS-OAuth-DFS

[WARNING] Tests run: 177, Failures: 0, Errors: 0, Skipped: 3
[WARNING] Tests run: 874, Failures: 0, Errors: 0, Skipped: 223
[WARNING] Tests run: 182, Failures: 0, Errors: 0, Skipped: 34
[WARNING] Tests run: 269, Failures: 0, Errors: 0, Skipped: 23

============================================================
HNS-SharedKey-DFS

[WARNING] Tests run: 177, Failures: 0, Errors: 0, Skipped: 4
[WARNING] Tests run: 874, Failures: 0, Errors: 0, Skipped: 172
[WARNING] Tests run: 182, Failures: 0, Errors: 0, Skipped: 34
[WARNING] Tests run: 269, Failures: 0, Errors: 0, Skipped: 10

============================================================
NonHNS-SharedKey-DFS

[WARNING] Tests run: 177, Failures: 0, Errors: 0, Skipped: 10
[WARNING] Tests run: 858, Failures: 0, Errors: 0, Skipped: 395
[WARNING] Tests run: 182, Failures: 0, Errors: 0, Skipped: 35
[WARNING] Tests run: 269, Failures: 0, Errors: 0, Skipped: 11

============================================================
AppendBlob-HNS-OAuth-DFS

[WARNING] Tests run: 177, Failures: 0, Errors: 0, Skipped: 3
[WARNING] Tests run: 874, Failures: 0, Errors: 0, Skipped: 234
[WARNING] Tests run: 182, Failures: 0, Errors: 0, Skipped: 58
[WARNING] Tests run: 269, Failures: 0, Errors: 0, Skipped: 23

============================================================
NonHNS-SharedKey-Blob

[WARNING] Tests run: 177, Failures: 0, Errors: 0, Skipped: 10
[WARNING] Tests run: 858, Failures: 0, Errors: 0, Skipped: 285
[WARNING] Tests run: 182, Failures: 0, Errors: 0, Skipped: 29
[WARNING] Tests run: 269, Failures: 0, Errors: 0, Skipped: 11

============================================================
NonHNS-OAuth-DFS

[WARNING] Tests run: 177, Failures: 0, Errors: 0, Skipped: 10
[WARNING] Tests run: 858, Failures: 0, Errors: 0, Skipped: 400
[WARNING] Tests run: 182, Failures: 0, Errors: 0, Skipped: 35
[WARNING] Tests run: 269, Failures: 0, Errors: 0, Skipped: 24

============================================================
NonHNS-OAuth-Blob

[WARNING] Tests run: 177, Failures: 0, Errors: 0, Skipped: 10
[WARNING] Tests run: 858, Failures: 0, Errors: 0, Skipped: 301
[WARNING] Tests run: 182, Failures: 0, Errors: 0, Skipped: 29
[WARNING] Tests run: 269, Failures: 0, Errors: 0, Skipped: 24

============================================================
AppendBlob-NonHNS-OAuth-Blob

[WARNING] Tests run: 177, Failures: 0, Errors: 0, Skipped: 10
[WARNING] Tests run: 858, Failures: 0, Errors: 0, Skipped: 346
[WARNING] Tests run: 182, Failures: 0, Errors: 0, Skipped: 53
[WARNING] Tests run: 269, Failures: 0, Errors: 0, Skipped: 24

============================================================
HNS-Oauth-DFS-IngressBlob

[WARNING] Tests run: 177, Failures: 0, Errors: 0, Skipped: 3
[WARNING] Tests run: 874, Failures: 0, Errors: 0, Skipped: 356
[WARNING] Tests run: 182, Failures: 0, Errors: 0, Skipped: 34
[WARNING] Tests run: 269, Failures: 0, Errors: 0, Skipped: 23

============================================================
NonHNS-OAuth-DFS-IngressBlob

[WARNING] Tests run: 177, Failures: 0, Errors: 0, Skipped: 10
[WARNING] Tests run: 858, Failures: 0, Errors: 0, Skipped: 397
[WARNING] Tests run: 182, Failures: 0, Errors: 0, Skipped: 35
[WARNING] Tests run: 269, Failures: 0, Errors: 0, Skipped: 24

Copilot

Pull Request Overview

This PR adds metrics to identify different types of read operations in the ABFS driver by enhancing the tracing header with operation-specific information. The main goal is to differentiate between various read types (direct, normal, prefetch, cache miss, footer, and small file reads) through the ClientRequestId header.

Key changes include:

Adding a ReadType enum to categorize different read operations
Updating the tracing header format to include versioning and operation-specific headers
Modifying read operations throughout the codebase to set appropriate ReadType values

Reviewed Changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
ReadType.java	New enum defining six read operation types with abbreviated string representations
AbfsHttpConstants.java	Added TracingHeaderVersion enum for header versioning
TracingContext.java	Enhanced header construction with versioning and operation-specific headers
Listener.java	Added interface method for updating ReadType
AbfsInputStream.java	Updated read operations to set appropriate ReadType values
ReadBufferWorker.java	Added imports for ReadType and TracingContext
TracingHeaderValidator.java	Updated validation logic for new header format
TestApacheHttpClientFallback.java	Fixed test assertions for new header structure
TestTracingContext.java	Updated header parsing for new format

Copilot · 2025-07-31T04:10:21Z

hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/utils/TracingContext.java

-        header += (":" + operatedBlobCount);
-      }
-      header += (":" + httpOperation.getTracingContextSuffix());
+          AbfsHttpConstants.TracingHeaderVersion.V1 + ":" +


The hardcoded V1 version is used in multiple places. Consider using TracingHeaderVersion.getCurrentVersion() consistently to centralize version management.

Copilot · 2025-07-31T04:10:22Z

hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/utils/TracingContext.java

+          AbfsHttpConstants.TracingHeaderVersion.V1 + ":" +
+          clientCorrelationID + ":" + clientRequestId;
      metricHeader += !(metricResults.trim().isEmpty()) ? metricResults  : "";
      break;
    default:
      //case SINGLE_ID_FORMAT
-      header = clientRequestId;
+      header =
+          AbfsHttpConstants.TracingHeaderVersion.V1 + ":" +


The hardcoded V1 version is used in multiple places. Consider using TracingHeaderVersion.getCurrentVersion() consistently to centralize version management.

Copilot · 2025-07-31T04:10:22Z

hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/utils/TracingContext.java

+          AbfsHttpConstants.TracingHeaderVersion.V1 + ":" +
+          clientCorrelationID + ":" + clientRequestId;
      metricHeader += !(metricResults.trim().isEmpty()) ? metricResults  : "";
      break;
    default:
      //case SINGLE_ID_FORMAT
-      header = clientRequestId;
+      header =
+          AbfsHttpConstants.TracingHeaderVersion.V1 + ":" +


The hardcoded V1 version is used in multiple places. Consider using TracingHeaderVersion.getCurrentVersion() consistently to centralize version management.

hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/utils/TracingContext.java

Copilot · 2025-07-31T04:10:23Z

...s/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/utils/TracingHeaderValidator.java

@@ -81,82 +85,93 @@ public TracingHeaderValidator(String clientCorrelationId, String fileSystemId,
  }

  private void validateTracingHeader(String tracingContextHeader) {
-    String[] idList = tracingContextHeader.split(":");
+    String[] idList = tracingContextHeader.split(":", -1);


[nitpick] Consider defining the split limit (-1) as a named constant to improve code readability and maintainability.

Suggested change

String[] idList = tracingContextHeader.split(":", -1);

String[] idList = tracingContextHeader.split(":", SPLIT_NO_LIMIT);

hadoop-yetus · 2025-07-31T07:13:16Z

💔 -1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	0m 21s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 0s		No case conflicting files found.
+0 🆗	codespell	0m 0s		codespell was not available.
+0 🆗	detsecrets	0m 0s		detect-secrets was not available.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
+1 💚	test4tests	0m 0s		The patch appears to include 7 new or modified test files.
			_ trunk Compile Tests _
+1 💚	mvninstall	26m 51s		trunk passed
+1 💚	compile	0m 26s		trunk passed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04
+1 💚	compile	0m 26s		trunk passed with JDK Private Build-1.8.0_452-8u452-ga~~us1-0ubuntu1~~20.04-b09
+1 💚	checkstyle	0m 22s		trunk passed
+1 💚	mvnsite	0m 30s		trunk passed
+1 💚	javadoc	0m 29s		trunk passed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04
+1 💚	javadoc	0m 24s		trunk passed with JDK Private Build-1.8.0_452-8u452-ga~~us1-0ubuntu1~~20.04-b09
+1 💚	spotbugs	0m 48s		trunk passed
+1 💚	shadedclient	21m 9s		branch has no errors when building and testing our client artifacts.
			_ Patch Compile Tests _
+1 💚	mvninstall	0m 21s		the patch passed
+1 💚	compile	0m 19s		the patch passed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04
+1 💚	javac	0m 19s		the patch passed
+1 💚	compile	0m 18s		the patch passed with JDK Private Build-1.8.0_452-8u452-ga~~us1-0ubuntu1~~20.04-b09
+1 💚	javac	0m 18s		the patch passed
+1 💚	blanks	0m 0s		The patch has no blanks issues.
-0 ⚠️	checkstyle	0m 13s	/results-checkstyle-hadoop-tools_hadoop-azure.txt	hadoop-tools/hadoop-azure: The patch generated 6 new + 5 unchanged - 0 fixed = 11 total (was 5)
+1 💚	mvnsite	0m 21s		the patch passed
+1 💚	javadoc	0m 19s		the patch passed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04
+1 💚	javadoc	0m 19s		the patch passed with JDK Private Build-1.8.0_452-8u452-ga~~us1-0ubuntu1~~20.04-b09
+1 💚	spotbugs	0m 42s		the patch passed
+1 💚	shadedclient	21m 13s		patch has no errors when building and testing our client artifacts.
			_ Other Tests _
+1 💚	unit	2m 23s		hadoop-azure in the patch passed.
-1 ❌	asflicense	0m 27s	/results-asflicense.txt	The patch generated 1 ASF License warnings.
		79m 44s

Subsystem	Report/Notes
Docker	ClientAPI=1.51 ServerAPI=1.51 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7837/4/artifact/out/Dockerfile
GITHUB PR	#7837
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname	Linux ba7a35768638 5.15.0-142-generic #152-Ubuntu SMP Mon May 19 10:54:31 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/bin/hadoop.sh
git revision	trunk / `0d926b1`
Default Java	Private Build-1.8.0_452-8u452-ga~~us1-0ubuntu1~~20.04-b09
Multi-JDK versions	/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_452-8u452-ga~~us1-0ubuntu1~~20.04-b09
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7837/4/testReport/
Max. process+thread count	555 (vs. ulimit of 5500)
modules	C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7837/4/console
versions	git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by	Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

bhattmanish98 · 2025-07-31T13:48:07Z

...ls/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/constants/AbfsHttpConstants.java

@@ -128,6 +128,7 @@ public final class AbfsHttpConstants {
  public static final String STAR = "*";
  public static final String COMMA = ",";
  public static final String COLON = ":";
+  public static final String HYPHEN = "-";


We already have CHAR_HYPHEN defined for this.

bhattmanish98 · 2025-07-31T13:51:39Z

hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/utils/TracingContext.java

@@ -265,6 +286,34 @@ private String addFailureReasons(final String header,
    return String.format("%s_%s", header, previousFailure);
  }

+  private String getRetryHeader(final String previousFailure, String retryPolicyAbbreviation) {


Please add javadoc to all newly added methods

bhattmanish98 · 2025-07-31T13:58:02Z

...ols/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/utils/TracingHeaderVersion.java

+  }
+
+  public int getFieldCount() {
+    return V1.fieldCount;


Shouldn't it be just return this.fieldCount?

bhattmanish98 · 2025-07-31T13:58:24Z

...ols/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/utils/TracingHeaderVersion.java

+  }
+
+  public String getVersion() {
+    return V1.version;


Same as above, it should be return this.version?

bhattmanish98 · 2025-07-31T13:59:17Z

...s/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/services/TestAbfsInputStream.java

@@ -781,6 +794,132 @@ public void testDefaultReadaheadQueueDepth() throws Exception {
    in.close();
  }

+  @Test
+  public void testReadTypeInTracingContextHeader() throws Exception {


Java Doc missing

hadoop-yetus · 2025-07-31T14:58:51Z

🎊 +1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	0m 24s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 0s		No case conflicting files found.
+0 🆗	codespell	0m 0s		codespell was not available.
+0 🆗	detsecrets	0m 0s		detect-secrets was not available.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
+1 💚	test4tests	0m 0s		The patch appears to include 7 new or modified test files.
			_ trunk Compile Tests _
+1 💚	mvninstall	26m 43s		trunk passed
+1 💚	compile	0m 26s		trunk passed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04
+1 💚	compile	0m 24s		trunk passed with JDK Private Build-1.8.0_452-8u452-ga~~us1-0ubuntu1~~20.04-b09
+1 💚	checkstyle	0m 24s		trunk passed
+1 💚	mvnsite	0m 29s		trunk passed
+1 💚	javadoc	0m 29s		trunk passed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04
+1 💚	javadoc	0m 23s		trunk passed with JDK Private Build-1.8.0_452-8u452-ga~~us1-0ubuntu1~~20.04-b09
+1 💚	spotbugs	0m 46s		trunk passed
+1 💚	shadedclient	20m 46s		branch has no errors when building and testing our client artifacts.
			_ Patch Compile Tests _
+1 💚	mvninstall	0m 19s		the patch passed
+1 💚	compile	0m 21s		the patch passed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04
+1 💚	javac	0m 21s		the patch passed
+1 💚	compile	0m 18s		the patch passed with JDK Private Build-1.8.0_452-8u452-ga~~us1-0ubuntu1~~20.04-b09
+1 💚	javac	0m 18s		the patch passed
+1 💚	blanks	0m 0s		The patch has no blanks issues.
+1 💚	checkstyle	0m 13s		the patch passed
+1 💚	mvnsite	0m 21s		the patch passed
+1 💚	javadoc	0m 18s		the patch passed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04
+1 💚	javadoc	0m 19s		the patch passed with JDK Private Build-1.8.0_452-8u452-ga~~us1-0ubuntu1~~20.04-b09
+1 💚	spotbugs	0m 43s		the patch passed
+1 💚	shadedclient	20m 44s		patch has no errors when building and testing our client artifacts.
			_ Other Tests _
+1 💚	unit	2m 25s		hadoop-azure in the patch passed.
+1 💚	asflicense	0m 25s		The patch does not generate ASF License warnings.
		78m 32s

Subsystem	Report/Notes
Docker	ClientAPI=1.51 ServerAPI=1.51 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7837/5/artifact/out/Dockerfile
GITHUB PR	#7837
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname	Linux b44a21d0bd2d 5.15.0-142-generic #152-Ubuntu SMP Mon May 19 10:54:31 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/bin/hadoop.sh
git revision	trunk / `9bb6cdb`
Default Java	Private Build-1.8.0_452-8u452-ga~~us1-0ubuntu1~~20.04-b09
Multi-JDK versions	/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_452-8u452-ga~~us1-0ubuntu1~~20.04-b09
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7837/5/testReport/
Max. process+thread count	676 (vs. ulimit of 5500)
modules	C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7837/5/console
versions	git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by	Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

manika137 · 2025-08-01T04:46:05Z

...tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsInputStream.java

-      receivedBytes = readRemote(position, b, offset, length, new TracingContext(tracingContext));
+      TracingContext tc = new TracingContext(tracingContext);
+      tc.setReadType(ReadType.MISSEDCACHE_READ);
+      receivedBytes = readRemote(position, b, offset, length, tc);
      return receivedBytes;
    } else {
      LOG.debug("read ahead disabled, reading remote");


Should we add readtype as normal read for this TC as well?

manika137 · 2025-08-01T05:18:00Z

...tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsInputStream.java

@@ -442,6 +451,7 @@ private int optimisedRead(final byte[] b, final int off, final int len,
    //  bCursor that means the user requested data has not been read.
    if (fCursor < contentLength && bCursor > limit) {
      restorePointerState();
+      tracingContext.setReadType(ReadType.NORMAL_READ);


Before readOneBlock we're setting TC as normal read both here and line 439. In readOneBlock method- we're setting TC again to normal read- do we need it twice?
We can keep it once in the method only otherwise

manika137 · 2025-08-01T05:51:51Z

hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/utils/TracingContext.java

+          + position + COLON
+          + operatedBlobCount + COLON
+          + httpOperation.getTracingContextSuffix() + COLON
+          + getOperationSpecificHeader(opType);


should we keep the op specific header before adding the HTTP client? It would get all req related info together and then network client.
Eg- .....:RE:1_EGR:NR:JDK

manika137 · 2025-08-01T06:01:31Z

hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/utils/TracingContext.java

@@ -265,6 +286,34 @@ private String addFailureReasons(final String header,
    return String.format("%s_%s", header, previousFailure);
  }

+  private String getRetryHeader(final String previousFailure, String retryPolicyAbbreviation) {


we can remove the addFailureReasons method- it has no usage now

manika137 · 2025-08-01T06:24:30Z

...ols/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/utils/TracingHeaderVersion.java

+public enum TracingHeaderVersion {
+
+  V0("", 8),
+  V1("v1", 13);


Since the next versions would be V1.1/V1.2- so should we consider starting with V1.0/V1.1?
And with the version updates- would we update the version field in V1 only or new V1.1 enum?

So every time we add a new header, we need to add a new version ?

manika137 · 2025-08-01T08:38:51Z

...s/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/services/TestAbfsInputStream.java

+        .contains(readType.toString());
+  }
+
+//  private testReadTypeInTracingContextHeaderInternal(ReadType readType) throws Exception {


Nit- we can remove this

anmolanmol1234 · 2025-08-01T09:14:54Z

hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/utils/TracingContext.java

@@ -77,8 +81,7 @@ public class TracingContext {
   * this field shall not be set.
   */
  private String primaryRequestIdForRetry;
-
-  private Integer operatedBlobCount = null;
+  private Integer operatedBlobCount = 1; // Only relevant for rename-delete over blob endpoint where it will be explicitly set.


why is it changed from null to 1 ?

anmolanmol1234 · 2025-08-01T09:27:00Z

...ols/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/utils/TracingHeaderVersion.java

+  }
+
+  public static TracingHeaderVersion getCurrentVersion() {
+    return V1;


this needs to be updated everytime a new version is introduced, can it be dynamically fetched ?

anmolanmol1234 · 2025-08-01T09:29:17Z

hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/utils/TracingContext.java

-      header += (":" + httpOperation.getTracingContextSuffix());
-      metricHeader += !(metricResults.trim().isEmpty()) ? metricResults  : "";
+    case ALL_ID_FORMAT:
+      header = TracingHeaderVersion.V1.getVersion() + COLON


should we use getCurrentVersion here ?

anmolanmol1234 · 2025-08-01T09:30:36Z

hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/utils/TracingContext.java

+          + streamID + COLON
+          + opType + COLON
+          + getRetryHeader(previousFailure, retryPolicyAbbreviation) + COLON
+          + ingressHandler + COLON


these empty string checks are needed

anmolanmol1234 · 2025-08-01T09:35:01Z

hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/utils/TracingContext.java

      break;
    case TWO_ID_FORMAT:
-      header = clientCorrelationID + ":" + clientRequestId;
-      metricHeader += !(metricResults.trim().isEmpty()) ? metricResults  : "";
+      header = TracingHeaderVersion.V1.getVersion() + COLON


same as above getCurrentVersion ?

anmolanmol1234 · 2025-08-01T09:41:12Z

hadoop-tools/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/TestTracingContext.java

@@ -326,8 +329,8 @@ fileSystemId, FSOperationType.CREATE_FILESYSTEM, tracingHeaderFormat, new Tracin
  }

  private void checkHeaderForRetryPolicyAbbreviation(String header, String expectedFailureReason, String expectedRetryPolicyAbbreviation) {
-    String[] headerContents = header.split(":");
-    String previousReqContext = headerContents[6];
+    String[] headerContents = header.split(":", SPLIT_NO_LIMIT);


colon constant here as well since we are changing at other places

anmolanmol1234 · 2025-08-01T09:47:18Z

...s/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/services/TestAbfsInputStream.java

+    numOfReadCalls += 3; // 3 blocks of 1MB each.
+    doReturn(false).when(spiedConfig).isReadAheadV2Enabled();
+    doReturn(false).when(spiedConfig).isReadAheadEnabled();
+    testReadTypeInTracingContextHeaderInternal(spiedFs, fileSize, NORMAL_READ, numOfReadCalls);


should we also verify that it is normal_read for all the three calls made, currently it verifies for contains

anmolanmol1234 · 2025-08-01T09:47:42Z

...s/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/services/TestAbfsInputStream.java

+    numOfReadCalls += 3;
+    doReturn(true).when(spiedConfig).isReadAheadEnabled();
+    Mockito.doReturn(3).when(spiedConfig).getReadAheadQueueDepth();
+    testReadTypeInTracingContextHeaderInternal(spiedFs, fileSize, PREFETCH_READ, numOfReadCalls);


same here verify that 2 calls have prefetch_read

anmolanmol1234 · 2025-08-01T09:49:46Z

...s/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/services/TestAbfsInputStream.java

+    doReturn(true).when(spiedConfig).readSmallFilesCompletely();
+    doReturn(false).when(spiedConfig).optimizeFooterRead();
+    testReadTypeInTracingContextHeaderInternal(spiedFs, fileSize, SMALLFILE_READ, numOfReadCalls);
+  }


One test for direct read as well ?

Introducing Read Type Metric

ea1572a

github-actions bot added trunk TOOLS ABFS labels Jul 30, 2025

anujmodi2021 changed the title ~~Introducing Read Type Metric~~ HADOOP-19645. [ABFS][ReadAheadV2] Improve Metrics for Read Calls to identify type of read done. Jul 30, 2025

Tests

42ecdd0

anujmodi2021 marked this pull request as ready for review July 30, 2025 06:25

Fixed Failing Tests

224f712

This comment was marked as outdated.

Sign in to view

anujmodi2021 requested a review from Copilot July 31, 2025 04:09

Copilot AI reviewed Jul 31, 2025

View reviewed changes

anujmodi2021 added 2 commits July 30, 2025 22:50

Additional Tests

9a87ad5

Merge branch 'trunk' into HADOOP-19645

0d926b1

Yetus Checks

9bb6cdb

bhattmanish98 reviewed Jul 31, 2025

View reviewed changes

manika137 reviewed Aug 1, 2025

View reviewed changes

anmolanmol1234 reviewed Aug 1, 2025

View reviewed changes

	String[] idList = tracingContextHeader.split(":", -1);
	String[] idList = tracingContextHeader.split(":", SPLIT_NO_LIMIT);

HADOOP-19645. [ABFS][ReadAheadV2] Improve Metrics for Read Calls to identify type of read done. #7837

Are you sure you want to change the base?

HADOOP-19645. [ABFS][ReadAheadV2] Improve Metrics for Read Calls to identify type of read done. #7837

Conversation

anujmodi2021 commented Jul 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description of PR

How was this patch tested?

Uh oh!

This comment was marked as outdated.

This comment was marked as outdated.

hadoop-yetus commented Jul 30, 2025

Uh oh!

anujmodi2021 commented Jul 30, 2025

============================================================ HNS-OAuth-DFS

============================================================ HNS-SharedKey-DFS

============================================================ NonHNS-SharedKey-DFS

============================================================ AppendBlob-HNS-OAuth-DFS

============================================================ NonHNS-SharedKey-Blob

============================================================ NonHNS-OAuth-DFS

============================================================ NonHNS-OAuth-Blob

============================================================ AppendBlob-NonHNS-OAuth-Blob

============================================================ HNS-Oauth-DFS-IngressBlob

============================================================ NonHNS-OAuth-DFS-IngressBlob

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Jul 31, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jul 31, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jul 31, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Jul 31, 2025

Choose a reason for hiding this comment

Uh oh!

hadoop-yetus commented Jul 31, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hadoop-yetus commented Jul 31, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

anmolanmol1234 Aug 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

anujmodi2021 commented Jul 30, 2025 •

edited

Loading

============================================================
HNS-OAuth-DFS

============================================================
HNS-SharedKey-DFS

============================================================
NonHNS-SharedKey-DFS

============================================================
AppendBlob-HNS-OAuth-DFS

============================================================
NonHNS-SharedKey-Blob

============================================================
NonHNS-OAuth-DFS

============================================================
NonHNS-OAuth-Blob

============================================================
AppendBlob-NonHNS-OAuth-Blob

============================================================
HNS-Oauth-DFS-IngressBlob

============================================================
NonHNS-OAuth-DFS-IngressBlob

anmolanmol1234 Aug 1, 2025 •

edited

Loading

anmolanmol1234 Aug 1, 2025 •

edited

Loading