Skip to content

HADOOP-19645. [ABFS][ReadAheadV2] Improve Metrics for Read Calls to identify type of read done. #7837

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: trunk
Choose a base branch
from

Conversation

anujmodi2021
Copy link
Contributor

@anujmodi2021 anujmodi2021 commented Jul 30, 2025

Description of PR

JIRA: https://issues.apache.org/jira/browse/HADOOP-19645

There are a number of ways in which ABFS driver can trigger a network call to read data. We need a way to identify what type of read call was made from client. Plan is to add an indication for this in already present ClientRequestId header.

Following are types of read we want to identify:

  1. Direct Read: Read from a given position in remote file. This will be synchronous read
  2. Normal Read: Read from current seeked position where read ahead was bypassed. This will be synchronous read.
  3. Prefetch Read: Read triggered from background threads filling up in memory cache. This will be asynchronous read.
  4. Missed Cache Read: Read triggered after nothing was received from read ahead. This will be synchronous read.
  5. Footer Read: Read triggered as part of footer read optimization. This will be synchronous.
  6. Small File Read: Read triggered as a part of small file read. This will be synchronous read.

We will add another field in the Tracing Header (Client Request Id) for each request. We can call this field "Operation Specific Header" very similar to how we have "Retry Header" today. As part of this we will only use it for read operations keeping it empty for other operations. Moving ahead f we need to publish any operation specific info, same header can be used.

How was this patch tested?

New tests around changes in Tracing Header and intoduction of read specific header added.
Existing test suite ran across all combinations. Results added as comment

@anujmodi2021 anujmodi2021 changed the title Introducing Read Type Metric HADOOP-19645. [ABFS][ReadAheadV2] Improve Metrics for Read Calls to identify type of read done. Jul 30, 2025
@anujmodi2021 anujmodi2021 marked this pull request as ready for review July 30, 2025 06:25
@hadoop-yetus

This comment was marked as outdated.

@hadoop-yetus

This comment was marked as outdated.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 1m 28s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 2s codespell was not available.
+0 🆗 detsecrets 0m 2s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 3 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 45m 31s trunk passed
+1 💚 compile 0m 44s trunk passed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04
+1 💚 compile 0m 37s trunk passed with JDK Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09
+1 💚 checkstyle 0m 34s trunk passed
+1 💚 mvnsite 0m 42s trunk passed
+1 💚 javadoc 0m 42s trunk passed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 0m 34s trunk passed with JDK Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09
+1 💚 spotbugs 1m 10s trunk passed
+1 💚 shadedclient 40m 58s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 30s the patch passed
+1 💚 compile 0m 34s the patch passed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04
+1 💚 javac 0m 34s the patch passed
+1 💚 compile 0m 29s the patch passed with JDK Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09
+1 💚 javac 0m 29s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 0m 21s /results-checkstyle-hadoop-tools_hadoop-azure.txt hadoop-tools/hadoop-azure: The patch generated 18 new + 1 unchanged - 0 fixed = 19 total (was 1)
+1 💚 mvnsite 0m 32s the patch passed
-1 ❌ javadoc 0m 29s /patch-javadoc-hadoop-tools_hadoop-azure-jdkUbuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04.txt hadoop-azure in the patch failed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04.
-1 ❌ javadoc 0m 26s /patch-javadoc-hadoop-tools_hadoop-azure-jdkPrivateBuild-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09.txt hadoop-azure in the patch failed with JDK Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09.
+1 💚 spotbugs 1m 9s the patch passed
+1 💚 shadedclient 46m 42s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 2m 42s hadoop-azure in the patch passed.
+1 💚 asflicense 0m 35s The patch does not generate ASF License warnings.
148m 56s
Subsystem Report/Notes
Docker ClientAPI=1.51 ServerAPI=1.51 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7837/3/artifact/out/Dockerfile
GITHUB PR #7837
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux 5f43a0e97825 5.15.0-144-generic #157-Ubuntu SMP Mon Jun 16 07:33:10 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 224f712
Default Java Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7837/3/testReport/
Max. process+thread count 596 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7837/3/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@anujmodi2021
Copy link
Contributor Author


:::: AGGREGATED TEST RESULT ::::

============================================================
HNS-OAuth-DFS

[WARNING] Tests run: 177, Failures: 0, Errors: 0, Skipped: 3
[WARNING] Tests run: 874, Failures: 0, Errors: 0, Skipped: 223
[WARNING] Tests run: 182, Failures: 0, Errors: 0, Skipped: 34
[WARNING] Tests run: 269, Failures: 0, Errors: 0, Skipped: 23

============================================================
HNS-SharedKey-DFS

[WARNING] Tests run: 177, Failures: 0, Errors: 0, Skipped: 4
[WARNING] Tests run: 874, Failures: 0, Errors: 0, Skipped: 172
[WARNING] Tests run: 182, Failures: 0, Errors: 0, Skipped: 34
[WARNING] Tests run: 269, Failures: 0, Errors: 0, Skipped: 10

============================================================
NonHNS-SharedKey-DFS

[WARNING] Tests run: 177, Failures: 0, Errors: 0, Skipped: 10
[WARNING] Tests run: 858, Failures: 0, Errors: 0, Skipped: 395
[WARNING] Tests run: 182, Failures: 0, Errors: 0, Skipped: 35
[WARNING] Tests run: 269, Failures: 0, Errors: 0, Skipped: 11

============================================================
AppendBlob-HNS-OAuth-DFS

[WARNING] Tests run: 177, Failures: 0, Errors: 0, Skipped: 3
[WARNING] Tests run: 874, Failures: 0, Errors: 0, Skipped: 234
[WARNING] Tests run: 182, Failures: 0, Errors: 0, Skipped: 58
[WARNING] Tests run: 269, Failures: 0, Errors: 0, Skipped: 23

============================================================
NonHNS-SharedKey-Blob

[WARNING] Tests run: 177, Failures: 0, Errors: 0, Skipped: 10
[WARNING] Tests run: 858, Failures: 0, Errors: 0, Skipped: 285
[WARNING] Tests run: 182, Failures: 0, Errors: 0, Skipped: 29
[WARNING] Tests run: 269, Failures: 0, Errors: 0, Skipped: 11

============================================================
NonHNS-OAuth-DFS

[WARNING] Tests run: 177, Failures: 0, Errors: 0, Skipped: 10
[WARNING] Tests run: 858, Failures: 0, Errors: 0, Skipped: 400
[WARNING] Tests run: 182, Failures: 0, Errors: 0, Skipped: 35
[WARNING] Tests run: 269, Failures: 0, Errors: 0, Skipped: 24

============================================================
NonHNS-OAuth-Blob

[WARNING] Tests run: 177, Failures: 0, Errors: 0, Skipped: 10
[WARNING] Tests run: 858, Failures: 0, Errors: 0, Skipped: 301
[WARNING] Tests run: 182, Failures: 0, Errors: 0, Skipped: 29
[WARNING] Tests run: 269, Failures: 0, Errors: 0, Skipped: 24

============================================================
AppendBlob-NonHNS-OAuth-Blob

[WARNING] Tests run: 177, Failures: 0, Errors: 0, Skipped: 10
[WARNING] Tests run: 858, Failures: 0, Errors: 0, Skipped: 346
[WARNING] Tests run: 182, Failures: 0, Errors: 0, Skipped: 53
[WARNING] Tests run: 269, Failures: 0, Errors: 0, Skipped: 24

============================================================
HNS-Oauth-DFS-IngressBlob

[WARNING] Tests run: 177, Failures: 0, Errors: 0, Skipped: 3
[WARNING] Tests run: 874, Failures: 0, Errors: 0, Skipped: 356
[WARNING] Tests run: 182, Failures: 0, Errors: 0, Skipped: 34
[WARNING] Tests run: 269, Failures: 0, Errors: 0, Skipped: 23

============================================================
NonHNS-OAuth-DFS-IngressBlob

[WARNING] Tests run: 177, Failures: 0, Errors: 0, Skipped: 10
[WARNING] Tests run: 858, Failures: 0, Errors: 0, Skipped: 397
[WARNING] Tests run: 182, Failures: 0, Errors: 0, Skipped: 35
[WARNING] Tests run: 269, Failures: 0, Errors: 0, Skipped: 24

@anujmodi2021 anujmodi2021 requested a review from Copilot July 31, 2025 04:09
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds metrics to identify different types of read operations in the ABFS driver by enhancing the tracing header with operation-specific information. The main goal is to differentiate between various read types (direct, normal, prefetch, cache miss, footer, and small file reads) through the ClientRequestId header.

Key changes include:

  • Adding a ReadType enum to categorize different read operations
  • Updating the tracing header format to include versioning and operation-specific headers
  • Modifying read operations throughout the codebase to set appropriate ReadType values

Reviewed Changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
ReadType.java New enum defining six read operation types with abbreviated string representations
AbfsHttpConstants.java Added TracingHeaderVersion enum for header versioning
TracingContext.java Enhanced header construction with versioning and operation-specific headers
Listener.java Added interface method for updating ReadType
AbfsInputStream.java Updated read operations to set appropriate ReadType values
ReadBufferWorker.java Added imports for ReadType and TracingContext
TracingHeaderValidator.java Updated validation logic for new header format
TestApacheHttpClientFallback.java Fixed test assertions for new header structure
TestTracingContext.java Updated header parsing for new format

header += (":" + operatedBlobCount);
}
header += (":" + httpOperation.getTracingContextSuffix());
AbfsHttpConstants.TracingHeaderVersion.V1 + ":" +
Copy link
Preview

Copilot AI Jul 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hardcoded V1 version is used in multiple places. Consider using TracingHeaderVersion.getCurrentVersion() consistently to centralize version management.

Copilot uses AI. Check for mistakes.

Comment on lines 236 to 243
AbfsHttpConstants.TracingHeaderVersion.V1 + ":" +
clientCorrelationID + ":" + clientRequestId;
metricHeader += !(metricResults.trim().isEmpty()) ? metricResults : "";
break;
default:
//case SINGLE_ID_FORMAT
header = clientRequestId;
header =
AbfsHttpConstants.TracingHeaderVersion.V1 + ":" +
Copy link
Preview

Copilot AI Jul 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hardcoded V1 version is used in multiple places. Consider using TracingHeaderVersion.getCurrentVersion() consistently to centralize version management.

Copilot uses AI. Check for mistakes.

Comment on lines 236 to 243
AbfsHttpConstants.TracingHeaderVersion.V1 + ":" +
clientCorrelationID + ":" + clientRequestId;
metricHeader += !(metricResults.trim().isEmpty()) ? metricResults : "";
break;
default:
//case SINGLE_ID_FORMAT
header = clientRequestId;
header =
AbfsHttpConstants.TracingHeaderVersion.V1 + ":" +
Copy link
Preview

Copilot AI Jul 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hardcoded V1 version is used in multiple places. Consider using TracingHeaderVersion.getCurrentVersion() consistently to centralize version management.

Copilot uses AI. Check for mistakes.

@@ -81,82 +85,93 @@ public TracingHeaderValidator(String clientCorrelationId, String fileSystemId,
}

private void validateTracingHeader(String tracingContextHeader) {
String[] idList = tracingContextHeader.split(":");
String[] idList = tracingContextHeader.split(":", -1);
Copy link
Preview

Copilot AI Jul 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Consider defining the split limit (-1) as a named constant to improve code readability and maintainability.

Suggested change
String[] idList = tracingContextHeader.split(":", -1);
String[] idList = tracingContextHeader.split(":", SPLIT_NO_LIMIT);

Copilot uses AI. Check for mistakes.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 21s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 7 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 26m 51s trunk passed
+1 💚 compile 0m 26s trunk passed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04
+1 💚 compile 0m 26s trunk passed with JDK Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09
+1 💚 checkstyle 0m 22s trunk passed
+1 💚 mvnsite 0m 30s trunk passed
+1 💚 javadoc 0m 29s trunk passed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 0m 24s trunk passed with JDK Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09
+1 💚 spotbugs 0m 48s trunk passed
+1 💚 shadedclient 21m 9s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 21s the patch passed
+1 💚 compile 0m 19s the patch passed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04
+1 💚 javac 0m 19s the patch passed
+1 💚 compile 0m 18s the patch passed with JDK Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09
+1 💚 javac 0m 18s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 0m 13s /results-checkstyle-hadoop-tools_hadoop-azure.txt hadoop-tools/hadoop-azure: The patch generated 6 new + 5 unchanged - 0 fixed = 11 total (was 5)
+1 💚 mvnsite 0m 21s the patch passed
+1 💚 javadoc 0m 19s the patch passed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 0m 19s the patch passed with JDK Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09
+1 💚 spotbugs 0m 42s the patch passed
+1 💚 shadedclient 21m 13s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 2m 23s hadoop-azure in the patch passed.
-1 ❌ asflicense 0m 27s /results-asflicense.txt The patch generated 1 ASF License warnings.
79m 44s
Subsystem Report/Notes
Docker ClientAPI=1.51 ServerAPI=1.51 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7837/4/artifact/out/Dockerfile
GITHUB PR #7837
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux ba7a35768638 5.15.0-142-generic #152-Ubuntu SMP Mon May 19 10:54:31 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 0d926b1
Default Java Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7837/4/testReport/
Max. process+thread count 555 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7837/4/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@@ -128,6 +128,7 @@ public final class AbfsHttpConstants {
public static final String STAR = "*";
public static final String COMMA = ",";
public static final String COLON = ":";
public static final String HYPHEN = "-";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have CHAR_HYPHEN defined for this.

@@ -265,6 +286,34 @@ private String addFailureReasons(final String header,
return String.format("%s_%s", header, previousFailure);
}

private String getRetryHeader(final String previousFailure, String retryPolicyAbbreviation) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add javadoc to all newly added methods

}

public int getFieldCount() {
return V1.fieldCount;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't it be just return this.fieldCount?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

}

public String getVersion() {
return V1.version;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, it should be return this.version?

@@ -781,6 +794,132 @@ public void testDefaultReadaheadQueueDepth() throws Exception {
in.close();
}

@Test
public void testReadTypeInTracingContextHeader() throws Exception {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Java Doc missing

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 24s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 7 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 26m 43s trunk passed
+1 💚 compile 0m 26s trunk passed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04
+1 💚 compile 0m 24s trunk passed with JDK Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09
+1 💚 checkstyle 0m 24s trunk passed
+1 💚 mvnsite 0m 29s trunk passed
+1 💚 javadoc 0m 29s trunk passed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 0m 23s trunk passed with JDK Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09
+1 💚 spotbugs 0m 46s trunk passed
+1 💚 shadedclient 20m 46s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 19s the patch passed
+1 💚 compile 0m 21s the patch passed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04
+1 💚 javac 0m 21s the patch passed
+1 💚 compile 0m 18s the patch passed with JDK Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09
+1 💚 javac 0m 18s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 13s the patch passed
+1 💚 mvnsite 0m 21s the patch passed
+1 💚 javadoc 0m 18s the patch passed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 0m 19s the patch passed with JDK Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09
+1 💚 spotbugs 0m 43s the patch passed
+1 💚 shadedclient 20m 44s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 2m 25s hadoop-azure in the patch passed.
+1 💚 asflicense 0m 25s The patch does not generate ASF License warnings.
78m 32s
Subsystem Report/Notes
Docker ClientAPI=1.51 ServerAPI=1.51 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7837/5/artifact/out/Dockerfile
GITHUB PR #7837
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux b44a21d0bd2d 5.15.0-142-generic #152-Ubuntu SMP Mon May 19 10:54:31 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 9bb6cdb
Default Java Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7837/5/testReport/
Max. process+thread count 676 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7837/5/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

receivedBytes = readRemote(position, b, offset, length, new TracingContext(tracingContext));
TracingContext tc = new TracingContext(tracingContext);
tc.setReadType(ReadType.MISSEDCACHE_READ);
receivedBytes = readRemote(position, b, offset, length, tc);
return receivedBytes;
} else {
LOG.debug("read ahead disabled, reading remote");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add readtype as normal read for this TC as well?

@@ -442,6 +451,7 @@ private int optimisedRead(final byte[] b, final int off, final int len,
// bCursor that means the user requested data has not been read.
if (fCursor < contentLength && bCursor > limit) {
restorePointerState();
tracingContext.setReadType(ReadType.NORMAL_READ);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before readOneBlock we're setting TC as normal read both here and line 439. In readOneBlock method- we're setting TC again to normal read- do we need it twice?
We can keep it once in the method only otherwise

+ position + COLON
+ operatedBlobCount + COLON
+ httpOperation.getTracingContextSuffix() + COLON
+ getOperationSpecificHeader(opType);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we keep the op specific header before adding the HTTP client? It would get all req related info together and then network client.
Eg- .....:RE:1_EGR:NR:JDK

@@ -265,6 +286,34 @@ private String addFailureReasons(final String header,
return String.format("%s_%s", header, previousFailure);
}

private String getRetryHeader(final String previousFailure, String retryPolicyAbbreviation) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can remove the addFailureReasons method- it has no usage now

public enum TracingHeaderVersion {

V0("", 8),
V1("v1", 13);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the next versions would be V1.1/V1.2- so should we consider starting with V1.0/V1.1?
And with the version updates- would we update the version field in V1 only or new V1.1 enum?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So every time we add a new header, we need to add a new version ?

.contains(readType.toString());
}

// private testReadTypeInTracingContextHeaderInternal(ReadType readType) throws Exception {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit- we can remove this

@@ -77,8 +81,7 @@ public class TracingContext {
* this field shall not be set.
*/
private String primaryRequestIdForRetry;

private Integer operatedBlobCount = null;
private Integer operatedBlobCount = 1; // Only relevant for rename-delete over blob endpoint where it will be explicitly set.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is it changed from null to 1 ?

}

public static TracingHeaderVersion getCurrentVersion() {
return V1;
Copy link
Contributor

@anmolanmol1234 anmolanmol1234 Aug 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs to be updated everytime a new version is introduced, can it be dynamically fetched ?

header += (":" + httpOperation.getTracingContextSuffix());
metricHeader += !(metricResults.trim().isEmpty()) ? metricResults : "";
case ALL_ID_FORMAT:
header = TracingHeaderVersion.V1.getVersion() + COLON
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we use getCurrentVersion here ?

+ streamID + COLON
+ opType + COLON
+ getRetryHeader(previousFailure, retryPolicyAbbreviation) + COLON
+ ingressHandler + COLON
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these empty string checks are needed

break;
case TWO_ID_FORMAT:
header = clientCorrelationID + ":" + clientRequestId;
metricHeader += !(metricResults.trim().isEmpty()) ? metricResults : "";
header = TracingHeaderVersion.V1.getVersion() + COLON
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above getCurrentVersion ?

@@ -326,8 +329,8 @@ fileSystemId, FSOperationType.CREATE_FILESYSTEM, tracingHeaderFormat, new Tracin
}

private void checkHeaderForRetryPolicyAbbreviation(String header, String expectedFailureReason, String expectedRetryPolicyAbbreviation) {
String[] headerContents = header.split(":");
String previousReqContext = headerContents[6];
String[] headerContents = header.split(":", SPLIT_NO_LIMIT);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

colon constant here as well since we are changing at other places

numOfReadCalls += 3; // 3 blocks of 1MB each.
doReturn(false).when(spiedConfig).isReadAheadV2Enabled();
doReturn(false).when(spiedConfig).isReadAheadEnabled();
testReadTypeInTracingContextHeaderInternal(spiedFs, fileSize, NORMAL_READ, numOfReadCalls);
Copy link
Contributor

@anmolanmol1234 anmolanmol1234 Aug 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we also verify that it is normal_read for all the three calls made, currently it verifies for contains

numOfReadCalls += 3;
doReturn(true).when(spiedConfig).isReadAheadEnabled();
Mockito.doReturn(3).when(spiedConfig).getReadAheadQueueDepth();
testReadTypeInTracingContextHeaderInternal(spiedFs, fileSize, PREFETCH_READ, numOfReadCalls);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here verify that 2 calls have prefetch_read

doReturn(true).when(spiedConfig).readSmallFilesCompletely();
doReturn(false).when(spiedConfig).optimizeFooterRead();
testReadTypeInTracingContextHeaderInternal(spiedFs, fileSize, SMALLFILE_READ, numOfReadCalls);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One test for direct read as well ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants