-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-51136][CORE] Set CallerContext
for History Server
#49858
Conversation
@@ -3151,9 +3152,31 @@ private[spark] object Utils | |||
} | |||
} | |||
|
|||
private[util] object CallerContext extends Logging { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I needed to relax visibility on things here to facilitate unit testing with caller context enabled. LMK if a different approach is preferred (reflection?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change looks okay for me.
If approved, can this also go into branch-3.5 please? The cherry-pick would need a minor merge conflict resolution in |
CallerContext
for History Server
SparkHadoopUtil.get.conf.getBoolean("hadoop.caller.context.enabled", false) | ||
private[spark] object CallerContext extends Logging { | ||
var callerContextEnabled: Boolean = SparkHadoopUtil.get.conf.getBoolean( | ||
HADOOP_CALLER_CONTEXT_ENABLED_KEY, HADOOP_CALLER_CONTEXT_ENABLED_DEFAULT) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you don't mind, please revert this change. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No problem, I can definitely do that. :)
Can you please help me understand the reasoning though? Does the Spark codebase prefer not to reference Hadoop's configuration constants?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, right. We prefer to avoid those compilation dependency. Not only for Hadoop, but also Hive, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, and I will keep it in mind in future patches.
* | ||
* VisibleForTesting | ||
*/ | ||
def withCallerContextEnabled[T](enabled: Boolean)(func: => T): T = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a pure test-helper utility instead of VisibleForTesting
. Do we have any benefit in main
code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no benefit to main code, other than keeping all access to callerContextEnabled
in the same object. Otherwise, people reading the code might be confused about why it's var
instead of val
.
If you prefer, I can just leave a comment on why it's var
and move this to a new CallerContextTestUtils
object under test (or some existing test utils file if you have another suggestion).
I would potentially like to reuse this in tests covering other usage of caller context, which is why I didn't put it directly in FsHistoryProviderSuite
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you confirm this PR description claims for non-YARN Spark deamons like Spark Master/Spark Worker/Spark ThriftServer
and so on? Or, is this only referring YARN ApplicationMaster
and Client
-related code?
Other Spark processes set the CallerContext, so that additional auditing context propagates in Hadoop RPC calls.
@dongjoon-hyun , thank you for the review. That's a good point. I have only confirmed existing support for YARN-based workloads, so I will update the PR description. We can continue to check support in the other processes. |
Thank you for replies. BTW, for the following question, we cannot backport the AS-IS SPARK-51136 because it's filed as
If you want this in Apache Spark 3.5.5, please convert the JIRA to |
f919079
to
b1bdd51
Compare
@dongjoon-hyun , regarding backport to 3.5, I don't think I can justify calling it a bug. It's providing an improvement, not fixing existing functionality that has been broken. I'll retract my request for the backport. I pushed up a change to stop referencing the Hadoop constants. LMK your preference on placement of the test helper. Happy to push up another change. |
BTW, I'm investigating the relevant code at this chance while reviewing this PR. It's because the existing test coverage also looks suspicious due to spark/core/src/test/scala/org/apache/spark/util/UtilsSuite.scala Lines 1005 to 1011 in 6a668cd
|
If coverage looks suspicious, then it might be that the flag is only initialized once per JVM startup, and test runs probably don't have configuration for No worries on the delay. Committers are busy! 😄 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made a PR to enable hadoop.caller.context.enabled
always during testing, @cnauroth .
Could you rebase this PR to the |
### What changes were proposed in this pull request? Initialize the Hadoop RPC `CallerContext` during History Server startup, before `FileSystem` access. Calls to HDFS will get tagged in the audit log as originating from the History Server. ### Why are the changes needed? Other YARN-based Spark processes set the `CallerContext`, so that additional auditing context propagates in Hadoop RPC calls. This PR provides auditing context for calls from the History Server. Other callers provide additional information like app ID, attempt ID, etc. We don't provide that here through History Server, which serves multiple apps/attempts. ### Does this PR introduce _any_ user-facing change? Yes. In environments that configure `hadoop.caller.context.enabled=true`, users will now see additional information in the HDFS audit logs explicitly stating that calls originated from the History Server. ### How was this patch tested? A new unit test has been added. All tests pass in the history package. ``` build/mvn -pl core test -Dtest=none -DmembersOnlySuites=org.apache.spark.deploy.history ``` When the changes are deployed to a running cluster, the new caller context is visible in the HDFS audit logs. ``` 2025-02-07 23:00:54,657 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true ugi=spark (auth:SIMPLE) ip=/10.240.5.205 cmd=open src=/133bcb94-52b8-4356-ad9b-7358c78ce7fd/spark-job-history/application_1738779819434_0012 dst=null perm=null proto=rpc callerContext=SPARK_HISTORY 2025-02-07 23:00:54,683 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true ugi=spark (auth:SIMPLE) ip=/10.240.5.205 cmd=open src=/133bcb94-52b8-4356-ad9b-7358c78ce7fd/spark-job-history/application_1738779819434_0011 dst=null perm=null proto=rpc callerContext=SPARK_HISTORY 2025-02-07 23:00:54,699 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true ugi=spark (auth:SIMPLE) ip=/10.240.5.205 cmd=open src=/133bcb94-52b8-4356-ad9b-7358c78ce7fd/spark-job-history/application_1738779819434_0011 dst=null perm=null proto=rpc callerContext=SPARK_HISTORY 2025-02-07 23:00:54,715 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true ugi=spark (auth:SIMPLE) ip=/10.240.5.205 cmd=open src=/133bcb94-52b8-4356-ad9b-7358c78ce7fd/spark-job-history/application_1738779819434_0010 dst=null perm=null proto=rpc callerContext=SPARK_HISTORY 2025-02-07 23:00:54,729 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true ugi=spark (auth:SIMPLE) ip=/10.240.5.205 cmd=open src=/133bcb94-52b8-4356-ad9b-7358c78ce7fd/spark-job-history/application_1738779819434_0010 dst=null perm=null proto=rpc callerContext=SPARK_HISTORY 2025-02-07 23:00:54,743 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true ugi=spark (auth:SIMPLE) ip=/10.240.5.205 cmd=open src=/133bcb94-52b8-4356-ad9b-7358c78ce7fd/spark-job-history/application_1738779819434_0009 dst=null perm=null proto=rpc callerContext=SPARK_HISTORY 2025-02-07 23:00:54,755 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true ugi=spark (auth:SIMPLE) ip=/10.240.5.205 cmd=open src=/133bcb94-52b8-4356-ad9b-7358c78ce7fd/spark-job-history/application_1738779819434_0009 dst=null perm=null proto=rpc callerContext=SPARK_HISTORY 2025-02-07 23:00:54,767 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true ugi=spark (auth:SIMPLE) ip=/10.240.5.205 cmd=open src=/133bcb94-52b8-4356-ad9b-7358c78ce7fd/spark-job-history/application_1738779819434_0008 dst=null perm=null proto=rpc callerContext=SPARK_HISTORY 2025-02-07 23:00:54,779 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true ugi=spark (auth:SIMPLE) ip=/10.240.5.205 cmd=open src=/133bcb94-52b8-4356-ad9b-7358c78ce7fd/spark-job-history/application_1738779819434_0008 dst=null perm=null proto=rpc callerContext=SPARK_HISTORY 2025-02-07 23:01:04,160 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true ugi=spark (auth:SIMPLE) ip=/10.240.5.205 cmd=listStatus src=/133bcb94-52b8-4356-ad9b-7358c78ce7fd/spark-job-history dst=null perm=null proto=rpc callerContext=SPARK_HISTORY ``` ### Was this patch authored or co-authored using generative AI tooling? No.
b1bdd51
to
0240cca
Compare
@dongjoon-hyun , I rebased to current master and removed my test helper. Thank you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM. (Pending CIs).
All |
Great, really appreciate your efforts on testing strategy for this change! Thank you, @dongjoon-hyun . |
What changes were proposed in this pull request?
Initialize the Hadoop RPC
CallerContext
during History Server startup, beforeFileSystem
access. Calls to HDFS will get tagged in the audit log as originating from the History Server.Why are the changes needed?
Other YARN-based Spark processes set the
CallerContext
, so that additional auditing context propagates in Hadoop RPC calls. This PR provides auditing context for calls from the History Server. Other callers provide additional information like app ID, attempt ID, etc. We don't provide that here through History Server, which serves multiple apps/attempts.Does this PR introduce any user-facing change?
Yes. In environments that configure
hadoop.caller.context.enabled=true
, users will now see additional information in the HDFS audit logs explicitly stating that calls originated from the History Server.How was this patch tested?
A new unit test has been added. All tests pass in the history package.
When the changes are deployed to a running cluster, the new caller context is visible in the HDFS audit logs.
Was this patch authored or co-authored using generative AI tooling?
No.