-
Notifications
You must be signed in to change notification settings - Fork 28.7k
[SPARK-52612][INFRA] Add an env NO_PROVIDED_SPARK_JARS
to control collection behavior of sbt/package
for spark-avro.jar
and spark-protobuf.jar
#51321
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
It seems that failing to gather the
|
spark-avro-*.jar
to the assembly/jars/scala-2.13
directory during sbt/package
spark-avro-*.jar
to the classpath of the benchmark.
spark-avro-*.jar
to the classpath of the benchmark.spark-avro-*.jar
to the classpath of the benchmark.
After running The aforementioned phenomenon may lead to several issues:
On the other hand, if
Therefore, it appears that there are negative consequences regardless of whether the packaging results include or exclude Currently, I can think of two solutions:
|
spark-avro-*.jar
to the classpath of the benchmark.
For me, I prefer (1) because Maven has been always our official build system for Spark distribution. BTW, do you know when this situation starts to happen? |
The |
} else if (jar.getName.contains("spark-connect") && | ||
!SbtPomKeys.profiles.value.contains("noshade-connect")) { | ||
Files.copy(fid.toPath, destJar.toPath) | ||
} else if (jar.getName.contains("spark-protobuf") && | ||
!SbtPomKeys.profiles.value.contains("noshade-protobuf")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've checked, and there is no profile named noshade-protobuf
in the project, so there's no need to make a judgment about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the current pr, I've chosen not to touch those profiles that are suspected to be non-existent. If it's possible to clean them up, I'll handle that in a separate PR.
The tests should have passed. It seems that the current solution involves relatively minor modifications. Let's run a full benchmarks to verify the effects of the changes: If everything goes well, I'll update the pr title and description tomorrow. |
Thank you so much, @LuciferYang . Looking forward to seeing a good result. |
All micro benchmarks can pass successfully, and there will no longer be logs similar to " |
NO_PROVIDED_SPARK_JARS
to control collection behavior of sbt/package
for spark-avro.jar
and spark-protobuf.jar
Could you help review this pull request If you have time? @dongjoon-hyun Thanks ~ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM. Simple and nice! Thank you, @LuciferYang .
…ollection behavior of `sbt/package` for `spark-avro.jar` and `spark-protobuf.jar` ### What changes were proposed in this pull request? This pr introduces an environment variable named `NO_PROVIDED_SPARK_JARS`, which controls the behavior of the `sbt/package` command so that it only collects `spark-avro.jar` and `spark-protobuf.jar` into the `assembly/target/scala-2.13/jars` directory during documentation generation. ### Why are the changes needed? 1. To ensure that, by default, the `sbt/package` command does not collect jars with a `provided` scope, such as `spark-avro.jar` and `spark-protobuf.jar`, into the `assembly/target/scala-2.13/jars` directory, maintaining consistency with Maven's behavior. 2. To ensure that, during documentation generation, the `sbt/package` command collects the necessary jars into the `assembly/target/scala-2.13/jars` directory to ensure that no dependencies are missing for the documentation generation task. 3. To avoid the following error when executing benchmark tasks using GitHub Actions: ``` 25/06/28 07:03:45 ERROR SparkContext: Failed to add file:///home/runner/work/spark/spark/assembly/target/scala-2.13/jars/spark-avro_2.13-4.1.0-SNAPSHOT.jar to Spark environment java.lang.IllegalArgumentException: requirement failed: File spark-avro_2.13-4.1.0-SNAPSHOT.jar was already registered with a different path (old path = /home/runner/work/spark/spark/connector/avro/target/scala-2.13/spark-avro_2.13-4.1.0-SNAPSHOT.jar, new path = /home/runner/work/spark/spark/assembly/target/scala-2.13/jars/spark-avro_2.13-4.1.0-SNAPSHOT.jar ... ``` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - Passed GitHub Actions. - Manually confirmed that benchmark tasks are not affected and that the ERROR log described above no longer appears during benchmark task execution. ### Was this patch authored or co-authored using generative AI tooling? No Closes #51321 from LuciferYang/SPARK-52612. Authored-by: yangjie01 <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit 591e1c3) Signed-off-by: Dongjoon Hyun <[email protected]>
Merged to master/4.0. |
Thank you @dongjoon-hyun ~ |
What changes were proposed in this pull request?
This pr introduces an environment variable named
NO_PROVIDED_SPARK_JARS
, which controls the behavior of thesbt/package
command so that it only collectsspark-avro.jar
andspark-protobuf.jar
into theassembly/target/scala-2.13/jars
directory during documentation generation.Why are the changes needed?
To ensure that, by default, the
sbt/package
command does not collect jars with aprovided
scope, such asspark-avro.jar
andspark-protobuf.jar
, into theassembly/target/scala-2.13/jars
directory, maintaining consistency with Maven's behavior.To ensure that, during documentation generation, the
sbt/package
command collects the necessary jars into theassembly/target/scala-2.13/jars
directory to ensure that no dependencies are missing for the documentation generation task.To avoid the following error when executing benchmark tasks using GitHub Actions:
Does this PR introduce any user-facing change?
No
How was this patch tested?
Was this patch authored or co-authored using generative AI tooling?
No