You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[SPARK-51158][YARN][TESTS] No longer restrict the test cases related to connect in the YarnClusterSuite to only run on GitHub Actions
### What changes were proposed in this pull request?
This PR adds more suitable `assume` conditions for the test cases related to `connect` in the `YarnClusterSuite`, so they are no longer mandated to run exclusively on GitHub Actions. The specific changes are as follows:
1. In `SparkBuild.scala`, test compilation dependencies have been added for the `Yarn` module to ensure that `build/sbt package` is executed to collect dependencies into the `assembly` directory before running `test` or `testOnly`.
2. In the `testPySpark` function of `YarnClusterSuite.scala`, two `assume` conditions have been added when `SPARK_API_MODE` is `connect`:
- Check if `spark-connect_$scalaVersion-$SPARK_VERSION.jar` exists in the `assembly` directory. This condition is primarily for testing scenarios using Maven commands, as it cannot be guaranteed that relevant dependencies are collected into the `assembly` directory first in such cases.
- Call the `check_dependencies` function in `pyspark.sql.connect.utils` to ensure that the Connect-related Python packages have been installed.
The test cases related to `connect` in the `YarnClusterSuite` will only be executed if both of the above conditions are met.
3. Remove the `assume` added in #49848
### Why are the changes needed?
No longer restrict the test cases related to `connect` in the `YarnClusterSuite` to only run on GitHub Actions
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
- Pass GitHub Actions
- manual check:
SBT
Run `build/sbt clean "yarn/testOnly org.apache.spark.deploy.yarn.YarnClusterSuite" -Pyarn`
1. Python dependencies are not installed or partially installed
The relevant tests will be CANCELED:
```
Traceback (most recent call last):
File "/Users/yangjie01/SourceCode/git/spark-sbt/python/pyspark/sql/connect/utils.py", line 47, in require_minimum_grpc_version
import grpc
ModuleNotFoundError: No module named 'grpc'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/Users/yangjie01/SourceCode/git/spark-sbt/python/pyspark/sql/connect/utils.py", line 37, in check_dependencies
require_minimum_grpc_version()
File "/Users/yangjie01/SourceCode/git/spark-sbt/python/pyspark/sql/connect/utils.py", line 49, in require_minimum_grpc_version
raise PySparkImportError(
pyspark.errors.exceptions.base.PySparkImportError: [PACKAGE_NOT_INSTALLED] grpcio >= 1.48.1 must be installed; however, it was not found.
[info] - run Python application with Spark Connect in yarn-client mode !!! CANCELED !!! (159 milliseconds)
[info] checker.isConnectPythonPackagesAvailable was false (YarnClusterSuite.scala:444)
[info] org.scalatest.exceptions.TestCanceledException:
...
[info] - run Python application with Spark Connect in yarn-cluster mode !!! CANCELED !!! (1 millisecond)
[info] checker.isConnectPythonPackagesAvailable was false (YarnClusterSuite.scala:444)
[info] org.scalatest.exceptions.TestCanceledException:
...
[info] Run completed in 4 minutes, 22 seconds.
[info] Total number of tests run: 28
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 28, failed 0, canceled 2, ignored 0, pending 0
[info] All tests passed (excluding canceled).
```
2. Python dependencies are installed
The tests succeed and no tests will be canceled:
```
[info] Run completed in 4 minutes, 51 seconds.
[info] Total number of tests run: 30
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 30, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
```
3. Running `build/sbt clean "yarn/test" -Pyarn` yields similar results.
Maven
1. Dependencies not collected into the assembly module
```
build/mvn clean install -DskipTests -pl resource-managers/yarn -am -Pyarn
build/mvn test -pl resource-managers/yarn -Dtest=none -DwildcardSuites=org.apache.spark.deploy.yarn.YarnClusterSuite -Pyarn
```
Relevant tests will be CANCELED:
```
- run Python application with Spark Connect in yarn-client mode !!! CANCELED !!!
checker.isSparkConnectJarAvailable was false (YarnClusterSuite.scala:443)
- run Python application with Spark Connect in yarn-cluster mode !!! CANCELED !!!
checker.isSparkConnectJarAvailable was false (YarnClusterSuite.scala:443)
...
Run completed in 4 minutes, 19 seconds.
Total number of tests run: 28
Suites: completed 2, aborted 0
Tests: succeeded 28, failed 0, canceled 2, ignored 0, pending 0
All tests passed (excluding canceled).
```
2. Dependencies collected into the assembly module, but Python dependencies are not installed or partially installed
```
build/mvn clean install -DskipTests -Pyarn
build/mvn test -pl resource-managers/yarn -Dtest=none -DwildcardSuites=org.apache.spark.deploy.yarn.YarnClusterSuite -Pyarn
```
Relevant tests will be CANCELED:
```
Traceback (most recent call last):
File "/Users/yangjie01/SourceCode/git/spark-maven/python/pyspark/sql/connect/utils.py", line 47, in require_minimum_grpc_version
import grpc
ModuleNotFoundError: No module named 'grpc'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/Users/yangjie01/SourceCode/git/spark-maven/python/pyspark/sql/connect/utils.py", line 37, in check_dependencies
require_minimum_grpc_version()
File "/Users/yangjie01/SourceCode/git/spark-maven/python/pyspark/sql/connect/utils.py", line 49, in require_minimum_grpc_version
raise PySparkImportError(
pyspark.errors.exceptions.base.PySparkImportError: [PACKAGE_NOT_INSTALLED] grpcio >= 1.48.1 must be installed; however, it was not found.
- run Python application with Spark Connect in yarn-client mode !!! CANCELED !!!
checker.isConnectPythonPackagesAvailable was false (YarnClusterSuite.scala:444)
- run Python application with Spark Connect in yarn-cluster mode !!! CANCELED !!!
checker.isConnectPythonPackagesAvailable was false (YarnClusterSuite.scala:444)
Run completed in 4 minutes, 36 seconds.
Total number of tests run: 28
Suites: completed 2, aborted 0
Tests: succeeded 28, failed 0, canceled 2, ignored 0, pending 0
All tests passed (excluding canceled).
```
3. Dependencies collected into the assembly module, and Python dependencies are installed
```
build/mvn clean install -DskipTests -Pyarn
build/mvn test -pl resource-managers/yarn -Dtest=none -DwildcardSuites=org.apache.spark.deploy.yarn.YarnClusterSuite -Pyarn
```
Tests succeed and no tests will be canceled:
```
Run completed in 4 minutes, 40 seconds.
Total number of tests run: 30
Suites: completed 2, aborted 0
Tests: succeeded 30, failed 0, canceled 0, ignored 0, pending 0
All tests passed.
```
### Was this patch authored or co-authored using generative AI tooling?
NO
Closes#49884 from LuciferYang/YarnClusterSuite-reenable.
Authored-by: yangjie01 <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
0 commit comments