-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-51030][CORE][TESTS] Add a check before Utils.deleteRecursively(tempDir)
to ensure tempDir
won't be cleaned up by the ShutdownHook in afterEach
of LocalRootDirsTest
#49723
Conversation
.github/workflows/build_and_test.yml
Outdated
@@ -1291,3 +1291,16 @@ jobs: | |||
cd ui-test | |||
npm install --save-dev | |||
node --experimental-vm-modules node_modules/.bin/jest | |||
maven-test: | |||
name: "Run Maven Test on macos" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
hasRootAsShutdownDeleteDir
check for tempDir
in the afterEach
of LocalRootDirsTest
.hasShutdownDeleteDir
check for tempDir
in the afterEach
of LocalRootDirsTest
.
hasShutdownDeleteDir
check for tempDir
in the afterEach
of LocalRootDirsTest
.tempDir
before manual cleanup in the afterEach
method of LocalRootDirsTest
@@ -46,7 +46,10 @@ trait LocalRootDirsTest extends SparkFunSuite with LocalSparkContext { | |||
|
|||
override def afterEach(): Unit = { | |||
try { | |||
Utils.deleteRecursively(tempDir) | |||
if (!ShutdownHookManager.hasShutdownDeleteDir(tempDir) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tempDir = Utils.createTempDir(namePrefix = "local") |
In the beforeEach
method, tempDir = Utils.createTempDir(namePrefix = "local")
invokes ShutdownHookManager.registerShutdownDeleteDir
to register tempDir
for deletion via a shutdown hook. Therefore, if tempDir
remains unchanged during the test case, manual cleanup is unnecessary.
However, given that tempDir
is defined as protected var tempDir
, it is possible for it to be altered within the test case. Hence, additional checks are implemented before manually cleaning up tempDir
:
tempDir
has not been registered for cleanup via a shutdown hook.tempDir
is not a subpath of any path that has been registered for cleanup via a shutdown hook.
This approach helps to minimize the risk of a race condition where multiple threads attempt to clean up the tempDir
concurrently. It seems that this issue is more prevalent on macOS, so let's conduct several more rounds of testing to ensure its effectiveness.
tempDir
before manual cleanup in the afterEach
method of LocalRootDirsTest
tempDir
to ensure that it won't be cleaned up by the ShutdownHook in the afterEach
of LocalRootDirsTest
tempDir
to ensure that it won't be cleaned up by the ShutdownHook in the afterEach
of LocalRootDirsTest
Utils.deleteRecursively(tempDir)
to ensure tempDir
won't be cleaned up by the ShutdownHook in the afterEach
of LocalRootDirsTest
Utils.deleteRecursively(tempDir)
to ensure tempDir
won't be cleaned up by the ShutdownHook in the afterEach
of LocalRootDirsTest
Utils.deleteRecursively(tempDir)
to ensure tempDir
won't be cleaned up by the ShutdownHook in afterEach
of LocalRootDirsTest
Utils.deleteRecursively(tempDir)
to ensure tempDir
won't be cleaned up by the ShutdownHook in afterEach
of LocalRootDirsTest
Utils.deleteRecursively(tempDir)
to ensure tempDir
won't be cleaned up by the ShutdownHook in afterEach
of LocalRootDirsTest
…y(tempDir)` to ensure `tempDir` won't be cleaned up by the ShutdownHook in `afterEach` of `LocalRootDirsTest` ### What changes were proposed in this pull request? This pr add a check before `Utils.deleteRecursively(tempDir)` to ensure `tempDir` won't be cleaned up by the ShutdownHook in `afterEach` of `LocalRootDirsTest` to minimize the risk of test failures caused by race conditions in multi-threaded deletion of the `tempdir`. ### Why are the changes needed? In the daily tests on macOS, you may see test failures similar to the following: - https://github.com/apache/spark/actions/runs/13014395379/job/36325277534 ``` SslExternalShuffleServiceSuite: ... - SPARK-38640: memory only blocks can unpersist using shuffle service cache fetching SUITE ABORTED - SslExternalShuffleServiceSuite: Failed to list files for dir: /Users/runner/work/spark/spark/core/target/tmp/local-6e9944e3-27c0-44bb-a535-8a7f09c1580b/55fe65f6-2d1c-4ced-af5f-abb64b135345/spark-3985444c-54a0-462c-b1d3-6df45ad55143/executor-8c55fc55-b74f-4793-bf3f-3d11a889fd8c/blockmgr-1a3455e7-bd31-4ef8-a9c6-910fde151bb3/11 java.io.IOException: Failed to list files for dir: /Users/runner/work/spark/spark/core/target/tmp/local-6e9944e3-27c0-44bb-a535-8a7f09c1580b/55fe65f6-2d1c-4ced-af5f-abb64b135345/spark-3985444c-54a0-462c-b1d3-6df45ad55143/executor-8c55fc55-b74f-4793-bf3f-3d11a889fd8c/blockmgr-1a3455e7-bd31-4ef8-a9c6-910fde151bb3/11 at org.apache.spark.network.util.JavaUtils.listFilesSafely(JavaUtils.java:201) at org.apache.spark.network.util.JavaUtils.deleteRecursivelyUsingJavaIO(JavaUtils.java:135) at org.apache.spark.network.util.JavaUtils.deleteRecursively(JavaUtils.java:123) at org.apache.spark.network.util.JavaUtils.deleteRecursivelyUsingJavaIO(JavaUtils.java:137) at org.apache.spark.network.util.JavaUtils.deleteRecursively(JavaUtils.java:123) at org.apache.spark.network.util.JavaUtils.deleteRecursivelyUsingJavaIO(JavaUtils.java:137) at org.apache.spark.network.util.JavaUtils.deleteRecursively(JavaUtils.java:123) at org.apache.spark.network.util.JavaUtils.deleteRecursivelyUsingJavaIO(JavaUtils.java:137) at org.apache.spark.network.util.JavaUtils.deleteRecursively(JavaUtils.java:123) at org.apache.spark.network.util.JavaUtils.deleteRecursivelyUsingJavaIO(JavaUtils.java:137) ... ``` This appears to be an issue caused by multi-threaded deletion of the `tempdir`. https://github.com/apache/spark/blob/42898c62b4c197e88b842eaa840986cb70d93a9c/core/src/test/scala/org/apache/spark/LocalRootDirsTest.scala#L42 Refer to the code above, in `LocalRootDirsTest#beforeEach` method, `tempDir = Utils.createTempDir(namePrefix = "local")` invokes `ShutdownHookManager.registerShutdownDeleteDir` to register `tempDir` for deletion via a shutdown hook. Therefore, if `tempDir` remains unchanged during the test case, manual cleanup is unnecessary in `LocalRootDirsTest#afterEach` method. However, given that `tempDir` is defined as `protected var tempDir`, it is possible for it to be altered within the test case. Hence, additional checks are implemented before manually cleaning up `tempDir`: 1. `tempDir` has not been registered for cleanup via a shutdown hook. 2. `tempDir` is not a subpath of any path that has been registered for cleanup via a shutdown hook. This approach helps to minimize the risk of a race condition where multiple threads attempt to clean up the `tempDir` concurrently. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - Pass GitHub Actions ### Was this patch authored or co-authored using generative AI tooling? No Closes #49723 from LuciferYang/LocalRootDirsTest-hasRootAsShutdownDeleteDir. Lead-authored-by: yangjie01 <[email protected]> Co-authored-by: YangJie <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit 19fbeaa) Signed-off-by: Dongjoon Hyun <[email protected]>
Merged to master/4.0. Thank you, @LuciferYang . |
Thank you @dongjoon-hyun |
What changes were proposed in this pull request?
This pr add a check before
Utils.deleteRecursively(tempDir)
to ensuretempDir
won't be cleaned up by the ShutdownHook inafterEach
ofLocalRootDirsTest
to minimize the risk of test failures caused by race conditions in multi-threaded deletion of thetempdir
.Why are the changes needed?
In the daily tests on macOS, you may see test failures similar to the following:
This appears to be an issue caused by multi-threaded deletion of the
tempdir
.spark/core/src/test/scala/org/apache/spark/LocalRootDirsTest.scala
Line 42 in 42898c6
Refer to the code above, in
LocalRootDirsTest#beforeEach
method,tempDir = Utils.createTempDir(namePrefix = "local")
invokesShutdownHookManager.registerShutdownDeleteDir
to registertempDir
for deletion via a shutdown hook. Therefore, iftempDir
remains unchanged during the test case, manual cleanup is unnecessary inLocalRootDirsTest#afterEach
method.However, given that
tempDir
is defined asprotected var tempDir
, it is possible for it to be altered within the test case. Hence, additional checks are implemented before manually cleaning uptempDir
:tempDir
has not been registered for cleanup via a shutdown hook.tempDir
is not a subpath of any path that has been registered for cleanup via a shutdown hook.This approach helps to minimize the risk of a race condition where multiple threads attempt to clean up the
tempDir
concurrently.Does this PR introduce any user-facing change?
No
How was this patch tested?
Was this patch authored or co-authored using generative AI tooling?
No