Skip to content

Commit

Permalink
[SPARK-51030][CORE][TESTS] Add a check before `Utils.deleteRecursivel…
Browse files Browse the repository at this point in the history
…y(tempDir)` to ensure `tempDir` won't be cleaned up by the ShutdownHook in `afterEach` of `LocalRootDirsTest`

### What changes were proposed in this pull request?
This pr add a check before `Utils.deleteRecursively(tempDir)` to ensure `tempDir` won't be cleaned up by the ShutdownHook in `afterEach` of `LocalRootDirsTest` to minimize the risk of test failures caused by race conditions in multi-threaded deletion of the `tempdir`.

### Why are the changes needed?

In the daily tests on macOS, you may see test failures similar to the following:

- https://github.com/apache/spark/actions/runs/13014395379/job/36325277534

```
SslExternalShuffleServiceSuite:
...
- SPARK-38640: memory only blocks can unpersist using shuffle service cache fetching
SUITE ABORTED - SslExternalShuffleServiceSuite: Failed to list files for dir: /Users/runner/work/spark/spark/core/target/tmp/local-6e9944e3-27c0-44bb-a535-8a7f09c1580b/55fe65f6-2d1c-4ced-af5f-abb64b135345/spark-3985444c-54a0-462c-b1d3-6df45ad55143/executor-8c55fc55-b74f-4793-bf3f-3d11a889fd8c/blockmgr-1a3455e7-bd31-4ef8-a9c6-910fde151bb3/11
  java.io.IOException: Failed to list files for dir: /Users/runner/work/spark/spark/core/target/tmp/local-6e9944e3-27c0-44bb-a535-8a7f09c1580b/55fe65f6-2d1c-4ced-af5f-abb64b135345/spark-3985444c-54a0-462c-b1d3-6df45ad55143/executor-8c55fc55-b74f-4793-bf3f-3d11a889fd8c/blockmgr-1a3455e7-bd31-4ef8-a9c6-910fde151bb3/11
  at org.apache.spark.network.util.JavaUtils.listFilesSafely(JavaUtils.java:201)
  at org.apache.spark.network.util.JavaUtils.deleteRecursivelyUsingJavaIO(JavaUtils.java:135)
  at org.apache.spark.network.util.JavaUtils.deleteRecursively(JavaUtils.java:123)
  at org.apache.spark.network.util.JavaUtils.deleteRecursivelyUsingJavaIO(JavaUtils.java:137)
  at org.apache.spark.network.util.JavaUtils.deleteRecursively(JavaUtils.java:123)
  at org.apache.spark.network.util.JavaUtils.deleteRecursivelyUsingJavaIO(JavaUtils.java:137)
  at org.apache.spark.network.util.JavaUtils.deleteRecursively(JavaUtils.java:123)
  at org.apache.spark.network.util.JavaUtils.deleteRecursivelyUsingJavaIO(JavaUtils.java:137)
  at org.apache.spark.network.util.JavaUtils.deleteRecursively(JavaUtils.java:123)
  at org.apache.spark.network.util.JavaUtils.deleteRecursivelyUsingJavaIO(JavaUtils.java:137)
  ...
```

This appears to be an issue caused by multi-threaded deletion of the `tempdir`.

https://github.com/apache/spark/blob/42898c62b4c197e88b842eaa840986cb70d93a9c/core/src/test/scala/org/apache/spark/LocalRootDirsTest.scala#L42

Refer to the code above, in `LocalRootDirsTest#beforeEach` method, `tempDir = Utils.createTempDir(namePrefix = "local")` invokes `ShutdownHookManager.registerShutdownDeleteDir` to register `tempDir` for deletion via a shutdown hook. Therefore, if `tempDir` remains unchanged during the test case, manual cleanup is unnecessary in `LocalRootDirsTest#afterEach` method.

However, given that `tempDir` is defined as `protected var tempDir`, it is possible for it to be altered within the test case. Hence, additional checks are implemented before manually cleaning up `tempDir`:

1. `tempDir` has not been registered for cleanup via a shutdown hook.
2. `tempDir` is not a subpath of any path that has been registered for cleanup via a shutdown hook.

This approach helps to minimize the risk of a race condition where multiple threads attempt to clean up the `tempDir` concurrently.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
- Pass GitHub Actions

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #49723 from LuciferYang/LocalRootDirsTest-hasRootAsShutdownDeleteDir.

Lead-authored-by: yangjie01 <[email protected]>
Co-authored-by: YangJie <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit 19fbeaa)
Signed-off-by: Dongjoon Hyun <[email protected]>
  • Loading branch information
LuciferYang authored and dongjoon-hyun committed Jan 30, 2025
1 parent f69e508 commit e810bf6
Showing 1 changed file with 8 additions and 2 deletions.
10 changes: 8 additions & 2 deletions core/src/test/scala/org/apache/spark/LocalRootDirsTest.scala
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ package org.apache.spark
import java.io.File
import java.util.UUID

import org.apache.spark.util.Utils
import org.apache.spark.util.{ShutdownHookManager, Utils}


trait LocalRootDirsTest extends SparkFunSuite with LocalSparkContext {
Expand All @@ -46,7 +46,13 @@ trait LocalRootDirsTest extends SparkFunSuite with LocalSparkContext {

override def afterEach(): Unit = {
try {
Utils.deleteRecursively(tempDir)
// SPARK-51030: Only perform manual cleanup of the `tempDir` if it has
// not been registered for cleanup via a shutdown hook, to avoid potential
// IOException due to race conditions during multithreaded cleanup.
if (!ShutdownHookManager.hasShutdownDeleteDir(tempDir) &&
!ShutdownHookManager.hasRootAsShutdownDeleteDir(tempDir)) {
Utils.deleteRecursively(tempDir)
}
Utils.clearLocalRootDirs()
} finally {
super.afterEach()
Expand Down

0 comments on commit e810bf6

Please sign in to comment.