From 19fbeaa04abe71529a707e07a36adf14098e7741 Mon Sep 17 00:00:00 2001 From: yangjie01 Date: Thu, 30 Jan 2025 07:39:30 -0800 Subject: [PATCH] [SPARK-51030][CORE][TESTS] Add a check before `Utils.deleteRecursively(tempDir)` to ensure `tempDir` won't be cleaned up by the ShutdownHook in `afterEach` of `LocalRootDirsTest` ### What changes were proposed in this pull request? This pr add a check before `Utils.deleteRecursively(tempDir)` to ensure `tempDir` won't be cleaned up by the ShutdownHook in `afterEach` of `LocalRootDirsTest` to minimize the risk of test failures caused by race conditions in multi-threaded deletion of the `tempdir`. ### Why are the changes needed? In the daily tests on macOS, you may see test failures similar to the following: - https://github.com/apache/spark/actions/runs/13014395379/job/36325277534 ``` SslExternalShuffleServiceSuite: ... - SPARK-38640: memory only blocks can unpersist using shuffle service cache fetching SUITE ABORTED - SslExternalShuffleServiceSuite: Failed to list files for dir: /Users/runner/work/spark/spark/core/target/tmp/local-6e9944e3-27c0-44bb-a535-8a7f09c1580b/55fe65f6-2d1c-4ced-af5f-abb64b135345/spark-3985444c-54a0-462c-b1d3-6df45ad55143/executor-8c55fc55-b74f-4793-bf3f-3d11a889fd8c/blockmgr-1a3455e7-bd31-4ef8-a9c6-910fde151bb3/11 java.io.IOException: Failed to list files for dir: /Users/runner/work/spark/spark/core/target/tmp/local-6e9944e3-27c0-44bb-a535-8a7f09c1580b/55fe65f6-2d1c-4ced-af5f-abb64b135345/spark-3985444c-54a0-462c-b1d3-6df45ad55143/executor-8c55fc55-b74f-4793-bf3f-3d11a889fd8c/blockmgr-1a3455e7-bd31-4ef8-a9c6-910fde151bb3/11 at org.apache.spark.network.util.JavaUtils.listFilesSafely(JavaUtils.java:201) at org.apache.spark.network.util.JavaUtils.deleteRecursivelyUsingJavaIO(JavaUtils.java:135) at org.apache.spark.network.util.JavaUtils.deleteRecursively(JavaUtils.java:123) at org.apache.spark.network.util.JavaUtils.deleteRecursivelyUsingJavaIO(JavaUtils.java:137) at org.apache.spark.network.util.JavaUtils.deleteRecursively(JavaUtils.java:123) at org.apache.spark.network.util.JavaUtils.deleteRecursivelyUsingJavaIO(JavaUtils.java:137) at org.apache.spark.network.util.JavaUtils.deleteRecursively(JavaUtils.java:123) at org.apache.spark.network.util.JavaUtils.deleteRecursivelyUsingJavaIO(JavaUtils.java:137) at org.apache.spark.network.util.JavaUtils.deleteRecursively(JavaUtils.java:123) at org.apache.spark.network.util.JavaUtils.deleteRecursivelyUsingJavaIO(JavaUtils.java:137) ... ``` This appears to be an issue caused by multi-threaded deletion of the `tempdir`. https://github.com/apache/spark/blob/42898c62b4c197e88b842eaa840986cb70d93a9c/core/src/test/scala/org/apache/spark/LocalRootDirsTest.scala#L42 Refer to the code above, in `LocalRootDirsTest#beforeEach` method, `tempDir = Utils.createTempDir(namePrefix = "local")` invokes `ShutdownHookManager.registerShutdownDeleteDir` to register `tempDir` for deletion via a shutdown hook. Therefore, if `tempDir` remains unchanged during the test case, manual cleanup is unnecessary in `LocalRootDirsTest#afterEach` method. However, given that `tempDir` is defined as `protected var tempDir`, it is possible for it to be altered within the test case. Hence, additional checks are implemented before manually cleaning up `tempDir`: 1. `tempDir` has not been registered for cleanup via a shutdown hook. 2. `tempDir` is not a subpath of any path that has been registered for cleanup via a shutdown hook. This approach helps to minimize the risk of a race condition where multiple threads attempt to clean up the `tempDir` concurrently. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - Pass GitHub Actions ### Was this patch authored or co-authored using generative AI tooling? No Closes #49723 from LuciferYang/LocalRootDirsTest-hasRootAsShutdownDeleteDir. Lead-authored-by: yangjie01 Co-authored-by: YangJie Signed-off-by: Dongjoon Hyun --- .../scala/org/apache/spark/LocalRootDirsTest.scala | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/core/src/test/scala/org/apache/spark/LocalRootDirsTest.scala b/core/src/test/scala/org/apache/spark/LocalRootDirsTest.scala index 3a813f4d8b53c..a7968b6f2b022 100644 --- a/core/src/test/scala/org/apache/spark/LocalRootDirsTest.scala +++ b/core/src/test/scala/org/apache/spark/LocalRootDirsTest.scala @@ -20,7 +20,7 @@ package org.apache.spark import java.io.File import java.util.UUID -import org.apache.spark.util.Utils +import org.apache.spark.util.{ShutdownHookManager, Utils} trait LocalRootDirsTest extends SparkFunSuite with LocalSparkContext { @@ -46,7 +46,13 @@ trait LocalRootDirsTest extends SparkFunSuite with LocalSparkContext { override def afterEach(): Unit = { try { - Utils.deleteRecursively(tempDir) + // SPARK-51030: Only perform manual cleanup of the `tempDir` if it has + // not been registered for cleanup via a shutdown hook, to avoid potential + // IOException due to race conditions during multithreaded cleanup. + if (!ShutdownHookManager.hasShutdownDeleteDir(tempDir) && + !ShutdownHookManager.hasRootAsShutdownDeleteDir(tempDir)) { + Utils.deleteRecursively(tempDir) + } Utils.clearLocalRootDirs() } finally { super.afterEach()