[SPARK-53413] Shuffle cleanup for commands #52157

karuppayya · 2025-08-28T06:07:10Z

What changes were proposed in this pull request?

Changes to cleanup shuffle generated from running commands(eg writes)
This was also brought by @cloud-fan and @ulysses-you here

Why are the changes needed?

To cleanupshuffle generated from commands

Does this PR introduce any user-facing change?

No

How was this patch tested?

Unit test added

Was this patch authored or co-authored using generative AI tooling?

No

Fix

karuppayya · 2025-08-28T06:12:49Z

@cloud-fan @ulysses-you Please help review when you get a chance to.

cloud-fan · 2025-08-28T10:39:39Z

sql/core/src/main/scala/org/apache/spark/sql/execution/SQLExecution.scala

-                      case exec: ShuffleExchangeLike =>
-                        exec.shuffleId
-                    }
+                  case command: V2CommandExec =>


how about other commands?

I have handled the DataWritingCommandExec,
I think it might be tricky for ExecutedCommandExec/V1Commands, since the spark plan is created internally.
Should we enumerate each SparkPlan type and handle them?

cloud-fan · 2025-08-28T10:40:41Z

sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala

@@ -150,7 +150,7 @@ class QueryExecution(
      // with the rest of processing of the root plan being just outputting command results,
      // for eagerly executed commands we mark this place as beginning of execution.
      tracker.setReadyForExecution()
-      val qe = sparkSession.sessionState.executePlan(p, mode)
+      val qe = sparkSession.sessionState.executePlan(p, mode, shuffleCleanupMode)


instead of calling executePlan, can we construct QueryExecution instance directly? I feel this executePlan method is a bit useless.

This seems to have been introduced here. It doesnt mention the rationale behind. But like you mentioned it does nt serve any real purpose, creating new QueryExecution

itskals · 2025-08-28T18:05:45Z

Why do we need this handling in commands? What about other flows? Or how is this different from other flows?

karuppayya · 2025-08-29T06:06:09Z

sql/core/src/main/scala/org/apache/spark/sql/execution/SQLExecution.scala

-                      case exec: ShuffleExchangeLike =>
-                        exec.shuffleId
-                    }
+                  case command: V2CommandExec =>


I have handled the DataWritingCommandExec,
I think it might be tricky for ExecutedCommandExec/V1Commands, since the spark plan is created internally.
Should we enumerate each SparkPlan type and handle them?

karuppayya · 2025-08-29T06:07:01Z

sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala

@@ -3586,7 +3586,8 @@ object SQLConf {
  val CLASSIC_SHUFFLE_DEPENDENCY_FILE_CLEANUP_ENABLED =
    buildConf("spark.sql.classic.shuffleDependency.fileCleanup.enabled")
      .doc("When enabled, shuffle files will be cleaned up at the end of classic " +
-        "SQL executions.")
+        "SQL executions. Note that this cleanup may cause stage retries and regenerate " +


This is related to this comment

karuppayya · 2025-08-29T06:33:25Z

Why do we need this handling in commands? What about other flows? Or how is this different from other flows?

All other flows were handled here
Commands wrap the actual query execution/Spark paln needed for the command's execution and had not been handled.
@itskals

SPARK-53413: Shuffle cleanup for commands

3388406

Fix

github-actions bot added the SQL label Aug 28, 2025

karuppayya mentioned this pull request Aug 28, 2025

[SPARK-52777][SQL] Enable shuffle cleanup mode configuration in Spark SQL #51458

Closed

cloud-fan reviewed Aug 28, 2025

View reviewed changes

karuppayya commented Aug 29, 2025

View reviewed changes

Add logs, handle DataWritingCommandExec

25406bc

karuppayya force-pushed the SPARK-53413 branch from ef482e2 to 25406bc Compare August 29, 2025 06:25

Write test for DataWritingCommandExec

aba5f59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-53413] Shuffle cleanup for commands #52157

[SPARK-53413] Shuffle cleanup for commands #52157

karuppayya commented Aug 28, 2025

Uh oh!

karuppayya commented Aug 28, 2025

Uh oh!

cloud-fan Aug 28, 2025

Uh oh!

karuppayya Aug 29, 2025

Uh oh!

cloud-fan Aug 28, 2025

Uh oh!

karuppayya Aug 29, 2025

Uh oh!

itskals commented Aug 28, 2025

Uh oh!

karuppayya Aug 29, 2025

Uh oh!

karuppayya Aug 29, 2025

Uh oh!

karuppayya commented Aug 29, 2025

Uh oh!

Uh oh!

[SPARK-53413] Shuffle cleanup for commands #52157

Are you sure you want to change the base?

[SPARK-53413] Shuffle cleanup for commands #52157

Conversation

karuppayya commented Aug 28, 2025

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

karuppayya commented Aug 28, 2025

Uh oh!

cloud-fan Aug 28, 2025

Choose a reason for hiding this comment

Uh oh!

karuppayya Aug 29, 2025

Choose a reason for hiding this comment

Uh oh!

cloud-fan Aug 28, 2025

Choose a reason for hiding this comment

Uh oh!

karuppayya Aug 29, 2025

Choose a reason for hiding this comment

Uh oh!

itskals commented Aug 28, 2025

Uh oh!

karuppayya Aug 29, 2025

Choose a reason for hiding this comment

Uh oh!

karuppayya Aug 29, 2025

Choose a reason for hiding this comment

Uh oh!

karuppayya commented Aug 29, 2025

Uh oh!

Uh oh!