Skip to content

Conversation

karuppayya
Copy link
Contributor

What changes were proposed in this pull request?

Changes to cleanup shuffle generated from running commands(eg writes)
This was also brought by @cloud-fan and @ulysses-you here

Why are the changes needed?

To cleanupshuffle generated from commands

Does this PR introduce any user-facing change?

No

How was this patch tested?

Unit test added

Was this patch authored or co-authored using generative AI tooling?

No

@karuppayya
Copy link
Contributor Author

@cloud-fan @ulysses-you Please help review when you get a chance to.

case exec: ShuffleExchangeLike =>
exec.shuffleId
}
case command: V2CommandExec =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about other commands?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have handled the DataWritingCommandExec,
I think it might be tricky for ExecutedCommandExec/V1Commands, since the spark plan is created internally.
Should we enumerate each SparkPlan type and handle them?

@@ -150,7 +150,7 @@ class QueryExecution(
// with the rest of processing of the root plan being just outputting command results,
// for eagerly executed commands we mark this place as beginning of execution.
tracker.setReadyForExecution()
val qe = sparkSession.sessionState.executePlan(p, mode)
val qe = sparkSession.sessionState.executePlan(p, mode, shuffleCleanupMode)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of calling executePlan, can we construct QueryExecution instance directly? I feel this executePlan method is a bit useless.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to have been introduced here. It doesnt mention the rationale behind. But like you mentioned it does nt serve any real purpose, creating new QueryExecution

@itskals
Copy link

itskals commented Aug 28, 2025

Why do we need this handling in commands? What about other flows? Or how is this different from other flows?

case exec: ShuffleExchangeLike =>
exec.shuffleId
}
case command: V2CommandExec =>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have handled the DataWritingCommandExec,
I think it might be tricky for ExecutedCommandExec/V1Commands, since the spark plan is created internally.
Should we enumerate each SparkPlan type and handle them?

@@ -3586,7 +3586,8 @@ object SQLConf {
val CLASSIC_SHUFFLE_DEPENDENCY_FILE_CLEANUP_ENABLED =
buildConf("spark.sql.classic.shuffleDependency.fileCleanup.enabled")
.doc("When enabled, shuffle files will be cleaned up at the end of classic " +
"SQL executions.")
"SQL executions. Note that this cleanup may cause stage retries and regenerate " +
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is related to this comment

@karuppayya
Copy link
Contributor Author

Why do we need this handling in commands? What about other flows? Or how is this different from other flows?

All other flows were handled here
Commands wrap the actual query execution/Spark paln needed for the command's execution and had not been handled.
@itskals

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants