-
Notifications
You must be signed in to change notification settings - Fork 28.7k
[SPARK-51008][SQL] Add ResultStage for AQE #49715
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from all commits
Commits
Show all changes
20 commits
Select commit
Hold shift + click to select a range
2f1669e
draft
liuzqt 5bfb8e5
fix
liuzqt 6e1fd83
fix tests
liuzqt 4251762
fix test
liuzqt f13d11d
update
liuzqt 14f4ba8
Merge remote-tracking branch 'upstream/master' into SPARK-51008
liuzqt eb2875b
Update sql/core/src/main/scala/org/apache/spark/sql/execution/adaptiv…
liuzqt 1ad4061
Update sql/core/src/main/scala/org/apache/spark/sql/execution/adaptiv…
liuzqt 915bf39
minor
liuzqt 4248e55
refactor createQueryStages
liuzqt cc82864
update
liuzqt d9017c4
update
liuzqt 7ba69f4
refactor back
liuzqt 1f376c3
minor
liuzqt ec58426
fix
liuzqt 0ab3e20
dereference the result from result stage
liuzqt c8e25e0
nit
liuzqt 138c46c
hide result query stage in Spark UI
liuzqt 9b1cd00
use Statistics.DUMMY
liuzqt cab59fe
Update sql/core/src/main/scala/org/apache/spark/sql/execution/adaptiv…
cloud-fan File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -268,9 +268,11 @@ case class AdaptiveSparkPlanExec( | |
|
||
def finalPhysicalPlan: SparkPlan = withFinalPlanUpdate(identity) | ||
|
||
private def getFinalPhysicalPlan(): SparkPlan = lock.synchronized { | ||
if (isFinalPlan) return currentPhysicalPlan | ||
|
||
/** | ||
* Run `fun` on finalized physical plan | ||
*/ | ||
def withFinalPlanUpdate[T](fun: SparkPlan => T): T = lock.synchronized { | ||
_isFinalPlan = false | ||
// In case of this adaptive plan being executed out of `withActive` scoped functions, e.g., | ||
// `plan.queryExecution.rdd`, we need to set active session here as new plan nodes can be | ||
// created in the middle of the execution. | ||
|
@@ -279,7 +281,7 @@ case class AdaptiveSparkPlanExec( | |
// Use inputPlan logicalLink here in case some top level physical nodes may be removed | ||
// during `initialPlan` | ||
var currentLogicalPlan = inputPlan.logicalLink.get | ||
var result = createQueryStages(currentPhysicalPlan) | ||
var result = createQueryStages(fun, currentPhysicalPlan, firstRun = true) | ||
val events = new LinkedBlockingQueue[StageMaterializationEvent]() | ||
val errors = new mutable.ArrayBuffer[Throwable]() | ||
var stagesToReplace = Seq.empty[QueryStageExec] | ||
|
@@ -344,56 +346,53 @@ case class AdaptiveSparkPlanExec( | |
if (errors.nonEmpty) { | ||
cleanUpAndThrowException(errors.toSeq, None) | ||
} | ||
|
||
// Try re-optimizing and re-planning. Adopt the new plan if its cost is equal to or less | ||
// than that of the current plan; otherwise keep the current physical plan together with | ||
// the current logical plan since the physical plan's logical links point to the logical | ||
// plan it has originated from. | ||
// Meanwhile, we keep a list of the query stages that have been created since last plan | ||
// update, which stands for the "semantic gap" between the current logical and physical | ||
// plans. And each time before re-planning, we replace the corresponding nodes in the | ||
// current logical plan with logical query stages to make it semantically in sync with | ||
// the current physical plan. Once a new plan is adopted and both logical and physical | ||
// plans are updated, we can clear the query stage list because at this point the two plans | ||
// are semantically and physically in sync again. | ||
val logicalPlan = replaceWithQueryStagesInLogicalPlan(currentLogicalPlan, stagesToReplace) | ||
val afterReOptimize = reOptimize(logicalPlan) | ||
if (afterReOptimize.isDefined) { | ||
val (newPhysicalPlan, newLogicalPlan) = afterReOptimize.get | ||
val origCost = costEvaluator.evaluateCost(currentPhysicalPlan) | ||
val newCost = costEvaluator.evaluateCost(newPhysicalPlan) | ||
if (newCost < origCost || | ||
(newCost == origCost && currentPhysicalPlan != newPhysicalPlan)) { | ||
lazy val plans = | ||
sideBySide(currentPhysicalPlan.treeString, newPhysicalPlan.treeString).mkString("\n") | ||
logOnLevel(log"Plan changed:\n${MDC(QUERY_PLAN, plans)}") | ||
cleanUpTempTags(newPhysicalPlan) | ||
currentPhysicalPlan = newPhysicalPlan | ||
currentLogicalPlan = newLogicalPlan | ||
stagesToReplace = Seq.empty[QueryStageExec] | ||
if (!currentPhysicalPlan.isInstanceOf[ResultQueryStageExec]) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why do we need to skip There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Result stage is already the last step, there is nothing to reoptimize. |
||
// Try re-optimizing and re-planning. Adopt the new plan if its cost is equal to or less | ||
// than that of the current plan; otherwise keep the current physical plan together with | ||
// the current logical plan since the physical plan's logical links point to the logical | ||
// plan it has originated from. | ||
// Meanwhile, we keep a list of the query stages that have been created since last plan | ||
// update, which stands for the "semantic gap" between the current logical and physical | ||
// plans. And each time before re-planning, we replace the corresponding nodes in the | ||
// current logical plan with logical query stages to make it semantically in sync with | ||
// the current physical plan. Once a new plan is adopted and both logical and physical | ||
// plans are updated, we can clear the query stage list because at this point the two | ||
// plans are semantically and physically in sync again. | ||
val logicalPlan = replaceWithQueryStagesInLogicalPlan(currentLogicalPlan, stagesToReplace) | ||
val afterReOptimize = reOptimize(logicalPlan) | ||
if (afterReOptimize.isDefined) { | ||
val (newPhysicalPlan, newLogicalPlan) = afterReOptimize.get | ||
val origCost = costEvaluator.evaluateCost(currentPhysicalPlan) | ||
val newCost = costEvaluator.evaluateCost(newPhysicalPlan) | ||
if (newCost < origCost || | ||
(newCost == origCost && currentPhysicalPlan != newPhysicalPlan)) { | ||
lazy val plans = sideBySide( | ||
currentPhysicalPlan.treeString, newPhysicalPlan.treeString).mkString("\n") | ||
logOnLevel(log"Plan changed:\n${MDC(QUERY_PLAN, plans)}") | ||
cleanUpTempTags(newPhysicalPlan) | ||
currentPhysicalPlan = newPhysicalPlan | ||
currentLogicalPlan = newLogicalPlan | ||
stagesToReplace = Seq.empty[QueryStageExec] | ||
} | ||
} | ||
} | ||
// Now that some stages have finished, we can try creating new stages. | ||
result = createQueryStages(currentPhysicalPlan) | ||
result = createQueryStages(fun, currentPhysicalPlan, firstRun = false) | ||
} | ||
|
||
// Run the final plan when there's no more unfinished stages. | ||
currentPhysicalPlan = applyPhysicalRules( | ||
optimizeQueryStage(result.newPlan, isFinalStage = true), | ||
postStageCreationRules(supportsColumnar), | ||
Some((planChangeLogger, "AQE Post Stage Creation"))) | ||
_isFinalPlan = true | ||
executionId.foreach(onUpdatePlan(_, Seq(currentPhysicalPlan))) | ||
currentPhysicalPlan | ||
} | ||
_isFinalPlan = true | ||
finalPlanUpdate | ||
// Dereference the result so it can be GCed. After this resultStage.isMaterialized will return | ||
// false, which is expected. If we want to collect result again, we should invoke | ||
// `withFinalPlanUpdate` and pass another result handler and we will create a new result stage. | ||
cloud-fan marked this conversation as resolved.
Show resolved
Hide resolved
|
||
currentPhysicalPlan.asInstanceOf[ResultQueryStageExec].resultOption.getAndUpdate(_ => None) | ||
.get.asInstanceOf[T] | ||
} | ||
|
||
// Use a lazy val to avoid this being called more than once. | ||
@transient private lazy val finalPlanUpdate: Unit = { | ||
// Subqueries that don't belong to any query stage of the main query will execute after the | ||
// last UI update in `getFinalPhysicalPlan`, so we need to update UI here again to make sure | ||
// the newly generated nodes of those subqueries are updated. | ||
if (shouldUpdatePlan && currentPhysicalPlan.exists(_.subqueries.nonEmpty)) { | ||
// Do final plan update after result stage has materialized. | ||
if (shouldUpdatePlan) { | ||
liuzqt marked this conversation as resolved.
Show resolved
Hide resolved
|
||
getExecutionId.foreach(onUpdatePlan(_, Seq.empty)) | ||
} | ||
logOnLevel(log"Final plan:\n${MDC(QUERY_PLAN, currentPhysicalPlan)}") | ||
|
@@ -426,13 +425,6 @@ case class AdaptiveSparkPlanExec( | |
} | ||
} | ||
|
||
private def withFinalPlanUpdate[T](fun: SparkPlan => T): T = { | ||
val plan = getFinalPhysicalPlan() | ||
val result = fun(plan) | ||
finalPlanUpdate | ||
result | ||
} | ||
|
||
protected override def stringArgs: Iterator[Any] = Iterator(s"isFinalPlan=$isFinalPlan") | ||
|
||
override def generateTreeString( | ||
|
@@ -521,6 +513,66 @@ case class AdaptiveSparkPlanExec( | |
this.inputPlan == obj.asInstanceOf[AdaptiveSparkPlanExec].inputPlan | ||
} | ||
|
||
/** | ||
* We separate stage creation of result and non-result stages because there are several edge cases | ||
* of result stage creation: | ||
* - existing ResultQueryStage created in previous `withFinalPlanUpdate`. | ||
* - the root node is a non-result query stage and we have to create query result stage on top of | ||
* it. | ||
* - we create a non-result query stage as root node and the stage is immediately materialized | ||
* due to stage resue, therefore we have to create a result stage right after. | ||
* | ||
* This method wraps around `createNonResultQueryStages`, the general logic is: | ||
* - Early return if ResultQueryStageExec already created before. | ||
* - Create non result query stage if possible. | ||
* - Try to create result query stage when there is no new non-result query stage created and all | ||
* stages are materialized. | ||
*/ | ||
private def createQueryStages( | ||
resultHandler: SparkPlan => Any, | ||
plan: SparkPlan, | ||
firstRun: Boolean): CreateStageResult = { | ||
plan match { | ||
// 1. ResultQueryStageExec is already created, no need to create non-result stages | ||
case resultStage @ ResultQueryStageExec(_, optimizedPlan, _) => | ||
assertStageNotFailed(resultStage) | ||
if (firstRun) { | ||
// There is already an existing ResultQueryStage created in previous `withFinalPlanUpdate` | ||
// e.g, when we do `df.collect` multiple times. Here we create a new result stage to | ||
// execute it again, as the handler function can be different. | ||
val newResultStage = ResultQueryStageExec(currentStageId, optimizedPlan, resultHandler) | ||
currentStageId += 1 | ||
setLogicalLinkForNewQueryStage(newResultStage, optimizedPlan) | ||
CreateStageResult(newPlan = newResultStage, | ||
allChildStagesMaterialized = false, | ||
newStages = Seq(newResultStage)) | ||
} else { | ||
// We will hit this branch after we've created result query stage in the AQE loop, we | ||
// should do nothing. | ||
CreateStageResult(newPlan = resultStage, | ||
allChildStagesMaterialized = resultStage.isMaterialized, | ||
newStages = Seq.empty) | ||
} | ||
case _ => | ||
// 2. Create non result query stage | ||
val result = createNonResultQueryStages(plan) | ||
var allNewStages = result.newStages | ||
var newPlan = result.newPlan | ||
var allChildStagesMaterialized = result.allChildStagesMaterialized | ||
// 3. Create result stage | ||
if (allNewStages.isEmpty && allChildStagesMaterialized) { | ||
val resultStage = newResultQueryStage(resultHandler, newPlan) | ||
newPlan = resultStage | ||
allChildStagesMaterialized = false | ||
allNewStages :+= resultStage | ||
} | ||
CreateStageResult( | ||
newPlan = newPlan, | ||
allChildStagesMaterialized = allChildStagesMaterialized, | ||
newStages = allNewStages) | ||
} | ||
} | ||
|
||
/** | ||
* This method is called recursively to traverse the plan tree bottom-up and create a new query | ||
* stage or try reusing an existing stage if the current node is an [[Exchange]] node and all of | ||
|
@@ -531,7 +583,7 @@ case class AdaptiveSparkPlanExec( | |
* 2) Whether the child query stages (if any) of the current node have all been materialized. | ||
* 3) A list of the new query stages that have been created. | ||
*/ | ||
private def createQueryStages(plan: SparkPlan): CreateStageResult = plan match { | ||
private def createNonResultQueryStages(plan: SparkPlan): CreateStageResult = plan match { | ||
case e: Exchange => | ||
// First have a quick check in the `stageCache` without having to traverse down the node. | ||
context.stageCache.get(e.canonicalized) match { | ||
|
@@ -544,7 +596,7 @@ case class AdaptiveSparkPlanExec( | |
newStages = if (isMaterialized) Seq.empty else Seq(stage)) | ||
|
||
case _ => | ||
val result = createQueryStages(e.child) | ||
val result = createNonResultQueryStages(e.child) | ||
val newPlan = e.withNewChildren(Seq(result.newPlan)).asInstanceOf[Exchange] | ||
// Create a query stage only when all the child query stages are ready. | ||
if (result.allChildStagesMaterialized) { | ||
|
@@ -588,14 +640,28 @@ case class AdaptiveSparkPlanExec( | |
if (plan.children.isEmpty) { | ||
CreateStageResult(newPlan = plan, allChildStagesMaterialized = true, newStages = Seq.empty) | ||
} else { | ||
val results = plan.children.map(createQueryStages) | ||
val results = plan.children.map(createNonResultQueryStages) | ||
CreateStageResult( | ||
newPlan = plan.withNewChildren(results.map(_.newPlan)), | ||
allChildStagesMaterialized = results.forall(_.allChildStagesMaterialized), | ||
liuzqt marked this conversation as resolved.
Show resolved
Hide resolved
|
||
newStages = results.flatMap(_.newStages)) | ||
} | ||
} | ||
|
||
private def newResultQueryStage( | ||
resultHandler: SparkPlan => Any, | ||
plan: SparkPlan): ResultQueryStageExec = { | ||
// Run the final plan when there's no more unfinished stages. | ||
val optimizedRootPlan = applyPhysicalRules( | ||
optimizeQueryStage(plan, isFinalStage = true), | ||
postStageCreationRules(supportsColumnar), | ||
Some((planChangeLogger, "AQE Post Stage Creation"))) | ||
val resultStage = ResultQueryStageExec(currentStageId, optimizedRootPlan, resultHandler) | ||
currentStageId += 1 | ||
setLogicalLinkForNewQueryStage(resultStage, plan) | ||
resultStage | ||
} | ||
|
||
private def newQueryStage(plan: SparkPlan): QueryStageExec = { | ||
val queryStage = plan match { | ||
case e: Exchange => | ||
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so when we call df.collect multi-times, we will re-optimize final stage multi-times. It is due to for each call we need to wrap new ResultQueryStageExec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case we construct
QueryResultStageExec
directly and won't re-optimize it: https://github.com/apache/spark/pull/49715/files#diff-ec42cd27662f3f528832c298a60fffa1d341feb04aa1d8c80044b70cbe0ebbfcR536