[SPARK-50892][SQL]Add UnionLoopExec, physical operator for recursion, to perform execution of recursive queries #49955

Pajaraja · 2025-02-14T09:10:00Z

What changes were proposed in this pull request?

This PR introduces UnionLoopExec, physical operator for recursion: UnionLoop is converted to UnionLoopExec during execution.
For now only UNION ALL case is supported.
The execution is performed by iteratively substituting UnionLoopRef with the plan obtained in previous step, as long as we are still generating new elements

In addition, small changes to Optimizer.scala are added to push down the Limit to UnionLoopExec in case it is present in the query.

Why are the changes needed?

Support for recursive CTE.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Added golden files tests for various use cases of recursive CTEs: cte-recursion.sql and with.sql (tests are run with SQLQueryTestSuite). The outputs of the tests are checked with the outputs of the same (or syntactically slightly adapted) queries in Snowflake and PostgreSQL engines.
Added two tests with parameterized identifier with recursive CTEs to ParametersSuite.

Was this patch authored or co-authored using generative AI tooling?

No.

…mizer/InlineCTE.scala.rej

common/utils/src/main/resources/error/error-conditions.json

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/cteOperators.scala

sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala

cloud-fan · 2025-02-18T13:29:10Z

sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala

+    override val output: Seq[Attribute],
+    limit: Option[Int] = None) extends LeafExecNode {
+
+  override def innerChildren: Seq[QueryPlan[_]] = Seq(anchor, recursion)


why do they have to be inner children?

Could you please elaborate what you mean by this, please? To me this makes sense that these are inner children as they represent subqueries, but I don't completely understand the difference between children and innerChildren.

They can't be "regular" children because they're logical, and the exec node is physical.

sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala

cloud-fan · 2025-02-18T13:41:42Z

sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala

+    var currentLevel = 1
+
+    // Main loop for obtaining the result of the recursive query.
+    while (prevCount > 0 && (limit.isEmpty || currentLimit > 0)) {


one idea: the key here is to get the row count of the current iteration, so that we can decide if we should keep iterating or not. The shuffle is only to save recomputing of the query. But for very simple queries (e.g. local scan with simple filter/project), shuffle is probably more expensive than recomputing. We should detect such case and avoid shuffle.

Added skipping shuffle when the recursion is simple in some cases. Would appreciate it if we could have more discussion on detecting these cases!

Maybe we could use InjectRuntimeFilter.isSimpleExpression() to check filter and project expressions.

Thinking about it more, I think a better and simpler idea is to skip shuffle if the optimized plan is LocalRelation, where the data is already materialized in memory.

common/utils/src/main/resources/error/error-conditions.json

cloud-fan · 2025-03-05T08:38:26Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala

@@ -1032,6 +1044,42 @@ object ColumnPruning extends Rule[LogicalPlan] {
        p
      }

+    case p @ Project(_, ul: UnionLoop) =>
+      if (!ul.outputSet.subsetOf(p.references)) {
+        val newAnchorChildProj = prunedChild(ul.anchor, p.references)


Suggested change

val newAnchorChildProj = prunedChild(ul.anchor, p.references)

val newAnchor = prunedChild(ul.anchor, p.references)

cloud-fan · 2025-03-05T08:52:12Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala

+    case p @ Project(_, ul: UnionLoop) =>
+      if (!ul.outputSet.subsetOf(p.references)) {
+        val newAnchorChildProj = prunedChild(ul.anchor, p.references)
+        val neededIndicesListRef = {


can you briefly describe the logic here? I thought we could follow how Union is handled

val anchor = ul.anchor val newAnchor = prunedChild(anchor, p.references) val newOutput = newAnchor.output val selected = ul.recursion.output.zipWithIndex.filter { case (a, i) => newOutput.contains(anchor.output(i)) }.map(_._1) val newRecursion = Project(selected, ul.recursion) p.copy(child = ul.withNewChildren(Seq(newAnchor, newRecursion)))

This doesn't work because at every step we need to produce something readable for the next iteration of UnionLoopRef (except if we truly do not need it). An example is the Fibonacci generation:

WITH RECURSIVE fibonacci AS ( VALUES (0, 1) AS t(a, b) UNION ALL SELECT b, a + b FROM fibonacci WHERE a < 10 ) SELECT a FROM fibonacci ORDER BY a;

This will fail with the above approach, as it will prune the recursive child to only return the next fibonacci number in the first iteration of the recursion. However, when trying to calculate the next fibonacci number, since we only memorized the last one, we won't be able to preform the calculation.

Now as for my approach, its pretty similar, just that we merge two sets of projection references to obtain everything we need to keep: the project right above UnionLoop (which we also consider in your approach), OR a project right above UnionLoopRef (which may not exist in which case we don't prune at all).

For this I create two sets of indices (one for each project) and merge them. Then I take make the pruned children in a way similar to what you suggested (I also started from modifying how pruning Union is handled). The reason I opted for sets of indices over sets of objects is because the columns down from UnionLoopRef might get renamed or given a different id in the process, but since the output of UnionLoopRef and UnionLoop should be of the same arity (and have the same types - they should logically correspond to each other!), the indices should correspond to the things we need accordingly.

I do notice that I made a small mistake in creating indicesForRef which should be fixed now.

sql/core/src/main/scala/org/apache/spark/sql/execution/RecursiveCTEExecution.scala

cloud-fan · 2025-03-05T09:01:00Z

sql/core/src/main/scala/org/apache/spark/sql/execution/RecursiveCTEExecution.scala

+    // condition of while loop down (limit.isEmpty will be true).
+    var globalLimitNum = globalLimit.getOrElse(0)
+    var localLimitNum = localLimit.getOrElse(0)
+    var currentLimit = Math.max(globalLimitNum, localLimitNum * numPartitions)


I don't think localLimitNum * numPartitions can be treated as a global limit. Think about this case:

The query result has 2 partitions. In each iteration, the first partition produces 1 million rows, and the second partition produces 1 row.

Let's say the local limit is 100. We need to iterate 100 times so that the second partition produces enough data, but the current code stops at the first iteration because 1 million already exceeds 100 * 2.

Is this true even though I apply repartition at the end? Not really sure how to proceed here, because it seems that if I don't apply repartition at the end, we get one partition for each iteration, and then the localLimit can never apply.

I don't really understand how the partitions interact with the iterations. Would love to discuss more about this local limit, and see whether we can come to a way to patch it.

common/utils/src/main/resources/error/error-conditions.json

sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala

cloud-fan · 2025-03-18T11:49:50Z

sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala

+      "unlimited.")
+    .version("4.0.0")
+    .intConf
+    .createWithDefault(1000)


do we really need a row limit?

Peter suggested it in the case that the num of rows grows exponentially, which I think makes sense. The default value should probably be bigger though.

yea 1000 is too small

I replaced it with 1.000.000 now, and replaced the golden file test to indeed grow exponentially.

sql/core/src/main/scala/org/apache/spark/sql/execution/UnionLoopExec.scala

Co-authored-by: Wenchen Fan <[email protected]>

…opExec.scala Co-authored-by: Wenchen Fan <[email protected]>

Co-authored-by: Wenchen Fan <[email protected]>

…opExec.scala Co-authored-by: Wenchen Fan <[email protected]>

peter-toth

Overall looks good to me, thanks for the changes and new test cases.

The only adjustment I would suggest is set the default cteRecursionRowLimit=1000 to higher as you mentioned.

…limit

…olden File

cloud-fan · 2025-03-19T14:46:13Z

sql/core/src/main/scala/org/apache/spark/sql/execution/UnionLoopExec.scala

+          Dataset.ofRows(session, Union(unionChildren.toSeq))
+        }
+      }
+      val coalescedDF = df.coalesce(numPartitions)


nit: we don't need to do coalesce for if (unionChildren.length == 1) branch.

cloud-fan · 2025-03-19T14:48:15Z

thanks, merging to master! Note: it's not merged to 4.0 as we need to solve the perf issues by supporting column pruning and optimizing small queries, which we are unlikely to make before the 4.0 release.

Apply Milan's already existing changes

74aefca

github-actions bot added the SQL label Feb 14, 2025

Pajaraja changed the title ~~Apply Milan's already existing changes~~ [WIP][SPARK-50892][SQL]Add UnionLoopExec, physical operator for recursion, to perform execution of recursive queries Feb 14, 2025

Pajaraja and others added 2 commits February 14, 2025 10:11

Delete sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/opti…

fe8f5d0

…mizer/InlineCTE.scala.rej

Add space to make uniform code

5101b1a

cloud-fan reviewed Feb 18, 2025

View reviewed changes

common/utils/src/main/resources/error/error-conditions.json Outdated Show resolved Hide resolved

cloud-fan reviewed Feb 18, 2025

View reviewed changes

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala Show resolved Hide resolved

cloud-fan reviewed Feb 18, 2025

View reviewed changes

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala Outdated Show resolved Hide resolved

cloud-fan reviewed Feb 18, 2025

View reviewed changes

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/cteOperators.scala Outdated Show resolved Hide resolved

cloud-fan reviewed Feb 18, 2025

View reviewed changes

sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala Outdated Show resolved Hide resolved

cloud-fan reviewed Feb 18, 2025

View reviewed changes

sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala Outdated Show resolved Hide resolved

cloud-fan reviewed Feb 18, 2025

View reviewed changes

pavle-martinovic_data added 9 commits February 24, 2025 18:12

Make changes according to part of Wenchen's comments

22ced7c

Seperate global and local limits in recursive CTEs

ebda64d

Fix compile error caused by typo

eef24d9

Add unionloop pruning

dda4e54

Fix pruning and regenerate golden files for different types of limits

e1d4932

Remove debug output

c968b0d

Separate UnionLoopExec into separate file

8187004

Add skip shuffle when the recursion is simple

7558695

Stylistic changes

a73d3c6

cloud-fan reviewed Mar 5, 2025

View reviewed changes

common/utils/src/main/resources/error/error-conditions.json Outdated Show resolved Hide resolved

cloud-fan reviewed Mar 5, 2025

View reviewed changes

sql/core/src/main/scala/org/apache/spark/sql/execution/RecursiveCTEExecution.scala Outdated Show resolved Hide resolved

cloud-fan reviewed Mar 5, 2025

View reviewed changes

sql/core/src/main/scala/org/apache/spark/sql/execution/RecursiveCTEExecution.scala Outdated Show resolved Hide resolved

cloud-fan reviewed Mar 5, 2025

View reviewed changes

sql/core/src/main/scala/org/apache/spark/sql/execution/RecursiveCTEExecution.scala Outdated Show resolved Hide resolved

cloud-fan reviewed Mar 5, 2025

View reviewed changes

sql/core/src/main/scala/org/apache/spark/sql/execution/RecursiveCTEExecution.scala Outdated Show resolved Hide resolved

cloud-fan reviewed Mar 5, 2025

View reviewed changes

Pajaraja requested review from cloud-fan and peter-toth March 18, 2025 11:20