-
Notifications
You must be signed in to change notification settings - Fork 28.8k
[SPARK-50739][SQL] Recursive CTE. Analyzer changes to unravel and resolve the recursion components. #49351
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-50739][SQL] Recursive CTE. Analyzer changes to unravel and resolve the recursion components. #49351
Conversation
…ocs.google.com/document/d/1qcEJxqoXcr5cSt6HgIQjWQSqhfkSaVYkoDHsg5oxXp4/edit . Changes in ResolveWithCTE.scala to have the analyzer grok recursive anchors. Introduction of UnionLoop and UnionLoopRef logical plan classes.
cteDef | ||
} | ||
} else { | ||
if (cteDef.recursionAnchor.nonEmpty) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how can this happen? i.e. A non-recursive CTE relation contains recursionAnchor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you believe it is non-recursive, it is within if (cteDEf.recursive) block?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh sorry I misread the code. Now the question becomes: why do we empty out the recursionAnchor
if the CTE def is resolved?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
recursionAnchor
removed, comment is now moot.
* mapping between the original and reference sequences are symmetric. | ||
*/ | ||
private def rewriteConstraints( | ||
reference: Seq[Attribute], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: 4 spaces indentation. Please follow the code style of the previous code: https://github.com/apache/spark/pull/49351/files#diff-0ac2c89f8cc0d00e8fe7717b01697f36f20fe8abf2def09bcfde0ad84b30e467L552
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
* @param limit An optional limit that can be pushed down to the node to stop the loop earlier. | ||
*/ | ||
case class UnionLoop( | ||
id: Long, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: 4 spaces indentation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
* results. | ||
*/ | ||
case class UnionLoopRef( | ||
loopId: Long, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
...alyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
Outdated
Show resolved
Hide resolved
After |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please take another look.
cteDef | ||
} | ||
} else { | ||
if (cteDef.recursionAnchor.nonEmpty) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you believe it is non-recursive, it is within if (cteDEf.recursive) block?
* mapping between the original and reference sequences are symmetric. | ||
*/ | ||
private def rewriteConstraints( | ||
reference: Seq[Attribute], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
* @param limit An optional limit that can be pushed down to the node to stop the loop earlier. | ||
*/ | ||
case class UnionLoop( | ||
id: Long, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
* results. | ||
*/ | ||
case class UnionLoopRef( | ||
loopId: Long, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
...alyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
Outdated
Show resolved
Hide resolved
…oop, and a recursive CTERelationRef into UnionLoopRef.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will get back to recursionAnchor variable in the next round, I would like to understand the benefits of removing it since it's removal is quite hairy.
cteDef | ||
} | ||
} else { | ||
if (cteDef.recursionAnchor.nonEmpty) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
recursionAnchor
removed, comment is now moot.
…ic of detecting and storing fully and resolved CTE Definitions. Add error cases.
…ecursion achor, instead locate it under CTE child via pattern matching when it is needed. recursiveAnchorResolved() now returns anchor iff it is resolved and can be used to populate and read from a temporary map of CTE definitions.
Please take another look. CTERelationDef does not contain anchor any longer -- when needed it is fetched from its child via pattern matching. Code is greatly simplified and all previous convoluted questions on recursionAnchor are now moot. I added several exceptions for the unsupported cases of Union under the CTE Definition. Substitution rules for UnionLoop/Ref are added for 4 cases of Union under CTE Definition. CTERelationDef change has broken some tests, will work on those now. |
Please update the PR title to have a full sentence. |
// Project yet), leaving us with cases of SubqueryAlias->Union and SubqueryAlias-> | ||
// UnresolvedSubqueryColumnAliases->Union. The same applies to Distinct Union. | ||
cteDef.failAnalysis( | ||
errorClass = "INVALID_RECURSIVE_CTE", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we add a new error class to error-conditions.json
and a new function to QueryCompilationErrors
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes let's please do this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
cteDefs.foreach { cteDef => | ||
if (cteDef.resolved) { | ||
cteDefMap.put(cteDef.id, cteDef) | ||
val newCTEDefs = cteDefs.map { cteDef => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels like this map
body should have two distinct cases - recursive and not recursive. We can rewrite it like this:
cteDefs.map {
case cteDef if !cteDef.isRecursive =>
...
case cteDef =>
...
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
val newCTEDef = if (cteDef.recursive) { | ||
cteDef.child match { | ||
// Substitutions to UnionLoop and UnionLoopRef. | ||
case a @ SubqueryAlias(_, Union(Seq(anchor, recursion), false, false)) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know that developers in Catalyst really enjoy one-letter variables in matches, but it does not feel like a good code health.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed, though this code may be changed to address the other feedback.
val newCTEDef = if (cteDef.recursive) { | ||
cteDef.child match { | ||
// Substitutions to UnionLoop and UnionLoopRef. | ||
case a @ SubqueryAlias(_, Union(Seq(anchor, recursion), false, false)) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can introduce an extractor object to reduce complexity here:
object ReplaceUnionWithUnionLoop {
def unapply(plan: LogicalPlan): Option[UnionLoop] = plan match {
case union: Union(Seq(anchor, recursion), false, false) =>
Some(UnionLoop(cteDef.id, anchor, transformRefs(recursion)))
case distinctUnion: Distinct(Union(Seq(anchor, recursion), false, false)) =>
Some(UnionLoop(cteDef.id, Distinct(anchor), Except(transformRefs(recursion), UnionLoopRef(cteDef.id, cteDef.output, true), false)))
case _ =>
None
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 this code seems duplicated, we can use a helper to dedup it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will address this one in the next round.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change was not possible. Regardless the comment is now moot as we changed the underlying code.
@@ -37,21 +38,150 @@ object ResolveWithCTE extends Rule[LogicalPlan] { | |||
} | |||
} | |||
|
|||
// Substitute CTERelationRef with UnionLoopRef. | |||
private def transformRefs(plan: LogicalPlan) = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we place methods in a top-down order as they are used? More natural for reading.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed to be more readable and moved below.
@@ -37,21 +38,150 @@ object ResolveWithCTE extends Rule[LogicalPlan] { | |||
} | |||
} | |||
|
|||
// Substitute CTERelationRef with UnionLoopRef. | |||
private def transformRefs(plan: LogicalPlan) = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
private def transformRefs(plan: LogicalPlan) = { | |
private def replaceSimpleRefsWithUnionLoopRefs(plan: LogicalPlan) = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -37,21 +38,150 @@ object ResolveWithCTE extends Rule[LogicalPlan] { | |||
} | |||
} | |||
|
|||
// Substitute CTERelationRef with UnionLoopRef. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please explain in the method doc why exactly are we replacing all the simple refs with union refs under a UnionLoop
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, let's expand all the comments in this PR heavily to give a lot of context about the total algorithm to be performed, steps taken, etc.
ref.copy(_resolved = true, output = cteDef.output, isStreaming = cteDef.isStreaming) | ||
} else { | ||
// In the non-recursive case, cteDefMap contains only resolved Definitions. | ||
cteDef.failAnalysis( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It changes the non-recursive behavior - if the def is unresolved now we would throw an error. Also, completely unrelated to the problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the non-recursive case:
We are pulling this def out from cteDefMap, and it is in that map precisely because it is resolved. The map insertion code above is:
if (newCTEDef.resolved) {
cteDefMap.put(newCTEDef.id, newCTEDef)
}
This is a sanity check that is enforcing the invariant.
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveWithCTE.scala
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this!!
@@ -762,4 +762,15 @@ object QueryPlan extends PredicateHelper { | |||
case e: AnalysisException => append(e.toString) | |||
} | |||
} | |||
|
|||
/** | |||
* Generate detailed field string with different format based on type of input value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this comment and method name are a bit generic; could we expand them to mention what type of field we are referring to here, and when this might be used? can we give a brief example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
...alyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
Outdated
Show resolved
Hide resolved
...alyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
Outdated
Show resolved
Hide resolved
...alyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
Show resolved
Hide resolved
@@ -37,21 +38,150 @@ object ResolveWithCTE extends Rule[LogicalPlan] { | |||
} | |||
} | |||
|
|||
// Substitute CTERelationRef with UnionLoopRef. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, let's expand all the comments in this PR heavily to give a lot of context about the total algorithm to be performed, steps taken, etc.
val newCTEDef = if (cteDef.recursive) { | ||
cteDef.child match { | ||
// Substitutions to UnionLoop and UnionLoopRef. | ||
case a @ SubqueryAlias(_, Union(Seq(anchor, recursion), false, false)) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 this code seems duplicated, we can use a helper to dedup it
// Project yet), leaving us with cases of SubqueryAlias->Union and SubqueryAlias-> | ||
// UnresolvedSubqueryColumnAliases->Union. The same applies to Distinct Union. | ||
cteDef.failAnalysis( | ||
errorClass = "INVALID_RECURSIVE_CTE", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes let's please do this
…handle errors. Simplified pattern matching. Added comments to generateFieldString. Added more comemnts to basicLogicalOperators.scala
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I deferred a few pieces of feedback for the next round. Several changes incorporated here including
- Added error handlers in error-conditions.json and QueryCompilationError
- Simplified pattern matching.
- Added comments to generateFieldString.
- Added more comments to basicLogicalOperators.scala
...alyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
Show resolved
Hide resolved
...alyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
Outdated
Show resolved
Hide resolved
...alyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
Outdated
Show resolved
Hide resolved
@@ -762,4 +762,15 @@ object QueryPlan extends PredicateHelper { | |||
case e: AnalysisException => append(e.toString) | |||
} | |||
} | |||
|
|||
/** | |||
* Generate detailed field string with different format based on type of input value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
ref.copy(_resolved = true, output = cteDef.output, isStreaming = cteDef.isStreaming) | ||
} else { | ||
// In the non-recursive case, cteDefMap contains only resolved Definitions. | ||
cteDef.failAnalysis( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the non-recursive case:
We are pulling this def out from cteDefMap, and it is in that map precisely because it is resolved. The map insertion code above is:
if (newCTEDef.resolved) {
cteDefMap.put(newCTEDef.id, newCTEDef)
}
This is a sanity check that is enforcing the invariant.
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveWithCTE.scala
Show resolved
Hide resolved
// Project yet), leaving us with cases of SubqueryAlias->Union and SubqueryAlias-> | ||
// UnresolvedSubqueryColumnAliases->Union. The same applies to Distinct Union. | ||
cteDef.failAnalysis( | ||
errorClass = "INVALID_RECURSIVE_CTE", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
val newCTEDef = if (cteDef.recursive) { | ||
cteDef.child match { | ||
// Substitutions to UnionLoop and UnionLoopRef. | ||
case a @ SubqueryAlias(_, Union(Seq(anchor, recursion), false, false)) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will address this one in the next round.
val newCTEDef = if (cteDef.recursive) { | ||
cteDef.child match { | ||
// Substitutions to UnionLoop and UnionLoopRef. | ||
case a @ SubqueryAlias(_, Union(Seq(anchor, recursion), false, false)) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed, though this code may be changed to address the other feedback.
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveWithCTE.scala
Show resolved
Hide resolved
private def transformRefs(plan: LogicalPlan) = { | ||
plan.transformWithPruning(_.containsPattern(CTE)) { | ||
case r: CTERelationRef if r.recursive => | ||
UnionLoopRef(r.cteId, r.output, false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the output should be the output of recursive anchor plan.
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveWithCTE.scala
Outdated
Show resolved
Hide resolved
val newCTEDef = cteDef | ||
if (newCTEDef.resolved) { | ||
cteDefMap.put(newCTEDef.id, newCTEDef) | ||
} | ||
newCTEDef |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
val newCTEDef = cteDef | |
if (newCTEDef.resolved) { | |
cteDefMap.put(newCTEDef.id, newCTEDef) | |
} | |
newCTEDef | |
if (cteDef.resolved) { | |
cteDefMap.put(cteDef.id, cteDef) | |
} | |
cteDef |
...alyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveWithCTE.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveWithCTE.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveWithCTE.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveWithCTE.scala
Outdated
Show resolved
Hide resolved
…hes unsupported CTE UNION placements in ResolveWithCTE.
…o catalyst/plans/logical/cteOperators.scala. Added unittest for the recursive CTE analyzer.
Added Milan's analyzer unittest to this PR. Refactored CTE operators out of basicLogicalOperators into its own file. Added more comments. |
...atalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/ResolveRecursiveCTESuite.scala
Show resolved
Hide resolved
...atalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/ResolveRecursiveCTESuite.scala
Outdated
Show resolved
Hide resolved
...atalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/ResolveRecursiveCTESuite.scala
Outdated
Show resolved
Hide resolved
...atalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/ResolveRecursiveCTESuite.scala
Outdated
Show resolved
Hide resolved
...atalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/ResolveRecursiveCTESuite.scala
Show resolved
Hide resolved
...atalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/ResolveRecursiveCTESuite.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala
Outdated
Show resolved
Hide resolved
def getBeforePlan(cteDef: CTERelationDef): LogicalPlan = { | ||
val anchor = Project(Seq(Alias(Literal(1), "1")()), OneRowRelation()) | ||
|
||
val recursionPart = Project(anchor.output, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way the tree is decomposed here (using local variables) is very hard to read. Maybe we can actually have a solid tree structure instead of. this decomposition?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed offline I will do it in a follow up PR.
sql/core/src/main/scala/org/apache/spark/sql/execution/ExplainUtils.scala
Show resolved
Hide resolved
…mplified ResolveRecursiveCTESuite by removing several Project's.
new AnalysisException( | ||
errorClass = "INVALID_RECURSIVE_CTE", | ||
messageParameters = Map( | ||
"error" -> error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't have this parameter any more.
thanks, merging to master/4.0! I'll create a followup PR shortly to do some cleanup |
…olve the recursion components ### What changes were proposed in this pull request? Instruction for reviewers https://docs.google.com/document/d/1qcEJxqoXcr5cSt6HgIQjWQSqhfkSaVYkoDHsg5oxXp4/edit Introduction of UnionLoop and UnionLoopRef logical plan classes. Changes in ResolveWithCTE.scala to have the analyzer grok recursive anchors. Specifically we substitute CTERelationRef with UnionLoopRef, and Union with UnionLoop. We untangle the dead loop in resolving where recursive CTE reference is referring to an unresolved CTE definition, which itself cannot be resolved as one of its descendants is an unresolved CTE reference. ### Why are the changes needed? Support for the recursive CTE. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Proposed changes in this PR are no-op. Tested ./build/sbt "test:testOnly org.apache.spark.sql.SQLQueryTestSuite" ./build/sbt "test:testOnly *PlanParserSuite" ### Was this patch authored or co-authored using generative AI tooling? No Closes #49351 from nemanjapetr-db/nemanjapetr-db/rcte3. Authored-by: Nemanja Petrovic <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit 3b114af) Signed-off-by: Wenchen Fan <[email protected]>
### What changes were proposed in this pull request? A followup of #49351 to simplify the test via dsl. ### Why are the changes needed? code cleanup ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? N/A ### Was this patch authored or co-authored using generative AI tooling? no Closes #49557 from cloud-fan/clean. Authored-by: Wenchen Fan <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
### What changes were proposed in this pull request? A followup of #49351 to simplify the test via dsl. ### Why are the changes needed? code cleanup ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? N/A ### Was this patch authored or co-authored using generative AI tooling? no Closes #49557 from cloud-fan/clean. Authored-by: Wenchen Fan <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit 66dd7dd) Signed-off-by: Dongjoon Hyun <[email protected]>
What changes were proposed in this pull request?
Instruction for reviewers https://docs.google.com/document/d/1qcEJxqoXcr5cSt6HgIQjWQSqhfkSaVYkoDHsg5oxXp4/edit
Introduction of UnionLoop and UnionLoopRef logical plan classes. Changes in ResolveWithCTE.scala to have the analyzer grok recursive anchors. Specifically we substitute CTERelationRef with UnionLoopRef, and Union with UnionLoop. We untangle the dead loop in resolving where recursive CTE reference is referring to an unresolved CTE definition, which itself cannot be resolved as one of its descendants is an unresolved CTE reference.
Why are the changes needed?
Support for the recursive CTE.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Proposed changes in this PR are no-op. Tested
./build/sbt "test:testOnly org.apache.spark.sql.SQLQueryTestSuite"
./build/sbt "test:testOnly *PlanParserSuite"
Was this patch authored or co-authored using generative AI tooling?
No