-
Notifications
You must be signed in to change notification settings - Fork 28.7k
[SPARK-33850][SQL] EXPLAIN FORMATTED doesn't show the plan for subqueries if AQE is enabled #30855
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
sql/core/src/test/scala/org/apache/spark/sql/ExplainSuite.scala
Outdated
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/execution/ExplainUtils.scala
Outdated
Show resolved
Hide resolved
Kubernetes integration test starting |
Kubernetes integration test status success |
Test build #133079 has finished for PR 30855 at commit
|
Test build #133080 has finished for PR 30855 at commit
|
Kubernetes integration test starting |
Kubernetes integration test status failure |
Kubernetes integration test starting |
Kubernetes integration test status failure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM. Thank you, @sarutak .
Merged to master/3.1.
…ries if AQE is enabled ### What changes were proposed in this pull request? This PR fixes an issue that when AQE is enabled, EXPLAIN FORMATTED doesn't show the plan for subqueries. ```scala val df = spark.range(1, 100) df.createTempView("df") spark.sql("SELECT (SELECT min(id) AS v FROM df)").explain("FORMATTED") == Physical Plan == AdaptiveSparkPlan (3) +- Project (2) +- Scan OneRowRelation (1) (1) Scan OneRowRelation Output: [] Arguments: ParallelCollectionRDD[0] at explain at <console>:24, OneRowRelation, UnknownPartitioning(0) (2) Project Output [1]: [Subquery subquery#3, [id=#20] AS scalarsubquery()#5L] Input: [] (3) AdaptiveSparkPlan Output [1]: [scalarsubquery()#5L] Arguments: isFinalPlan=false ``` After this change, the plan for the subquerie is shown. ```scala == Physical Plan == * Project (2) +- * Scan OneRowRelation (1) (1) Scan OneRowRelation [codegen id : 1] Output: [] Arguments: ParallelCollectionRDD[0] at explain at <console>:24, OneRowRelation, UnknownPartitioning(0) (2) Project [codegen id : 1] Output [1]: [Subquery scalar-subquery#3, [id=#24] AS scalarsubquery()#5L] Input: [] ===== Subqueries ===== Subquery:1 Hosting operator id = 2 Hosting Expression = Subquery scalar-subquery#3, [id=#24] * HashAggregate (6) +- Exchange (5) +- * HashAggregate (4) +- * Range (3) (3) Range [codegen id : 1] Output [1]: [id#0L] Arguments: Range (1, 100, step=1, splits=Some(12)) (4) HashAggregate [codegen id : 1] Input [1]: [id#0L] Keys: [] Functions [1]: [partial_min(id#0L)] Aggregate Attributes [1]: [min#7L] Results [1]: [min#8L] (5) Exchange Input [1]: [min#8L] Arguments: SinglePartition, ENSURE_REQUIREMENTS, [id=#20] (6) HashAggregate [codegen id : 2] Input [1]: [min#8L] Keys: [] Functions [1]: [min(id#0L)] Aggregate Attributes [1]: [min(id#0L)#4L] Results [1]: [min(id#0L)#4L AS v#2L] ``` ### Why are the changes needed? For better debuggability. ### Does this PR introduce _any_ user-facing change? Yes. Users can see the formatted plan for subqueries. ### How was this patch tested? New test. Closes #30855 from sarutak/fix-aqe-explain. Authored-by: Kousuke Saruta <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit 70da86a) Signed-off-by: Dongjoon Hyun <[email protected]>
cc @cloud-fan , @maropu , @HyukjinKwon |
Test build #133082 has finished for PR 30855 at commit
|
.write | ||
.format("parquet") | ||
.mode("overwrite") | ||
.saveAsTable("df1") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: df1
-> df
(this is a nit comment, so I think we don't need a follow-up pr to fix it)
.saveAsTable("df1") | ||
|
||
val sqlText = "EXPLAIN FORMATTED SELECT (SELECT min(id) FROM df1) as v" | ||
val expected_pattern1 = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: we don't the number in the name, I think. expected_pattern1
-> expected_pattern
withTable("df1") { | ||
withSQLConf(SQLConf.ADAPTIVE_EXECUTION_ENABLED.key -> "true") { | ||
withTable("df1") { | ||
spark.range(1, 100) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think its better to use temporary views in tests where possible.
### What changes were proposed in this pull request? This PR mainly improves and cleans up the test code introduced in #30855 based on the comment. The test code is actually taken from another test `explain formatted - check presence of subquery in case of DPP` so this PR cleans the code too ( removed unnecessary `withTable`). ### Why are the changes needed? To keep the test code clean. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? `ExplainSuite` passes. Closes #30861 from sarutak/followup-SPARK-33850. Authored-by: Kousuke Saruta <[email protected]> Signed-off-by: Takeshi Yamamuro <[email protected]>
Nice, @Ngone51 and @maryannxue who took a look at related issues FYI |
### What changes were proposed in this pull request? This PR mainly improves and cleans up the test code introduced in #30855 based on the comment. The test code is actually taken from another test `explain formatted - check presence of subquery in case of DPP` so this PR cleans the code too ( removed unnecessary `withTable`). ### Why are the changes needed? To keep the test code clean. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? `ExplainSuite` passes. Closes #30861 from sarutak/followup-SPARK-33850. Authored-by: Kousuke Saruta <[email protected]> Signed-off-by: Takeshi Yamamuro <[email protected]> (cherry picked from commit 3c8be39) Signed-off-by: Dongjoon Hyun <[email protected]>
What changes were proposed in this pull request?
This PR fixes an issue that when AQE is enabled, EXPLAIN FORMATTED doesn't show the plan for subqueries.
After this change, the plan for the subquerie is shown.
Why are the changes needed?
For better debuggability.
Does this PR introduce any user-facing change?
Yes. Users can see the formatted plan for subqueries.
How was this patch tested?
New test.