[SPARK-33850][SQL] EXPLAIN FORMATTED doesn't show the plan for subqueries if AQE is enabled #30855

sarutak · 2020-12-19T17:31:33Z

What changes were proposed in this pull request?

This PR fixes an issue that when AQE is enabled, EXPLAIN FORMATTED doesn't show the plan for subqueries.

val df = spark.range(1, 100)
df.createTempView("df")
spark.sql("SELECT (SELECT min(id) AS v FROM df)").explain("FORMATTED")

== Physical Plan ==
AdaptiveSparkPlan (3)
+- Project (2)
 +- Scan OneRowRelation (1)


(1) Scan OneRowRelation
Output: []
Arguments: ParallelCollectionRDD[0] at explain at <console>:24, OneRowRelation, UnknownPartitioning(0)

(2) Project
Output [1]: [Subquery subquery#3, [id=#20] AS scalarsubquery()#5L]
Input: []

(3) AdaptiveSparkPlan
Output [1]: [scalarsubquery()#5L]
Arguments: isFinalPlan=false

After this change, the plan for the subquerie is shown.

== Physical Plan ==
* Project (2)
+- * Scan OneRowRelation (1)


(1) Scan OneRowRelation [codegen id : 1]
Output: []
Arguments: ParallelCollectionRDD[0] at explain at <console>:24, OneRowRelation, UnknownPartitioning(0)

(2) Project [codegen id : 1]
Output [1]: [Subquery scalar-subquery#3, [id=#24] AS scalarsubquery()#5L]
Input: []

===== Subqueries =====

Subquery:1 Hosting operator id = 2 Hosting Expression = Subquery scalar-subquery#3, [id=#24]
* HashAggregate (6)
+- Exchange (5)
   +- * HashAggregate (4)
      +- * Range (3)


(3) Range [codegen id : 1]
Output [1]: [id#0L]
Arguments: Range (1, 100, step=1, splits=Some(12))

(4) HashAggregate [codegen id : 1]
Input [1]: [id#0L]
Keys: []
Functions [1]: [partial_min(id#0L)]
Aggregate Attributes [1]: [min#7L]
Results [1]: [min#8L]

(5) Exchange
Input [1]: [min#8L]
Arguments: SinglePartition, ENSURE_REQUIREMENTS, [id=#20]

(6) HashAggregate [codegen id : 2]
Input [1]: [min#8L]
Keys: []
Functions [1]: [min(id#0L)]
Aggregate Attributes [1]: [min(id#0L)#4L]
Results [1]: [min(id#0L)#4L AS v#2L]

Why are the changes needed?

For better debuggability.

Does this PR introduce any user-facing change?

Yes. Users can see the formatted plan for subqueries.

How was this patch tested?

New test.

…abled.

sql/core/src/test/scala/org/apache/spark/sql/ExplainSuite.scala

sql/core/src/main/scala/org/apache/spark/sql/execution/ExplainUtils.scala

SparkQA · 2020-12-19T18:15:19Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37679/

SparkQA · 2020-12-19T18:43:26Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37679/

SparkQA · 2020-12-19T18:52:06Z

Test build #133079 has finished for PR 30855 at commit 2ee8d19.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-12-19T19:00:23Z

Test build #133080 has finished for PR 30855 at commit 577dd8f.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-12-19T19:20:20Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37680/

SparkQA · 2020-12-19T19:53:59Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37680/

SparkQA · 2020-12-19T21:20:58Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37682/

SparkQA · 2020-12-19T21:54:12Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37682/

dongjoon-hyun

+1, LGTM. Thank you, @sarutak .
Merged to master/3.1.

…ries if AQE is enabled ### What changes were proposed in this pull request? This PR fixes an issue that when AQE is enabled, EXPLAIN FORMATTED doesn't show the plan for subqueries. ```scala val df = spark.range(1, 100) df.createTempView("df") spark.sql("SELECT (SELECT min(id) AS v FROM df)").explain("FORMATTED") == Physical Plan == AdaptiveSparkPlan (3) +- Project (2) +- Scan OneRowRelation (1) (1) Scan OneRowRelation Output: [] Arguments: ParallelCollectionRDD[0] at explain at <console>:24, OneRowRelation, UnknownPartitioning(0) (2) Project Output [1]: [Subquery subquery#3, [id=#20] AS scalarsubquery()#5L] Input: [] (3) AdaptiveSparkPlan Output [1]: [scalarsubquery()#5L] Arguments: isFinalPlan=false ``` After this change, the plan for the subquerie is shown. ```scala == Physical Plan == * Project (2) +- * Scan OneRowRelation (1) (1) Scan OneRowRelation [codegen id : 1] Output: [] Arguments: ParallelCollectionRDD[0] at explain at <console>:24, OneRowRelation, UnknownPartitioning(0) (2) Project [codegen id : 1] Output [1]: [Subquery scalar-subquery#3, [id=#24] AS scalarsubquery()#5L] Input: [] ===== Subqueries ===== Subquery:1 Hosting operator id = 2 Hosting Expression = Subquery scalar-subquery#3, [id=#24] * HashAggregate (6) +- Exchange (5) +- * HashAggregate (4) +- * Range (3) (3) Range [codegen id : 1] Output [1]: [id#0L] Arguments: Range (1, 100, step=1, splits=Some(12)) (4) HashAggregate [codegen id : 1] Input [1]: [id#0L] Keys: [] Functions [1]: [partial_min(id#0L)] Aggregate Attributes [1]: [min#7L] Results [1]: [min#8L] (5) Exchange Input [1]: [min#8L] Arguments: SinglePartition, ENSURE_REQUIREMENTS, [id=#20] (6) HashAggregate [codegen id : 2] Input [1]: [min#8L] Keys: [] Functions [1]: [min(id#0L)] Aggregate Attributes [1]: [min(id#0L)#4L] Results [1]: [min(id#0L)#4L AS v#2L] ``` ### Why are the changes needed? For better debuggability. ### Does this PR introduce _any_ user-facing change? Yes. Users can see the formatted plan for subqueries. ### How was this patch tested? New test. Closes #30855 from sarutak/fix-aqe-explain. Authored-by: Kousuke Saruta <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit 70da86a) Signed-off-by: Dongjoon Hyun <[email protected]>

dongjoon-hyun · 2020-12-19T22:11:59Z

cc @cloud-fan , @maropu , @HyukjinKwon

SparkQA · 2020-12-20T01:05:44Z

Test build #133082 has finished for PR 30855 at commit 13fb225.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

maropu · 2020-12-20T06:01:16Z

sql/core/src/test/scala/org/apache/spark/sql/ExplainSuite.scala

+            .write
+            .format("parquet")
+            .mode("overwrite")
+            .saveAsTable("df1")


nit: df1 -> df (this is a nit comment, so I think we don't need a follow-up pr to fix it)

maropu · 2020-12-20T06:01:39Z

sql/core/src/test/scala/org/apache/spark/sql/ExplainSuite.scala

+            .saveAsTable("df1")
+
+          val sqlText = "EXPLAIN FORMATTED SELECT (SELECT min(id) FROM df1) as v"
+          val expected_pattern1 =


nit: we don't the number in the name, I think. expected_pattern1 -> expected_pattern

maropu · 2020-12-20T06:03:37Z

sql/core/src/test/scala/org/apache/spark/sql/ExplainSuite.scala

+    withTable("df1") {
+      withSQLConf(SQLConf.ADAPTIVE_EXECUTION_ENABLED.key -> "true") {
+        withTable("df1") {
+          spark.range(1, 100)


I think its better to use temporary views in tests where possible.

### What changes were proposed in this pull request? This PR mainly improves and cleans up the test code introduced in #30855 based on the comment. The test code is actually taken from another test `explain formatted - check presence of subquery in case of DPP` so this PR cleans the code too ( removed unnecessary `withTable`). ### Why are the changes needed? To keep the test code clean. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? `ExplainSuite` passes. Closes #30861 from sarutak/followup-SPARK-33850. Authored-by: Kousuke Saruta <[email protected]> Signed-off-by: Takeshi Yamamuro <[email protected]>

HyukjinKwon · 2020-12-21T11:40:24Z

Nice, @Ngone51 and @maryannxue who took a look at related issues FYI

### What changes were proposed in this pull request? This PR mainly improves and cleans up the test code introduced in #30855 based on the comment. The test code is actually taken from another test `explain formatted - check presence of subquery in case of DPP` so this PR cleans the code too ( removed unnecessary `withTable`). ### Why are the changes needed? To keep the test code clean. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? `ExplainSuite` passes. Closes #30861 from sarutak/followup-SPARK-33850. Authored-by: Kousuke Saruta <[email protected]> Signed-off-by: Takeshi Yamamuro <[email protected]> (cherry picked from commit 3c8be39) Signed-off-by: Dongjoon Hyun <[email protected]>

sarutak added 3 commits December 19, 2020 07:24

Fix an issue that EXPLAIN FORMATTED doesn't show subquery when APE en…

165cd91

…abled.

Add test.

35f127a

Modify test.

2ee8d19

github-actions bot added the SQL label Dec 19, 2020

dongjoon-hyun reviewed Dec 19, 2020

View reviewed changes

sql/core/src/test/scala/org/apache/spark/sql/ExplainSuite.scala Outdated Show resolved Hide resolved

Add JIRA ID to the test.

6cbe6a6

dongjoon-hyun reviewed Dec 19, 2020

View reviewed changes

sql/core/src/main/scala/org/apache/spark/sql/execution/ExplainUtils.scala Outdated Show resolved Hide resolved

Change the variable name "adp" to "a"

577dd8f

Modify explain-aqe.sql.out to comply with the change.

13fb225

dongjoon-hyun approved these changes Dec 19, 2020

View reviewed changes

dongjoon-hyun closed this in 70da86a Dec 19, 2020

maropu reviewed Dec 20, 2020

View reviewed changes

sarutak mentioned this pull request Dec 20, 2020

[SPARK-33850][SQL][FOLLOWUP] Improve and cleanup the test code #30861

Closed

[SPARK-33850][SQL] EXPLAIN FORMATTED doesn't show the plan for subqueries if AQE is enabled #30855

[SPARK-33850][SQL] EXPLAIN FORMATTED doesn't show the plan for subqueries if AQE is enabled #30855

Uh oh!

Conversation

sarutak commented Dec 19, 2020 • edited by dongjoon-hyun Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

Uh oh!

Uh oh!

SparkQA commented Dec 19, 2020

Uh oh!

SparkQA commented Dec 19, 2020

Uh oh!

SparkQA commented Dec 19, 2020

Uh oh!

SparkQA commented Dec 19, 2020

Uh oh!

SparkQA commented Dec 19, 2020

Uh oh!

SparkQA commented Dec 19, 2020

Uh oh!

SparkQA commented Dec 19, 2020

Uh oh!

SparkQA commented Dec 19, 2020

Uh oh!

dongjoon-hyun left a comment

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun commented Dec 19, 2020

Uh oh!

SparkQA commented Dec 20, 2020

Uh oh!

maropu Dec 20, 2020

Choose a reason for hiding this comment

Uh oh!

maropu Dec 20, 2020

Choose a reason for hiding this comment

Uh oh!

maropu Dec 20, 2020

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon commented Dec 21, 2020

Uh oh!

Uh oh!

sarutak commented Dec 19, 2020 •

edited by dongjoon-hyun

Loading