[SPARK-30200][SQL] Add ExplainMode for Dataset.explain #26829

maropu · 2019-12-10T07:58:28Z

What changes were proposed in this pull request?

This pr intends to add ExplainMode for explaining Dataset/DataFrame with a given format mode (ExplainMode). ExplainMode has four types along with the SQL EXPLAIN command: Simple, Extended, Codegen, Cost, and Formatted.

For example, this pr enables users to explain DataFrame/Dataset with the FORMATTED format implemented in #24759;

scala> spark.range(10).groupBy("id").count().explain(ExplainMode.Formatted)
== Physical Plan ==
* HashAggregate (3)
+- * HashAggregate (2)
   +- * Range (1)

(1) Range [codegen id : 1]
Output: [id#0L]
     
(2) HashAggregate [codegen id : 1]
Input: [id#0L]
     
(3) HashAggregate [codegen id : 1]
Input: [id#0L, count#8L]

This comes from the @cloud-fan suggestion.

Why are the changes needed?

To follow the SQL EXPLAIN command.

Does this PR introduce any user-facing change?

No, this is just for a new API in Dataset.

How was this patch tested?

Add tests in ExplainSuite.

ulysses-you · 2019-12-10T11:22:50Z

sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala

   */
-  def explain(extended: Boolean): Unit = {
+  def explain(mode: ExplainMode): Unit = {


How about retain the old api and add a new api ?

@ulysses-you . @maropu already did. Please see line 564.

Yea, we should keep this. Thanks for the comment, @dongjoon-hyun

Oh, I see it.

SparkQA · 2019-12-10T11:56:49Z

Test build #115091 has finished for PR 26829 at commit e8c4af1.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun · 2019-12-10T17:48:41Z

sql/core/src/main/scala/org/apache/spark/sql/execution/command/commands.scala

@@ -132,13 +131,15 @@ case class DataWritingCommandExec(cmd: DataWritingCommand, child: SparkPlan)
 * (but do NOT actually execute it).
 *
 * {{{
- *   EXPLAIN (EXTENDED | CODEGEN) SELECT * FROM ...
+ *   EXPLAIN (EXTENDED | CODEGEN | COST | FORMATTED) SELECT * FROM ...


Thank you for fixing this together.

dongjoon-hyun

+1, LGTM. Merged to master.

maropu · 2019-12-11T05:34:22Z

Thanks for the check, @dongjoon-hyun !

ulysses-you · 2019-12-11T10:14:20Z

sql/core/src/main/java/org/apache/spark/sql/ExplainMode.java

+   */
+  Extended,
+  /**
+   * Extended mode means that when printing explain for a DataFrame, if generated codes are


Codegen mode ?

oh... I'll do follow-up, thanks!

ulysses-you · 2019-12-11T10:14:42Z

sql/core/src/main/java/org/apache/spark/sql/ExplainMode.java

+   */
+  Codegen,
+  /**
+   * Extended mode means that when printing explain for a DataFrame, if plan node statistics are


### What changes were proposed in this pull request? This pr is a follow-up of #26829 to fix typos in ExplainMode. ### Why are the changes needed? For better docs. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? N/A Closes #26851 from maropu/SPARK-30200-FOLLOWUP. Authored-by: Takeshi Yamamuro <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>

### What changes were proposed in this pull request? This pr intends to support explain modes implemented in #26829 for PySpark. ### Why are the changes needed? For better debugging info. in PySpark dataframes. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Added UTs. Closes #26861 from maropu/ExplainModeInPython. Authored-by: Takeshi Yamamuro <[email protected]> Signed-off-by: HyukjinKwon <[email protected]>

Fix

e8c4af1

ulysses-you reviewed Dec 10, 2019

View reviewed changes

dongjoon-hyun added the SQL label Dec 10, 2019

dongjoon-hyun reviewed Dec 10, 2019

View reviewed changes

dongjoon-hyun approved these changes Dec 10, 2019

View reviewed changes

dongjoon-hyun closed this in 6103cf1 Dec 10, 2019

ulysses-you reviewed Dec 11, 2019

View reviewed changes

maropu mentioned this pull request Dec 11, 2019

[SPARK-30200][SQL][FOLLOWUP] Fix typo in ExplainMode #26851

Closed

maropu mentioned this pull request Dec 12, 2019

[SPARK-30231][SQL][PYTHON] Support explain mode in PySpark df.explain #26861

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-30200][SQL] Add ExplainMode for Dataset.explain #26829

[SPARK-30200][SQL] Add ExplainMode for Dataset.explain #26829

maropu commented Dec 10, 2019

Uh oh!

ulysses-you Dec 10, 2019

Uh oh!

dongjoon-hyun Dec 10, 2019

Uh oh!

maropu Dec 11, 2019

Uh oh!

ulysses-you Dec 11, 2019

Uh oh!

SparkQA commented Dec 10, 2019

Uh oh!

dongjoon-hyun Dec 10, 2019

Uh oh!

dongjoon-hyun left a comment

Uh oh!

maropu commented Dec 11, 2019

Uh oh!

ulysses-you Dec 11, 2019

Uh oh!

maropu Dec 11, 2019

Uh oh!

ulysses-you Dec 11, 2019

Uh oh!

Uh oh!

[SPARK-30200][SQL] Add ExplainMode for Dataset.explain #26829

[SPARK-30200][SQL] Add ExplainMode for Dataset.explain #26829

Conversation

maropu commented Dec 10, 2019

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Dec 10, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun left a comment

Choose a reason for hiding this comment

Uh oh!

maropu commented Dec 11, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!