Skip to content

[SPARK-30200][SQL] Add ExplainMode for Dataset.explain #26829

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

maropu
Copy link
Member

@maropu maropu commented Dec 10, 2019

What changes were proposed in this pull request?

This pr intends to add ExplainMode for explaining Dataset/DataFrame with a given format mode (ExplainMode). ExplainMode has four types along with the SQL EXPLAIN command: Simple, Extended, Codegen, Cost, and Formatted.

For example, this pr enables users to explain DataFrame/Dataset with the FORMATTED format implemented in #24759;

scala> spark.range(10).groupBy("id").count().explain(ExplainMode.Formatted)
== Physical Plan ==
* HashAggregate (3)
+- * HashAggregate (2)
   +- * Range (1)

(1) Range [codegen id : 1]
Output: [id#0L]
     
(2) HashAggregate [codegen id : 1]
Input: [id#0L]
     
(3) HashAggregate [codegen id : 1]
Input: [id#0L, count#8L]

This comes from the @cloud-fan suggestion.

Why are the changes needed?

To follow the SQL EXPLAIN command.

Does this PR introduce any user-facing change?

No, this is just for a new API in Dataset.

How was this patch tested?

Add tests in ExplainSuite.

*/
def explain(extended: Boolean): Unit = {
def explain(mode: ExplainMode): Unit = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about retain the old api and add a new api ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ulysses-you . @maropu already did. Please see line 564.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, we should keep this. Thanks for the comment, @dongjoon-hyun

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see it.

@SparkQA
Copy link

SparkQA commented Dec 10, 2019

Test build #115091 has finished for PR 26829 at commit e8c4af1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -132,13 +131,15 @@ case class DataWritingCommandExec(cmd: DataWritingCommand, child: SparkPlan)
* (but do NOT actually execute it).
*
* {{{
* EXPLAIN (EXTENDED | CODEGEN) SELECT * FROM ...
* EXPLAIN (EXTENDED | CODEGEN | COST | FORMATTED) SELECT * FROM ...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for fixing this together.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Merged to master.

@maropu
Copy link
Member Author

maropu commented Dec 11, 2019

Thanks for the check, @dongjoon-hyun !

*/
Extended,
/**
* Extended mode means that when printing explain for a DataFrame, if generated codes are
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Codegen mode ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh... I'll do follow-up, thanks!

*/
Codegen,
/**
* Extended mode means that when printing explain for a DataFrame, if plan node statistics are
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same.

dongjoon-hyun pushed a commit that referenced this pull request Dec 11, 2019
### What changes were proposed in this pull request?

This pr is a follow-up of #26829 to fix typos in ExplainMode.

### Why are the changes needed?

For better docs.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

N/A

Closes #26851 from maropu/SPARK-30200-FOLLOWUP.

Authored-by: Takeshi Yamamuro <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
HyukjinKwon pushed a commit that referenced this pull request Dec 13, 2019
### What changes were proposed in this pull request?

This pr intends to support explain modes implemented in #26829 for PySpark.

### Why are the changes needed?

For better debugging info. in PySpark dataframes.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Added UTs.

Closes #26861 from maropu/ExplainModeInPython.

Authored-by: Takeshi Yamamuro <[email protected]>
Signed-off-by: HyukjinKwon <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants