-
Notifications
You must be signed in to change notification settings - Fork 28.7k
[SPARK-30200][SQL] Add ExplainMode for Dataset.explain #26829
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
*/ | ||
def explain(extended: Boolean): Unit = { | ||
def explain(mode: ExplainMode): Unit = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about retain the old api and add a new api ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ulysses-you . @maropu already did. Please see line 564.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, we should keep this. Thanks for the comment, @dongjoon-hyun
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see it.
Test build #115091 has finished for PR 26829 at commit
|
@@ -132,13 +131,15 @@ case class DataWritingCommandExec(cmd: DataWritingCommand, child: SparkPlan) | |||
* (but do NOT actually execute it). | |||
* | |||
* {{{ | |||
* EXPLAIN (EXTENDED | CODEGEN) SELECT * FROM ... | |||
* EXPLAIN (EXTENDED | CODEGEN | COST | FORMATTED) SELECT * FROM ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for fixing this together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM. Merged to master.
Thanks for the check, @dongjoon-hyun ! |
*/ | ||
Extended, | ||
/** | ||
* Extended mode means that when printing explain for a DataFrame, if generated codes are |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Codegen mode ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh... I'll do follow-up, thanks!
*/ | ||
Codegen, | ||
/** | ||
* Extended mode means that when printing explain for a DataFrame, if plan node statistics are |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same.
### What changes were proposed in this pull request? This pr is a follow-up of #26829 to fix typos in ExplainMode. ### Why are the changes needed? For better docs. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? N/A Closes #26851 from maropu/SPARK-30200-FOLLOWUP. Authored-by: Takeshi Yamamuro <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
### What changes were proposed in this pull request? This pr intends to support explain modes implemented in #26829 for PySpark. ### Why are the changes needed? For better debugging info. in PySpark dataframes. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Added UTs. Closes #26861 from maropu/ExplainModeInPython. Authored-by: Takeshi Yamamuro <[email protected]> Signed-off-by: HyukjinKwon <[email protected]>
What changes were proposed in this pull request?
This pr intends to add
ExplainMode
for explainingDataset/DataFrame
with a given format mode (ExplainMode
).ExplainMode
has four types along with the SQL EXPLAIN command:Simple
,Extended
,Codegen
,Cost
, andFormatted
.For example, this pr enables users to explain DataFrame/Dataset with the
FORMATTED
format implemented in #24759;This comes from the @cloud-fan suggestion.
Why are the changes needed?
To follow the SQL EXPLAIN command.
Does this PR introduce any user-facing change?
No, this is just for a new API in Dataset.
How was this patch tested?
Add tests in
ExplainSuite
.