[SPARK-6550][SQL] Use analyzed plan in DataFrame #5217

marmbrus · 2015-03-26T21:44:38Z

This is based on bug and test case proposed by @viirya. See #5203 for a excellent description of the problem.

TLDR; The problem occurs because the function groupBy(String) calls resolve, which returns an AttributeReference. However, this AttributeReference is based on an analyzed plan which is thrown away. At execution time, we once again analyze the plan. However, in the case of self-joins, each call to analyze will produce a new tree for the left side of the join, rendering the previously returned AttributeReference invalid.

As a fix, I propose we keep the analyzed plan instead of the unresolved plan inside of a DataFrame.

SparkQA · 2015-03-26T22:35:42Z

Test build #29258 has finished for PR 5217 at commit dd4dec1.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-03-27T00:47:48Z

Test build #29261 has finished for PR 5217 at commit 1f98e2d.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

This is based on bug and test case proposed by viirya. See #5203 for a excellent description of the problem. TLDR; The problem occurs because the function `groupBy(String)` calls `resolve`, which returns an `AttributeReference`. However, this `AttributeReference` is based on an analyzed plan which is thrown away. At execution time, we once again analyze the plan. However, in the case of self-joins, each call to analyze will produce a new tree for the left side of the join, rendering the previously returned `AttributeReference` invalid. As a fix, I propose we keep the analyzed plan instead of the unresolved plan inside of a `DataFrame`. Author: Michael Armbrust <[email protected]> Closes #5217 from marmbrus/preanalyzer and squashes the following commits: 1f98e2d [Michael Armbrust] revert change dd4dec1 [Michael Armbrust] Use the analyzed plan in DataFrame 089c52e [Michael Armbrust] WIP (cherry picked from commit 5d9c37c) Signed-off-by: Michael Armbrust <[email protected]>

marmbrus added 2 commits March 26, 2015 12:13

WIP

089c52e

Use the analyzed plan in DataFrame

dd4dec1

marmbrus mentioned this pull request Mar 26, 2015

[SPARK-6550][SQL] Add PreAnalyzer to keep logical plan consistent across DataFrame #5203

Closed

revert change

1f98e2d

asfgit closed this in 5d9c37c Mar 27, 2015

viirya mentioned this pull request Mar 28, 2015

[SPARK-6586][SQL] Add the capability of retrieving original logical plan of DataFrame #5241

Closed

marmbrus deleted the preanalyzer branch August 3, 2015 22:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-6550][SQL] Use analyzed plan in DataFrame #5217

[SPARK-6550][SQL] Use analyzed plan in DataFrame #5217

Uh oh!

marmbrus commented Mar 26, 2015

Uh oh!

SparkQA commented Mar 26, 2015

Uh oh!

SparkQA commented Mar 27, 2015

Uh oh!

Uh oh!

[SPARK-6550][SQL] Use analyzed plan in DataFrame #5217

[SPARK-6550][SQL] Use analyzed plan in DataFrame #5217

Uh oh!

Conversation

marmbrus commented Mar 26, 2015

Uh oh!

SparkQA commented Mar 26, 2015

Uh oh!

SparkQA commented Mar 27, 2015

Uh oh!

Uh oh!