Yinxusen spark 11893 cleanups #5

jkbradley · 2016-03-28T20:04:08Z

No description provided.

jkbradley · 2016-03-28T20:04:30Z

@yinxusen Please review and merge if it looks ok to you. Thanks!

yinxusen · 2016-03-28T20:16:37Z

Thanks, check it now

Sent from my iPhone

On Mar 28, 2016, at 13:04, jkbradley [email protected] wrote:

@yinxusen Please review and merge if it looks ok to you. Thanks!

—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub

yinxusen · 2016-03-28T20:19:22Z

Merged, thanks so much!

## What changes were proposed in this pull request? This PR aims to optimize GroupExpressions by removing repeating expressions. `RemoveRepetitionFromGroupExpressions` is added. **Before** ```scala scala> sql("select a+1 from values 1,2 T(a) group by a+1, 1+a, A+1, 1+A").explain() == Physical Plan == WholeStageCodegen : +- TungstenAggregate(key=[(a#0 + 1)apache#6,(1 + a#0)apache#7,(A#0 + 1)apache#8,(1 + A#0)apache#9], functions=[], output=[(a + 1)#5]) : +- INPUT +- Exchange hashpartitioning((a#0 + 1)apache#6, (1 + a#0)apache#7, (A#0 + 1)apache#8, (1 + A#0)apache#9, 200), None +- WholeStageCodegen : +- TungstenAggregate(key=[(a#0 + 1) AS (a#0 + 1)apache#6,(1 + a#0) AS (1 + a#0)apache#7,(A#0 + 1) AS (A#0 + 1)apache#8,(1 + A#0) AS (1 + A#0)apache#9], functions=[], output=[(a#0 + 1)apache#6,(1 + a#0)apache#7,(A#0 + 1)apache#8,(1 + A#0)apache#9]) : +- INPUT +- LocalTableScan [a#0], [[1],[2]] ``` **After** ```scala scala> sql("select a+1 from values 1,2 T(a) group by a+1, 1+a, A+1, 1+A").explain() == Physical Plan == WholeStageCodegen : +- TungstenAggregate(key=[(a#0 + 1)apache#6], functions=[], output=[(a + 1)#5]) : +- INPUT +- Exchange hashpartitioning((a#0 + 1)apache#6, 200), None +- WholeStageCodegen : +- TungstenAggregate(key=[(a#0 + 1) AS (a#0 + 1)apache#6], functions=[], output=[(a#0 + 1)apache#6]) : +- INPUT +- LocalTableScan [a#0], [[1],[2]] ``` ## How was this patch tested? Pass the Jenkins tests (with a new testcase) Author: Dongjoon Hyun <[email protected]> Closes apache#12590 from dongjoon-hyun/SPARK-14830.

## What changes were proposed in this pull request? Implements `eval()` method for expression `AssertNotNull` so that we can convert local projection on LocalRelation to another LocalRelation. ### Before change: ``` scala> import org.apache.spark.sql.catalyst.dsl.expressions._ scala> import org.apache.spark.sql.catalyst.expressions.objects.AssertNotNull scala> import org.apache.spark.sql.Column scala> case class A(a: Int) scala> Seq((A(1),2)).toDS().select(new Column(AssertNotNull("_1".attr, Nil))).explain java.lang.UnsupportedOperationException: Only code-generated evaluation is supported. at org.apache.spark.sql.catalyst.expressions.objects.AssertNotNull.eval(objects.scala:850) ... ``` ### After the change: ``` scala> Seq((A(1),2)).toDS().select(new Column(AssertNotNull("_1".attr, Nil))).explain(true) == Parsed Logical Plan == 'Project [assertnotnull('_1) AS assertnotnull(_1)#5] +- LocalRelation [_1#2, _2#3] == Analyzed Logical Plan == assertnotnull(_1): struct<a:int> Project [assertnotnull(_1#2) AS assertnotnull(_1)#5] +- LocalRelation [_1#2, _2#3] == Optimized Logical Plan == LocalRelation [assertnotnull(_1)#5] == Physical Plan == LocalTableScan [assertnotnull(_1)#5] ``` ## How was this patch tested? Unit test. Author: Sean Zhong <[email protected]> Closes apache#14486 from clockfly/assertnotnull_eval.

Refine transform from Spark Dataset to ArrowRecordBatch

jkbradley added 3 commits March 28, 2016 12:55

Cleanups around validator read/write

69e4af2

more cleanups

550f4ad

added doc to save/loadImpl

53397c2

jkbradley mentioned this pull request Mar 28, 2016

[SPARK-11893] Model export/import for spark.ml: TrainValidationSplit apache/spark#9971

Closed

yinxusen merged commit f9172d1 into yinxusen:SPARK-11893 Mar 28, 2016

jkbradley deleted the yinxusen-SPARK-11893 branch March 28, 2016 20:20

yinxusen pushed a commit that referenced this pull request Nov 22, 2016

Merge pull request #5 from yinxusen/wip-toPandas_with_arrow-SPARK-13534

053e3a6

Refine transform from Spark Dataset to ArrowRecordBatch

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Yinxusen spark 11893 cleanups #5

Yinxusen spark 11893 cleanups #5

Uh oh!

jkbradley commented Mar 28, 2016

Uh oh!

jkbradley commented Mar 28, 2016

Uh oh!

yinxusen commented Mar 28, 2016

Uh oh!

yinxusen commented Mar 28, 2016

Uh oh!

Uh oh!

Yinxusen spark 11893 cleanups #5

Yinxusen spark 11893 cleanups #5

Uh oh!

Conversation

jkbradley commented Mar 28, 2016

Uh oh!

jkbradley commented Mar 28, 2016

Uh oh!

yinxusen commented Mar 28, 2016

Uh oh!

yinxusen commented Mar 28, 2016

Uh oh!

Uh oh!