Skip to content

Yinxusen spark 11893 cleanups #5

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Mar 28, 2016

Conversation

jkbradley
Copy link

No description provided.

@jkbradley
Copy link
Author

@yinxusen Please review and merge if it looks ok to you. Thanks!

@yinxusen
Copy link
Owner

Thanks, check it now

Sent from my iPhone

On Mar 28, 2016, at 13:04, jkbradley [email protected] wrote:

@yinxusen Please review and merge if it looks ok to you. Thanks!


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub

@yinxusen yinxusen merged commit f9172d1 into yinxusen:SPARK-11893 Mar 28, 2016
@yinxusen
Copy link
Owner

Merged, thanks so much!

@jkbradley jkbradley deleted the yinxusen-SPARK-11893 branch March 28, 2016 20:20
yinxusen pushed a commit that referenced this pull request May 6, 2016
## What changes were proposed in this pull request?

This PR aims to optimize GroupExpressions by removing repeating expressions. `RemoveRepetitionFromGroupExpressions` is added.

**Before**
```scala
scala> sql("select a+1 from values 1,2 T(a) group by a+1, 1+a, A+1, 1+A").explain()
== Physical Plan ==
WholeStageCodegen
:  +- TungstenAggregate(key=[(a#0 + 1)apache#6,(1 + a#0)apache#7,(A#0 + 1)apache#8,(1 + A#0)apache#9], functions=[], output=[(a + 1)#5])
:     +- INPUT
+- Exchange hashpartitioning((a#0 + 1)apache#6, (1 + a#0)apache#7, (A#0 + 1)apache#8, (1 + A#0)apache#9, 200), None
   +- WholeStageCodegen
      :  +- TungstenAggregate(key=[(a#0 + 1) AS (a#0 + 1)apache#6,(1 + a#0) AS (1 + a#0)apache#7,(A#0 + 1) AS (A#0 + 1)apache#8,(1 + A#0) AS (1 + A#0)apache#9], functions=[], output=[(a#0 + 1)apache#6,(1 + a#0)apache#7,(A#0 + 1)apache#8,(1 + A#0)apache#9])
      :     +- INPUT
      +- LocalTableScan [a#0], [[1],[2]]
```

**After**
```scala
scala> sql("select a+1 from values 1,2 T(a) group by a+1, 1+a, A+1, 1+A").explain()
== Physical Plan ==
WholeStageCodegen
:  +- TungstenAggregate(key=[(a#0 + 1)apache#6], functions=[], output=[(a + 1)#5])
:     +- INPUT
+- Exchange hashpartitioning((a#0 + 1)apache#6, 200), None
   +- WholeStageCodegen
      :  +- TungstenAggregate(key=[(a#0 + 1) AS (a#0 + 1)apache#6], functions=[], output=[(a#0 + 1)apache#6])
      :     +- INPUT
      +- LocalTableScan [a#0], [[1],[2]]
```

## How was this patch tested?

Pass the Jenkins tests (with a new testcase)

Author: Dongjoon Hyun <[email protected]>

Closes apache#12590 from dongjoon-hyun/SPARK-14830.
yinxusen pushed a commit that referenced this pull request Aug 12, 2016
## What changes were proposed in this pull request?

Implements `eval()` method for expression `AssertNotNull` so that we can convert local projection on LocalRelation to another LocalRelation.

### Before change:
```
scala> import org.apache.spark.sql.catalyst.dsl.expressions._
scala> import org.apache.spark.sql.catalyst.expressions.objects.AssertNotNull
scala> import org.apache.spark.sql.Column
scala> case class A(a: Int)
scala> Seq((A(1),2)).toDS().select(new Column(AssertNotNull("_1".attr, Nil))).explain

java.lang.UnsupportedOperationException: Only code-generated evaluation is supported.
  at org.apache.spark.sql.catalyst.expressions.objects.AssertNotNull.eval(objects.scala:850)
  ...
```

### After the change:
```
scala> Seq((A(1),2)).toDS().select(new Column(AssertNotNull("_1".attr, Nil))).explain(true)

== Parsed Logical Plan ==
'Project [assertnotnull('_1) AS assertnotnull(_1)#5]
+- LocalRelation [_1#2, _2#3]

== Analyzed Logical Plan ==
assertnotnull(_1): struct<a:int>
Project [assertnotnull(_1#2) AS assertnotnull(_1)#5]
+- LocalRelation [_1#2, _2#3]

== Optimized Logical Plan ==
LocalRelation [assertnotnull(_1)#5]

== Physical Plan ==
LocalTableScan [assertnotnull(_1)#5]
```

## How was this patch tested?

Unit test.

Author: Sean Zhong <[email protected]>

Closes apache#14486 from clockfly/assertnotnull_eval.
yinxusen pushed a commit that referenced this pull request Nov 22, 2016
Refine transform from Spark Dataset to ArrowRecordBatch
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants