[SPARK-9014][SQL] Allow Python spark API to use built-in exponential operator #8658

0x0FFF · 2015-09-08T21:36:07Z

This PR addresses (SPARK-9014)[https://issues.apache.org/jira/browse/SPARK-9014]
Added functionality: Column object in Python now supports exponential operator **
Example:

from pyspark.sql import *
df = sqlContext.createDataFrame([Row(a=2)])
df.select(3**df.a,df.a**3,df.a**df.a).collect()

Outputs:

[Row(POWER(3.0, a)=9.0, POWER(a, 3.0)=8.0, POWER(a, a)=4.0)]

…operator

holdenk · 2015-09-08T21:54:53Z

python/pyspark/sql/column.py

@@ -151,6 +162,8 @@ def __init__(self, jc):
    __rdiv__ = _reverse_op("divide")
    __rtruediv__ = _reverse_op("divide")
    __rmod__ = _reverse_op("mod")
+    __pow__ = _bin_func_op("pow")


So this doesn't quite match the scala API (which I suppose isn't the end of the world), but would it possible make sense to have a similar functions.py file to match the scala API?

We already have pow in pyspark.sql.functions.

For here, it's easy to do like this:

from pyspark.sql.function import pow __pow__ = pow __rpow__ = lambda c, other: pow(other, c)

@davies, pow function in pyspark.sql.functions is created with function _create_binary_mathfunction which uses Column internally, thus it cannot be simply imported from pyspark.sql.fuction
So as I outlined below, there are two options: do it like I did or add pow and ** implementation to Column in Scala. In Scala you can reuse the same Pow class as it does not depend on Column

I see, I'd like to go with current approach.

We could change to _bin_func_op to _pow, use it for __pow__ and __rpow__, it would be more clear.

I can rename, but what if later we would implement something like __lshift__ or __rshift__ to allow the syntax df.a << 2, then we would have to either rename _pow back to _bin_func_op and utilize it, or add one more functions. Now its clear that _bin_func_op allows you to utilize binary function. What do you think?

That make sense, but _bin_func_op expect that other should be Column or float, it's not in a shape that we could easily reused for others.

Agree. What if I replace

jc = other._jc if isinstance(other, Column) else float(other)

with

jc = other._jc if isinstance(other, Column) else _create_column_from_literal(other)

Would it still worth renaming to _pow?

This sounds better, go for it, thanks!

0x0FFF · 2015-09-09T09:17:44Z

@holdenk, there are two ways of implementing this:

The one I've done in this PR: adding functionality to utilize binary function operation from org.apache.spark.sql.functions to Column.py and utilize pow() function from there for __pow__ and __rpow__. Pros: with this approach you can add other functions from org.apache.spark.sql.functions to Column.py. Cons: non-consistent Column APIs between Python and Scala (to be fair, Scala does not have ** out of the box, so it is a question whether it is really an inconsistency)
Add implementation of ** and pow to org.apache.spark.sql.Column and utilize it in Column.py with existing _bin_op and _reverse_op. Pros: ** operation is added to Scala org.apache.spark.sql.Column as well, plus it turns into slightly less new code. Cons: this would cover only pow() implementation, if in the future you would need to utilize other function from org.apache.spark.sql.functions in Python you would have to change org.apache.spark.sql.Column first once again. Plus in Scala the case with 3**col("a") will not work as it would look for ** implementation in Int

In my opinion both options are possible, and second one might even be a bit better. I can easily switch to the second option, I've already implemented it locally. What is your view?

0x0FFF · 2015-09-10T06:47:56Z

@davies, could you take a look, please?

davies · 2015-09-10T17:21:02Z

Jenkins, OK to test.

0x0FFF · 2015-09-10T21:28:28Z

Please, also check out the implementation from last commit. In my opinion it is much more consistent. I just cannot implement _pow in column.py, it looks much like a duck tape. General _bin_func_op function with converting second parameter to Column type with lit call is better, but still in my opinion the implementation from commit 40cbcb4 is much better - code is easier to read, more consistent, and it is adding less code (only 4 lines, excluding comment and tests)

davies · 2015-09-10T21:36:53Z

@0x0FFF I think ** only make sense in Python, so we should not introduce ** into Scala (also pow).

davies · 2015-09-10T21:37:22Z

cc @rxin

rxin · 2015-09-10T22:56:24Z

+1 on not having this for Scala. There is already a pow function that do pow(x, y).

We should just do this for Python.

…) with lit()

0x0FFF · 2015-09-11T07:50:32Z

Agree, with commit aecc0c2 I reverted to the first option and replaced float with _create_column_from_literal as proposed

davies · 2015-09-11T21:36:52Z

LGTM, waiting for tests.

SparkQA · 2015-09-11T22:01:27Z

Test build #1745 has finished for PR 8658 at commit aecc0c2.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

[SPARK-9014][SQL] Allow Python spark API to use built-in exponential …

485e9ba

…operator

holdenk reviewed Sep 8, 2015
View reviewed changes

[SPARK-9014][SQL] Reverted to the previous option and replaced float(…

aecc0c2

…) with lit()

0x0FFF force-pushed the SPARK-9014 branch from 40cbcb4 to aecc0c2 Compare September 11, 2015 07:48

asfgit closed this in c34fc19 Sep 11, 2015

[SPARK-9014][SQL] Allow Python spark API to use built-in exponential operator #8658

[SPARK-9014][SQL] Allow Python spark API to use built-in exponential operator #8658

Uh oh!

Conversation

0x0FFF commented Sep 8, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

0x0FFF commented Sep 9, 2015

Uh oh!

0x0FFF commented Sep 10, 2015

Uh oh!

davies commented Sep 10, 2015

Uh oh!

0x0FFF commented Sep 10, 2015

Uh oh!

davies commented Sep 10, 2015

Uh oh!

davies commented Sep 10, 2015

Uh oh!

rxin commented Sep 10, 2015

Uh oh!

0x0FFF commented Sep 11, 2015

Uh oh!

davies commented Sep 11, 2015

Uh oh!

SparkQA commented Sep 11, 2015

Uh oh!

Uh oh!