-
Notifications
You must be signed in to change notification settings - Fork 28.7k
[SPARK-9014][SQL] Allow Python spark API to use built-in exponential operator #8658
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -151,6 +162,8 @@ def __init__(self, jc): | |||
__rdiv__ = _reverse_op("divide") | |||
__rtruediv__ = _reverse_op("divide") | |||
__rmod__ = _reverse_op("mod") | |||
__pow__ = _bin_func_op("pow") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this doesn't quite match the scala API (which I suppose isn't the end of the world), but would it possible make sense to have a similar functions.py file to match the scala API?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already have pow
in pyspark.sql.functions.
For here, it's easy to do like this:
from pyspark.sql.function import pow
__pow__ = pow
__rpow__ = lambda c, other: pow(other, c)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@davies, pow
function in pyspark.sql.functions
is created with function _create_binary_mathfunction
which uses Column
internally, thus it cannot be simply imported from pyspark.sql.fuction
So as I outlined below, there are two options: do it like I did or add pow
and **
implementation to Column
in Scala. In Scala you can reuse the same Pow
class as it does not depend on Column
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, I'd like to go with current approach.
We could change to _bin_func_op
to _pow
, use it for __pow__
and __rpow__
, it would be more clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can rename, but what if later we would implement something like __lshift__
or __rshift__
to allow the syntax df.a << 2
, then we would have to either rename _pow
back to _bin_func_op
and utilize it, or add one more functions. Now its clear that _bin_func_op
allows you to utilize binary function. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That make sense, but _bin_func_op expect that other
should be Column or float, it's not in a shape that we could easily reused for others.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. What if I replace
jc = other._jc if isinstance(other, Column) else float(other)
with
jc = other._jc if isinstance(other, Column) else _create_column_from_literal(other)
Would it still worth renaming to _pow
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds better, go for it, thanks!
@holdenk, there are two ways of implementing this:
In my opinion both options are possible, and second one might even be a bit better. I can easily switch to the second option, I've already implemented it locally. What is your view? |
@davies, could you take a look, please? |
Jenkins, OK to test. |
Please, also check out the implementation from last commit. In my opinion it is much more consistent. I just cannot implement |
@0x0FFF I think |
cc @rxin |
+1 on not having this for Scala. There is already a pow function that do pow(x, y). We should just do this for Python. |
Agree, with commit aecc0c2 I reverted to the first option and replaced |
LGTM, waiting for tests. |
Test build #1745 has finished for PR 8658 at commit
|
This PR addresses (SPARK-9014)[https://issues.apache.org/jira/browse/SPARK-9014]
Added functionality:
Column
object in Python now supports exponential operator**
Example:
Outputs: