-
Notifications
You must be signed in to change notification settings - Fork 28.7k
[SPARK-20768][PYSPARK][ML] Expose numPartitions (expert) param of PySpark FPGrowth. #18058
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thanks, @yanboliang. |
Jenkins, test this please. |
There seems something wrong with CI. I saw the same non-response/delay of CI once again since last month. Thanks, @yanboliang . |
Jenkins, ok to test |
Test build #77228 has finished for PR 18058 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One minor comment, otherwise, LGTM. Thanks.
python/pyspark/ml/fpm.py
Outdated
@@ -49,6 +49,32 @@ def getMinSupport(self): | |||
return self.getOrDefault(self.minSupport) | |||
|
|||
|
|||
class HasNumPartitions(Params): | |||
""" | |||
Mixin for param support. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mixin for param numPartitions: Number of partitions (at least 1) used by parallel FP-growth.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
modified.
python/pyspark/ml/fpm.py
Outdated
numPartitions = Param( | ||
Params._dummy(), | ||
"numPartitions", | ||
"""Number of partitions (at least 1) used by parallel FP-growth. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using """
to wrap doc here will get \n
in generated Python API docs. You can update to use "
referring to discussion at here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this need to be scrubbed ? I think we have """
everywhere
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
replaced.
python/pyspark/ml/fpm.py
Outdated
|
||
def getNumPartitions(self): | ||
""" | ||
Gets the value of numPartitions or its default value. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:py:attr:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added.
Test build #77334 has finished for PR 18058 at commit
|
@facaiy Could you resolve merge conflicts? Then I can get this in. Thanks. |
Resolved. By the way, |
I personally prefer merging when the PR is still in progress - it preserves the commit history for reviewers. |
Test build #77369 has finished for PR 18058 at commit
|
…park FPGrowth. ## What changes were proposed in this pull request? Expose numPartitions (expert) param of PySpark FPGrowth. ## How was this patch tested? + [x] Pass all unit tests. Author: Yan Facai (颜发才) <[email protected]> Closes #18058 from facaiy/ENH/pyspark_fpg_add_num_partition. (cherry picked from commit 139da11) Signed-off-by: Yanbo Liang <[email protected]>
Merged into master and branch-2.2. Thanks for all. |
What changes were proposed in this pull request?
Expose numPartitions (expert) param of PySpark FPGrowth.
How was this patch tested?