Skip to content

[SPARK-20768][PYSPARK][ML] Expose numPartitions (expert) param of PySpark FPGrowth. #18058

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

facaiy
Copy link
Contributor

@facaiy facaiy commented May 22, 2017

What changes were proposed in this pull request?

Expose numPartitions (expert) param of PySpark FPGrowth.

How was this patch tested?

  • Pass all unit tests.

@facaiy
Copy link
Contributor Author

facaiy commented May 23, 2017

Thanks, @yanboliang.
Do you have any suggestion about testing the parameter?

@yanboliang
Copy link
Contributor

Jenkins, test this please.

@yanboliang
Copy link
Contributor

@srowen @MLnick Could you help to add @facaiy to whitelist? It seems we can't trigger this job currently. Thanks.

@facaiy
Copy link
Contributor Author

facaiy commented May 23, 2017

There seems something wrong with CI. I saw the same non-response/delay of CI once again since last month. Thanks, @yanboliang .

@felixcheung
Copy link
Member

Jenkins, ok to test

@SparkQA
Copy link

SparkQA commented May 23, 2017

Test build #77228 has finished for PR 18058 at commit 85882ae.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class HasNumPartitions(Params):

Copy link
Contributor

@yanboliang yanboliang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor comment, otherwise, LGTM. Thanks.

@@ -49,6 +49,32 @@ def getMinSupport(self):
return self.getOrDefault(self.minSupport)


class HasNumPartitions(Params):
"""
Mixin for param support.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mixin for param numPartitions: Number of partitions (at least 1) used by parallel FP-growth.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

modified.

numPartitions = Param(
Params._dummy(),
"numPartitions",
"""Number of partitions (at least 1) used by parallel FP-growth.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using """ to wrap doc here will get \n in generated Python API docs. You can update to use " referring to discussion at here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this need to be scrubbed ? I think we have """ everywhere

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replaced.


def getNumPartitions(self):
"""
Gets the value of numPartitions or its default value.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:py:attr:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added.

@SparkQA
Copy link

SparkQA commented May 25, 2017

Test build #77334 has finished for PR 18058 at commit 44267cb.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yanboliang
Copy link
Contributor

@facaiy Could you resolve merge conflicts? Then I can get this in. Thanks.

@facaiy
Copy link
Contributor Author

facaiy commented May 25, 2017

Resolved.

By the way,
Which one is preferable, rebase or merge?

@MLnick
Copy link
Contributor

MLnick commented May 25, 2017

I personally prefer merging when the PR is still in progress - it preserves the commit history for reviewers.

@SparkQA
Copy link

SparkQA commented May 25, 2017

Test build #77369 has finished for PR 18058 at commit fcaee9e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class DecisionTreeClassifierWrapperWriter(instance: DecisionTreeClassifierWrapper)
  • class DecisionTreeClassifierWrapperReader extends MLReader[DecisionTreeClassifierWrapper]
  • class DecisionTreeRegressorWrapperWriter(instance: DecisionTreeRegressorWrapper)
  • class DecisionTreeRegressorWrapperReader extends MLReader[DecisionTreeRegressorWrapper]
  • class HasMinSupport(Params):
  • class HasMinConfidence(Params):
  • case class UnresolvedHint(name: String, parameters: Seq[String], child: LogicalPlan)
  • case class ResolvedHint(child: LogicalPlan, hints: HintInfo = HintInfo())
  • case class HintInfo(

asfgit pushed a commit that referenced this pull request May 25, 2017
…park FPGrowth.

## What changes were proposed in this pull request?

Expose numPartitions (expert) param of PySpark FPGrowth.

## How was this patch tested?

+ [x] Pass all unit tests.

Author: Yan Facai (颜发才) <[email protected]>

Closes #18058 from facaiy/ENH/pyspark_fpg_add_num_partition.

(cherry picked from commit 139da11)
Signed-off-by: Yanbo Liang <[email protected]>
@yanboliang
Copy link
Contributor

Merged into master and branch-2.2. Thanks for all.

@asfgit asfgit closed this in 139da11 May 25, 2017
@facaiy facaiy deleted the ENH/pyspark_fpg_add_num_partition branch May 25, 2017 13:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants