Skip to content

[SPARK-4047] - Generate runtime warnings for example implementation of PageRank #2894

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from

Conversation

varadharajan
Copy link
Contributor

Based on SPARK-2434, this PR generates runtime warnings for example implementations (Python, Scala) of PageRank.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@srowen
Copy link
Member

srowen commented Oct 24, 2014

Are there other examples that should have the same warning? I think there are many more than this.

@varadharajan
Copy link
Contributor Author

Here are list of scala examples that i think is similar / naive implementation of algorithms from MLlib or graphx.

  1. LocalALS
  2. LocalFileLR
  3. LocalKMeans
  4. LocalLR
  5. SparkALS
  6. SparkHdfsLR
  7. SparkKMeans
  8. SparkLR
  9. SparkPageRank (*)
  10. SparkTachyonHdfsLR (*)

Python examples:

  1. ALS
  2. kmeans
  3. logistic_regression
  4. pagerank (*)

Java examples:

  1. JavaHdfsLR (*)
  2. JavaPageRank (*)

(*) - Examples with missing warnings. I've updated JIRA with these details and also added warning for them

I've also corrected class names of existing LR examples. They were pointing to org.apache.spark.mllib.classification.LogisticRegression instead of org.apache.spark.mllib.classification.LogisticRegressionModel

I've excluded examples that compute transitive closures on graphs because i'm was not able to find corresponding implementations in graphx. Please let me know if i'm missing something

1. JavaHdfsLR
2. JavaPageRank
3. SparkTachyonHdfsLR

b. Renamed references of org.apache.spark.mllib.classification.LogisticRegression to org.apache.spark.mllib.classification.LogisticRegressionModel
@davies
Copy link
Contributor

davies commented Oct 30, 2014

Jenkins, ok to test.

@SparkQA
Copy link

SparkQA commented Oct 30, 2014

Test build #22498 has started for PR 2894 at commit 252f595.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Oct 30, 2014

Test build #22498 has finished for PR 2894 at commit 252f595.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22498/
Test PASSed.

@davies
Copy link
Contributor

davies commented Nov 3, 2014

LGTM, thanks!

@varadharajan
Copy link
Contributor Author

Thanks :)

@JoshRosen
Copy link
Contributor

Since this is MLlib related, @mengxr or @jkbradley, could one of you do the final sign-off + commit on this? Thanks!

@jkbradley
Copy link
Member

@varadharajan Thanks for adding the warnings! My main comment is that LogisticRegressionModel is a model, rather than an algorithm. Users would really want the algorithm which they can run to produce the model. Could you instead direct users to the algorithms: LogisticRegressionWithSGD and LogisticRegressionWithLBFGS? (It is awkward that there are 2 algorithms to direct users towards, but it is hard to get around that.)

@varadharajan
Copy link
Contributor Author

@jkbradley Makes sense. I've updated the warnings, please let me know if wordings can be improved. Also i just noticed that pyspark classification model does not have LR-LBFGS implementation. I'll probably create a new issue and work on it.

…egressionWithLBFGS instead of LogisticRegressionModel
@SparkQA
Copy link

SparkQA commented Nov 8, 2014

Test build #23102 has started for PR 2894 at commit 5f9406b.

  • This patch merges cleanly.

@varadharajan
Copy link
Contributor Author

Also i think it would help users if we can document in the LR section of the MLlib guide, which algorithm should be preferred in which scenarios.

@SparkQA
Copy link

SparkQA commented Nov 8, 2014

Test build #23102 has finished for PR 2894 at commit 5f9406b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23102/
Test PASSed.

@jkbradley
Copy link
Member

@varadharajan Good suggestion about documenting algs for LR; I'll make a note to do that for the upcoming release. Thank you for the PR!

LGTM

@varadharajan
Copy link
Contributor Author

@jkbradley Thanks :)

@mengxr
Copy link
Contributor

mengxr commented Nov 10, 2014

Merged into master and branch-1.2. Thanks! (We should find some time and clean really old examples.)

@asfgit asfgit closed this in 974d334 Nov 10, 2014
asfgit pushed a commit that referenced this pull request Nov 10, 2014
…f PageRank

Based on SPARK-2434, this PR generates runtime warnings for example implementations (Python, Scala) of PageRank.

Author: Varadharajan Mukundan <[email protected]>

Closes #2894 from varadharajan/SPARK-4047 and squashes the following commits:

5f9406b [Varadharajan Mukundan] [SPARK-4047] - Point users to LogisticRegressionWithSGD and LogisticRegressionWithLBFGS instead of LogisticRegressionModel
252f595 [Varadharajan Mukundan] a. Generate runtime warnings for
05a018b [Varadharajan Mukundan] Fix PageRank implementation's package reference
5c2bf54 [Varadharajan Mukundan] [SPARK-4047] - Generate runtime warnings for example implementation of PageRank

(cherry picked from commit 974d334)
Signed-off-by: Xiangrui Meng <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants