-
Notifications
You must be signed in to change notification settings - Fork 28.7k
[SPARK-4047] - Generate runtime warnings for example implementation of PageRank #2894
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Can one of the admins verify this patch? |
Are there other examples that should have the same warning? I think there are many more than this. |
Here are list of scala examples that i think is similar / naive implementation of algorithms from MLlib or graphx.
Python examples:
Java examples:
(*) - Examples with missing warnings. I've updated JIRA with these details and also added warning for them I've also corrected class names of existing LR examples. They were pointing to org.apache.spark.mllib.classification.LogisticRegression instead of org.apache.spark.mllib.classification.LogisticRegressionModel I've excluded examples that compute transitive closures on graphs because i'm was not able to find corresponding implementations in graphx. Please let me know if i'm missing something |
1. JavaHdfsLR 2. JavaPageRank 3. SparkTachyonHdfsLR b. Renamed references of org.apache.spark.mllib.classification.LogisticRegression to org.apache.spark.mllib.classification.LogisticRegressionModel
Jenkins, ok to test. |
Test build #22498 has started for PR 2894 at commit
|
Test build #22498 has finished for PR 2894 at commit
|
Test PASSed. |
LGTM, thanks! |
Thanks :) |
Since this is MLlib related, @mengxr or @jkbradley, could one of you do the final sign-off + commit on this? Thanks! |
@varadharajan Thanks for adding the warnings! My main comment is that LogisticRegressionModel is a model, rather than an algorithm. Users would really want the algorithm which they can run to produce the model. Could you instead direct users to the algorithms: LogisticRegressionWithSGD and LogisticRegressionWithLBFGS? (It is awkward that there are 2 algorithms to direct users towards, but it is hard to get around that.) |
@jkbradley Makes sense. I've updated the warnings, please let me know if wordings can be improved. Also i just noticed that pyspark classification model does not have LR-LBFGS implementation. I'll probably create a new issue and work on it. |
…egressionWithLBFGS instead of LogisticRegressionModel
Test build #23102 has started for PR 2894 at commit
|
Also i think it would help users if we can document in the LR section of the MLlib guide, which algorithm should be preferred in which scenarios. |
Test build #23102 has finished for PR 2894 at commit
|
Test PASSed. |
@varadharajan Good suggestion about documenting algs for LR; I'll make a note to do that for the upcoming release. Thank you for the PR! LGTM |
@jkbradley Thanks :) |
Merged into master and branch-1.2. Thanks! (We should find some time and clean really old examples.) |
…f PageRank Based on SPARK-2434, this PR generates runtime warnings for example implementations (Python, Scala) of PageRank. Author: Varadharajan Mukundan <[email protected]> Closes #2894 from varadharajan/SPARK-4047 and squashes the following commits: 5f9406b [Varadharajan Mukundan] [SPARK-4047] - Point users to LogisticRegressionWithSGD and LogisticRegressionWithLBFGS instead of LogisticRegressionModel 252f595 [Varadharajan Mukundan] a. Generate runtime warnings for 05a018b [Varadharajan Mukundan] Fix PageRank implementation's package reference 5c2bf54 [Varadharajan Mukundan] [SPARK-4047] - Generate runtime warnings for example implementation of PageRank (cherry picked from commit 974d334) Signed-off-by: Xiangrui Meng <[email protected]>
Based on SPARK-2434, this PR generates runtime warnings for example implementations (Python, Scala) of PageRank.