[SPARK-4711] [mllib] [docs] Programming guide advice on choosing optimizer #3569

jkbradley · 2014-12-03T03:54:08Z

I have heard requests for the docs to include advice about choosing an optimization method. The programming guide could include a brief statement about this (so the user does not have to read the whole optimization section).

CC: @mengxr

…osing an optimization method

SparkQA · 2014-12-03T04:00:32Z

Test build #24073 has started for PR 3569 at commit 94f6dec.

This patch merges cleanly.

SparkQA · 2014-12-03T05:20:24Z

Test build #24073 has finished for PR 3569 at commit 94f6dec.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2014-12-03T05:20:27Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24073/
Test PASSed.

mengxr · 2014-12-03T11:09:22Z

docs/mllib-optimization.md

+However, different optimization methods can have different convergence guarantees depending on the properties of the objective function, and we cannot cover the literature here.
+
+* L-BFGS is recommended since it generally converges faster (in fewer iterations) than SGD.
+* SGD can be faster for datasets with a very large number of instances (rows), especially when using a small `miniBatchFraction`.


This part might not be true because we implemented mini-batch SGD but obtaining a mini-batch from an RDD is expensive, which requires one pass, while computing the gradient is not super expensive. Maybe we can also mention this trade-off.

jkbradley · 2014-12-03T21:56:26Z

@mengxr Thanks for taking a look! Updated based on your comment.

jkbradley · 2014-12-03T21:56:57Z

docs/mllib-optimization.md

@@ -359,13 +362,15 @@ public class LBFGSExample {
 {% endhighlight %}
 </div>
 </div>
-#### Developer's note
+


I think this caused a .md generation problem in the old docs.

SparkQA · 2014-12-03T21:57:31Z

Test build #24108 has started for PR 3569 at commit 5035ad0.

This patch merges cleanly.

SparkQA · 2014-12-03T22:02:57Z

Test build #24109 has started for PR 3569 at commit 654aeb5.

This patch merges cleanly.

SparkQA · 2014-12-03T23:16:30Z

Test build #24108 has finished for PR 3569 at commit 5035ad0.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- trait ConnectionFactory extends Serializable
- class MatrixFactorizationModel(

AmplabJenkins · 2014-12-03T23:16:33Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24108/
Test PASSed.

SparkQA · 2014-12-03T23:20:17Z

Test build #24109 has finished for PR 3569 at commit 654aeb5.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2014-12-03T23:20:20Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24109/
Test PASSed.

…mizer I have heard requests for the docs to include advice about choosing an optimization method. The programming guide could include a brief statement about this (so the user does not have to read the whole optimization section). CC: mengxr Author: Joseph K. Bradley <[email protected]> Closes #3569 from jkbradley/lr-doc and squashes the following commits: 654aeb5 [Joseph K. Bradley] updated section header for mllib-optimization 5035ad0 [Joseph K. Bradley] updated based on review 94f6dec [Joseph K. Bradley] Updated linear methods and optimization docs with quick advice on choosing an optimization method (cherry picked from commit 27ab0b8) Signed-off-by: Xiangrui Meng <[email protected]>

mengxr · 2014-12-04T00:59:50Z

LGTM. Merged into master and branch-1.2. Thanks!

Updated linear methods and optimization docs with quick advice on cho…

94f6dec

…osing an optimization method

mengxr reviewed Dec 3, 2014
View reviewed changes

updated based on review

5035ad0

jkbradley reviewed Dec 3, 2014
View reviewed changes

updated section header for mllib-optimization

654aeb5

jkbradley changed the title ~~[SPARK-4711] [mllib] Programming guide advice on choosing optimizer~~ [SPARK-4711] [mllib] [docs] Programming guide advice on choosing optimizer Dec 4, 2014

asfgit closed this in 27ab0b8 Dec 4, 2014

jkbradley deleted the lr-doc branch December 4, 2014 20:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-4711] [mllib] [docs] Programming guide advice on choosing optimizer #3569

[SPARK-4711] [mllib] [docs] Programming guide advice on choosing optimizer #3569

Uh oh!

jkbradley commented Dec 3, 2014

Uh oh!

SparkQA commented Dec 3, 2014

Uh oh!

SparkQA commented Dec 3, 2014

Uh oh!

AmplabJenkins commented Dec 3, 2014

Uh oh!

mengxr Dec 3, 2014

Uh oh!

jkbradley commented Dec 3, 2014

Uh oh!

jkbradley Dec 3, 2014

Uh oh!

SparkQA commented Dec 3, 2014

Uh oh!

SparkQA commented Dec 3, 2014

Uh oh!

SparkQA commented Dec 3, 2014

Uh oh!

AmplabJenkins commented Dec 3, 2014

Uh oh!

SparkQA commented Dec 3, 2014

Uh oh!

AmplabJenkins commented Dec 3, 2014

Uh oh!

mengxr commented Dec 4, 2014

Uh oh!

Uh oh!

[SPARK-4711] [mllib] [docs] Programming guide advice on choosing optimizer #3569

[SPARK-4711] [mllib] [docs] Programming guide advice on choosing optimizer #3569

Uh oh!

Conversation

jkbradley commented Dec 3, 2014

Uh oh!

SparkQA commented Dec 3, 2014

Uh oh!

SparkQA commented Dec 3, 2014

Uh oh!

AmplabJenkins commented Dec 3, 2014

Uh oh!

mengxr Dec 3, 2014

Choose a reason for hiding this comment

Uh oh!

jkbradley commented Dec 3, 2014

Uh oh!

jkbradley Dec 3, 2014

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Dec 3, 2014

Uh oh!

SparkQA commented Dec 3, 2014

Uh oh!

SparkQA commented Dec 3, 2014

Uh oh!

AmplabJenkins commented Dec 3, 2014

Uh oh!

SparkQA commented Dec 3, 2014

Uh oh!

AmplabJenkins commented Dec 3, 2014

Uh oh!

mengxr commented Dec 4, 2014

Uh oh!

Uh oh!