Skip to content

[SPARK-4711] [mllib] [docs] Programming guide advice on choosing optimizer #3569

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

jkbradley
Copy link
Member

I have heard requests for the docs to include advice about choosing an optimization method. The programming guide could include a brief statement about this (so the user does not have to read the whole optimization section).

CC: @mengxr

@SparkQA
Copy link

SparkQA commented Dec 3, 2014

Test build #24073 has started for PR 3569 at commit 94f6dec.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Dec 3, 2014

Test build #24073 has finished for PR 3569 at commit 94f6dec.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24073/
Test PASSed.

However, different optimization methods can have different convergence guarantees depending on the properties of the objective function, and we cannot cover the literature here.

* L-BFGS is recommended since it generally converges faster (in fewer iterations) than SGD.
* SGD can be faster for datasets with a very large number of instances (rows), especially when using a small `miniBatchFraction`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part might not be true because we implemented mini-batch SGD but obtaining a mini-batch from an RDD is expensive, which requires one pass, while computing the gradient is not super expensive. Maybe we can also mention this trade-off.

@jkbradley
Copy link
Member Author

@mengxr Thanks for taking a look! Updated based on your comment.

@@ -359,13 +362,15 @@ public class LBFGSExample {
{% endhighlight %}
</div>
</div>
#### Developer's note

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this caused a .md generation problem in the old docs.

@SparkQA
Copy link

SparkQA commented Dec 3, 2014

Test build #24108 has started for PR 3569 at commit 5035ad0.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Dec 3, 2014

Test build #24109 has started for PR 3569 at commit 654aeb5.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Dec 3, 2014

Test build #24108 has finished for PR 3569 at commit 5035ad0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • trait ConnectionFactory extends Serializable
    • class MatrixFactorizationModel(

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24108/
Test PASSed.

@SparkQA
Copy link

SparkQA commented Dec 3, 2014

Test build #24109 has finished for PR 3569 at commit 654aeb5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24109/
Test PASSed.

@jkbradley jkbradley changed the title [SPARK-4711] [mllib] Programming guide advice on choosing optimizer [SPARK-4711] [mllib] [docs] Programming guide advice on choosing optimizer Dec 4, 2014
asfgit pushed a commit that referenced this pull request Dec 4, 2014
…mizer

I have heard requests for the docs to include advice about choosing an optimization method. The programming guide could include a brief statement about this (so the user does not have to read the whole optimization section).

CC: mengxr

Author: Joseph K. Bradley <[email protected]>

Closes #3569 from jkbradley/lr-doc and squashes the following commits:

654aeb5 [Joseph K. Bradley] updated section header for mllib-optimization
5035ad0 [Joseph K. Bradley] updated based on review
94f6dec [Joseph K. Bradley] Updated linear methods and optimization docs with quick advice on choosing an optimization method

(cherry picked from commit 27ab0b8)
Signed-off-by: Xiangrui Meng <[email protected]>
@asfgit asfgit closed this in 27ab0b8 Dec 4, 2014
@mengxr
Copy link
Contributor

mengxr commented Dec 4, 2014

LGTM. Merged into master and branch-1.2. Thanks!

@jkbradley jkbradley deleted the lr-doc branch December 4, 2014 20:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants