Skip to content

[SPARK-7555][docs] Add doc for elastic net in ml-guide and mllib-guide #6504

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 25 commits into from
Closed

[SPARK-7555][docs] Add doc for elastic net in ml-guide and mllib-guide #6504

wants to merge 25 commits into from

Conversation

coderxiang
Copy link
Contributor

@jkbradley I put the elastic net under the Algorithm guide section. Also add the formula of elastic net in mllib-linear mllib-linear-methods#regularizers.

@dbtsai I left the code tab for you to add example code. Do you think it is the right place?

@SparkQA
Copy link

SparkQA commented May 29, 2015

Test build #33760 has finished for PR 6504 at commit 8ce37c2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -157,6 +174,49 @@ There are now several algorithms in the Pipelines API which are not in the lower
* [Feature Extraction, Transformation, and Selection](ml-features.html)
* [Ensembles](ml-ensembles.html)

## Linear Methods with Elastic Net Regularization

In MLlib, we implement popular linear methods such as logistic regression and linear least squares with L1 or L2 regularization. Refer to [the linear methods section](mllib-linear-methods.html) for details. In `spark.ml`, we add the [Elastic net](http://users.stat.umn.edu/~zouxx019/Papers/elasticnet.pdf), which is a hybrid of L1 and L2 regularization. Mathematically it is defined as a linear combination of the L1-norm and the L2-norm:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In spark.ml, we add the Elastic net, which is a hybrid of L1 and L2 regularization.

This makes it sound like it's only in spark.ml. Can this please instead say that we provide a Pipelines API? The main thing needed here is a code example, which should demonstrate how to do L1, L2, and a mix. (But I like the note about how it uses a different optimizer.)

@jkbradley
Copy link
Member

@coderxiang Thanks for the doc!

In addition to the link to the Elastic Net paper, could you please add a link to the Wikipedia page [http://en.wikipedia.org/wiki/Elastic_net_regularization]?

I'll try generating the doc once the code examples are there.

@jkbradley
Copy link
Member

I'm OK with using the Algorithm guide section for now; we can create a separate page later on.

@coderxiang
Copy link
Contributor Author

@jkbradley thanks for the comments. I highlighted the Pipelines API in the revision and included the wiki page.

@SparkQA
Copy link

SparkQA commented May 31, 2015

Test build #33859 has finished for PR 6504 at commit df5bd14.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dbtsai
Copy link
Member

dbtsai commented Jun 2, 2015

Agreed. We should create a separate page for this. For example code, in https://github.com/apache/spark/pull/6576/files , I have full running apps in scala, but this may not be suitable here since in the documentation, we probably only have to provide a code snippet to demonstrate the apis. @coderxiang can you add scala example based on my PR? Thanks.

@coderxiang
Copy link
Contributor Author

@dbtsai sounds good. Let me start with your Scala code.

@jkbradley
Copy link
Member

@coderxiang One more comment: Could you please mention the phrases "Lasso" and "Ridge Regression" in the spark.ml doc? E.g., say how to set elasticNetParam to get Lasso and Ridge Regression. That might help users who are not familiar with Elastic Net.

@coderxiang
Copy link
Contributor Author

@jkbradley @dbtsai I removed the pipeline and o.a.s.example part in @dbtsai's code so that the code example can be paste and run directly in spark-shell. Please let me know your thoughts on this. I can add them back if needed.

Also update the definition of elastic net using only one lambda.

@SparkQA
Copy link

SparkQA commented Jun 2, 2015

Test build #34029 has finished for PR 6504 at commit d8616fd.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@coderxiang
Copy link
Contributor Author

jenkins test this please


{% highlight scala %}

import scala.collection.mutable
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need these 2 imports?? I think you can remove these 2 + others below. Can you please see how many can be removed and still be able to run the example in the Spark shell?

@jkbradley
Copy link
Member

Nice, it's much simpler now. The current PR looks good except for the Java/Python examples. Who is going to add them? If you need me to, then I can send an update to this PR.

@coderxiang
Copy link
Contributor Author

@jkbradley I was planning to reuse the code from the code example that @dbtsai is working on. But feel free to update this PR, especially for the Python code.

@coderxiang
Copy link
Contributor Author

@jkbradley: had an offline discussion with @dbtsai, I can work on the the Java and python code example. What do you think?

@dbtsai
Copy link
Member

dbtsai commented Jun 11, 2015

What example do you want to work on? Example in the documentation or in the example code base?

@coderxiang
Copy link
Contributor Author

@dbtsai the example code in this user guide. Are you working on the example code base, I'm sure they share lots of code.

@SparkQA
Copy link

SparkQA commented Jun 12, 2015

Test build #34791 has finished for PR 6504 at commit 9bc2b4c.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • public class LogisticRegressionWithElasticNetExample

@coderxiang
Copy link
Contributor Author

jenkins test this please

@SparkQA
Copy link

SparkQA commented Jun 12, 2015

Test build #34790 has finished for PR 6504 at commit db32a60.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • public class LogisticRegressionWithElasticNetExample

@SparkQA
Copy link

SparkQA commented Jun 12, 2015

Test build #34796 has finished for PR 6504 at commit 9bc2b4c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • public class LogisticRegressionWithElasticNetExample

@coderxiang
Copy link
Contributor Author

@dbtsai @jkbradley finished the Java and Python sample code.

@SparkQA
Copy link

SparkQA commented Jun 16, 2015

Test build #34965 has finished for PR 6504 at commit 706d3f7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • public class LogisticRegressionWithElasticNetExample

`\[
\alpha \|\wv\|_1 + (1-\alpha) \frac{1}{2}\|\wv\|_2^2, \alpha \in [0, 1].
\]`
By setting $\alpha$ properly, it contains both L1 and L2 regularization as special cases. For example, if a [linear regression](/api/scala/index.html#org.apache.spark.ml.regression.LinearRegression) model is trained with the elastic net parameter $\alpha$ set to $1$, it is equivalent to a [Lasso](http://en.wikipedia.org/wiki/Least_squares#Lasso_method) model. On the other hand, if $\alpha$ is set to $0$, the trained model reduces to a [ridge regression](http://en.wikipedia.org/wiki/Tikhonov_regularization) model. We implement Pipelines API for both linear regression and logistic regression with elastic net regularization.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's odd to link to the Scala API for linear regression but Wikipedia for other models. How about all of these links go to Wikipedia (following what the spark.mllib guide does)?

@jkbradley
Copy link
Member

@coderxiang My apologies for the long delay! It looks good, save for the minor comments above. Since there are merge issues, would you mind going ahead and moving this doc to a new Markdown file, linked from the algorithm section? That new file can be for Linear Methods and be called ml-linear-methods.md

Thank you!

@coderxiang
Copy link
Contributor Author

@jkbradley thanks for the comments! Just uploaded the change.

@SparkQA
Copy link

SparkQA commented Jul 12, 2015

Test build #37093 has finished for PR 6504 at commit f6061ee.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@coderxiang
Copy link
Contributor Author

@jkbradley is this one ready to go?

@jkbradley
Copy link
Member

LGTM merging with master
Thanks for the PR!

asfgit pushed a commit that referenced this pull request Jul 15, 2015
jkbradley I put the elastic net under the **Algorithm guide** section. Also add the formula of elastic net in mllib-linear `mllib-linear-methods#regularizers`.

dbtsai I left the code tab for you to add example code. Do you think it is the right place?

Author: Shuo Xiang <[email protected]>

Closes #6504 from coderxiang/elasticnet and squashes the following commits:

f6061ee [Shuo Xiang] typo
90a7c88 [Shuo Xiang] Merge remote-tracking branch 'upstream/master' into elasticnet
0610a36 [Shuo Xiang] move out the elastic net to ml-linear-methods
8747190 [Shuo Xiang] merge master
706d3f7 [Shuo Xiang] add python code
9bc2b4c [Shuo Xiang] typo
db32a60 [Shuo Xiang] java code sample
aab3b3a [Shuo Xiang] Merge remote-tracking branch 'upstream/master' into elasticnet
a0dae07 [Shuo Xiang] simplify code
d8616fd [Shuo Xiang] Update the definition of elastic net. Add scala code; Mention Lasso and Ridge
df5bd14 [Shuo Xiang] use wikipeida page in ml-linear-methods.md
78d9366 [Shuo Xiang] address comments
8ce37c2 [Shuo Xiang] Merge branch 'elasticnet' of github.com:coderxiang/spark into elasticnet
8f24848 [Shuo Xiang] Merge branch 'elastic-net-doc' of github.com:coderxiang/spark into elastic-net-doc
998d766 [Shuo Xiang] Merge branch 'elastic-net-doc' of github.com:coderxiang/spark into elastic-net-doc
89f10e4 [Shuo Xiang] Merge remote-tracking branch 'upstream/master' into elastic-net-doc
9262a72 [Shuo Xiang] update
7e07d12 [Shuo Xiang] update
b32f21a [Shuo Xiang] add doc for elastic net in sparkml
937eef1 [Shuo Xiang] Merge remote-tracking branch 'upstream/master' into elastic-net-doc
180b496 [Shuo Xiang] Merge remote-tracking branch 'upstream/master'
aa0717d [Shuo Xiang] Merge remote-tracking branch 'upstream/master'
5f109b4 [Shuo Xiang] Merge remote-tracking branch 'upstream/master'
c5c5bfe [Shuo Xiang] Merge remote-tracking branch 'upstream/master'
98804c9 [Shuo Xiang] fix bug in topBykey and update test

(cherry picked from commit 303c120)
Signed-off-by: Joseph K. Bradley <[email protected]>
@asfgit asfgit closed this in 303c120 Jul 15, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants