You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[SPARK-7555] [DOCS] Add doc for elastic net in ml-guide and mllib-guide
jkbradley I put the elastic net under the **Algorithm guide** section. Also add the formula of elastic net in mllib-linear `mllib-linear-methods#regularizers`.
dbtsai I left the code tab for you to add example code. Do you think it is the right place?
Author: Shuo Xiang <[email protected]>
Closes#6504 from coderxiang/elasticnet and squashes the following commits:
f6061ee [Shuo Xiang] typo
90a7c88 [Shuo Xiang] Merge remote-tracking branch 'upstream/master' into elasticnet
0610a36 [Shuo Xiang] move out the elastic net to ml-linear-methods
8747190 [Shuo Xiang] merge master
706d3f7 [Shuo Xiang] add python code
9bc2b4c [Shuo Xiang] typo
db32a60 [Shuo Xiang] java code sample
aab3b3a [Shuo Xiang] Merge remote-tracking branch 'upstream/master' into elasticnet
a0dae07 [Shuo Xiang] simplify code
d8616fd [Shuo Xiang] Update the definition of elastic net. Add scala code; Mention Lasso and Ridge
df5bd14 [Shuo Xiang] use wikipeida page in ml-linear-methods.md
78d9366 [Shuo Xiang] address comments
8ce37c2 [Shuo Xiang] Merge branch 'elasticnet' of github.com:coderxiang/spark into elasticnet
8f24848 [Shuo Xiang] Merge branch 'elastic-net-doc' of github.com:coderxiang/spark into elastic-net-doc
998d766 [Shuo Xiang] Merge branch 'elastic-net-doc' of github.com:coderxiang/spark into elastic-net-doc
89f10e4 [Shuo Xiang] Merge remote-tracking branch 'upstream/master' into elastic-net-doc
9262a72 [Shuo Xiang] update
7e07d12 [Shuo Xiang] update
b32f21a [Shuo Xiang] add doc for elastic net in sparkml
937eef1 [Shuo Xiang] Merge remote-tracking branch 'upstream/master' into elastic-net-doc
180b496 [Shuo Xiang] Merge remote-tracking branch 'upstream/master'
aa0717d [Shuo Xiang] Merge remote-tracking branch 'upstream/master'
5f109b4 [Shuo Xiang] Merge remote-tracking branch 'upstream/master'
c5c5bfe [Shuo Xiang] Merge remote-tracking branch 'upstream/master'
98804c9 [Shuo Xiang] fix bug in topBykey and update test
Copy file name to clipboardExpand all lines: docs/ml-guide.md
+31Lines changed: 31 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -3,6 +3,24 @@ layout: global
3
3
title: Spark ML Programming Guide
4
4
---
5
5
6
+
`\[
7
+
\newcommand{\R}{\mathbb{R}}
8
+
\newcommand{\E}{\mathbb{E}}
9
+
\newcommand{\x}{\mathbf{x}}
10
+
\newcommand{\y}{\mathbf{y}}
11
+
\newcommand{\wv}{\mathbf{w}}
12
+
\newcommand{\av}{\mathbf{\alpha}}
13
+
\newcommand{\bv}{\mathbf{b}}
14
+
\newcommand{\N}{\mathbb{N}}
15
+
\newcommand{\id}{\mathbf{I}}
16
+
\newcommand{\ind}{\mathbf{1}}
17
+
\newcommand{\0}{\mathbf{0}}
18
+
\newcommand{\unit}{\mathbf{e}}
19
+
\newcommand{\one}{\mathbf{1}}
20
+
\newcommand{\zero}{\mathbf{0}}
21
+
\]`
22
+
23
+
6
24
Spark 1.2 introduced a new package called `spark.ml`, which aims to provide a uniform set of
7
25
high-level APIs that help users create and tune practical machine learning pipelines.
8
26
@@ -154,6 +172,19 @@ Parameters belong to specific instances of `Estimator`s and `Transformer`s.
154
172
For example, if we have two `LogisticRegression` instances `lr1` and `lr2`, then we can build a `ParamMap` with both `maxIter` parameters specified: `ParamMap(lr1.maxIter -> 10, lr2.maxIter -> 20)`.
155
173
This is useful if there are two algorithms with the `maxIter` parameter in a `Pipeline`.
156
174
175
+
# Algorithm Guides
176
+
177
+
There are now several algorithms in the Pipelines API which are not in the lower-level MLlib API, so we link to documentation for them here. These algorithms are mostly feature transformers, which fit naturally into the `Transformer` abstraction in Pipelines, and ensembles, which fit naturally into the `Estimator` abstraction in the Pipelines.
178
+
179
+
**Pipelines API Algorithm Guides**
180
+
181
+
*[Feature Extraction, Transformation, and Selection](ml-features.html)
182
+
*[Ensembles](ml-ensembles.html)
183
+
184
+
**Algorithms in `spark.ml`**
185
+
186
+
*[Linear methods with elastic net regularization](ml-linear-methods.html)
187
+
157
188
# Code Examples
158
189
159
190
This section gives code examples illustrating the functionality discussed above.
displayTitle: <a href="ml-guide.html">ML</a> - Linear Methods
5
+
---
6
+
7
+
8
+
`\[
9
+
\newcommand{\R}{\mathbb{R}}
10
+
\newcommand{\E}{\mathbb{E}}
11
+
\newcommand{\x}{\mathbf{x}}
12
+
\newcommand{\y}{\mathbf{y}}
13
+
\newcommand{\wv}{\mathbf{w}}
14
+
\newcommand{\av}{\mathbf{\alpha}}
15
+
\newcommand{\bv}{\mathbf{b}}
16
+
\newcommand{\N}{\mathbb{N}}
17
+
\newcommand{\id}{\mathbf{I}}
18
+
\newcommand{\ind}{\mathbf{1}}
19
+
\newcommand{\0}{\mathbf{0}}
20
+
\newcommand{\unit}{\mathbf{e}}
21
+
\newcommand{\one}{\mathbf{1}}
22
+
\newcommand{\zero}{\mathbf{0}}
23
+
\]`
24
+
25
+
26
+
In MLlib, we implement popular linear methods such as logistic regression and linear least squares with L1 or L2 regularization. Refer to [the linear methods in mllib](mllib-linear-methods.html) for details. In `spark.ml`, we also include Pipelines API for [Elastic net](http://en.wikipedia.org/wiki/Elastic_net_regularization), a hybrid of L1 and L2 regularization proposed in [this paper](http://users.stat.umn.edu/~zouxx019/Papers/elasticnet.pdf). Mathematically it is defined as a linear combination of the L1-norm and the L2-norm:
By setting $\alpha$ properly, it contains both L1 and L2 regularization as special cases. For example, if a [linear regression](https://en.wikipedia.org/wiki/Linear_regression) model is trained with the elastic net parameter $\alpha$ set to $1$, it is equivalent to a [Lasso](http://en.wikipedia.org/wiki/Least_squares#Lasso_method) model. On the other hand, if $\alpha$ is set to $0$, the trained model reduces to a [ridge regression](http://en.wikipedia.org/wiki/Tikhonov_regularization) model. We implement Pipelines API for both linear regression and logistic regression with elastic net regularization.
from pyspark.ml.classification import LogisticRegression
106
+
from pyspark.mllib.regression import LabeledPoint
107
+
from pyspark.mllib.util import MLUtils
108
+
109
+
# Load training data
110
+
training = MLUtils.loadLibSVMFile(sc, "data/mllib/sample_libsvm_data.txt").toDF()
111
+
112
+
lr = LogisticRegression(maxIter=10, regParam=0.3, elasticNetParam=0.8)
113
+
114
+
# Fit the model
115
+
lrModel = lr.fit(training)
116
+
117
+
# Print the weights and intercept for logistic regression
118
+
print("Weights: " + str(lrModel.weights))
119
+
print("Intercept: " + str(lrModel.intercept))
120
+
{% endhighlight %}
121
+
122
+
</div>
123
+
124
+
</div>
125
+
126
+
### Optimization
127
+
128
+
The optimization algorithm underlies the implementation is called [Orthant-Wise Limited-memory QuasiNewton](http://research-srv.microsoft.com/en-us/um/people/jfgao/paper/icml07scalable.pdf)
129
+
(OWL-QN). It is an extension of L-BFGS that can effectively handle L1 regularization and elastic net.
L2-regularized problems are generally easier to solve than L1-regularized due to smoothness.
109
112
However, L1 regularization can help promote sparsity in weights leading to smaller and more interpretable models, the latter of which can be useful for feature selection.
110
-
It is not recommended to train models without any regularization,
113
+
[Elastic net](http://en.wikipedia.org/wiki/Elastic_net_regularization) is a combination of L1 and L2 regularization. It is not recommended to train models without any regularization,
111
114
especially when the number of training examples is small.
0 commit comments