Skip to content

Commit 94f6dec

Browse files
committed
Updated linear methods and optimization docs with quick advice on choosing an optimization method
1 parent 2b233f5 commit 94f6dec

File tree

2 files changed

+16
-3
lines changed

2 files changed

+16
-3
lines changed

docs/mllib-linear-methods.md

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -110,20 +110,24 @@ However, L1 regularization can help promote sparsity in weights leading to small
110110
It is not recommended to train models without any regularization,
111111
especially when the number of training examples is small.
112112

113+
### Optimization
114+
115+
Under the hood, linear methods use convex optimization methods to optimize the objective functions. MLlib uses two methods, SGD and L-BFGS, described in the [optimization section](mllib-optimization.html). Currently, most algorithm APIs support Stochastic Gradient Descent (SGD), and a few support L-BFGS. Refer to [this optimization section](mllib-optimization.html#Choosing-an-Optimization-Method) for guidelines on choosing between optimization methods.
116+
113117
## Binary classification
114118

115119
[Binary classification](http://en.wikipedia.org/wiki/Binary_classification)
116120
aims to divide items into two categories: positive and negative. MLlib
117-
supports two linear methods for binary classification: linear support vector
118-
machines (SVMs) and logistic regression. For both methods, MLlib supports
121+
supports two linear methods for binary classification: linear Support Vector
122+
Machines (SVMs) and logistic regression. For both methods, MLlib supports
119123
L1 and L2 regularized variants. The training data set is represented by an RDD
120124
of [LabeledPoint](mllib-data-types.html) in MLlib. Note that, in the
121125
mathematical formulation in this guide, a training label $y$ is denoted as
122126
either $+1$ (positive) or $-1$ (negative), which is convenient for the
123127
formulation. *However*, the negative label is represented by $0$ in MLlib
124128
instead of $-1$, to be consistent with multiclass labeling.
125129

126-
### Linear support vector machines (SVMs)
130+
### Linear Support Vector Machines (SVMs)
127131

128132
The [linear SVM](http://en.wikipedia.org/wiki/Support_vector_machine#Linear_SVM)
129133
is a standard method for large-scale classification tasks. It is a linear method as described above in equation `$\eqref{eq:regPrimal}$`, with the loss function in the formulation given by the hinge loss:

docs/mllib-optimization.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -138,6 +138,15 @@ vertical scalability issue (the number of training features) when computing the
138138
explicitly in Newton's method. As a result, L-BFGS often achieves rapider convergence compared with
139139
other first-order optimization.
140140

141+
### Choosing an Optimization Method
142+
143+
[Linear methods](mllib-linear-methods.html) use optimization internally, and some linear methods in MLlib support both SGD and L-BFGS.
144+
We give a few guidelines for choosing between methods.
145+
However, different optimization methods can have different convergence guarantees depending on the properties of the objective function, and we cannot cover the literature here.
146+
147+
* L-BFGS is recommended since it generally converges faster (in fewer iterations) than SGD.
148+
* SGD can be faster for datasets with a very large number of instances (rows), especially when using a small `miniBatchFraction`.
149+
141150
## Implementation in MLlib
142151

143152
### Gradient descent and stochastic gradient descent

0 commit comments

Comments
 (0)