You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/mllib-linear-methods.md
+7-3Lines changed: 7 additions & 3 deletions
Original file line number
Diff line number
Diff line change
@@ -110,20 +110,24 @@ However, L1 regularization can help promote sparsity in weights leading to small
110
110
It is not recommended to train models without any regularization,
111
111
especially when the number of training examples is small.
112
112
113
+
### Optimization
114
+
115
+
Under the hood, linear methods use convex optimization methods to optimize the objective functions. MLlib uses two methods, SGD and L-BFGS, described in the [optimization section](mllib-optimization.html). Currently, most algorithm APIs support Stochastic Gradient Descent (SGD), and a few support L-BFGS. Refer to [this optimization section](mllib-optimization.html#Choosing-an-Optimization-Method) for guidelines on choosing between optimization methods.
aims to divide items into two categories: positive and negative. MLlib
117
-
supports two linear methods for binary classification: linear support vector
118
-
machines (SVMs) and logistic regression. For both methods, MLlib supports
121
+
supports two linear methods for binary classification: linear Support Vector
122
+
Machines (SVMs) and logistic regression. For both methods, MLlib supports
119
123
L1 and L2 regularized variants. The training data set is represented by an RDD
120
124
of [LabeledPoint](mllib-data-types.html) in MLlib. Note that, in the
121
125
mathematical formulation in this guide, a training label $y$ is denoted as
122
126
either $+1$ (positive) or $-1$ (negative), which is convenient for the
123
127
formulation. *However*, the negative label is represented by $0$ in MLlib
124
128
instead of $-1$, to be consistent with multiclass labeling.
125
129
126
-
### Linear support vector machines (SVMs)
130
+
### Linear Support Vector Machines (SVMs)
127
131
128
132
The [linear SVM](http://en.wikipedia.org/wiki/Support_vector_machine#Linear_SVM)
129
133
is a standard method for large-scale classification tasks. It is a linear method as described above in equation `$\eqref{eq:regPrimal}$`, with the loss function in the formulation given by the hinge loss:
Copy file name to clipboardExpand all lines: docs/mllib-optimization.md
+9Lines changed: 9 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -138,6 +138,15 @@ vertical scalability issue (the number of training features) when computing the
138
138
explicitly in Newton's method. As a result, L-BFGS often achieves rapider convergence compared with
139
139
other first-order optimization.
140
140
141
+
### Choosing an Optimization Method
142
+
143
+
[Linear methods](mllib-linear-methods.html) use optimization internally, and some linear methods in MLlib support both SGD and L-BFGS.
144
+
We give a few guidelines for choosing between methods.
145
+
However, different optimization methods can have different convergence guarantees depending on the properties of the objective function, and we cannot cover the literature here.
146
+
147
+
* L-BFGS is recommended since it generally converges faster (in fewer iterations) than SGD.
148
+
* SGD can be faster for datasets with a very large number of instances (rows), especially when using a small `miniBatchFraction`.
149
+
141
150
## Implementation in MLlib
142
151
143
152
### Gradient descent and stochastic gradient descent
0 commit comments