Skip to content

Commit ca43be2

Browse files
mengxrpdeyhim
authored andcommitted
[WIP][SPARK-1871][MLLIB] Improve MLlib guide for v1.0
Some improvements to MLlib guide: 1. [SPARK-1872] Update API links for unidoc. 2. [SPARK-1783] Added `page.displayTitle` to the global layout. If it is defined, use it instead of `page.title` for title display. 3. Add more Java/Python examples. Author: Xiangrui Meng <[email protected]> Closes apache#816 from mengxr/mllib-doc and squashes the following commits: ec2e407 [Xiangrui Meng] format scala example for ALS cd9f40b [Xiangrui Meng] add a paragraph to summarize distributed matrix types 4617f04 [Xiangrui Meng] add python example to loadLibSVMFile and fix Java example d6509c2 [Xiangrui Meng] [SPARK-1783] update mllib titles 561fdc0 [Xiangrui Meng] add a displayTitle option to global layout 195d06f [Xiangrui Meng] add Java example for summary stats and minor fix 9f1ff89 [Xiangrui Meng] update java api links in mllib-basics 7dad18e [Xiangrui Meng] update java api links in NB 3a0f4a6 [Xiangrui Meng] api/pyspark -> api/python 35bdeb9 [Xiangrui Meng] api/mllib -> api/scala e4afaa8 [Xiangrui Meng] explicity state what might change
1 parent 515d888 commit ca43be2

10 files changed

+153
-90
lines changed

docs/_layouts/global.html

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -114,7 +114,11 @@
114114
</div>
115115

116116
<div class="container" id="content">
117-
<h1 class="title">{{ page.title }}</h1>
117+
{% if page.displayTitle %}
118+
<h1 class="title">{{ page.displayTitle }}</h1>
119+
{% else %}
120+
<h1 class="title">{{ page.title }}</h1>
121+
{% endif %}
118122

119123
{{ content }}
120124

docs/mllib-basics.md

Lines changed: 86 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
---
22
layout: global
3-
title: <a href="mllib-guide.html">MLlib</a> - Basics
3+
title: Basics - MLlib
4+
displayTitle: <a href="mllib-guide.html">MLlib</a> - Basics
45
---
56

67
* Table of contents
@@ -26,11 +27,11 @@ of the vector.
2627
<div data-lang="scala" markdown="1">
2728

2829
The base class of local vectors is
29-
[`Vector`](api/mllib/index.html#org.apache.spark.mllib.linalg.Vector), and we provide two
30-
implementations: [`DenseVector`](api/mllib/index.html#org.apache.spark.mllib.linalg.DenseVector) and
31-
[`SparseVector`](api/mllib/index.html#org.apache.spark.mllib.linalg.SparseVector). We recommend
30+
[`Vector`](api/scala/index.html#org.apache.spark.mllib.linalg.Vector), and we provide two
31+
implementations: [`DenseVector`](api/scala/index.html#org.apache.spark.mllib.linalg.DenseVector) and
32+
[`SparseVector`](api/scala/index.html#org.apache.spark.mllib.linalg.SparseVector). We recommend
3233
using the factory methods implemented in
33-
[`Vectors`](api/mllib/index.html#org.apache.spark.mllib.linalg.Vector) to create local vectors.
34+
[`Vectors`](api/scala/index.html#org.apache.spark.mllib.linalg.Vector) to create local vectors.
3435

3536
{% highlight scala %}
3637
import org.apache.spark.mllib.linalg.{Vector, Vectors}
@@ -53,11 +54,11 @@ Scala imports `scala.collection.immutable.Vector` by default, so you have to imp
5354
<div data-lang="java" markdown="1">
5455

5556
The base class of local vectors is
56-
[`Vector`](api/mllib/index.html#org.apache.spark.mllib.linalg.Vector), and we provide two
57-
implementations: [`DenseVector`](api/mllib/index.html#org.apache.spark.mllib.linalg.DenseVector) and
58-
[`SparseVector`](api/mllib/index.html#org.apache.spark.mllib.linalg.SparseVector). We recommend
57+
[`Vector`](api/java/org/apache/spark/mllib/linalg/Vector.html), and we provide two
58+
implementations: [`DenseVector`](api/java/org/apache/spark/mllib/linalg/DenseVector.html) and
59+
[`SparseVector`](api/java/org/apache/spark/mllib/linalg/SparseVector.html). We recommend
5960
using the factory methods implemented in
60-
[`Vectors`](api/mllib/index.html#org.apache.spark.mllib.linalg.Vector) to create local vectors.
61+
[`Vectors`](api/java/org/apache/spark/mllib/linalg/Vector.html) to create local vectors.
6162

6263
{% highlight java %}
6364
import org.apache.spark.mllib.linalg.Vector;
@@ -78,13 +79,13 @@ MLlib recognizes the following types as dense vectors:
7879

7980
and the following as sparse vectors:
8081

81-
* MLlib's [`SparseVector`](api/pyspark/pyspark.mllib.linalg.SparseVector-class.html).
82+
* MLlib's [`SparseVector`](api/python/pyspark.mllib.linalg.SparseVector-class.html).
8283
* SciPy's
8384
[`csc_matrix`](http://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.csc_matrix.html#scipy.sparse.csc_matrix)
8485
with a single column
8586

8687
We recommend using NumPy arrays over lists for efficiency, and using the factory methods implemented
87-
in [`Vectors`](api/pyspark/pyspark.mllib.linalg.Vectors-class.html) to create sparse vectors.
88+
in [`Vectors`](api/python/pyspark.mllib.linalg.Vectors-class.html) to create sparse vectors.
8889

8990
{% highlight python %}
9091
import numpy as np
@@ -117,7 +118,7 @@ For multiclass classification, labels should be class indices staring from zero:
117118
<div data-lang="scala" markdown="1">
118119

119120
A labeled point is represented by the case class
120-
[`LabeledPoint`](api/mllib/index.html#org.apache.spark.mllib.regression.LabeledPoint).
121+
[`LabeledPoint`](api/scala/index.html#org.apache.spark.mllib.regression.LabeledPoint).
121122

122123
{% highlight scala %}
123124
import org.apache.spark.mllib.linalg.Vectors
@@ -134,7 +135,7 @@ val neg = LabeledPoint(0.0, Vectors.sparse(3, Array(0, 2), Array(1.0, 3.0)))
134135
<div data-lang="java" markdown="1">
135136

136137
A labeled point is represented by
137-
[`LabeledPoint`](api/mllib/index.html#org.apache.spark.mllib.regression.LabeledPoint).
138+
[`LabeledPoint`](api/java/org/apache/spark/mllib/regression/LabeledPoint.html).
138139

139140
{% highlight java %}
140141
import org.apache.spark.mllib.linalg.Vectors;
@@ -151,7 +152,7 @@ LabeledPoint neg = new LabeledPoint(1.0, Vectors.sparse(3, new int[] {0, 2}, new
151152
<div data-lang="python" markdown="1">
152153

153154
A labeled point is represented by
154-
[`LabeledPoint`](api/pyspark/pyspark.mllib.regression.LabeledPoint-class.html).
155+
[`LabeledPoint`](api/python/pyspark.mllib.regression.LabeledPoint-class.html).
155156

156157
{% highlight python %}
157158
from pyspark.mllib.linalg import SparseVector
@@ -184,28 +185,40 @@ After loading, the feature indices are converted to zero-based.
184185
<div class="codetabs">
185186
<div data-lang="scala" markdown="1">
186187

187-
[`MLUtils.loadLibSVMFile`](api/mllib/index.html#org.apache.spark.mllib.util.MLUtils$) reads training
188+
[`MLUtils.loadLibSVMFile`](api/scala/index.html#org.apache.spark.mllib.util.MLUtils$) reads training
188189
examples stored in LIBSVM format.
189190

190191
{% highlight scala %}
191192
import org.apache.spark.mllib.regression.LabeledPoint
192193
import org.apache.spark.mllib.util.MLUtils
193194
import org.apache.spark.rdd.RDD
194195

195-
val training: RDD[LabeledPoint] = MLUtils.loadLibSVMFile(sc, "mllib/data/sample_libsvm_data.txt")
196+
val examples: RDD[LabeledPoint] = MLUtils.loadLibSVMFile(sc, "mllib/data/sample_libsvm_data.txt")
196197
{% endhighlight %}
197198
</div>
198199

199200
<div data-lang="java" markdown="1">
200-
[`MLUtils.loadLibSVMFile`](api/mllib/index.html#org.apache.spark.mllib.util.MLUtils$) reads training
201+
[`MLUtils.loadLibSVMFile`](api/java/org/apache/spark/mllib/util/MLUtils.html) reads training
201202
examples stored in LIBSVM format.
202203

203204
{% highlight java %}
204205
import org.apache.spark.mllib.regression.LabeledPoint;
205206
import org.apache.spark.mllib.util.MLUtils;
206-
import org.apache.spark.rdd.RDDimport;
207+
import org.apache.spark.api.java.JavaRDD;
208+
209+
JavaRDD<LabeledPoint> examples =
210+
MLUtils.loadLibSVMFile(jsc.sc(), "mllib/data/sample_libsvm_data.txt").toJavaRDD();
211+
{% endhighlight %}
212+
</div>
213+
214+
<div data-lang="python" markdown="1">
215+
[`MLUtils.loadLibSVMFile`](api/python/pyspark.mllib.util.MLUtils-class.html) reads training
216+
examples stored in LIBSVM format.
207217

208-
RDD<LabeledPoint> training = MLUtils.loadLibSVMFile(jsc, "mllib/data/sample_libsvm_data.txt");
218+
{% highlight python %}
219+
from pyspark.mllib.util import MLUtils
220+
221+
examples = MLUtils.loadLibSVMFile(sc, "mllib/data/sample_libsvm_data.txt")
209222
{% endhighlight %}
210223
</div>
211224
</div>
@@ -227,10 +240,10 @@ We are going to add sparse matrix in the next release.
227240
<div data-lang="scala" markdown="1">
228241

229242
The base class of local matrices is
230-
[`Matrix`](api/mllib/index.html#org.apache.spark.mllib.linalg.Matrix), and we provide one
231-
implementation: [`DenseMatrix`](api/mllib/index.html#org.apache.spark.mllib.linalg.DenseMatrix).
243+
[`Matrix`](api/scala/index.html#org.apache.spark.mllib.linalg.Matrix), and we provide one
244+
implementation: [`DenseMatrix`](api/scala/index.html#org.apache.spark.mllib.linalg.DenseMatrix).
232245
Sparse matrix will be added in the next release. We recommend using the factory methods implemented
233-
in [`Matrices`](api/mllib/index.html#org.apache.spark.mllib.linalg.Matrices) to create local
246+
in [`Matrices`](api/scala/index.html#org.apache.spark.mllib.linalg.Matrices) to create local
234247
matrices.
235248

236249
{% highlight scala %}
@@ -244,10 +257,10 @@ val dm: Matrix = Matrices.dense(3, 2, Array(1.0, 3.0, 5.0, 2.0, 4.0, 6.0))
244257
<div data-lang="java" markdown="1">
245258

246259
The base class of local matrices is
247-
[`Matrix`](api/mllib/index.html#org.apache.spark.mllib.linalg.Matrix), and we provide one
248-
implementation: [`DenseMatrix`](api/mllib/index.html#org.apache.spark.mllib.linalg.DenseMatrix).
260+
[`Matrix`](api/java/org/apache/spark/mllib/linalg/Matrix.html), and we provide one
261+
implementation: [`DenseMatrix`](api/java/org/apache/spark/mllib/linalg/DenseMatrix.html).
249262
Sparse matrix will be added in the next release. We recommend using the factory methods implemented
250-
in [`Matrices`](api/mllib/index.html#org.apache.spark.mllib.linalg.Matrices) to create local
263+
in [`Matrices`](api/java/org/apache/spark/mllib/linalg/Matrices.html) to create local
251264
matrices.
252265

253266
{% highlight java %}
@@ -269,6 +282,15 @@ and distributed matrices. Converting a distributed matrix to a different format
269282
global shuffle, which is quite expensive. We implemented three types of distributed matrices in
270283
this release and will add more types in the future.
271284

285+
The basic type is called `RowMatrix`. A `RowMatrix` is a row-oriented distributed
286+
matrix without meaningful row indices, e.g., a collection of feature vectors.
287+
It is backed by an RDD of its rows, where each row is a local vector.
288+
We assume that the number of columns is not huge for a `RowMatrix`.
289+
An `IndexedRowMatrix` is similar to a `RowMatrix` but with row indices,
290+
which can be used for identifying rows and joins.
291+
A `CoordinateMatrix` is a distributed matrix stored in [coordinate list (COO)](https://en.wikipedia.org/wiki/Sparse_matrix) format,
292+
backed by an RDD of its entries.
293+
272294
***Note***
273295

274296
The underlying RDDs of a distributed matrix must be deterministic, because we cache the matrix size.
@@ -284,7 +306,7 @@ limited by the integer range but it should be much smaller in practice.
284306
<div class="codetabs">
285307
<div data-lang="scala" markdown="1">
286308

287-
A [`RowMatrix`](api/mllib/index.html#org.apache.spark.mllib.linalg.distributed.RowMatrix) can be
309+
A [`RowMatrix`](api/scala/index.html#org.apache.spark.mllib.linalg.distributed.RowMatrix) can be
288310
created from an `RDD[Vector]` instance. Then we can compute its column summary statistics.
289311

290312
{% highlight scala %}
@@ -303,7 +325,7 @@ val n = mat.numCols()
303325

304326
<div data-lang="java" markdown="1">
305327

306-
A [`RowMatrix`](api/mllib/index.html#org.apache.spark.mllib.linalg.distributed.RowMatrix) can be
328+
A [`RowMatrix`](api/java/org/apache/spark/mllib/linalg/distributed/RowMatrix.html) can be
307329
created from a `JavaRDD<Vector>` instance. Then we can compute its column summary statistics.
308330

309331
{% highlight java %}
@@ -333,8 +355,8 @@ which could be faster if the rows are sparse.
333355
<div class="codetabs">
334356
<div data-lang="scala" markdown="1">
335357

336-
`RowMatrix#computeColumnSummaryStatistics` returns an instance of
337-
[`MultivariateStatisticalSummary`](api/mllib/index.html#org.apache.spark.mllib.stat.MultivariateStatisticalSummary),
358+
[`RowMatrix#computeColumnSummaryStatistics`](api/scala/index.html#org.apache.spark.mllib.linalg.distributed.RowMatrix) returns an instance of
359+
[`MultivariateStatisticalSummary`](api/scala/index.html#org.apache.spark.mllib.stat.MultivariateStatisticalSummary),
338360
which contains the column-wise max, min, mean, variance, and number of nonzeros, as well as the
339361
total count.
340362

@@ -355,6 +377,31 @@ println(summary.numNonzeros) // number of nonzeros in each column
355377
val cov: Matrix = mat.computeCovariance()
356378
{% endhighlight %}
357379
</div>
380+
381+
<div data-lang="java" markdown="1">
382+
383+
[`RowMatrix#computeColumnSummaryStatistics`](api/java/org/apache/spark/mllib/linalg/distributed/RowMatrix.html#computeColumnSummaryStatistics()) returns an instance of
384+
[`MultivariateStatisticalSummary`](api/java/org/apache/spark/mllib/stat/MultivariateStatisticalSummary.html),
385+
which contains the column-wise max, min, mean, variance, and number of nonzeros, as well as the
386+
total count.
387+
388+
{% highlight java %}
389+
import org.apache.spark.mllib.linalg.Matrix;
390+
import org.apache.spark.mllib.linalg.distributed.RowMatrix;
391+
import org.apache.spark.mllib.stat.MultivariateStatisticalSummary;
392+
393+
RowMatrix mat = ... // a RowMatrix
394+
395+
// Compute column summary statistics.
396+
MultivariateStatisticalSummary summary = mat.computeColumnSummaryStatistics();
397+
System.out.println(summary.mean()); // a dense vector containing the mean value for each column
398+
System.out.println(summary.variance()); // column-wise variance
399+
System.out.println(summary.numNonzeros()); // number of nonzeros in each column
400+
401+
// Compute the covariance matrix.
402+
Matrix cov = mat.computeCovariance();
403+
{% endhighlight %}
404+
</div>
358405
</div>
359406

360407
### IndexedRowMatrix
@@ -366,9 +413,9 @@ an RDD of indexed rows, which each row is represented by its index (long-typed)
366413
<div data-lang="scala" markdown="1">
367414

368415
An
369-
[`IndexedRowMatrix`](api/mllib/index.html#org.apache.spark.mllib.linalg.distributed.IndexedRowMatrix)
416+
[`IndexedRowMatrix`](api/scala/index.html#org.apache.spark.mllib.linalg.distributed.IndexedRowMatrix)
370417
can be created from an `RDD[IndexedRow]` instance, where
371-
[`IndexedRow`](api/mllib/index.html#org.apache.spark.mllib.linalg.distributed.IndexedRow) is a
418+
[`IndexedRow`](api/scala/index.html#org.apache.spark.mllib.linalg.distributed.IndexedRow) is a
372419
wrapper over `(Long, Vector)`. An `IndexedRowMatrix` can be converted to a `RowMatrix` by dropping
373420
its row indices.
374421

@@ -391,9 +438,9 @@ val rowMat: RowMatrix = mat.toRowMatrix()
391438
<div data-lang="java" markdown="1">
392439

393440
An
394-
[`IndexedRowMatrix`](api/mllib/index.html#org.apache.spark.mllib.linalg.distributed.IndexedRowMatrix)
441+
[`IndexedRowMatrix`](api/java/org/apache/spark/mllib/linalg/distributed/IndexedRowMatrix.html)
395442
can be created from an `JavaRDD<IndexedRow>` instance, where
396-
[`IndexedRow`](api/mllib/index.html#org.apache.spark.mllib.linalg.distributed.IndexedRow) is a
443+
[`IndexedRow`](api/java/org/apache/spark/mllib/linalg/distributed/IndexedRow.html) is a
397444
wrapper over `(long, Vector)`. An `IndexedRowMatrix` can be converted to a `RowMatrix` by dropping
398445
its row indices.
399446

@@ -427,9 +474,9 @@ dimensions of the matrix are huge and the matrix is very sparse.
427474
<div data-lang="scala" markdown="1">
428475

429476
A
430-
[`CoordinateMatrix`](api/mllib/index.html#org.apache.spark.mllib.linalg.distributed.CoordinateMatrix)
477+
[`CoordinateMatrix`](api/scala/index.html#org.apache.spark.mllib.linalg.distributed.CoordinateMatrix)
431478
can be created from an `RDD[MatrixEntry]` instance, where
432-
[`MatrixEntry`](api/mllib/index.html#org.apache.spark.mllib.linalg.distributed.MatrixEntry) is a
479+
[`MatrixEntry`](api/scala/index.html#org.apache.spark.mllib.linalg.distributed.MatrixEntry) is a
433480
wrapper over `(Long, Long, Double)`. A `CoordinateMatrix` can be converted to a `IndexedRowMatrix`
434481
with sparse rows by calling `toIndexedRowMatrix`. In this release, we do not provide other
435482
computation for `CoordinateMatrix`.
@@ -453,21 +500,21 @@ val indexedRowMatrix = mat.toIndexedRowMatrix()
453500
<div data-lang="java" markdown="1">
454501

455502
A
456-
[`CoordinateMatrix`](api/mllib/index.html#org.apache.spark.mllib.linalg.distributed.CoordinateMatrix)
503+
[`CoordinateMatrix`](api/java/org/apache/spark/mllib/linalg/distributed/CoordinateMatrix.html)
457504
can be created from a `JavaRDD<MatrixEntry>` instance, where
458-
[`MatrixEntry`](api/mllib/index.html#org.apache.spark.mllib.linalg.distributed.MatrixEntry) is a
505+
[`MatrixEntry`](api/java/org/apache/spark/mllib/linalg/distributed/MatrixEntry.html) is a
459506
wrapper over `(long, long, double)`. A `CoordinateMatrix` can be converted to a `IndexedRowMatrix`
460507
with sparse rows by calling `toIndexedRowMatrix`.
461508

462-
{% highlight scala %}
509+
{% highlight java %}
463510
import org.apache.spark.api.java.JavaRDD;
464511
import org.apache.spark.mllib.linalg.distributed.CoordinateMatrix;
465512
import org.apache.spark.mllib.linalg.distributed.IndexedRowMatrix;
466513
import org.apache.spark.mllib.linalg.distributed.MatrixEntry;
467514

468515
JavaRDD<MatrixEntry> entries = ... // a JavaRDD of matrix entries
469516
// Create a CoordinateMatrix from a JavaRDD<MatrixEntry>.
470-
CoordinateMatrix mat = new CoordinateMatrix(entries);
517+
CoordinateMatrix mat = new CoordinateMatrix(entries.rdd());
471518

472519
// Get its size.
473520
long m = mat.numRows();

docs/mllib-clustering.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
---
22
layout: global
3-
title: <a href="mllib-guide.html">MLlib</a> - Clustering
3+
title: Clustering - MLlib
4+
displayTitle: <a href="mllib-guide.html">MLlib</a> - Clustering
45
---
56

67
* Table of contents
@@ -40,7 +41,7 @@ a given dataset, the algorithm returns the best clustering result).
4041
Following code snippets can be executed in `spark-shell`.
4142

4243
In the following example after loading and parsing data, we use the
43-
[`KMeans`](api/mllib/index.html#org.apache.spark.mllib.clustering.KMeans) object to cluster the data
44+
[`KMeans`](api/scala/index.html#org.apache.spark.mllib.clustering.KMeans) object to cluster the data
4445
into two clusters. The number of desired clusters is passed to the algorithm. We then compute Within
4546
Set Sum of Squared Error (WSSSE). You can reduce this error measure by increasing *k*. In fact the
4647
optimal *k* is usually one where there is an "elbow" in the WSSSE graph.

docs/mllib-collaborative-filtering.md

Lines changed: 17 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
---
22
layout: global
3-
title: <a href="mllib-guide.html">MLlib</a> - Collaborative Filtering
3+
title: Collaborative Filtering - MLlib
4+
displayTitle: <a href="mllib-guide.html">MLlib</a> - Collaborative Filtering
45
---
56

67
* Table of contents
@@ -48,7 +49,7 @@ user for an item.
4849

4950
<div data-lang="scala" markdown="1">
5051
In the following example we load rating data. Each row consists of a user, a product and a rating.
51-
We use the default [ALS.train()](api/mllib/index.html#org.apache.spark.mllib.recommendation.ALS$)
52+
We use the default [ALS.train()](api/scala/index.html#org.apache.spark.mllib.recommendation.ALS$)
5253
method which assumes ratings are explicit. We evaluate the
5354
recommendation model by measuring the Mean Squared Error of rating prediction.
5455

@@ -58,25 +59,29 @@ import org.apache.spark.mllib.recommendation.Rating
5859

5960
// Load and parse the data
6061
val data = sc.textFile("mllib/data/als/test.data")
61-
val ratings = data.map(_.split(',') match {
62-
case Array(user, item, rate) => Rating(user.toInt, item.toInt, rate.toDouble)
63-
})
62+
val ratings = data.map(_.split(',') match { case Array(user, item, rate) =>
63+
Rating(user.toInt, item.toInt, rate.toDouble)
64+
})
6465

6566
// Build the recommendation model using ALS
6667
val rank = 10
6768
val numIterations = 20
6869
val model = ALS.train(ratings, rank, numIterations, 0.01)
6970

7071
// Evaluate the model on rating data
71-
val usersProducts = ratings.map{ case Rating(user, product, rate) => (user, product)}
72-
val predictions = model.predict(usersProducts).map{
73-
case Rating(user, product, rate) => ((user, product), rate)
72+
val usersProducts = ratings.map { case Rating(user, product, rate) =>
73+
(user, product)
7474
}
75-
val ratesAndPreds = ratings.map{
76-
case Rating(user, product, rate) => ((user, product), rate)
75+
val predictions =
76+
model.predict(usersProducts).map { case Rating(user, product, rate) =>
77+
((user, product), rate)
78+
}
79+
val ratesAndPreds = ratings.map { case Rating(user, product, rate) =>
80+
((user, product), rate)
7781
}.join(predictions)
78-
val MSE = ratesAndPreds.map{
79-
case ((user, product), (r1, r2)) => math.pow((r1- r2), 2)
82+
val MSE = ratesAndPreds.map { case ((user, product), (r1, r2)) =>
83+
val err = (r1 - r2)
84+
err * err
8085
}.mean()
8186
println("Mean Squared Error = " + MSE)
8287
{% endhighlight %}

docs/mllib-decision-tree.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
---
22
layout: global
3-
title: <a href="mllib-guide.html">MLlib</a> - Decision Tree
3+
title: Decision Tree - MLlib
4+
displayTitle: <a href="mllib-guide.html">MLlib</a> - Decision Tree
45
---
56

67
* Table of contents

0 commit comments

Comments
 (0)