Skip to content

Commit 6313bab

Browse files
committed
The documentation for BlockMatrix should come after RowMatrix, IndexedRowMatrix, and CoordinateMatrix, as BlockMatrix references the later three types, and RowMatrix is considered the "basic" distributed matrix. This will improve comprehensibility of the "Distributed matrix" section, especially for the new reader.
1 parent 4de74d2 commit 6313bab

File tree

1 file changed

+64
-64
lines changed

1 file changed

+64
-64
lines changed

docs/mllib-data-types.md

Lines changed: 64 additions & 64 deletions
Original file line numberDiff line numberDiff line change
@@ -296,70 +296,6 @@ backed by an RDD of its entries.
296296
The underlying RDDs of a distributed matrix must be deterministic, because we cache the matrix size.
297297
In general the use of non-deterministic RDDs can lead to errors.
298298

299-
### BlockMatrix
300-
301-
A `BlockMatrix` is a distributed matrix backed by an RDD of `MatrixBlock`s, where a `MatrixBlock` is
302-
a tuple of `((Int, Int), Matrix)`, where the `(Int, Int)` is the index of the block, and `Matrix` is
303-
the sub-matrix at the given index with size `rowsPerBlock` x `colsPerBlock`.
304-
`BlockMatrix` supports methods such as `add` and `multiply` with another `BlockMatrix`.
305-
`BlockMatrix` also has a helper function `validate` which can be used to check whether the
306-
`BlockMatrix` is set up properly.
307-
308-
<div class="codetabs">
309-
<div data-lang="scala" markdown="1">
310-
311-
A [`BlockMatrix`](api/scala/index.html#org.apache.spark.mllib.linalg.distributed.BlockMatrix) can be
312-
most easily created from an `IndexedRowMatrix` or `CoordinateMatrix` by calling `toBlockMatrix`.
313-
`toBlockMatrix` creates blocks of size 1024 x 1024 by default.
314-
Users may change the block size by supplying the values through `toBlockMatrix(rowsPerBlock, colsPerBlock)`.
315-
316-
{% highlight scala %}
317-
import org.apache.spark.mllib.linalg.distributed.{BlockMatrix, CoordinateMatrix, MatrixEntry}
318-
319-
val entries: RDD[MatrixEntry] = ... // an RDD of (i, j, v) matrix entries
320-
// Create a CoordinateMatrix from an RDD[MatrixEntry].
321-
val coordMat: CoordinateMatrix = new CoordinateMatrix(entries)
322-
// Transform the CoordinateMatrix to a BlockMatrix
323-
val matA: BlockMatrix = coordMat.toBlockMatrix().cache()
324-
325-
// Validate whether the BlockMatrix is set up properly. Throws an Exception when it is not valid.
326-
// Nothing happens if it is valid.
327-
matA.validate()
328-
329-
// Calculate A^T A.
330-
val ata = matA.transpose.multiply(matA)
331-
{% endhighlight %}
332-
</div>
333-
334-
<div data-lang="java" markdown="1">
335-
336-
A [`BlockMatrix`](api/java/org/apache/spark/mllib/linalg/distributed/BlockMatrix.html) can be
337-
most easily created from an `IndexedRowMatrix` or `CoordinateMatrix` by calling `toBlockMatrix`.
338-
`toBlockMatrix` creates blocks of size 1024 x 1024 by default.
339-
Users may change the block size by supplying the values through `toBlockMatrix(rowsPerBlock, colsPerBlock)`.
340-
341-
{% highlight java %}
342-
import org.apache.spark.api.java.JavaRDD;
343-
import org.apache.spark.mllib.linalg.distributed.BlockMatrix;
344-
import org.apache.spark.mllib.linalg.distributed.CoordinateMatrix;
345-
import org.apache.spark.mllib.linalg.distributed.IndexedRowMatrix;
346-
347-
JavaRDD<MatrixEntry> entries = ... // a JavaRDD of (i, j, v) Matrix Entries
348-
// Create a CoordinateMatrix from a JavaRDD<MatrixEntry>.
349-
CoordinateMatrix coordMat = new CoordinateMatrix(entries.rdd());
350-
// Transform the CoordinateMatrix to a BlockMatrix
351-
BlockMatrix matA = coordMat.toBlockMatrix().cache();
352-
353-
// Validate whether the BlockMatrix is set up properly. Throws an Exception when it is not valid.
354-
// Nothing happens if it is valid.
355-
matA.validate();
356-
357-
// Calculate A^T A.
358-
BlockMatrix ata = matA.transpose().multiply(matA);
359-
{% endhighlight %}
360-
</div>
361-
</div>
362-
363299
### RowMatrix
364300

365301
A `RowMatrix` is a row-oriented distributed matrix without meaningful row indices, backed by an RDD
@@ -530,3 +466,67 @@ IndexedRowMatrix indexedRowMatrix = mat.toIndexedRowMatrix();
530466
{% endhighlight %}
531467
</div>
532468
</div>
469+
470+
### BlockMatrix
471+
472+
A `BlockMatrix` is a distributed matrix backed by an RDD of `MatrixBlock`s, where a `MatrixBlock` is
473+
a tuple of `((Int, Int), Matrix)`, where the `(Int, Int)` is the index of the block, and `Matrix` is
474+
the sub-matrix at the given index with size `rowsPerBlock` x `colsPerBlock`.
475+
`BlockMatrix` supports methods such as `add` and `multiply` with another `BlockMatrix`.
476+
`BlockMatrix` also has a helper function `validate` which can be used to check whether the
477+
`BlockMatrix` is set up properly.
478+
479+
<div class="codetabs">
480+
<div data-lang="scala" markdown="1">
481+
482+
A [`BlockMatrix`](api/scala/index.html#org.apache.spark.mllib.linalg.distributed.BlockMatrix) can be
483+
most easily created from an `IndexedRowMatrix` or `CoordinateMatrix` by calling `toBlockMatrix`.
484+
`toBlockMatrix` creates blocks of size 1024 x 1024 by default.
485+
Users may change the block size by supplying the values through `toBlockMatrix(rowsPerBlock, colsPerBlock)`.
486+
487+
{% highlight scala %}
488+
import org.apache.spark.mllib.linalg.distributed.{BlockMatrix, CoordinateMatrix, MatrixEntry}
489+
490+
val entries: RDD[MatrixEntry] = ... // an RDD of (i, j, v) matrix entries
491+
// Create a CoordinateMatrix from an RDD[MatrixEntry].
492+
val coordMat: CoordinateMatrix = new CoordinateMatrix(entries)
493+
// Transform the CoordinateMatrix to a BlockMatrix
494+
val matA: BlockMatrix = coordMat.toBlockMatrix().cache()
495+
496+
// Validate whether the BlockMatrix is set up properly. Throws an Exception when it is not valid.
497+
// Nothing happens if it is valid.
498+
matA.validate()
499+
500+
// Calculate A^T A.
501+
val ata = matA.transpose.multiply(matA)
502+
{% endhighlight %}
503+
</div>
504+
505+
<div data-lang="java" markdown="1">
506+
507+
A [`BlockMatrix`](api/java/org/apache/spark/mllib/linalg/distributed/BlockMatrix.html) can be
508+
most easily created from an `IndexedRowMatrix` or `CoordinateMatrix` by calling `toBlockMatrix`.
509+
`toBlockMatrix` creates blocks of size 1024 x 1024 by default.
510+
Users may change the block size by supplying the values through `toBlockMatrix(rowsPerBlock, colsPerBlock)`.
511+
512+
{% highlight java %}
513+
import org.apache.spark.api.java.JavaRDD;
514+
import org.apache.spark.mllib.linalg.distributed.BlockMatrix;
515+
import org.apache.spark.mllib.linalg.distributed.CoordinateMatrix;
516+
import org.apache.spark.mllib.linalg.distributed.IndexedRowMatrix;
517+
518+
JavaRDD<MatrixEntry> entries = ... // a JavaRDD of (i, j, v) Matrix Entries
519+
// Create a CoordinateMatrix from a JavaRDD<MatrixEntry>.
520+
CoordinateMatrix coordMat = new CoordinateMatrix(entries.rdd());
521+
// Transform the CoordinateMatrix to a BlockMatrix
522+
BlockMatrix matA = coordMat.toBlockMatrix().cache();
523+
524+
// Validate whether the BlockMatrix is set up properly. Throws an Exception when it is not valid.
525+
// Nothing happens if it is valid.
526+
matA.validate();
527+
528+
// Calculate A^T A.
529+
BlockMatrix ata = matA.transpose().multiply(matA);
530+
{% endhighlight %}
531+
</div>
532+
</div>

0 commit comments

Comments
 (0)