Skip to content

Commit cd9f40b

Browse files
committed
add a paragraph to summarize distributed matrix types
1 parent 4617f04 commit cd9f40b

File tree

1 file changed

+9
-0
lines changed

1 file changed

+9
-0
lines changed

docs/mllib-basics.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -282,6 +282,15 @@ and distributed matrices. Converting a distributed matrix to a different format
282282
global shuffle, which is quite expensive. We implemented three types of distributed matrices in
283283
this release and will add more types in the future.
284284

285+
The basic type is called `RowMatrix`. A `RowMatrix` is a row-oriented distributed
286+
matrix without meaningful row indices, e.g., a collection of feature vectors.
287+
It is backed by an RDD of its rows, where each row is a local vector.
288+
We assume that the number of columns is not huge for a `RowMatrix`.
289+
An `IndexedRowMatrix` is similar to a `RowMatrix` but with row indices,
290+
which can be used for identifying rows and joins.
291+
A `CoordinateMatrix` is a distributed matrix stored in [coordinate list (COO)](https://en.wikipedia.org/wiki/Sparse_matrix) format,
292+
backed by an RDD of its entries.
293+
285294
***Note***
286295

287296
The underlying RDDs of a distributed matrix must be deterministic, because we cache the matrix size.

0 commit comments

Comments
 (0)