You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on May 9, 2024. It is now read-only.
Copy file name to clipboardExpand all lines: docs/ml-features.md
+89Lines changed: 89 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -106,6 +106,95 @@ for features_label in featurized.select("features", "label").take(3):
106
106
</div>
107
107
</div>
108
108
109
+
## Word2Vec
110
+
111
+
`Word2Vec` is an `Estimator` which takes sequences of words that represents documents and trains a `Word2VecModel`. The model is a `Map(String, Vector)` essentially, which maps each word to an unique fix-sized vector. The `Word2VecModel` transforms each documents into a vector using the average of all words in the document, which aims to other computations of documents such as similarity calculation consequencely. Please refer to the [MLlib user guide on Word2Vec](mllib-feature-extraction.html#Word2Vec) for more details on Word2Vec.
112
+
113
+
Word2Vec is implemented in [Word2Vec](api/scala/index.html#org.apache.spark.ml.feature.Word2Vec). In the following code segment, we start with a set of documents, each of them is represented as a sequence of words. For each document, we transform it into a feature vector. This feature vector could then be passed to a learning algorithm.
114
+
115
+
<divclass="codetabs">
116
+
<divdata-lang="scala"markdown="1">
117
+
{% highlight scala %}
118
+
import org.apache.spark.ml.feature.Word2Vec
119
+
120
+
// Input data: Each row is a bag of words from a sentence or document.
0 commit comments