Skip to content

Commit 3c456a8

Browse files
committed
update NB user guide
1 parent 17bba53 commit 3c456a8

File tree

1 file changed

+5
-4
lines changed

1 file changed

+5
-4
lines changed

docs/mllib-naive-bayes.md

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ Within that context, each observation is a document and each
2121
feature represents a term whose value is the frequency of the term (in multinomial naive Bayes) or
2222
a zero or one indicating whether the term was found in the document (in Bernoulli naive Bayes).
2323
Feature values must be nonnegative. The model type is selected with an optional parameter
24-
"Multinomial" or "Bernoulli" with "Multinomial" as the default.
24+
"multinomial" or "bernoulli" with "multinomial" as the default.
2525
[Additive smoothing](http://en.wikipedia.org/wiki/Lidstone_smoothing) can be used by
2626
setting the parameter $\lambda$ (default to $1.0$). For document classification, the input feature
2727
vectors are usually sparse, and sparse vectors should be supplied as input to take advantage of
@@ -35,7 +35,7 @@ sparsity. Since the training data is only used once, it is not necessary to cach
3535
[NaiveBayes](api/scala/index.html#org.apache.spark.mllib.classification.NaiveBayes$) implements
3636
multinomial naive Bayes. It takes an RDD of
3737
[LabeledPoint](api/scala/index.html#org.apache.spark.mllib.regression.LabeledPoint) and an optional
38-
smoothing parameter `lambda` as input, an optional model type parameter (default is Multinomial), and outputs a
38+
smoothing parameter `lambda` as input, an optional model type parameter (default is "multinomial"), and outputs a
3939
[NaiveBayesModel](api/scala/index.html#org.apache.spark.mllib.classification.NaiveBayesModel), which
4040
can be used for evaluation and prediction.
4141

@@ -54,7 +54,7 @@ val splits = parsedData.randomSplit(Array(0.6, 0.4), seed = 11L)
5454
val training = splits(0)
5555
val test = splits(1)
5656

57-
val model = NaiveBayes.train(training, lambda = 1.0, model = "Multinomial")
57+
val model = NaiveBayes.train(training, lambda = 1.0, model = "multinomial")
5858

5959
val predictionAndLabel = test.map(p => (model.predict(p.features), p.label))
6060
val accuracy = 1.0 * predictionAndLabel.filter(x => x._1 == x._2).count() / test.count()
@@ -75,14 +75,15 @@ optionally smoothing parameter `lambda` as input, and output a
7575
can be used for evaluation and prediction.
7676

7777
{% highlight java %}
78+
import scala.Tuple2;
79+
7880
import org.apache.spark.api.java.JavaPairRDD;
7981
import org.apache.spark.api.java.JavaRDD;
8082
import org.apache.spark.api.java.function.Function;
8183
import org.apache.spark.api.java.function.PairFunction;
8284
import org.apache.spark.mllib.classification.NaiveBayes;
8385
import org.apache.spark.mllib.classification.NaiveBayesModel;
8486
import org.apache.spark.mllib.regression.LabeledPoint;
85-
import scala.Tuple2;
8687

8788
JavaRDD<LabeledPoint> training = ... // training set
8889
JavaRDD<LabeledPoint> test = ... // test set

0 commit comments

Comments
 (0)