@@ -21,7 +21,7 @@ Within that context, each observation is a document and each
21
21
feature represents a term whose value is the frequency of the term (in multinomial naive Bayes) or
22
22
a zero or one indicating whether the term was found in the document (in Bernoulli naive Bayes).
23
23
Feature values must be nonnegative. The model type is selected with an optional parameter
24
- "Multinomial " or "Bernoulli " with "Multinomial " as the default.
24
+ "multinomial " or "bernoulli " with "multinomial " as the default.
25
25
[ Additive smoothing] ( http://en.wikipedia.org/wiki/Lidstone_smoothing ) can be used by
26
26
setting the parameter $\lambda$ (default to $1.0$). For document classification, the input feature
27
27
vectors are usually sparse, and sparse vectors should be supplied as input to take advantage of
@@ -35,7 +35,7 @@ sparsity. Since the training data is only used once, it is not necessary to cach
35
35
[ NaiveBayes] ( api/scala/index.html#org.apache.spark.mllib.classification.NaiveBayes$ ) implements
36
36
multinomial naive Bayes. It takes an RDD of
37
37
[ LabeledPoint] ( api/scala/index.html#org.apache.spark.mllib.regression.LabeledPoint ) and an optional
38
- smoothing parameter ` lambda ` as input, an optional model type parameter (default is Multinomial ), and outputs a
38
+ smoothing parameter ` lambda ` as input, an optional model type parameter (default is "multinomial" ), and outputs a
39
39
[ NaiveBayesModel] ( api/scala/index.html#org.apache.spark.mllib.classification.NaiveBayesModel ) , which
40
40
can be used for evaluation and prediction.
41
41
@@ -54,7 +54,7 @@ val splits = parsedData.randomSplit(Array(0.6, 0.4), seed = 11L)
54
54
val training = splits(0)
55
55
val test = splits(1)
56
56
57
- val model = NaiveBayes.train(training, lambda = 1.0, model = "Multinomial ")
57
+ val model = NaiveBayes.train(training, lambda = 1.0, model = "multinomial ")
58
58
59
59
val predictionAndLabel = test.map(p => (model.predict(p.features), p.label))
60
60
val accuracy = 1.0 * predictionAndLabel.filter(x => x._ 1 == x._ 2).count() / test.count()
@@ -75,14 +75,15 @@ optionally smoothing parameter `lambda` as input, and output a
75
75
can be used for evaluation and prediction.
76
76
77
77
{% highlight java %}
78
+ import scala.Tuple2;
79
+
78
80
import org.apache.spark.api.java.JavaPairRDD;
79
81
import org.apache.spark.api.java.JavaRDD;
80
82
import org.apache.spark.api.java.function.Function;
81
83
import org.apache.spark.api.java.function.PairFunction;
82
84
import org.apache.spark.mllib.classification.NaiveBayes;
83
85
import org.apache.spark.mllib.classification.NaiveBayesModel;
84
86
import org.apache.spark.mllib.regression.LabeledPoint;
85
- import scala.Tuple2;
86
87
87
88
JavaRDD<LabeledPoint > training = ... // training set
88
89
JavaRDD<LabeledPoint > test = ... // test set
0 commit comments