-
Notifications
You must be signed in to change notification settings - Fork 2k
NaiveBayes issue when probability is 0 #288
Description
When a probability of 0 is run through GeneralDiscreteDistribution!LogProbabilityMassFunction the value returned by Math.Log is -Inf. Once -Inf is returned the Decide value is always 0 with (0,0) probabilities (at least in cases where there are two outputs).
--==Example==--
Training data:
v1,v2,v3,v4,v5,v6,v7,v8,Result
TRUE,TRUE,FALSE,TRUE,TRUE,FALSE,FALSE,FALSE,FALSE
TRUE,TRUE,TRUE,TRUE,TRUE,FALSE,FALSE,FALSE,FALSE
TRUE,FALSE,TRUE,TRUE,TRUE,FALSE,FALSE,TRUE,FALSE
TRUE,TRUE,TRUE,TRUE,TRUE,FALSE,FALSE,TRUE,FALSE
FALSE,FALSE,TRUE,TRUE,TRUE,FALSE,FALSE,TRUE,FALSE
TRUE,TRUE,TRUE,TRUE,FALSE,FALSE,FALSE,FALSE,FALSE
FALSE,TRUE,TRUE,FALSE,TRUE,FALSE,FALSE,FALSE,FALSE
TRUE,TRUE,TRUE,FALSE,TRUE,FALSE,FALSE,FALSE,FALSE
FALSE,TRUE,TRUE,FALSE,TRUE,FALSE,FALSE,TRUE,FALSE
FALSE,TRUE,TRUE,TRUE,TRUE,FALSE,FALSE,TRUE,FALSE
FALSE,TRUE,TRUE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE
TRUE,FALSE,FALSE,TRUE,FALSE,FALSE,FALSE,TRUE,TRUE
TRUE,TRUE,FALSE,TRUE,FALSE,FALSE,FALSE,TRUE,TRUE
TRUE,TRUE,TRUE,TRUE,FALSE,FALSE,FALSE,TRUE,TRUE
FALSE,TRUE,TRUE,TRUE,FALSE,TRUE,TRUE,TRUE,TRUE
TRUE,TRUE,FALSE,FALSE,FALSE,TRUE,TRUE,TRUE,TRUE
FALSE,TRUE,FALSE,FALSE,FALSE,TRUE,TRUE,TRUE,TRUE
TRUE,TRUE,TRUE,TRUE,FALSE,TRUE,TRUE,TRUE,TRUE
FALSE,FALSE,FALSE,FALSE,FALSE,TRUE,TRUE,TRUE,TRUE
TRUE,TRUE,FALSE,TRUE,FALSE,TRUE,TRUE,TRUE,TRUE
FALSE,TRUE,FALSE,TRUE,FALSE,TRUE,TRUE,TRUE,TRUE
FALSE,TRUE,TRUE,FALSE,FALSE,TRUE,TRUE,TRUE,TRUE
Test:
FALSE,FALSE,FALSE,FALSE,TRUE,TRUE,TRUE,TRUE returns FALSE with NaiveBayes.Probabilities returning (0, 0).
A fix that seems to work (although I don't know if it is the mathematically correct thing to do) is check if the probability is zero and if it is, use a very low probability close to zero instead.
if (probabilities[value] == 0.0) {
return Math.Log(0.001);
}
In the above example, this fix causes TRUE to be returned with probabilities of (0.833, 0.167).