Skip to content
This repository was archived by the owner on Nov 19, 2020. It is now read-only.
This repository was archived by the owner on Nov 19, 2020. It is now read-only.

NaiveBayes issue when probability is 0 #288

@spoved1

Description

@spoved1

When a probability of 0 is run through GeneralDiscreteDistribution!LogProbabilityMassFunction the value returned by Math.Log is -Inf. Once -Inf is returned the Decide value is always 0 with (0,0) probabilities (at least in cases where there are two outputs).

--==Example==--
Training data:
v1,v2,v3,v4,v5,v6,v7,v8,Result
TRUE,TRUE,FALSE,TRUE,TRUE,FALSE,FALSE,FALSE,FALSE
TRUE,TRUE,TRUE,TRUE,TRUE,FALSE,FALSE,FALSE,FALSE
TRUE,FALSE,TRUE,TRUE,TRUE,FALSE,FALSE,TRUE,FALSE
TRUE,TRUE,TRUE,TRUE,TRUE,FALSE,FALSE,TRUE,FALSE
FALSE,FALSE,TRUE,TRUE,TRUE,FALSE,FALSE,TRUE,FALSE
TRUE,TRUE,TRUE,TRUE,FALSE,FALSE,FALSE,FALSE,FALSE
FALSE,TRUE,TRUE,FALSE,TRUE,FALSE,FALSE,FALSE,FALSE
TRUE,TRUE,TRUE,FALSE,TRUE,FALSE,FALSE,FALSE,FALSE
FALSE,TRUE,TRUE,FALSE,TRUE,FALSE,FALSE,TRUE,FALSE
FALSE,TRUE,TRUE,TRUE,TRUE,FALSE,FALSE,TRUE,FALSE
FALSE,TRUE,TRUE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE
TRUE,FALSE,FALSE,TRUE,FALSE,FALSE,FALSE,TRUE,TRUE
TRUE,TRUE,FALSE,TRUE,FALSE,FALSE,FALSE,TRUE,TRUE
TRUE,TRUE,TRUE,TRUE,FALSE,FALSE,FALSE,TRUE,TRUE
FALSE,TRUE,TRUE,TRUE,FALSE,TRUE,TRUE,TRUE,TRUE
TRUE,TRUE,FALSE,FALSE,FALSE,TRUE,TRUE,TRUE,TRUE
FALSE,TRUE,FALSE,FALSE,FALSE,TRUE,TRUE,TRUE,TRUE
TRUE,TRUE,TRUE,TRUE,FALSE,TRUE,TRUE,TRUE,TRUE
FALSE,FALSE,FALSE,FALSE,FALSE,TRUE,TRUE,TRUE,TRUE
TRUE,TRUE,FALSE,TRUE,FALSE,TRUE,TRUE,TRUE,TRUE
FALSE,TRUE,FALSE,TRUE,FALSE,TRUE,TRUE,TRUE,TRUE
FALSE,TRUE,TRUE,FALSE,FALSE,TRUE,TRUE,TRUE,TRUE

Test:
FALSE,FALSE,FALSE,FALSE,TRUE,TRUE,TRUE,TRUE returns FALSE with NaiveBayes.Probabilities returning (0, 0).

A fix that seems to work (although I don't know if it is the mathematically correct thing to do) is check if the probability is zero and if it is, use a very low probability close to zero instead.

if (probabilities[value] == 0.0) {
return Math.Log(0.001);
}

In the above example, this fix causes TRUE to be returned with probabilities of (0.833, 0.167).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions