A Naive Bayes Classifier is a supervised algorithm in machine-learning which uses the Bayes Theorem. The theorem depends on the assumption that input variables are independent of each other. Irrespective of this assumption, it has proven to be a classifier with better results.
Naive Bayes (NB) algorithm is naive because it makes the assumption that features are independent of each other. This is naive because it is almost never true.
Naive Bayes classifier assumes that the existence of a particular feature of a class is unrelated to the presence or absence of any other feature, in the given the class variable.
Naive Bayes Classifiers basically depends on the Bayes’ Theorem, which is based on conditional probability. The likelihood that an event (A) will occur given that another event (B) has already occurred. Basically, the theorem allows a hypothesis to be updated every time new evidence is introduced. The equation below explains Bayes Theorem:
Let’s explain each of these terms,
“P” is the symbol to denote probability.
P(A | B) = The probability of event A (hypothesis) occurring given that B (evidence) has occurred.
P(B | A) = The probability of the event B (evidence) occurring given that A (hypothesis) has occurred.
P(A) = The probability of event B (hypothesis) occurring.
P(B) = The probability of event A (evidence) occurring.
A list of probabilities required to store to file for a learned naive Bayes model are,
Class Probabilities: The probabilities of each class in the training dataset.
Conditional Probabilities: The conditional probabilities of each input value for every given class value.
To perform Naïve-Bayes classification, the dataset is divided into two parts, namely, feature matrix and the response vector.
Feature matrix contains all the vectors(rows) of dataset in which every vector consists of the value of dependent features.
Response vector contains the value of class variable either prediction or output for each row of feature matrix.
A prior probability is the probability that an observation will fall into a category before you collect the data. The prior is a probability distribution that represents your uncertainty over θ before you have sampled any data and tried to estimate it – usually denoted by π(θ).
A posterior probability is the probability of assigning observations to categories or groups given in the data. The posterior is a probability distribution representing your uncertainty over θ after you have sampled data and is denoted by π(θ|X). It is a conditional distribution because it applies conditions on the observed data.
From Bayes’ theorem we can relate the two as
Types of Naïve Bayes Classifiers
There are three types of classifiers. They are
- Gaussian Naïve Bayes
- Multinomial Naïve Bayes
- Bernoulli Naïve Bayes
Gaussian Naive Bayes
Gaussian Naive Bayes is useful when we are working with continuous values whose probabilities can be modeled using a Gaussian distribution
Multinomial naive Bayes
A multinomial distribution is helpful to model feature vectors where each value represents like the number of occurrences of a term or its relative frequency. If the feature vectors have n elements and each element of them can assume k different values with probability pk, then
Bernoulli naive Bayes
If X is a random variable which is Bernoulli-distributed, it assumes only two values and their probability is given as follows