/    /  Naïve Bayes Classifier

Naïve Bayes

A Naive Bayes Classifier is a supervised algorithm in machine-learning which uses the Bayes Theorem. The theorem depends on the assumption that input variables are independent of each other. Irrespective of this assumption, it has proven to be a classifier with better results.

Naive Bayes (NB) algorithm is naive because it makes the assumption that features are independent of each other. This is naive because it is almost never true.

Naive Bayes classifier assumes that the existence of a particular feature of a class is unrelated to the presence or absence of any other feature, in the given the class variable.

Naive Bayes Classifiers basically depends on the Bayes’ Theorem, which is based on conditional probability. The likelihood that an event (A) will occur given that another event (B) has already occurred. Basically, the theorem allows a hypothesis to be updated every time new evidence is introduced. The equation below explains Bayes Theorem:

Naïve Bayes 1 (i2tutorials)

Let’s explain each of these terms,

“P” is the symbol to denote probability.

P(A | B) = The probability of event A (hypothesis) occurring given that B (evidence) has occurred.

P(B | A) = The probability of the event B (evidence) occurring given that A (hypothesis) has occurred.

P(A) = The probability of event B (hypothesis) occurring.

P(B) = The probability of event A (evidence) occurring.

A list of probabilities required to store to file for a learned naive Bayes model are,

Class Probabilities: The probabilities of each class in the training dataset.

Conditional Probabilities: The conditional probabilities of each input value for every given class value.

To perform Naïve-Bayes classification, the dataset is divided into two parts, namely, feature matrix and the response vector.

Feature matrix contains all the vectors(rows) of dataset in which every vector consists of the value of dependent features.

Response vector contains the value of class variable either prediction or output for each row of feature matrix.

Prior Probability

A prior probability is the probability that an observation will fall into a category before you collect the data. The prior is a probability distribution that represents your uncertainty over θ before you have sampled any data and tried to estimate it – usually denoted by π(θ).

 

Posterior Probability

A posterior probability is the probability of assigning observations to categories or groups given in the data. The posterior is a probability distribution representing your uncertainty over θ after you have sampled data and is denoted by π(θ|X). It is a conditional distribution because it applies conditions on the observed data.

From Bayes’ theorem we can relate the two as

Naïve Bayes 2 (i2tutorials)

Naïve Bayes 3 (i2tutorials)

Naïve Bayes 4 (i2tutorials)

Types of Naïve Bayes Classifiers

There are three types of classifiers. They are

  • Gaussian Naïve Bayes
  • Multinomial Naïve Bayes
  • Bernoulli Naïve Bayes

Gaussian Naive Bayes

Gaussian Naive Bayes is useful when we are working with continuous values whose probabilities can be modeled using a Gaussian distribution

Naïve Bayes 5 (i2tutorials) Naïve Bayes 6 (i2tutorials)

Multinomial naive Bayes

A multinomial distribution is helpful to model feature vectors where each value represents like the number of occurrences of a term or its relative frequency. If the feature vectors have n elements and each element of them can assume k different values with probability pk, then

Naïve Bayes 7 (i2tutorials)

Bernoulli naive Bayes

If X is a random variable which is Bernoulli-distributed, it assumes only two values and their probability is given as follows

Naïve Bayes 8 (i2tutorials)