/    /  Machine Learning- Bayes Optimal Classifier and Naive Bayes Classifier

Bayes Optimal Classifier and Naive Bayes Classifier

 

The Bayes Optimal Classifier is a probabilistic model that predicts the most likely outcome for a new situation. In this blog, we’ll have a look at Bayes optimal classifier and Naive Bayes Classifier. 

 

The Bayes theorem is a method for calculating a hypothesis’s probability based on its prior probability, the probabilities of observing specific data given the hypothesis, and the seen data itself.

 

BAYES OPTIMAL CLASSIFIER

The Bayes Theorem, which provides a systematic means of computing a conditional probability, is used to describe it. It’s also related to Maximum a Posteriori (MAP), a probabilistic framework for determining the most likely hypothesis for a training dataset.

 

Take a hypothesis space that has 3 hypotheses h1, h2, and h3.

 

The posterior probabilities of the hypotheses are as follows:

h1 -> 0.4

h2 -> 0.3

h3 -> 0.3

 

Hence, h1 is the MAP hypothesis. (MAP => max posterior)

 

Suppose a new instance x is encountered, which is classified negative by h2 and h3 but positive by h1. 

 

Taking all hypotheses into account, the probability that x is positive is .4 and the probability that it is negative is therefore .6. 

 

The classification generated by the MAP hypothesis is different from the most probable classification in this case which is negative.

 

The most probable classification of the new instance is obtained by combining the predictions of all hypotheses, weighted by their posterior probabilities. 

 

If the new example’s probable classification can be any value vj from a set V, the probability P(vj/D) that the right classification for the new instance is vj is merely

 

The denominator is omitted since we’re only using this for comparison and all the values of P(vj/D) will have the same denominator. 

 

The value vj, for which P (vj/D) is maximum, is the best classification for the new instance.

 

A Bayes optimal classifier is a system that classifies new cases according to Equation. This strategy increases the likelihood that the new instance will be appropriately classified.

 

Consider an example for Bayes Optimal Classification, 

 

Let there be 5 hypotheses h1 through h5.

      P(hi/D)       P(F/ hi)      P(L/hi)       P(R/hi)
        0.4           1          0           0
        0.2           0          1           0
        0.1           0          0           1
        0.1           0          1           0
        0.2           0          1           0

 

The MAP theory, therefore, argues that the robot should proceed forward (F). Let’s see what the Bayes optimal procedure suggests. 

 

Thus, the Bayes optimal procedure recommends the robot turn left.

 

Naive Bayes Classifier

The Naive Bayes classifiers, which are a set of classification algorithms, are created using the Bayes’ Theorem. ‘Each pair of features categorized is independent of the others. Naive Bayes Classifier is a group of algorithms that all work on the above principle.

 

The naive Bayes classifier is useful for learning tasks in which each instance x is represented by a set of attribute values and the target function f(x) can take any value from a finite set V.

 

A set of target function training examples is provided, as well as a new instance specified by the tuple of attribute values (a1, a2.. .an).

 

The learner is given the task of estimating the goal value. The most likely target value VMAP is assigned in the Bayesian strategy to classify the new instance.

 

Simply count the number of times each target value vj appears in the training data to estimate each P(vj).

 

Where VNB stands for the Naive Bayes classifier’s target value.

 

The naive Bayes learning approach includes a learning stage in which the different P(vj) and P(ai/vj) variables are estimated using the training data’s frequency distribution.

 

The learned hypothesis is represented by the set of these estimations.

 

The basic Naive Bayes assumption is that each feature has the following effect:

 

Contribution to the ultimate product that is both independent and equal.

 

Let’s understand the concept of the Naive Bayes classifier better, with the help of an example. 

 

Let’s use the naive Bayes classifier to solve a problem we discussed during our decision tree learning discussion: classifying days based on whether or not someone will play tennis. 

 

Table 3.2 above shows 14 training instances of the goal concept PlayTennis, with the characteristics Outlook, Temperature, Humidity, and Wind describing each day. To categorize the following novel instance, we utilize the naive Bayes classifier and the training data from this table:

 

First, based on the frequencies of the 14 training instances, the probability of the various goal values may be easily determined.

 

We can also estimate conditional probabilities in the same way. Those for Wind = strong, for example, include

 

Based on the probability estimates learned from the training data, the naive Bayes classifier gives the goal value PlayTennis = no to this new occurrence. Furthermore, given the observed attribute values, we can determine the conditional probability that the target value is no by normalizing the above amounts to sum to one.

 

For the current example, this probability is,

 

References

Bayes Optimal Classifier and Naive Bayes Classifier- 1

Bayes Optimal Classifier and Naive Bayes Classifier- 2