/    /  Machine Learning- Bayes Theorem and Concept Learning | Example of Bayes Theorem

Bayes Theorem and Concept Learning | Example of Bayes Theorem

 

The Bayes theorem is a method for calculating a hypothesis’s probability based on its prior probability, the probabilities of observing specific data given the hypothesis, and the seen data itself.

 

In this blog, we’ll have a look at an example for the Bayes theorem and also look at the relationship between the Bayes theorem and Concept Learning. 

 

Bayes theorem calculates the probability of each possible hypothesis and outputs the most probable one. 

 

Consider a medical diagnosis problem, There are two different hypotheses: 

(1) the patient has a certain type of cancer

(2) the patient does not have cancer. 

 

The information is based on a laboratory test with two possible outcomes: + (positive) and – (negative) (negative).

 

We already knew that only.008 percent of the population is infected with this disease. Furthermore, the laboratory test is merely an imperfect indicator of disease.

 

Only 98 percent of the time does the test give an accurate positive result when the disease is present, and only 97 percent of the time does it give a valid negative result when the disease is not there. In other circumstances, the test yields the opposite outcome.

 

That is,

 

Let’s say we come across a new patient who has a positive lab test result. Should we give the patient a cancer diagnosis or not? Equation 6.2 can be used to find the maximal a posteriori hypothesis

 

Let’s have a look at the Problem of Probability Density Estimation,

 

Given a sample of observations (X) from a domain (x1, x2, x3,…, xn), each observation is taken independently from the domain with the same probability distribution (so-called independent and identically distributed, i.i.d., or close to it).

 

Density estimation entails choosing a probability distribution function and its parameters that best explain the observed data’s joint probability distribution (X).

 

There are several approaches to tackling this problem, but two of the most common are:

 

A Bayesian approach is called Maximum a Posteriori (MAP).

 

Frequentist technique, Maximum Likelihood Estimation (MLE)

 

To find the maximum likelihood in Bayesian learning, 

 

The small p indicates the probability density function. Maximum likelihood estimation (MLE) is a statistical technique for estimating the parameters of a probability distribution based on observed data. 

 

This is accomplished by maximizing a probability function such that the observed data is most likely under the assumed statistical model. The maximum likelihood estimate is the point in the parameter space that maximizes the likelihood function.

 

Bayes Theorem and Concept Learning: 

The steps for brute force concept learning: 

 

1. Given the training data, the Bayes theorem determines the posterior probability of each hypothesis. It calculates the likelihood of each conceivable hypothesis before determining which is the most likely.

2. Output the hypothesis hMAP with the highest posterior probability. 

 

To calculate, we need to know the values of P(h) and P(D/h). To choose these to be consistent with the following assumptions: 

  1. There is no noise in the training data D (i.e., di = c(xi)). 
  2. The hypothesis space H contains the goal notion c. 
  3. We have no reason to conclude that one hypothesis is more likely than another based on prior evidence.

 

  • Since we’re assuming the training data to be noise-free, the chances of observing classification di given h are 1 if di = h(xi) and 0 if di != (xi). Therefore,

 

To put it another way, the probability of data D given hypothesis h is 1 if D agrees with h and 0 otherwise.

 

  • Given no previous information of which hypothesis is more likely, it is fair to give each hypothesis h in H the same prior probability. We should require that these prior probabilities amount to 1 because we presume the target notion is contained in H.

 

Now, let’s consider two cases:

 

Case 1: h is inconsistent with the training data D. 

 

Here, since we know that P(D/h) = 0 when h is inconsistent with D. We have,

 

That is, The posterior probability of a hypothesis inconsistent with D is zero. 

 

Case 2: Consider a case where h is consistent with D. and since we know for h consistent with D, the value of P(D/h) = 1, we have, 

| VSH,D | is the subset of hypotheses from H that are consistent with D. It’s easier to verify that P(D) = | VSH,D | / | H | above. Since the add up over all the hypotheses of P(h/D) must be one and because the number of hypotheses from H consistent with D is by definition | VSH,D |. 

 

Alternatively, we can derive P(D) from the theorem of total probability and the fact that the hypothesis is mutually exclusive. 

 

To summarize, Bayes theorem implies that the posterior probability p(h/D) under the assumed P(h) and P(D/h) is

 

where | VSH,D | is the number of hypotheses from H consistent with D.