Machine Learning- Bayesian Learning: Introduction | i2tutorials

Home / Machine Learning – Tutorial / Machine Learning- Bayesian Learning: Introduction

Bayesian Learning: Introduction

Bayesian machine learning is a subset of probabilistic machine learning approaches (for other probabilistic models, see Supervised Learning). In this blog, we’ll have a look at a brief introduction to bayesian learning.

In Bayesian learning, model parameters are treated as random variables, and parameter estimation entails constructing posterior distributions for these random variables based on observed data.

Why Bayesian Learning Algorithms?

For two reasons, Bayesian learning approaches are relevant to machine learning.

To begin, Bayesian learning algorithms compute explicit probabilities for hypotheses.
The second reason is that they aid comprehension of various learning methods that do not involve probability manipulation.

Features of Bayesian learning methods include:

Each observed training example can reduce or enhance the estimated chance that a hypothesis is correct by a small amount.

This is more flexible than methods that fully discard a hypothesis if it is discovered to be inconsistent with any single example. To assess the final probability of a hypothesis, prior knowledge can be merged with observed data.

Hypotheses that make probabilistic predictions can be accommodated by Bayesian approaches (e.g., hypotheses such as “this pneumonia patient has a 93 percent chance of complete recovery”).

The validity of a proposition is calculated via Bayesian Estimation.

The proposition’s validity is determined by two factors:

i). Preliminary Estimate

ii). New evidence that is relevant.

Practical Issues:

One practical issue with using Bayesian methods is that they often require prior knowledge of a large number of possibilities. When these probabilities aren’t known ahead of time, they’re calculated using prior knowledge, data, and assumptions about the shape of the underlying distributions.

The substantial processing cost necessary to determine the Bayes optimum hypothesis in the general case is a second practical obstacle.

BAYES THEOREM:

Consider a typical machine learning task. You have a set of training data, inputs, and outputs, and you’d like to figure out how to map them together.

As a result, you piece together a model and soon have a deterministic way of making predictions for a target variable y given an unknown input x.

There’s only one problem: you have no method of explaining what’s going on inside your model! You just know it was trained to minimize some loss function on your training data, but that’s not much information. In an ideal world, you’d have an objective summary of your model’s parameters, complete with confidence intervals and other statistical morsels, and you’d be able to reason about them in probability terms.

This is where Bayesian Machine Learning enters the picture.

The Bayes theorem is a method for calculating a hypothesis’s probability based on its prior probability, the probabilities of observing specific data given the hypothesis, and the seen data itself.

Bayes theorem definition,

Before we view the training data, we use P(h) to signify the starting probability that hypothesis h holds.

P(h) is also known as the prior probability of h, and it might reflect whatever prior knowledge we have about the likelihood that h is right.

If we don’t have any prior knowledge, we may just give each candidate’s hypothesis the same prior probability.

Similarly, P(D) is the prior probability of observing training data D. (i.e., the probability of D given no knowledge about which hypothesis holds).

The likelihood of seeing data D in some environment where hypothesis h holds is denoted by P(D/h).

Given the observed training data D, the probability P(h/D) that h holds.

The posterior probability of h is designated P(h/D) because it represents our confidence that h holds after seeing the training data D.

In contrast to the prior probability P(h), which is independent of D, the posterior probability P(h/D) indicates the influence of the training data D.

From the prior probability P(h), as well as P(D) and P(D/h), the Bayes theorem can be used to compute the posterior probability P(h/D).

According to Bayes theorem, P(h/D) grows with P(h) and P(D/h).

As P(D) grows, P(h/D) drops.

In many learning scenarios, the learner considers a collection of candidate hypotheses H and is looking for the most likely hypothesis hH given the observed data D, or at least one of the most likely if there are numerous.

A maximum a posteriori (MAP) hypothesis is any maximally likely hypothesis.

The posterior probability of each candidate hypothesis can be calculated using the Bayes method to identify the MAP hypotheses.

P(D) was deleted because it is a constant.

We’ll suppose that every hypothesis in H is equally likely a priori in some instances (P(hi) = P(hj) for all hi and hj in H).

In this scenario, we may simplify the equation even more by simply considering the quantity P(D/h) when determining the most likely hypothesis.

The likelihood of the data D given h is typically referred to as P(D/h), and any hypothesis that maximizes P(D/h) is referred to as a maximum likelihood (ML) hypothesis.

We’ve already seen one use of Bayes Theorem in the analysis of Knowledge Cascades, we discovered that based on the conditional probabilities computed using Bayes’ Theorem, reasonable decisions may be made where one’s own personal information is omitted.

Application of the Bayes Theorem:

The theorem has a wide range of applications that aren’t confined to finance.

Bayes’ theorem, for example, can be used to estimate the accuracy of medical test findings by taking into account how probable any specific person is to have a condition as well as the test’s overall accuracy.

Reference

Bayesian Learning: Introduction