/    /  Machine Learning- Finding a Maximally Specific Hypothesis: The List-Then-Eliminate Algorithm | Version Space

Finding a Maximally Specific Hypothesis: The List-Then-Eliminate Algorithm | Version Space.

 

The LIST-THEN-ELIMINATE technique can be used if the hypothesis space H is finite in theory. It provides a number of advantages, including the assurance that all hypotheses will be compatible with the training data.

 

In this blog, we’ll have a look at the working and need of the list-then-eliminate algorithm and also understand the concept of version space.

 

To understand it from scratch let’s have a look at all the terminologies involved, 

 

Hypothesis: 

It is usually represented with an ‘h’. In supervised machine learning, a hypothesis is a function that best characterizes the target. 

 

For example, Consider a coordinate plane showing the output as positive or negative for a given task. 

  

The Hypothesis Space is made up of all of the legal ways in which we might partition the coordinate plane to anticipate the outcome of the test data.

 

Each conceivable path, represented with a gray line is referred to as a hypothesis.

 

Specific Hypothesis: 

If a hypothesis, h, covers none of the negative cases and there is no other hypothesis, h′, that covers none of the negative examples, then h is strictly more general than h′, then h is said to be the most specific hypothesis. 

 

The specific hypothesis fills in important details about the variables given in the hypothesis.

 

Before understanding version space, let’s first have a look at the formal definition for ‘Consistent’ => 

 

If and only if h(x) = c(x) for each example (x, c(x)) in D, a hypothesis h is consistent with a collection of training examples D. 

 

Let’s take our EnjoySport example yet again, to understand what Consistent Hypothesis means better, 

 

SkyAir TempHumidityWindWaterForecastEnjoySport
SunnyWarmNormalStrongWarmSameYes
SunnyWarmHighStrongWarmSameYes
RainyColdHighStrongWarmChangeNo
SunnyWarmHighStrongCoolChangeYes

 

Here,consider a hypothesis,  h1 = < ?, ?, ?, Strong, ?, ?>. 

 

But, h(x)!= c(x) in the case of training example (3). As a result, hypothesis h1 is not consistent with the training data set. 

 

Whereas consider a hypothesis, h2 = < ?, Warm, ?, Strong, ?, ?>.

h(x) = c(x) is true in all of the training instances. As a result, hypothesis h2 is consistent with the training data set.

 

Version Space: 

With regard to hypothesis space H and training examples D, the version space, denoted as VSH,D, is the subset of hypotheses from H that are consistent with the training instances in D.

 

In the above example, We have two hypotheses from H in the case above, both of which are consistent with the training dataset.

 

h1=< Sunny, Warm, ?, Strong, ?, ?> and

h2=< ?, Warm, ?, Strong, ?, ?>

 

As a result, the collection of hypotheses h1, h2 is referred to as a Version Space.

 

List – Then – Eliminate: 

The LIST-THEN-ELIMINATE method first populates the version space with all hypotheses in H, then discards any hypothesis that contradicts any training example.

 

As more instances are observed, the version space of candidate hypotheses reduces, until ideally just one hypothesis exists that is compatible with all of the observed cases.

 

If there isn’t enough evidence to reduce the version space down to a single hypothesis, the algorithm can generate the whole set of hypotheses that are compatible with the data.

 

=> Algorithm:

 

1. Initialize all the hypotheses in H to the Version Space. 

2. Keep removing the inconsistent hypothesis from the Version Space.

For each training instance, <x1, c(x) > remove any hypothesis that is h(x) != c(x).

3. Output the list of version space after checking all the examples. 

 

 

Let’s take another brief example to understand the working of list-then-eliminate. 

 

F1F2TargetClass
AXYes
AYYes

 

The version space for the above example is, 

(A, X), (A, Y), (A, ?), (B, X), (B, Y), (B, ?), (?, X), (?, Y) (?, ?), (ø, ø). 

 

Keep removing the hypothesis that is not consistent with the training examples. 

 

It was result in a Version Space => (A, ?), (?, ?). 

 

Some of the Key Disadvantages of using the List-Then-Eliminate Algorithm are, 

  • There must be a limit to the hypothesis space.
  • Enumeration of all hypotheses is a time-consuming process.

 

Reference

Finding a Maximally Specific Hypothesis: The List-Then-Eliminate Algorithm | Version Space.