/    /  Machine Learning- Finding a Maximally Specific Hypothesis: Find-S

Finding a Maximally Specific Hypothesis: Find-S

 

The find-S algorithm is a machine learning concept learning algorithm. The find-S technique identifies the hypothesis that best matches all of the positive cases. 

 

In this blog, we’ll discuss the algorithm and some examples of Find-S: an algorithm to find a maximally specific hypothesis. 

 

To understand it from scratch let’s have a look at all the terminologies involved, 

 

Hypothesis: 

It is usually represented with an ‘h’. In supervised machine learning, a hypothesis is a function that best characterizes the target. 

 

For example, Consider a coordinate plane showing the output as positive or negative for a given task. 

The Hypothesis Space is made up of all of the legal ways in which we might partition the coordinate plane to anticipate the outcome of the test data.

 

Each conceivable path, represented with a gray line is referred to as a hypothesis.

 

Specific Hypothesis: 

If a hypothesis, h, covers none of the negative cases and there is no other hypothesis, h′, that covers none of the negative examples, then h is strictly more general than h′, then h is said to be the most specific hypothesis. 

 

The specific hypothesis fills in important details about the variables given in the hypothesis.

 

Find-S: 

The find-S algorithm is a machine learning concept learning algorithm. The find-S technique identifies the hypothesis that best matches all of the positive cases. The find-S algorithm considers only positive cases. 

 

When the find-S method fails to categorize observed positive training data, it starts with the most particular hypothesis and generalizes it. 

 

Representations:

  • The most specific hypothesis is represented using ϕ.
  • The most general hypothesis is represented using ?.

 

? basically means that any value is accepted for the attribute. 

 

Whereas, ϕ means no value is accepted for the attribute. 

 

Let’s have a look at the algorithm of Find-S: 

1. Initialize the value of the hypothesis for all attributes with the most specific one. That is, 

                            h0 = < ϕ, ϕ, ϕ, ϕ…….. > 

2. Take the next example, if the taken example is negative leave them and move on to another example without changing our hypothesis for the step. 

3. Now, if the taken example is a positive example, then 

 

For each attribute, check if the value of the attribute is equal to that of the value we took in our hypothesis. 

 

If the value is equal then we’ll use the same value for the attribute in our hypothesis and move to another attribute. If the value of the attribute is not equal to that of the value in our specific hypothesis then change the value of our attribute in a specific hypothesis to the most general hypothesis (?). 

 

After we’ve completed all of the training examples, we’ll have a final hypothesis that we can use to categorize the new ones.

 

Let’s have a look at an example to see how Find-S works. 

 

Consider the following data set, which contains information about the best day for a person to enjoy their preferred sport.

 

SkyAir tempHumidityWindWaterForecastEnjoySport
SunnyWarmNormalStrongWarmSameYes
SunnyWarmHighStrongWarmSameYes
RainyColdHighStrongWarmChangeNo
SunnyWarmHighStrongCoolChangeYes

 

Now initializing the value of the hypothesis for all attributes with the most specific one.

h0 = < ϕ, ϕ, ϕ, ϕ, ϕ, ϕ> 

 

Consider example 1, The attribute values are < Sunny, Warm, Normal, Strong, Warm, Same>. Since its target class(EnjoySport) value is yes, it is considered as a positive example. 

 

Now, We can see that our first hypothesis is more specific, and we must generalize it in this case. As a result, the hypothesis is:

h1 = < Sunny, Warm, Normal, Strong, Warm, Same>

 

The second training example (also positive in this case) compels the algorithm to generalize h further, this time by replacing any attribute value in h that is not met by the new example with a “?”.

 

The attribute values are < Sunny, Warm, High, Strong, Warm, Same> 

 

The refined hypothesis now is, 

h2 = < Sunny, Warm, ?, Strong, Warm, Same > 

 

Consider example 3, The attribute values are < Rainy, Cold, High, Strong, Warm, Change>. But since the target class value is No, it is considered as a negative example. 

h3 = < Sunny, Warm, ?, Strong, Warm, Same > (Same as that of h2)

 

Every negative example is simply ignored by the FIND-S algorithm. As a result, no changes to h will be necessary in reaction to any unfavorable case.

 

The fourth (positive) case leads to a further generalization of h in our Find-S trace.

 

Consider example 4, It has the following information <Sunny, Warm, High, Strong, Cool, Change> which again is a positive example. 

 

Every attribute is compared to the initial data, and if there is a discrepancy, the attribute is replaced with a general case (“? “). After completing the procedure, the following hypothesis emerges:

h4 = < Sunny, Warm, ?, Strong, ?, ? >

 

Therefore the final hypothesis is h = < Sunny, Warm, ?, Strong, ?, ? >.

 

The hypothesis is only expanded as far as is necessary to encompass the new positive case at each phase. As a result, the hypothesis at each step is the most particular hypothesis consistent with the training instances seen thus far (hence the name FIND-S).

 

FIND-S will always return the most specific hypothesis inside H that matches the positive training instances.

 

 

Reference

Finding a Maximally Specific Hypothesis: Find-S