/    /  Machine Learning- Instance-based Learning: k-Nearest Neighbor Algorithm – 1

Instance-based Learning: k-Nearest Neighbor Algorithm – 1

 

The k-NEAREST NEIGHBOR algorithm is the simplest basic instance-based technique. All instances in this algorithm are assumed to correspond to points in n-dimensional space.

 

The assumption K-NN uses is that the new cases are similar to the previous already trained and classified cases, and hence classifies the new data in the category it is most similar to. 

 

The K-NN method stores all available data and separates the new data point based on its similarity to existing data. This means that new data can be quickly processed into a well-defined phase using the K-NN method.

 

Although the K-NN method can be used for both regression and classification, it is most typically employed for classification.

 

The K-NN algorithm is a non-parametric algorithm, meaning it doesn’t make any assumptions about the data.

 

K-NN is known as a lazy learner algorithm because it doesn’t learn from the training set right away, but it saves the dataset and classifies it when it’s time.

 

During the training phase, the KNN algorithm automatically stores the data, and when it receives new data, it puts it in the exact same category as the new data.

 

Example: Let’s say we have a picture of a creature that looks like a cat or a dog, but we don’t know whether it’s a cat or a dog. We can utilize the KNN method for this identification because it is based on a similarity measure. 

 

Our KNN model will compare the new data set to the cats and dogs photographs and label it as a cat or a dog based on the most similar qualities.

 

The standard Euclidean distance is used to define an instance’s closest neighbors.

 

Allow the feature vector to describe any arbitrary instance x.

 

where ar (x) signifies the value of instance x’s rth property. Then d(xi, xj) is defined as the distance between two instances xi and xj.

The target function in nearest-neighbor learning might be discrete or real-valued.

 

Consider learning discrete-valued target functions of the form , where V denotes the finite collection {vl,… v}.

 

This algorithm’s estimate of f (xq,) is simply the most common value of f among the k training samples closest to x. When k is set to 1, the 1-NEAREST method assigns the value f (xi) to f(x,), where xi is the closest training instance to x. The approach assigns the most common value among the k closest training instances for greater values of k.

 

Training Algorithm:

  • Add the example to the list of training examples for each training example (x, f(x)).

Classification algorithm:

  • Given a classification query instance xq,
    • Let’s x1, ….. xk say examples of k from training_examples near xq.
    • Return, 

 

KNN and Voronoi diagram: 

 

The examples are points in a two-dimensional space with a Boolean value for the target function. “+” and “-” represent good and negative training instances, respectively. 

 

In this image, the 1-NEAREST learning algorithm identifies x as a positive example, while the 5-NEAREST learning method classifies it as a negative example.

 

The k- 1-Nearest learning technique never generates a general hypothesis about the target function f. It just computes each new query instance’s classification as needed.

 

The geometry of this decision surface induced by 1-Nearest Neighbor throughout the full instance space is depicted in the diagram on the right side of Figure. 

 

Each of the training examples is surrounded by a convex polyhedron on the decision surface. The polyhedron represents the collection of query points whose categorization will be totally controlled by that training example for each training example. 

 

Outside the polyhedron, query locations are closer to another training example. The Voronoi diagram of the collection of training examples is a common name for this type of diagram.

 

Approximating continuous-valued target functions is simple with the KNN algorithm. We do this by having the algorithm calculate the mean value of the k closest training samples rather than the most common value. To be more specific, to approximate a target function with an actual value.

KNN Algorithm Benefits: 

  • It is straightforward to implement.
  • It can withstand noisy training data.
  • If the training data is large, it may be more effective.

 

KNN Algorithm Disadvantages:

  • The value of K must constantly be determined, which might be challenging at times.
  • Because the distance between data points for all training samples must be calculated, the calculation costs are high.

 

The amount of the training dataset increases the computational cost of KNN. KNN may be made stochastic for very large training sets by choosing a sample from the training dataset from which to determine the K-most similar occurrences.

 

KNN has been examined extensively and has been around for a long time. Resulting, many instructions use various terms to explain it, such as:

 

Instance-Based Learning: Predictions are made using only the raw training instances. As a result, KNN is frequently referred to as case-based learning or instance-based learning (where each training instance is a case from the problem domain).

 

Lazy Learning: The model does not need to be learned, and all of the work is done when a prediction is needed. As a result, KNN is known as a sluggish learning algorithm.

 

KNN is non-parametric, which means it makes no assumptions about the problem’s functional form.

 

Reference

Instance-based Learning: k-Nearest Neighbor Algorithm – 1