Machine Learning- Hierarchial Clustering

Home / Machine Learning – Tutorial / Machine Learning- Hierarchial Clustering

Hierarchical clustering

Hierarchical clustering which is also called as Hierarchical clustering analysis is an algorithm which combines similar data points into a cluster. At last there is a set of clusters, where each cluster is different from each other, and the objects within each cluster are broadly similar to each other.

The reason we came up with Hierarchical clustering is the demerit provided by K-Means clustering. In K-Means clustering, we have to decide number of clusters initially before starting the algorithm. Basically, we face difficulty in selecting number of clusters. So, we can avoid this problem by using Hierarchical clustering analysis.

Hierarchical clustering is divided into two types. There are

Agglomerative
Divisive

Agglomerative Hierarchical clustering

In this method, at beginning each data point is considered as an individual cluster. At each iteration, the similar clusters group with other clusters until one cluster or K clusters are formed. The algorithm of Agglomerative is straight forward. Procedure used in agglomerative clustering is explained in the following steps.

Step- 1

In the beginning, we compute the proximity of individual points and consider all data points as individual clusters.

Step- 2

In second step, similar clusters are merged or grouped together and formed as a single cluster.

Step- 3

We again compute the proximity of new clusters and group the similar clusters to form new clusters.

Step- 4

Compute the proximity of the new clusters. The clusters that are similar are merged together to form a new cluster. Therefore, number of clusters are reduced.

Step- 5

Finally, all the clusters are merged together to form a single cluster.

Hierarchical clustering 1 (i2tutorials)

Divisive Hierarchical clustering

Divisive hierarchical clustering technique is exactly opposite of that of Agglomerative clustering technique. In this technique, we consider all the data points as a single cluster. In each iteration, we separate the data points from the cluster which are not similar. Each data point which is separated is treated as an individual cluster. Finally, we’ll be left with n clusters.

Hierarchical clustering 2 (i2tutorials)

Calculating the similarity between the two clusters

Computing the similarity between two clusters is important to merge (Agglomerative method) or divide (Divisive method) the clusters. There are some methods which are used to calculate the similarity between two clusters:

MIN
MAX
Group Average
Distance Between Centroids
Ward’s Method

MIN:

Min method is known as single linkage algorithm which can be defined as the similarity of two clusters C1 and C2 is equal to the minimum of the similarity between points Pi and Pj such that Pi belongs to C1 and Pj belongs to C2.

Mathematically this can be written as,

Sim (C1, C2) = Min Sim (Pi, Pj) such that Pi ∈ C1 & Pj ∈ C2

Hierarchical clustering 3 (i2tutorials)

Pros of MIN:

By using this method, we can separate non-elliptical shapes as long as the gap between two clusters is not small.

Cons of MIN:

MIN approach cannot separate clusters properly if there is noise between clusters.

Original data

Hierarchical clustering 4 (i2tutorials)

Clustered data using MIN method

Hierarchical clustering 5 (i2tutorials)

MAX:

This method is also known as the complete linkage algorithm, which is exactly opposite to the MIN approach. The similarity of two clusters C1 and C2 is equal to the maximum of the similarity between points Pi and Pj such that Pi belongs to C1 and Pj belongs to C2.

Mathematically this can be written as,

Sim (C1, C2) = Max Sim (Pi, Pj) such that Pi ∈ C1 & Pj ∈ C2

Hierarchical clustering 6 (i2tutorials)

Pros of MAX:

MAX method does good in separating clusters even if there is noise between clusters.

Cons of Max:

Max approach is biased towards circular clusters.

Max approach tends to break large clusters.

Original data

Clustered Data using MAX method

Hierarchical clustering 8 (i2tutorials)

Group Average:

Take all the pairs of points and calculate their similarities and compute the average of the similarities.

Mathematically this can be written as,

sim(C1,C2) = ∑ sim(Pi, Pj)/|C1|*|C2|

where, Pi ∈ C1 & Pj ∈ C2

Hierarchical clustering 9 (i2tutorials)

Pros of Group Average:

The group Average method does well in separating clusters if there is noise between clusters.

Cons of Group Average:

The group Average approach is biased towards circular clusters.

Distance between centroids: Compute the centroids of two clusters C1 & C2 and take the similarity between the two centroids as the similarity between two clusters.

Hierarchical clustering 10 (i2tutorials)

Ward’s Method:

This method of computing the similarity between two clusters is exactly the same as Group Average method except that Ward’s method calculates the sum of the square of the distances between Pi and Pj.

Mathematically this can be written as,

sim(C1,C2) = ∑ (dist (Pi, Pj))²/|C1|*|C2|

Pros of Ward’s method:

Ward’s method does good in separating clusters even if there is noise between clusters.

Cons of Ward’s method:

Ward’s method is also biased towards circular clusters.

Disadvantages of Hierarchical clustering Technique:

There is no mathematical objective for Hierarchical clustering.
All the methods to calculate the similarity between clusters has their own disadvantages.
This clustering algorithm cannot be used when we have huge data because of High space and Time complexity.