Machine Learning- Dimensionality Reduction

Home / Machine Learning – Tutorial / Machine Learning- Dimensionality Reduction

Dimensionality Reduction

Dimensionality reduction is the process of decreasing the dimensions of the feature set. In machine learning, there are often many factors or variables through which final classification is done. These variables are called as features. If there are a greater number of features, it will be harder to visualize the training set. Also, if features are more, it may lead to correlation and hence redundant. Here is where Dimensionality reduction plays a key role by reducing the number of rando variables. It can also be divided into feature selection and feature extraction.

Dimensionality Reduction 1 (i2tutorials)

Curse of Dimensionality

Curse of Dimensionality means the problems arise when we work with high dimensions, which does not occur in low dimensions. If the number of features increase, number of samples also increases proportionally. More number of samples need all different combinations of feature values which are to be represented in our sample. Due to this, certain algorithms struggle to train effective models. This is known as Curse of Dimensionality.

The model becomes more complex and lead to overfitting, if there is an increase in number of features. This results in poor performance on the data. Therefore, to avoid overfitting we employ dimensionality reduction.

Components of Dimensionality Reduction

Dimensionality reduction has two main components. They are

Feature Selection
Feature Extraction

Let us study about them in more detail.

Feature Selection

Feature Selection means discovering the subsets of the original set of features to get smaller subsets that are used to model the problem. It is for filtering out irrelevant features from the dataset. This method involves three procedures.

Filter
Wrapped
Embedded

Feature Extraction

Feature Extraction reduces the data in a high dimensional space to the low dimensional space by reducing number of dimensions. It is employed for creating a new, smaller set of features which can capture most of the useful information.

The main difference between feature selection and feature extraction is that feature extraction creates new ones whereas the feature selection keeps a subset of original features.

Dimensionality Reduction 2 (i2tutorials)