Dimensionality reduction is the process of decreasing the dimensions of the feature set. In machine learning, there are often many factors or variables through which final classification is done. These variables are called as features. If there are a greater number of features, it will be harder to visualize the training set. Also, if features are more, it may lead to correlation and hence redundant. Here is where Dimensionality reduction plays a key role by reducing the number of rando variables. It can also be divided into feature selection and feature extraction.
Curse of Dimensionality
Curse of Dimensionality means the problems arise when we work with high dimensions, which does not occur in low dimensions. If the number of features increase, number of samples also increases proportionally. More number of samples need all different combinations of feature values which are to be represented in our sample. Due to this, certain algorithms struggle to train effective models. This is known as Curse of Dimensionality.
The model becomes more complex and lead to overfitting, if there is an increase in number of features. This results in poor performance on the data. Therefore, to avoid overfitting we employ dimensionality reduction.
Components of Dimensionality Reduction
Dimensionality reduction has two main components. They are
- Feature Selection
- Feature Extraction
Let us study about them in more detail.
Feature Selection means discovering the subsets of the original set of features to get smaller subsets that are used to model the problem. It is for filtering out irrelevant features from the dataset. This method involves three procedures.
Feature Extraction reduces the data in a high dimensional space to the low dimensional space by reducing number of dimensions. It is employed for creating a new, smaller set of features which can capture most of the useful information.
The main difference between feature selection and feature extraction is that feature extraction creates new ones whereas the feature selection keeps a subset of original features.
Methods of Dimensionality Reduction
There are different methods to adopt for Dimensionality Reduction. They are
- Principal Component Analysis (PCA)
- Linear Discriminant Analysis (LDA)
- Generalized Discriminant Analysis (GDA)
Advantages of Dimensionality Reduction
- It helps in compressing the data and reduces storage space.
- It removes noise which will results in improvement of performance of models.
- Consumption of time for computations will be less as there are less dimensions.