Principal Component Analysis
Principal Component Analysis is usually termed as PCA. This technique is used in unsupervised learning technique as it does not consider about features but only concentrates on variation of data in order to reduce the dimensions. In the real time, the data is so huge which needs to be reduced in order to avoid over fitting and some other problems during the predictions of the model. As the dimensions of data increases, we will face difficulty in visualization and also performing calculations on it also increases. Hence, we have to decrease the dimensions of the data by using Dimensionality Reduction techniques.
Principal component analysis is one of the best Dimensionality reduction techniques used to reduce the dimensions. The main idea behind the principal component analysis (PCA) is to reduce the dimensionality of a data set. The variables of the dataset may be correlated with each other either heavily or lightly, while retaining the difference or variation present in the dataset up to the maximum extent.
We can perform the same procedure by transforming the variables to a new set of variables which are termed as principal components. These principal components are orthogonal and ordered such that retention of variation present in the original variables reduces as we move down in the order.
In this manner, the first principal component holds maximum variation that was present in the original components of the data. The principal components are the eigenvectors of a covariance matrix, and hence principal components are orthogonal in nature.
PCA discovers a new set of dimensions such that all the dimensions are orthogonal in nature which means they are linearly dependent and ranked as per the variance of data along them. It means the data which has more variance or more spread of data in axis occurs first.
Working of the Principal component analysis
Compute the covariance matrix X of data points.
Compute the eigen vectors and their corresponding eigen values.
Place the eigen vectors according to their eigen values in decreasing order.
Select first k number of eigen vectors and that will become the new k dimensions.
Transform the original n dimensional data points into k dimensional data.
These are the steps followed in the Principal component analysis for Dimensionality reduction.
Theoretically, a principal component can be defined as a linear combination of optimally weighted observed variables. The output of Principal component analysis are principal components, number of principal components are less than or equal to the number of original variables.
Properties of Principal components
- The principal components are must be the linear combinations of the original variables, the weights vector in this combination is the eigenvector found which will satisfies the principle of least squares.
- Principal components are orthogonal in nature which means linearly dependent.
- The variation or spread of the data in the principal components decreases as we move from the first principal component to the last.
Advantages of Principal Component Analysis
- It removes Features with correlation.
- It improves performance of the algorithm.
- It also reduces overfitting.
- Visualization of data is improved in this method.
Disadvantages of Principal component Analysis
- In this method, independent variables become less interpretable.
- Before beginning PCA, we must perform Data standardization.
- There will be information loss by using this method.