Feature Scaling is a technique used to convert or transform all the values or magnitudes of specific features in a data in to a fixed or standard parameter. It should be performed before processing of the data or during pre-processing of the data to handle highly varying magnitudes or units.
Generally, the machine learning algorithm tends to weigh the values with higher magnitude as Greater and the values with lesser magnitude is considered as Lesser value without taking units into consideration.
To make it simple, if the feature distance has some values or data in it say 5 kilometers and 500 meters. We know that 5 kilometers is greater than 500 meters, but algorithm consider that 500 meters is greater than 5 kilometers without taking its units into consideration. This will create a problem and effects the performance of the model. To avoid this, we perform feature scaling which will transform or scale all the values into the standard unit or parameter.
Methods of Feature scaling
Feature Scaling can be done in different methods.
- Standard Scaler
- Min-Max Scaler
- Robust Scaler
Let’s study about them in more detail
Generally, Scale means changing the range of the values without disturbing the shape of the distribution. The range is often set from 0 to 1.
Standard Scaler converts the values of a feature to a standard unit by subtracting the mean and then scaling to unit variance which means dividing all the values by the standard deviation.
Standard Scaler produces a distribution with a standard deviation equal to 1. The variance is equal to 1 as variance is the square of standard deviation.
When Standard Scaler is used, mean of Distribution is taken as 0. Almost 68% of the values will lie between -1 and 1.
In Min-Max Scaler, minimum value in the feature data is subtracted from every value and then divides by the range of the particular feature. The range of the feature is given by difference between the original maximum and original minimum.
Min-Max Scaler maintains the shape of the distribution without disturbing the original distribution shape. Also, Min-Max Scaler does not reduce the importance of outliers.
The default range of values generated by Min-Max Scaler is 0 to 1.
The relative space or distance between each value in the feature have been maintained.
In Robust Scaler, Median of the values of the feature vector is subtracted from each value in the respective Feature vector and then dividing by the interquartile range. Interquartile range is given by
75 % value- 25 % value.
Robust Scaler does not scale the data into a predetermined interval like Min-Max Scaler (0 to 1). It will not meet the strict definition of scale as mentioned earlier.
Range for each feature after Robust Scaler is larger than that of Min-Max Scaler. Robust Scaler is used if we want to reduce the effects of Outliers as we are scaling according to median of the feature vector.
In Normalizer, L2 Normalization is applied to each observation. It only works on the rows not the columns to have a unit norm. Unit norm with L2 Normalization means that each element in the feature vector is squared and summed, the total should be equal to 1. Meanwhile, L1 Normalization can also be instead of L2 Normalization. The range of feature vector after Normalizer is between -1 to 1.
Comparison of All Feature Scaling methods
- Generally, Min-Max Scaler is used as the default if you are transforming a feature.
- Robust Scaler is used if your data have outliers and want to reduce their influence on entire data.
- Standard Scaler is used if you need a relatively normal distribution.
- Use Normalizer carefully, it normalizes sample rows, not feature columns. It can use either L2 or L1 normalization.