LASSO stands for Least Absolute Shrinkage and Selection Operator. Lasso Regression is almost identical to Ridge Regression, the only difference is the absolute value as opposed to the squaring the weights when computing the ridge regression penalty.
Lasso regression performs L1 regularization. Lasso is a regression analysis method which performs both variable selection and regularization in order to improve the prediction accuracy. It prevents under fitting of the Data, which is a problem in the analysis of the data.
Thus, lasso regression optimizes the following:
Lasso regression = RSS + α * (sum of absolute value of coefficients)
Here, α works similar to that of ridge and provides a trade-off between balancing RSS and magnitude of coefficients.
α = 0, Same coefficients as simple linear regression.
α = ∞, All coefficients zero.
0 < α < ∞, coefficients between 0 and that of simple linear regression.
Why Lasso Regression?
When we have less or insufficient data, the model suffers from underfitting. Underfitting reduces the accuracy of our machine learning model. Its occurrence simply means that our model does not fit the data well enough.
Did you ever tried to fit in over-sized clothes?
A normal Person trying to fit in an extra-large dress refers to the underfitting problem. The same problem occurs in the dataset if you increase number of features to decrease cost function.
Underfit happens in linear models when dealing with less data. If we cannot get rid of this problem, it effects the model performance. Here, Lasso regression comes into the picture. It reduces the underfitting problem in a dataset by using some metrics.
L1 regularization adds penalty equivalent to absolute value of the magnitude of coefficients
Minimization objective = LS Obj + α * (sum of absolute value of coefficients)
Here ‘LS Obj’ refers to ‘least squares objective’, i.e. the linear regression objective without regularization.
Metric used for Lasso regression is
Lasso is an equation where summation of modulus of coefficients is less than or equal to s. Here, s is a constant exists for every value of shrinkage factor λ. These equations are also referred as constraint functions.
For lasso, the equation becomes, |β1|+|β2|≤ s. This implies that lasso coefficients have the smallest RSS (loss function) for all points that lie within the diamond given by |β1|+|β2|≤ s.