Machine Learning- Ridge Regression

Home / Machine Learning – Tutorial / Machine Learning- Ridge Regression

RIDGE REGRESSION

The Ridge regression is a specialized technique used to analyze multiple regression data which is multicollinearity in nature. The term multicollinearity refers to collinearity between the predictor variables in multiple regression models. Multicollinearity occurs when there are high correlations between more than two predictor variables.

Ridge Regression is also called as L2 Regularization. It prevents over fitting of the Data, which is a major problem in the analysis of the data.

Why Ridge Regression?

Ridge regression is an extension to the Linear Regression. The basic idea of linear regression model revolves around minimizing the cost function’s value. Lower the cost function value, better the regression model. Cost function (Loss function) is the function to find maximum or minimum of a specific function.

By increasing number of Features, we can decrease the cost function. But if we keep on increasing the features in model, model starts fitting the training data set well as well as it leads to overfitting of the data which effects the model performance. To overcome this problem, we go for Ridge Regression.

It Performs L2 regularization, i.e. adds penalty equivalent to square of the magnitude of coefficients
Minimization objective = LS Obj + α * (sum of square of coefficients)

Did you ever tried to fit in under-sized clothes?

A normal Person trying to fit in an extra-small dress refers to the overfitting problem. The same problem occurs in the dataset if you increase number of features to decrease cost function.

Overfit happens in linear models when dealing with multiple features. If we cannot get rid of this problem, some features can be more destructive than helpful, Information repeated by other features will add high noise to the dataset. Here, Ridge regression comes into the picture. It reduces the Overfitting problem in a dataset by using some metrics.

To fix the problem of overfitting, we need to balance two things:
1. How well function/model fits data.
2. Magnitude of coefficients.

Metrics used for Ridge Regression is

where the Residual Sum of Squares is modified by adding the shrinkage. The coefficients are now estimated by minimizing this function. Here, λ is the tuning parameter which decides penalization the flexibility of our model. The increase in flexibility of a model is represented by increase in its coefficients, to minimize the above function, then these coefficients need to be small. This is how the Ridge regression technique controls the coefficients not to become too high. β0 intercept is a measure of the mean value of the response when xi1 = xi2 = …= xip = 0 which cannot be shrinked.

When λ = 0, the penalty term would become Zero and shows no effect, and the estimates produced by ridge regression will be equal to least squares. However, as λ→∞, the effect of the shrinkage penalty grows, and the ridge regression coeﬃcient estimates will approach zero. Selection of value of λ is critical. The coefficient estimates produced by this method are also known as the L2 norm.

For performing ridge regression, the formula used to do this is given below.