Machine Learning- Simple Linear Regression

Home / Machine Learning – Tutorial / Machine Learning- Simple Linear Regression

Simple Linear Regression

Simple linear regression is a well-known statistical method for obtaining a formula to predict values of one variable from another variable when there is a causal relationship between the two variables.

Linear regression models are used to predict the relationship between two variables or factors or else to showcase their relationship.

Simple linear regression is the formula for a straight line which can be most commonly represented as

y = mx + c (or) y = a +bx

Generally, everyone prefers to use the Simple Linear regression form by involving betas:

y x = b 0 + b1

y is the dependent variable or Response.
x is independent variable or Predictor.

Simple linear regression 1 (i2tutorials)

Simple linear regression 2 (i2tutorials)

Independent and Dependent variables:

In the concept of Statistical learning, there are two types of variables:

Independent variables: Data which can be controlled directly. It does not depend on other variables.
Dependent variables: Data which cannot be controlled directly. It depends on another variable. These variables need to be predicted or estimated.

The simple linear regression model is represented by the following equation:

y = β₀ +β₁x+ε

Simple linear regression 3 (i2tutorials)

The linear regression model consists of an error term which is represented by ε. This error term is used to explain the variability in y that cannot be explained by the linear relationship between x and y. If (ε) error term is not present, then that would mean that knowing x will provide required information to determine the value of y.

The simple linear regression equation is represented as a straight line, where:

β₀ is y-intercept of the regression line.
β₁ is slope of the straight line.
Ε(y) is the expected value of y for a given value of x.

Precisely, B0 is called as the intercept because it can determine line intercepts on the y-axis. It is also called as Bias in the Machine Learning concept, because it can be added to offset of all predictions which we make. The term B1 is called as the slope because it defines the slope of the Regression line. We have to find the best estimates for the coefficients to minimize the errors in predicting y variables from x variables.

Simple Linear Regression is referred as useful because instead of having to search for values by trial and error method or by calculating them analytically using advanced linear algebra, we can estimate the futuristic values directly from our data by using Simple Linear Regression.

We can estimate the value for B1 by using the following metric:

B1 = sum((xi-mean(x)) * (yi -mean(y))) / sum ((xi – mean (x)) ^2)

Where mean () is the average value of the feature variable in the dataset. xi and yi refer to the element that we have to repeat these calculations across all values in our dataset and i value refers to the i’th value of x or y variables.

We can compute B0 by using B1 and some statistical methods from our dataset by following metric:

B0 = mean(y) – B1 * mean(x)

Simple linear regression 4 (i2tutorials)

Estimating Error

We can compute an error for the predictions. This error is called as the Root Mean Squared Error or RMSE.

RMSE = sqrt (sum ((pi – yi) ^2) /n)

Where sqrt () is the function of square root, predicted value is represented by and y is the actual value, i is the index for a specific instance, n is the number of predictions these are required because we have to calculate the error across all predicted values.

A regression line can exhibit three types of relationships, a positive linear relationship, a negative linear relationship, or no relationship.

No relationship: The graphed line in a simple linear regression is flat that is slope is equal to 0. This type of line represents that there is no relationship between the two variables.
Positive relationship: The regression line slope is upward with the lower end of the line at the y-intercept of the graph and the upper end of the line extending in upward direction into the graph field, away from the x-intercept. This type of line represents that there is a positive linear relationship between the two variables which means, as the value of one variable increases, the value of the other variable also increases.

Negative relationship: The regression line slope is downward with the upper end of the line at the y-intercept of the graph and the lower end of the line extending in downward direction into the graph field, toward the x-intercept. This type of line represents that there is a negative linear relationship between the two variables which means, as the value of one variable increases, the value of the other variable decreases.

Simple linear regression 5 (i2tutorials)

Limitations of Simple Linear Regression

A relationship between two variables does not mean that one variable causes the other variable to happen.

Even a line in a simple linear regression that fits the data points perfectly, may not guarantee a cause-and-effect relationship which is explained as above.

This Regression helps us to know whether there exists a relationship between two variables or not. But it cannot explain the true relation between the variables in the entire dataset.