/    /  Machine Learning- Logistic Regression

Logistic Regression

 

Logistic regression is one of the classification algorithms which is used to assign values or observations to a discrete set of classes. Logistic regression transforms its output value by using the logistic sigmoid function to return a probability value which will map two or more discrete classes.

 

Logistic regression is used to find the probability of event=Success and Failure. Logistic regression is employed when the variable is binary in nature. The value of Y varies from 0 to 1.

Logistic regression need not require any linear relationship between dependent and independent variables. It can handle various sorts of relationships because it applies a non-linear log transformation to the anticipated odds ratio.

 

Logistic Regression is employed or used when the dependent variable(target) is categorical.

For example,

  • True or False
  • Yes or No
  • Spam or Not Spam
  • Pass or Fail

 

 

 

The logistic function which is also termed as the sigmoid function was developed by statisticians. It is an S-shaped curve which can accept any real-valued number or integer and transform it into a value between 0 and 1, but never exactly at those limits.

 

It takes real value as an input and gives the output which lies in between 0 and 1. It is not linear in nature, which is continuously differentiable and has fixed output range of values. It gives output as a continuous form structure unlike the Step function. It has smooth gradient. At the ends of the sigmoid function, Y values varies less with change in X values, due to which vanishing gradient arises which will result in decrease in ability of Network to learn further or too slow to make correct Predictions. Sigmoid Function is not zero centered function or structure.

Logistic Regression 2 (i2tutorials)

 

Types of Logistic Regression

 

1. Binary Logistic Regression

In this type, categorical response has only two 2 possible outcomes. Example: Spam or Not, Pass or Fail.

 

Logistic Regression 3 (i2tutorials)

 

2. Multinomial Logistic Regression

Response may be of three or more categories without any particular order or sequence. Example: Predicting which Holiday spot is preferred more (Paris, Maldives, New York).

 

Logistic Regression 4 (i2tutorials)

 

3. Ordinal Logistic Regression

Three or more categories with specific order or sequence. Example: Movie rating from 1 to 5.

Logistic Regression 5 (i2tutorials)

 

Decision Boundary

To predict which class a data belongs to, a threshold value can be set to it. Based upon this threshold value, the obtained estimated probability is classified into different classes.

 

If predicted_value ≥ 0.5, then classify email as either spam or as not spam.

 

Decision boundary can be linear or non-linear. To get complex decision boundary, polynomial order can be increased.


Advantages of Logistic Regression

1. Logistic Regression gives better performance when the dataset is linearly separable.

2. Logistic regression is a smaller amount susceptible to over-fitting but it can overfit in high dimensional datasets. Consideration of Regularization (L1 and L2) techniques is needed to avoid over-fitting in these scenarios.

3. Logistic Regression not only measures the relevancy of a predictor (coefficient size), but also the direction of association whether it is positive or negative.

4. Logistic regression is simpler to implement, interpret and really efficient to coach.

Disadvantages of Logistic Regression

1. Main limitation of Logistic Regression is that the assumption of linearity between the variable and therefore the independent variables. In the world, the info is never linearly separable. Most of the time data would be a mixed-up mess.

2. If the amount of observations is lesser than that of features, Logistic Regression shouldn’t be used, otherwise it’s going to cause overfit.

3. Logistic Regression can only be wont to predict discrete functions. Therefore, the variable of Logistic Regression is restricted to the discrete number set. This restriction itself is problematic, because it is prohibitive to the prediction of continuous data.