In Machine learning, model accuracy is the measurement used to determine which model is best at finding relationships and their patterns between different variables in a dataset based on the input data or training data. If the model generalizes well to the unseen data, it can produce better predictions and insights, which is most required for every dataset.
Here are some metrics used to measure the model accuracy of the machine learning model.
The notation we generally use for term accuracy is Classification Accuracy. It is defined as ratio of number of correct predictions to the total number of input samples.
This method works only well if there are equal number of samples belonging to each class. Which means, if there is a great difference in the number of samples of different classes, then model get high accuracy by simply predicting every training sample containing a greater number of samples. When the same model is tested on classes which has very less or no difference, then the classification accuracy may be less.
The main problem arises when the cost of misclassification of the class with less samples are very high. If we deal with rare disease, the cost of failure of diagnosing the disease of sick person is much higher than the cost of sending a healthy person to more tests.
Logarithmic loss can also be called as Log Loss. It works by penalizing the false classifications or misclassifications. It produces best results for multi class classification.
When we are working with Log loss, the classifier must assign probability to each class for all the samples. If there are N samples belonging to M classes, then metric used to calculate Log loss is given below.
yij indicates whether sample i belongs to class j or not
pij indicates the probability of sample i belonging to class j
Log Loss has no upper bound and it exists on the range [0, ∞). Log Loss nearer to 0 indicates higher accuracy, whereas if the Log Loss is away from 0 then it indicates lower accuracy.
In general, minimizing Log Loss gives greater accuracy for the classifier.
As the name suggests, Confusion matrix is a matrix which describes the complete performance of the model. There are four important terms in confusion matrix. They are
True Positives: The cases in which predicted output is YES and the actual output was YES.
True Negatives: The cases in which predicted output is NO and the actual output was NO.
False Positives: The cases in which predicted output is YES and the actual output was NO.
False Negatives: The cases in which predicted output is NO and the actual output was YES.
Model accuracy of the confusion matrix can be computed by taking average of main diagonal values.
Confusion matrix also forms basis for other metrics which are used in measuring model accuracy.
Area Under Curve
Area under curve (AUC) is the most widely used metric for model evaluation. It is mostly applied for binary classification problem. AUC of a classifier is equal to the probability that the classifier will rank a randomly selected positive example higher than negative example which are also selected randomly. For understanding AUC in more detail, we have to understand two basic terms. They are
- True positive rate (Sensitivity)
- False positive rate (Specificity)
True Positive Rate (Sensitivity):
True Positive Rate is defined as the cases whose predicted outcome is YES and the actual output is also YES. It corresponds to the ratio of positive data points which are correctly predicted as positive, with respect to all positive data points.
False Positive Rate (Specificity):
False Positive Rate is defined as the cases whose predicted outcome is positive where the actual output is negative. The cases which are predicted False Positive Rate corresponds to the proportion of negative data points that are considered as positive by mistake, with respect to all negative data points.
The range of values of False Positive Rate and True Positive Rate is [0, 1]. False Positive Rate and True Positive Rate are calculated at threshold values like (0.00, 0.02, 0.04, …., 1.00) and graph is drawn. AUC plot is the area under the curve of plot drawn between False Positive Rate vs True Positive Rate at different points in the range [0, 1].
The performance of the model is better if the value is greater.
F1 Score is a metric used to compute a test’s accuracy. It is the harmonic mean between precision and recall. The range of F1 score lies between [0, 1]. It gives us how precise our classifier is which means how many instances it classifies correctly as well as how robust it is.
High precision with lower recall, gives us an extremely accurate, but it misses a large number of instances which are difficult to classify. If the F1 Score is greater, the performance of our model is better. The metric used for F1 score is shown below.
F1 Score tries to determine the balance between precision and recall.
It is the ratio of number of correct positive results to the number of positive results predicted by the classifier.
It is the ratio of number of correct positive results to the number of all relevant samples which means all samples that should have been identified as positive.
Mean Absolute Error
Mean Absolute Error is the metric which can be defined as the average of the difference between the Original Values and the Predicted Values. It tells the measure of how far the predictions were from the actual output. However, it doesn’t give us any idea of the direction of the error. It can be calculated by using following metric.
Mean Squared Error
Mean Squared Error (MSE) is pretty similar to Mean Absolute Error, the only difference between mean absolute error and mean squared error is that, being that MSE takes the average of the square of the difference between the original values and the predicted values. The advantage of MSE is that it is easier to calculate the gradient, whereas Mean Absolute Error requires complicated linear programming tools to calculate the gradient. As, we calculate square of the error, the effect of larger errors become more highlighted then smaller error, hence the model can focus more on the larger errors.