Adaptive Boosting is an ensemble technique of Boosting methods. In short Adaptive Boosting is called as AdaBoost. Ada-boost classifier converts weak classifier algorithm to strong classifier. It combines multiple weak classifiers and converts them into a strong classifier by adding weights to them. We know that, a single algorithm may not classify the objects correctly. But if we combine multiple weak classifiers with selection of training set at every iteration and assigning correct amount of weight in final voting, we will attain good accuracy score for the overall classifier.
To make it simple, AdaBoost retrains the algorithm iteratively by choosing the training set based on accuracy of previous training.
The weight-age of each trained classifier at any iteration depends on the accuracy achieved.
The final equation for classification can be represented as
Where, fm stands for the m number of weak classifiers
θm is its corresponding weight
It is the weighted combination of M weak classifiers.
The well-matched and most common algorithm used with AdaBoost are decision trees with one level. Because these trees are so short and consists of only one decision for classification, they are often called as decision stumps.
Each occurrence in the training dataset is weighted. The initial weight is set to:
weight(xi) = 1/n
Where xi is the i’th training occurrence or instance and n is the number of training instances.
How to Train One Model
A weak classifier also called as decision stump is prepared on the training data using the weighted samples. Only binary classification problems are supported; therefore, each decision stump makes one decision on one input variable and outputs a positive or negative 1 value for the first- or second-class value.
The misclassification or error rate is calculated for the trained model. Generally, error is calculated as:
error = (correct – N) / N
Where error is the misclassification rate, correct represents the number of training instances correctly predicted by the model and N is the total number of training instances.
This is improved to use the weighting of the training instances:
error = sum(w(i) * terror(i)) / sum(w)
where w is the weight for training instance i and terror is the prediction error for training instance i when it is misclassified it is 1 and 0 if correctly classified.
A stage value is computed for the trained model which provides a weighting for any predictions which the model makes. It can be calculated by using the below metric.
stage = ln((1-error) / error)
Where stage is the value of stage used to weight predictions from the model, ln () is the natural logarithm and error is the misclassification error for the model. The effect of the stage weight is that more weight or contribution gives more accurate models to the final prediction.
The training weights are updated which gives more weight to wrongly predicted instances, and less weight to correctly predicted instances.
For example, the weight of one training instance (w) is updated using
w = w * exp (stage * terror)
Where w is the weight for a specific training instance, exp () is the numerical constant e, stage is the error rate for the weak classifier and terror is the error which the weak classifier made predicting the output variable for the training instance, calculated as
terror = 0 if (y == p), otherwise 1
Where y is the output variable for the training instance and p is the prediction from the weak learner.
This has the effect of not changing the weight if the training instance was classified correctly and making the weight slightly larger if the weak learner incorrectly classified the instance.