XG Boost

XG Boost 1 (i2tutorials)

XG Boost is abbreviated as extreme gradient boosting. It is an implementation of gradient boosted trees which are designed for improving speed and performance. Therefore, XG Boost is a decision tree-based ensemble machine learning algorithm which uses gradient boosting framework.

When we have to deal with small to medium structured data or tabular data, we can opt for decision tree-based algorithms as they are considered best for this type of data.

XG Boost 2 (i2tutorials)

The above chart shows us the evolution of XG Boost. Decision trees are the basic tree-based algorithms. Bagging is an ensemble technique applied to decision trees. Random forest is also ensemble technique which is done by averaging the results and selecting the best among them. Boosting is also somewhat same as Random Forest which adds weights for most prioritized elements and choosing the best alternatives among them as result. Gradient Boosting applies Gradient Descent algorithm to minimize errors in the sequential models. Finally, XG Boost is an optimized Gradient Boosting algorithm. It processes the data, handles the missing values and regularizes to avoid overfitting or bias in the data, tree pruning is also done as a part of optimization.

XG Boost and Gradient Boosting Machines in short GBM are both ensemble tree methods which apply the principle of boosting weak learners using the gradient descent architecture. However, XG Boost improves on the basis of GBM framework through systems optimization and algorithmic enhancements or developments.

XG Boost 3 (i2tutorials)

System Optimization:

Parallelization:

XG Boost considers the process of sequential tree building using parallelized application. This is possible due to the exchangeable or swappable nature of loops used for building base learners. The outer loop that counts or enumerates the leaf nodes of a tree, and the second inner loop that computes the features. This nesting of loops restricts parallelization because without completing the inner loop, the outer loop cannot be started. Hence, to improve run time, the order of loops is interchanged using initialization through a global scan of all instances and sorted using parallel threads. This improves algorithmic performance by offsetting any parallelization overheads in computation.

Tree Pruning:

The blocking criterion for tree splitting within Gradient Boosting Machine (GBM) framework is greedy in nature and depends on the negative loss criterion at the point of split. XG Boost uses ‘max_depth’ parameter as specified and starts pruning trees in backward direction. This ‘depth-first’ approach improves computational performance considerably.

Hardware Optimization:

This algorithm has been created to make efficient use of hardware resources. This is achieved by cache awareness by allocating internal buffers in each thread to store gradient statistics. Further improvements such as ‘out-of-core’ calculating optimize available disk space while handling big data-frames that do not fit into memory.

XG Boost 4 (i2tutorials)

Key parameters used in XG Boost are given below

XG Boost 5 (i2tutorials)

Metrics used in XG Boost algorithm are discussed as follows

XG Boost 6 (i2tutorials)