Machine Learning- Issues in Decision Tree Learning and How To solve them

Issues in Decision Tree Learning and How To solve them

A decision tree as we’ve already discussed is a method for approximating discrete-valued target attributes, under the category of supervised learning. They can be used to address problems involving regression and classification.

In this blog, we’ll have a look at the Issues in Decision Tree learning and how we can solve them.

Practical issues in learning decision trees:

Determining how deep to grow the decision tree, handling continuous attributes, choosing an appropriate attribute selection measure, and handling training data with missing attribute values, handling attributes with different costs, and improving computational efficiency are all practical issues in learning decision trees.

Let’s have a look at each one of them briefly,

Overfitting the Data:

A model is regarded to be a good machine learning model if it generalizes any new input data from the issue domain in an appropriate manner while we are creating it.

Each branch of the tree is grown just deep enough by the algorithm to properly categorize the training instances.

In reality, when there is noise in the data or when the number of training instances is insufficient to provide a representative sample of the underlying target function, it might cause problems.

This basic technique may yield trees that overfit the training samples in either instance.

The formal definition of overfitting is, “Given a hypothesis space H, a hypothesis h ∈ H is said to overfit the training data if another hypothesis h’ ∈ H exists, with h having less error than h’ over the training examples but h’ having smaller error over the full distribution of cases.”

As the tree is built, the horizontal axis of this graphic shows the total number of nodes in the decision tree. The accuracy of the tree’s predictions is indicated by the vertical axis.

The solid line depicts the decision tree’s accuracy over the training instances, whereas the broken line depicts accuracy over a separate set of test cases not included in the training set.

The tree’s accuracy over the training instances grows in a linear fashion as it matures. The accuracy assessed over the independent test cases, on the other hand, increases at first, then falls.

As can be observed, once the tree size reaches about 25 nodes, additional elaboration reduces the tree’s accuracy on the test cases while boosting it on the training examples.

What is Underfitting:

When a machine learning system fails to capture the underlying trend of the data, it is considered to be underfitting. Our machine learning model’s accuracy is ruined by underfitting.

Its recurrence merely indicates that our model or method does not adequately fit the data. Underfitting may be prevented by collecting additional data and employing feature selection to reduce the number of characteristics.

Both of the errors usually occur when the training example contains errors or noise.

What is Noise?

Real-world data contains noise, which is unnecessary or nonsensical data that may dramatically impair various data analyses. Classification, grouping, and association analysis are examples of machine learning tasks.

Even when the training data is noise-free, overfitting can occur, especially when tiny numbers of samples are connected with leaf nodes.

In this scenario, coincidence regularities are feasible, in which some property, despite being unrelated to the actual goal function, occurs to divide the cases quite effectively.

There is a risk of overfitting whenever such accidental regularities emerge.

What can we do to avoid overfitting? Here are a few examples of frequent heuristics:

Don’t try to fit all of the examples in; instead, quit before the training set runs out.
After fitting all of the instances, prune the resulting tree.

In decision tree learning, there are numerous methods for preventing overfitting.

These may be divided into two categories:

Techniques that stop growing the tree before it reaches the point where it properly classifies the training data.
Then post-prune the tree, and ways that allow the tree to overfit the data and then post-prune the tree.
Despite the fact that the first strategy appears to be more straightforward, the second approach of post-pruning overfit trees has shown to be more effective in reality.

The criterion used to determine the correct final tree size:

To assess the usefulness of post-pruning nodes from the tree, use a separate set of examples from the training examples.
Use all available data for training, but do a statistical test to see if extending (or pruning) a specific node would result in a better result than the training set.
A chi-square test is performed to see if enlarging a node would increase performance throughout the full instance distribution or only on the current sample of training data.
When encoding the training samples and the decision tree, use an explicit measure of complexity, with the tree’s development halted when the encoding size is reduced. This method is based on the Minimum Description Length concept, which is a heuristic.

Reference:

Issues in Decision Tree Learning and How To solve them

#Continue to the next article for more