Site icon i2tutorials

Machine Learning- Issues in Decision Tree Learning and How To solve them

Issues in Decision Tree Learning and How To solve them

 

A decision tree as we’ve already discussed is a method for approximating discrete-valued target attributes, under the category of supervised learning. They can be used to address problems involving regression and classification.

 

In this blog, we’ll have a look at the Issues in Decision Tree learning and how we can solve them. 

 

Practical issues in learning decision trees:

Determining how deep to grow the decision tree, handling continuous attributes, choosing an appropriate attribute selection measure, and handling training data with missing attribute values, handling attributes with different costs, and improving computational efficiency are all practical issues in learning decision trees.

 

Let’s have a look at each one of them briefly,

 

Overfitting the Data:

A model is regarded to be a good machine learning model if it generalizes any new input data from the issue domain in an appropriate manner while we are creating it.

 

Each branch of the tree is grown just deep enough by the algorithm to properly categorize the training instances. 

 

In reality, when there is noise in the data or when the number of training instances is insufficient to provide a representative sample of the underlying target function, it might cause problems. 

 

This basic technique may yield trees that overfit the training samples in either instance.

 

The formal definition of overfitting is, “Given a hypothesis space H, a hypothesis h H is said to overfit the training data if another hypothesis h’ H exists, with h having less error than h’ over the training examples but h’ having smaller error over the full distribution of cases.”

 

As the tree is built, the horizontal axis of this graphic shows the total number of nodes in the decision tree. The accuracy of the tree’s predictions is indicated by the vertical axis. 

 

The solid line depicts the decision tree’s accuracy over the training instances, whereas the broken line depicts accuracy over a separate set of test cases not included in the training set.

 

The tree’s accuracy over the training instances grows in a linear fashion as it matures. The accuracy assessed over the independent test cases, on the other hand, increases at first, then falls. 

 

As can be observed, once the tree size reaches about 25 nodes, additional elaboration reduces the tree’s accuracy on the test cases while boosting it on the training examples.

 

What is Underfitting:

When a machine learning system fails to capture the underlying trend of the data, it is considered to be underfitting. Our machine learning model’s accuracy is ruined by underfitting. 

 

Its recurrence merely indicates that our model or method does not adequately fit the data. Underfitting may be prevented by collecting additional data and employing feature selection to reduce the number of characteristics.

 

Both of the errors usually occur when the training example contains errors or noise. 

 

What is Noise? 

Real-world data contains noise, which is unnecessary or nonsensical data that may dramatically impair various data analyses. Classification, grouping, and association analysis are examples of machine learning tasks.

 

Even when the training data is noise-free, overfitting can occur, especially when tiny numbers of samples are connected with leaf nodes. 

 

In this scenario, coincidence regularities are feasible, in which some property, despite being unrelated to the actual goal function, occurs to divide the cases quite effectively. 

 

There is a risk of overfitting whenever such accidental regularities emerge.

 

What can we do to avoid overfitting? Here are a few examples of frequent heuristics:

 

In decision tree learning, there are numerous methods for preventing overfitting. 

These may be divided into two categories: 

 

The criterion used to determine the correct final tree size:

 

Reference:

Issues in Decision Tree Learning and How To solve them

 

#Continue to the next article for more 

Exit mobile version