Machine Learning- Issues in Decision Tree Learning and How-To solve them - Part 2

Home / Machine Learning – Tutorial / Machine Learning- Issues in Decision Tree Learning and How-To solve them – Part 2

Issues in Decision Tree Learning and How-To solve them – Part 2

A decision tree as we’ve already discussed is a method for approximating discrete-valued target attributes, under the category of supervised learning. They can be used to address problems involving regression and classification.

In this blog, we’ll have a look at the Issues in Decision Tree learning and how we can solve them. This is part two of the topic. You can find the first part of the article here (Issues in DT and How-To-Solve them Part-1)

The given data is divided into two sets of examples: a training set for forming the learned hypothesis and a separate validation set for evaluating the correctness of this hypothesis over time and, in particular, for evaluating the impact of trimming this hypothesis.

One frequent approach is to save one-third of the available instances for validation and train with the remaining two-thirds.

REDUCED ERROR PRUNING

Reduced-error pruning is a method that considers each of the decision nodes in the tree as a candidate for pruning.

Pruning a decision node entails deleting the subtree rooted at that node, converting it to a leaf node, and assigning it the most common categorization of the training instances associated with it. Only nodes are deleted if the resultant trimmed tree outperforms the original on the validation set.

Iteratively prune nodes, always picking the node whose removal improves decision tree accuracy the greatest over the validation set. Nodes are pruned until it is no longer safe to do so.

The above figure depicts the impact of pruning on the tree. The accuracy of the tree is assessed over both training and test samples, as illustrated in the image accuracy versus tree size.

In many actual circumstances where data is restricted, an alternate way to pruning is presented in the next section. Many other methods have also been offered, such as splitting the available data several times in various ways and then averaging the results.

Rule Post-Pruning

One rule is produced for each leaf node in the tree during rule post pruning. Each attribute test along the route from the root to the leaf is a rule antecedent (precondition), and the categorization at the leaf node is the rule consequent (postcondition).

Infer the decision tree from the training data, expanding the tree until the training data is as well fitted as feasible while allowing for overfitting.
Create one rule for each path from the root node to the leaf node to convert the learned tree into an equivalent set of rules.
Remove any preconditions that improve the projected accuracy of each rule and prune (generalize) it.
Sort the trimmed rules according to their estimated accuracy, then use this order to categorize the following cases.

Before pruning, transform the decision tree into rules to get advantages. A few of them are:

The ability to discriminate between the many situations in which a decision node is utilized is enabled by converting to rules. Because each separate path through the decision tree node generates a different rule, the attribute test pruning choice can be made differently for each path.

When you convert to rules, you lose the distinction between attribute tests that happen near the tree’s root and those that happen near the leaves. As a result, it avoids complicated accounting difficulties like reorganizing the tree if the root node is pruned but a portion of the subtree below this test is retained.

The readability of a document is improved by converting it to rules. Rules are frequently simpler to comprehend.

Continuous-Valued Attributes Incorporation

Our original definition of ID3 is limited to characteristics with a finite number of possible values.

First, the discrete value of the target attribute whose value is anticipated by the learned tree is required.

Second, the properties examined in the tree’s decision nodes must have discrete values as well. This second constraint may be readily relaxed, allowing continuous-valued decision characteristics to be included in the trained tree.

Alternative Attribute Selection Measures

The information gain measure has an inherent bias that favors qualities with many values over those with few.

Consider the property Date, which has a huge range of potential values (e.g., March 4, 1979). This characteristic would have the biggest information gain of all of the attributes if it were added to the data.

This is due to the fact that across the training data, Date alone fully predicts the target characteristic. As a result, it would be chosen as the decision attribute for the tree’s root node, resulting in a (very broad) depth one tree that precisely classifies the training data.

Of course, despite the fact that it correctly separates the training data, this decision tree would perform badly in subsequent samples since it is not a reliable predictor.

Handling Cost-Differentiated Attributes

The instance properties may have costs associated with them in some learning tasks. For example, when learning to categorize medical conditions, we may use phrases like Temperature, BiopsyResult, Pulse, BloodTestResults, and so on to characterize patients.

These characteristics have a wide range of expenses, both monetary and in terms of patient comfort.

We would like decision trees that employ low-cost characteristics wherever possible in such assignments, depending on high-cost attributes only when absolutely necessary to provide solid classifications.

Reference:

Issues in Decision Tree Learning and How-To solve them – Part 2