Machine Learning- Decision Tree Learning

Decision Tree Learning

Decision tree learning is a method for approximating discrete-valued target attributes, under the category of supervised learning. They can be used to address problems involving regression and classification.

In this blog, we’ll have a look at an Introduction to Decision tree learning and its representation.

Using the decision tree, we may express any boolean function on discrete characteristics. To increase human readability, learned trees can also be represented as sets of if-then rules.

Representation:

The classification is yielded when the instances are classified using decision trees by sorting them along the tree from the root to a leaf node.

Every node in the decision tree represents a test of an instance’s property, and each branch descending from that node represents one of the attribute’s potential values.

Starting at the root node of the tree, test the attribute indicated by this node, and then go along the tree branch corresponding to the value of the attribute in the given example, an instance is classified.

Every subtree encountered is processed similarly to construct the tree.

Let’s have a look at an example, where the target attribute is EnjoySport that might have yes or no values on different Saturday mornings, is predicted here based on other morning qualities.

Here’s a decision tree for the concept of EnjoySport. An instance is classified by sorting it through the tree to its suitable leaf node, then returning the classification associated with this leaf node, it can be yes or no.

This decision tree categorizes Saturday mornings based on whether or not they are ideal for EnjoySport or not.

For example, the instance

< Outlook = Sunny, Temperature = Hot, Humidity = High, Wind = Strong>

Would be sorted down the leftmost branch of this decision tree and would therefore be classified as a negative instance (i.e., the tree predicts that EnjoySport = no)

The conjunction of attribute tests corresponds to each path from the tree root to a leaf, and the tree itself corresponds to a disjunction of these conjunctions.

Decision trees, in general, are a disjunction of conjunctions of restrictions on instance attribute values.

The determination of the attribute for the root node in each level is a key difficulty in the Decision Tree. Attribute selection is the term for this procedure. There are two widely used attribute selection methods: Information Gain and Gini Index.

APPROPRIATE PROBLEMS FOR DECISION TREE LEARNING:

Attribute-value pairs are used to represent instances. A predetermined set of characteristics (e.g., Temperature) and their values are used to characterize instances (e.g., Hot). Each property can have a limited number of distinct values (e.g., Hot, Mild, Cold).

The output values of the target function are discrete. Each sample is given a boolean categorization (yes or no) by the decision tree. Learning functions with more than two potential output values may be readily extended using decision tree approaches.

It’s possible that the training data contains mistakes. Both mistakes in the classifications of the training examples and inaccuracies in the attribute values that define these instances are resilient to decision tree learning approaches.

There may be missing attribute values in the training data. Even if some of the training examples have uncertain values, decision tree approaches can be applied (e.g., if the Humidity of the day is known for only some of the training examples).

Disjunctive descriptions may be necessary. Disjunctive expressions are represented using decision trees.

Applications:

The task of classifying examples into one of a discrete number of potential categories is sometimes referred to as a classification issue.

As a result, decision tree learning has been used to categorize medical patients according to their ailment, equipment faults according to their cause, and loan applicants according to their chance of defaulting on payments.

Issues:

Handling Inconsistent Data

In the real world, data sets aren’t always as neat as we’d want them to be, and inconsistent data is one example.

When two identical observations have distinct class labels, for as when two fruits have similar height and breadth measurements but different labels owing to one being a pear and the other being an apple, we have inconsistent data.

Clearly, the last leaf node cannot be pure in this scenario since there is no way to distinguish between the two fruits. When this happens, a frequent solution is to set the impure node’s class predictions to those of the dominant class in that node.

To put it another way, if this node had five apples and just one pear, the decision tree would categorize new observations in this node as an apple.

However, If the majority classes all have the same number of observations, a label may have to be chosen at random.

Oblique Splits –

Up until now, we’ve only explored dividing based on a single characteristic, but it’s worth noting that this technique might be extended to include linear combinations of features.

Oblique splits have various disadvantages, such as diminished interpretability and the greater impact of missing values and outliers.

Reference

Decision Tree Learning