i2tutorials

Machine Learning- Decision Tree Learning

Decision Tree Learning

 

Decision tree learning is a method for approximating discrete-valued target attributes, under the category of supervised learning. They can be used to address problems involving regression and classification.

 

In this blog, we’ll have a look at an Introduction to Decision tree learning and its representation.

 

Using the decision tree, we may express any boolean function on discrete characteristics. To increase human readability, learned trees can also be represented as sets of if-then rules.

 

Representation:

The classification is yielded when the instances are classified using decision trees by sorting them along the tree from the root to a leaf node.

 

Every node in the decision tree represents a test of an instance’s property, and each branch descending from that node represents one of the attribute’s potential values.

 

Starting at the root node of the tree, test the attribute indicated by this node, and then go along the tree branch corresponding to the value of the attribute in the given example, an instance is classified.

 

Every subtree encountered is processed similarly to construct the tree.

 

Let’s have a look at an example, where the target attribute is EnjoySport that might have yes or no values on different Saturday mornings, is predicted here based on other morning qualities.

 

Here’s a decision tree for the concept of EnjoySport. An instance is classified by sorting it through the tree to its suitable leaf node, then returning the classification associated with this leaf node, it can be yes or no. 

 

This decision tree categorizes Saturday mornings based on whether or not they are ideal for EnjoySport or not.

 

For example, the instance

< Outlook = Sunny, Temperature = Hot, Humidity = High, Wind = Strong>

 

   Would be sorted down the leftmost branch of this decision tree and would therefore be classified as a negative instance (i.e., the tree predicts that EnjoySport = no) 

 

The conjunction of attribute tests corresponds to each path from the tree root to a leaf, and the tree itself corresponds to a disjunction of these conjunctions.

 

Decision trees, in general, are a disjunction of conjunctions of restrictions on instance attribute values.

 

The determination of the attribute for the root node in each level is a key difficulty in the Decision Tree. Attribute selection is the term for this procedure. There are two widely used attribute selection methods: Information Gain and Gini Index.

 

APPROPRIATE PROBLEMS FOR DECISION TREE LEARNING:

 

 

 

 

 

Applications:

 

 

Issues: 

In the real world, data sets aren’t always as neat as we’d want them to be, and inconsistent data is one example. 

 

When two identical observations have distinct class labels, for as when two fruits have similar height and breadth measurements but different labels owing to one being a pear and the other being an apple, we have inconsistent data. 

 

Clearly, the last leaf node cannot be pure in this scenario since there is no way to distinguish between the two fruits. When this happens, a frequent solution is to set the impure node’s class predictions to those of the dominant class in that node. 

 

To put it another way, if this node had five apples and just one pear, the decision tree would categorize new observations in this node as an apple. 

 

However, If the majority classes all have the same number of observations, a label may have to be chosen at random.

 

Up until now, we’ve only explored dividing based on a single characteristic, but it’s worth noting that this technique might be extended to include linear combinations of features.

 

Oblique splits have various disadvantages, such as diminished interpretability and the greater impact of missing values and outliers.

 

 

Reference

Decision Tree Learning

Exit mobile version