/    /  Machine Learning- The Basic Decision Tree Algorithm 

The Basic Decision Tree Algorithm 

 

Learning and prediction are two steps of a classification process in Machine Learning. The model is built based on the training data in the learning process. 

 

The model is used to forecast the response for provided data in the prediction stage. The Decision Tree is one of the most straightforward and often used classification techniques.

 

In this article, we’ll have a look at how decision trees are constructed and how they benefit the machine. 

 

The Decision Node and the Leaf Node are the two nodes of a Decision tree. Leaf nodes are the result of those decisions and do not include any more branches, whereas Choice nodes are used to make any decisions.

 

Types of Decision Trees: 

The Decision Tree is classified into two types based on the target variable it can have. 

 

Categorical Variable Decision Tree: A Categorical Variable Decision Tree is a decision tree with a categorical target variable.

 

Continuous Variable Decision Tree: A Continuous Variable Decision Tree is one that has a continuous target attribute type.

 

Consider an example, Let’s imagine we’re trying to estimate if a consumer will pay his insurance company’s renewal premium (yes/no). 

 

Consumers’ income is a crucial variable in this case, yet the insurance business does not have income information for all customers. 

 

We may create a decision tree to forecast consumer income based on occupation, product, and numerous other characteristics now that we know this is an essential variable. We are predicting values for continuous variables in this scenario.

 

Terminologies needed to know to understand Decision Tree Algorithm:

  • The root node is the starting point for the decision tree. It represents the full dataset, which is then split into two or more homogenous groups.

 

  • Leaf Node: Leaf nodes are the tree’s last output nodes, and they can’t be separated anymore after that.

 

  • Branch/Sub Tree: A tree formed by splitting the tree.

 

  • The root node of the tree is known as the parent node, while the remaining nodes are known as the child nodes.

 

  • Pruning (or post-pruning) takes a tree that has already been overfitting and makes certain alterations to decrease or remove the observed overfitting. 

A good pruning rule will identify the splits that don’t generalize well and eliminate them from the tree, generally using an independent test set.

 

  • Overfitting – Good generalization is the desired property in our decision trees (and, indeed, in all classification problems), as we noted before. 

This implies we want the model fit on the labeled training data to generate predictions that are as accurate as they are on new, unseen observations.

 

How Does the Decision Tree work: 

Step 1: Start with the root node, which holds the whole dataset, explains S.

Step 2: Using the Attribute Selection Measure, find the best attribute in the dataset (ASM).

Step 3: Subdivide the S into subsets that include the best attribute’s potential values.

Step 4: Create the node of the decision tree that has the best attribute.

Step 5: Create additional decision trees in a recursive manner using the subsets of the dataset obtained in step 3. Continue this procedure until the nodes can no longer be classified, at which point the final node is referred to as a leaf node.

 

For example, consider the following scenario: A applicant receives a job offer and must decide whether or not to take it. 

 

As a result, the decision tree begins at the root node to answer this problem (Salary attribute by ASM). The root node divides into the next decision node (distance from the office) and one leaf node based on the related labels. 

 

The following decision node is divided into one decision node and one leaf node.

 

The determination of the attribute for the root node in each level is a key difficulty in the Decision Tree. This procedure is called Attribute Selection. There are two widely used attribute selection methods: Information Gain and Gini Index.

 

Information Gain: 

The assessment of changes in entropy after segmenting a dataset based on a characteristic is known as information gain.

 

It establishes how much information a feature provides about a class.

 

We divided the node and built the decision tree based on the value of information gained.

 

The greatest information gain node/attribute is split first in a decision tree method, which always strives to maximize the value of information gain. The formula for Information Gain: 

 

Information Gain= Entropy(S)- [(Weighted Avg) *Entropy(each feature)

 

Entropy is a metric for determining the degree of impurity in a particular property. It denotes the unpredictability of data. The following formula may be used to compute entropy:

 

Entropy(s)= -P(yes)log2 P(yes)- P(no) log2 P(no)

 

Where,

S stands for “total number of samples.”

P(yes) denotes the likelihood of a yes answer.

P(no) denotes the likelihood of a negative outcome.

 

Gini Index:

The Gini index may be thought of as a cost function for evaluating splits in a dataset. It’s computed by subtracting one from the total of each class’s squared probability. 

 

It prefers bigger partitions that are simple to implement, whereas information gain prefers smaller partitions that have different values.

 

Gini Index= 1- ∑jPj2

 

The Gini Index uses a category target variable called “Success” or “Failure.” It only does binary splits.

 

A higher Gini index value indicates greater inequality and heterogeneity.

 

Reference

The Basic Decision Tree Algorithm