**Introduction :**

In this blog we will discuss a Machine Learning Algorithm called Decision Tree. The goal of the blogpost is to get the beginners started with fundamental concepts of a Decision Tree and quickly help them to develop their first tree model in no time.

A decision tree is a decision support tool that uses a tree-like graph or model of decisions and their possible consequences. It is one way to display an algorithm that only contains conditional control statements.

A decision tree is a flowchart-like structure in which each internal node represents a “test” on an attribute , each branch represents the outcome of the test, and each leaf node represents a class label (decision taken after computing all attributes). The paths from root to leaf represent classification rules.

Decision tree is a type of supervised learning algorithm (having a pre-defined target variable) that is mostly used in classification problems.

It works for both categorical and continuous input and output variables.

## Types of Decision Trees

Types of decision tree is based on the type of target variable we have. It can be of two types:

**Categorical Variable Decision Tree:**

Decision Tree which has categorical target variable then it called as categorical variable decision tree.

**Continuous Variable Decision Tree:**

Decision Tree which has continuous target variable then it is called as Continuous Variable Decision Tree.

*Example:-*

*Example:-*

Let’s say we have a problem to predict whether a bike is good or not . This can be judged by using decision tree classifier.

However to qualify the bike into good or bad category mileage is an important factor. Mileage is measured using a contiguous value hence it can be measured using the decision tree regressor.

**Important Terminology related to Decision Trees**

Let’s look at the basic terminology used with Decision trees:

**Root Node:**

It represents entire population or sample and this further gets divided into two or more homogeneous sets.

**Splitting:**

It is a process of dividing a node into two or more sub-nodes.

**Decision Node:**

When a sub-node splits into further sub-nodes, then it is called decision node.

**Leaf/ Terminal Node:**

Nodes do not split is called Leaf or Terminal node.

**Pruning:**

When we remove sub-nodes of a decision node, this process is called pruning. You can say opposite process of splitting.

**Branch / Sub-Tree:**

A sub-section of entire tree is called branch or sub-tree.

**Parent and Child Node:**

A node, which is divided into sub-nodes is called parent node of sub-nodes where as sub-nodes are the child of parent node.

**How to split nodes**

The decision of making strategic splits heavily affects a tree’s accuracy. The decision criteria is different for classification and regression trees. Decision trees use multiple algorithms to decide to split a node in two or more sub-nodes. The creation of sub-nodes increases the homogeneity of resultant sub-nodes. In other words, we can say that purity of the node increases with respect to the target variable. Decision tree splits the nodes on all available variables and then selects the split which results in most homogeneous sub-nodes.

There are few algorithms to find optimum split. Let’s look at the following to understand the mathematics behind.

**Gini Index**

Gini index says, if we select two items from a population at random then they must be of same class and probability for this is 1 if population is pure.

It works with categorical target variable “Success” or “Failure”. It performs only Binary splits **Higher the value of Gini higher the homogeneity.** CART (Classification and Regression Tree) uses Gini method to create binary splits.

**Steps to Calculate Gini for a split**

- Calculate Gini for sub-nodes, using formula sum of square of probability for success and failure (p^2+q^2).
- Calculate Gini for split using weighted Gini score of each node of that split

**Information gain**

We can derive information gain from entropy as **1- Entropy. **Entropy is a way measuring the amount of impurity in a given set of data. It is represented by a formula :

Enough of theory now let’s dive into the implementation logistic regression .

We will use implementation provided by the python machine learning framework known as scikit-learn.