Machine Learning

Decision Tree Classifier Python Code Example

In this post, you will learn about how to train a decision tree classifier machine learning model using Python. The following points will be covered in this post:

  • What is decision tree?
  • Decision tree python code sample

What is Decision Tree?

Simply speaking, the decision tree algorithm breaks the data points into decision nodes resulting in a tree structure. The decision nodes represent the question based on which the data is split further into two or more child nodes. The tree is created until the data points at a specific child node is pure (all data belongs to one class). The criteria for creating the most optimal decision questions is the information gain. The diagram below represents a sample decision tree.

Fig 1. Sample Decision tree

Training a machine learning model using a decision tree classification algorithm is about finding the decision tree boundaries.

Decision treesĀ build complex decision boundaries by dividing the feature space into rectangles. Here is a sample of how decision boundaries look like after model trained using a decision tree algorithm classifies the Sklearn IRIS data points. The feature space consists of two features namely petal length and petal width. The code sample is given later below.

Fig 2. Decision boundaries created by a decision tree classifier

Decision Tree Python Code Sample

Here is the code sample which can be used to train a decision tree classifier.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier

iris = datasets.load_iris()
X = iris.data[:, 2:]
y = iris.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1, stratify=y)

clf_tree = DecisionTreeClassifier(criterion='gini', max_depth=4, random_state=1)
clf_tree.fit(X_train, y_train)

Visualizing Decision Tree Model Decision Boundaries

Here is the code which can be used to create the decision tree boundaries shown in fig 2. Note that the package mlxtend is used for creating decision tree boundaries.

from mlxtend.plotting import plot_decision_regions

X_combined = np.vstack((X_train, X_test))
y_combined = np.hstack((y_train, y_test))

fig, ax = plt.subplots(figsize=(7, 7))
plot_decision_regions(X_combined, y_combined, clf=clf_tree)
plt.xlabel('petal length [cm]')
plt.ylabel('petal width [cm]')
plt.legend(loc='upper left')
plt.tight_layout()
plt.show()

Visualizing Decision Tree in the Tree Structure

Here is the code which can be used visualize the tree structure created as part of training the model. plot_tree function from sklearn tree class is used to create the tree structure. Here is the code:

from sklearn import tree

fig, ax = plt.subplots(figsize=(10, 10))
tree.plot_tree(clf_tree, fontsize=10)
plt.show()

Here is how the tree would look after the tree is drawn using the above command. Note the usage of plt.subplots(figsize=(10, 10)) for creating a larger diagram of the tree. Otherwise, the tree created is very small.

Fig 3. Decision tree visualisation

In the follow-up article, you will learn about how to draw nicer visualizations of decision tree using graphviz package. Also, you will learn some key concepts in relation to decision tree classifier such as information gain (entropy, gini etc).

Latest posts by Ajitesh Kumar (see all)
Ajitesh Kumar

I have been recently working in the area of Data analytics including Data Science and Machine Learning / Deep Learning. I am also passionate about different technologies including programming languages such as Java/JEE, Javascript, Python, R, Julia, etc, and technologies such as Blockchain, mobile computing, cloud-native technologies, application security, cloud computing platforms, big data, etc. I would love to connect with you on Linkedin. Check out my latest book titled as First Principles Thinking: Building winning products using first principles thinking.

Recent Posts

What are AI Agents? How do they work?

Artificial Intelligence (AI) agents have started becoming an integral part of our lives. Imagine asking…

2 weeks ago

Agentic AI Design Patterns Examples

In the ever-evolving landscape of agentic AI workflows and applications, understanding and leveraging design patterns…

2 weeks ago

List of Agentic AI Resources, Papers, Courses

In this blog, I aim to provide a comprehensive list of valuable resources for learning…

2 weeks ago

Understanding FAR, FRR, and EER in Auth Systems

Have you ever wondered how systems determine whether to grant or deny access, and how…

3 weeks ago

Top 10 Gartner Technology Trends for 2025

What revolutionary technologies and industries will define the future of business in 2025? As we…

3 weeks ago

OpenAI GPT Models in 2024: What’s in it for Data Scientists

For data scientists and machine learning researchers, 2024 has been a landmark year in AI…

3 weeks ago