In this post, you will learn about the confusion matrix with examples and how it could be used as performance metrics for classification models in machine learning.

Let’s take an example of a classification model which is used to predict whether a person would default on a bank loan. To build this classification model, let’s say, a historical data set of 10000 records got chosen for building the model. As part of building the model, all of the 10,000 records got labeled where each record represented a person and got labeled as “Yes” or “No” based on whether they defaulted (Yes) or not defaulted (No).

Out of 10,000 labeled records, 7550 records are labeled as “No” – Not a defaulter. These cases could be called as “Negative“. 2450 records got labeled as “Yes” – A defaulter. Such cases could be called “Positive”.

The model got trained and did the prediction for all 10,000 cases. Before we get into looking at the confusion matrix, let’s try and understand what will be termed as a true positive, false positive, true negative and false negative cases in relation to the prediction made by the model.

• True Positive (TP): True positive will be a number of records which got predicted as positive and were originally found to be labeled as positive. In the current example, true positive will be a number of records which got predicted to be a defaulter and were found to be originally labeled as a defaulter.Let’s say, the total number of records representing true positive came out to be 1800.
• False Negative (FN): False negative will be a number of records which got predicted as negative and were originally found to be labeled as positive. In the current example, false negative will be a number of records which got predicted to be non-defaulters and were originally found to be labeled as defaulters.The total number of records representing false negative would come out to be 2450 – 1800 = 650.
• True Negative (TN): True negative will be a number of records which got predicted as negative and were originally found to be labeled as negative. In the current example, true negative will be a number of records which got predicted to be non-defaulter and were originally found to be labeled as a non-defaulter.Let’s say, the total number of records representing true negative came out to be 5800.
• False Positive (FP): False positive will be a number of records which got predicted as positive but were originally found to be labeled as negative. In the current example, false positive will be a number of records which got predicted to be defaulter but were originally found to be labeled as a non-defaulter.Let’s say, the total number of records representing true positive would come out to be 7450 – 5800 = 1650.

Laying the above in a matrix format, the following is how it would look like:

 Labeled/Predicted Predicted as Yes (Positive) Predicted as No (Negative) Labeled as Yes (Positive) 1800 (true positive) 650 (false negative) Labeled as No (Negative) 1650 (false positive) 5800 (true negative)

The above represents the confusion matrix representing the predictions made by the classification model. Let’s quickly go through some popular performance metrics:

• Accuracy can be calculated as (TP + TN)/(TP + FN + TN + FP)
• Precision = TP / (TP + FP)
• Recall or sensitivity = TP / (TP + FN)
• Specificity = TN / (TN + FP)

## Python Code Example for Confusion Matrix

In this section, you will see the Sklearn Python code example of confusion matrix. The model below is trained using the support vector classifier (SVC) algorithm. Sklearn.svm package is used for importing SVC. The following are some of the aspects illustrated in the code example given below:

• Sklearn IRIS dataset is used for training the model
• train_test_split method is used for creating the training and test data split.
• StandardScaler instance is used for data standardization
• SVC algorithm is used for fitting the model
• Sklearn.metrics confusion_matrix is used for calculating the confusion matrix.

Here is how the code will look like:


import pandas as pd
import numpy as np
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline
from sklearn.svm import SVC
from sklearn.metrics import confusion_matrix
import matplotlib.pyplot as plt
#
#
X = iris.data
y = iris.target
#
# Create the training and test split
#
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=1, stratify=y)
#
# Standardize the data set
#
sc = StandardScaler()
sc.fit(X_train)
X_train_std = sc.transform(X_train)
X_test_std = sc.transform(X_test)
#
# Fit the SVC model
#
svc = SVC(kernel='linear', C=10.0, random_state=1)
svc.fit(X_train, y_train)
#
# Get the predictions
#
y_pred = svc.predict(X_test)
#
# Calculate the confusion matrix
#
conf_matrix = confusion_matrix(y_true=y_test, y_pred=y_pred)
#
# Print the confusion matrix
#
print(conf_matrix)
#
# Print the confusion matrix using Matplotlib
#
fig, ax = plt.subplots(figsize=(5, 5))
ax.matshow(conf_matrix, cmap=plt.cm.Oranges, alpha=0.3)
for i in range(conf_matrix.shape[0]):
for j in range(conf_matrix.shape[1]):
ax.text(x=j, y=i,s=conf_matrix[i, j], va='center', ha='center')

plt.xlabel('Predicted label')
plt.ylabel('True label')
plt.title('Confusion Matrix')
plt.show()



Here is how the confusion matrix will look like:

Fig 1. Confusion Matrix for IRIS Data Set Prediction

## Summary

In this post, you learned about the concept of confusion matrix in relation to how it could be used as performance metrics for the classification model. Hope you liked the post. Please feel free to suggest and sorry for the typo.