Accuracy, Precision, Recall & F1-Score – Python Examples

0

In this post, you will learn about how to calculate machine learning model performance metrics such as some of the following scores while assessing the performance of the classification model. The concepts is illustrated using Python Sklearn example.

As a data scientist, you must get a good understanding of concepts related to the above in relation to measuring classification model performance.

Lets work with Sklearn datasets for breast cancer. You can load the dataset using the following code:

import pandas as pd
import numpy as np
from sklearn import datasets
#
# Load the breast cancer data set
#
bc = datasets.load_breast_cancer()
X = bc.data
y = bc.target

The target labels in the breast cancer dataset is Benign (1) and Malignant (0). There are 212 records with label as malignant and 357 records with label as benign. Lets create a training and test split where 30% of dataset is set aside for testing purpose.

from sklearn.model_selection import train_test_split
#
# Create training and test split
#
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=1, stratify=y)

Splitting the breast cancer dataset into training and test set results in the test set consisting of 64 records’ labels as benign and 107 records’ labels as malignant. Thus, actual positive is 107 records and actual negative is 64 records. Let’s train the model and get the confusion matrix. Here is the code for training the model and printing the confusion matrix.

from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.metrics import confusion_matrix
from sklearn.metrics import precision_score, recall_score, f1_score, accuracy_score
import matplotlib.pyplot as plt
#
# Standardize the data set
#
sc = StandardScaler()
sc.fit(X_train)
X_train_std = sc.transform(X_train)
X_test_std = sc.transform(X_test)
#
# Fit the SVC model
#
svc = SVC(kernel='linear', C=10.0, random_state=1)
svc.fit(X_train, y_train)
#
# Get the predictions
#
y_pred = svc.predict(X_test)
#
# Calculate the confusion matrix
#
conf_matrix = confusion_matrix(y_true=y_test, y_pred=y_pred)
#
# Print the confusion matrix using Matplotlib
#
fig, ax = plt.subplots(figsize=(5, 5))
ax.matshow(conf_matrix, cmap=plt.cm.Oranges, alpha=0.3)
for i in range(conf_matrix.shape[0]):
    for j in range(conf_matrix.shape[1]):
        ax.text(x=j, y=i,s=conf_matrix[i, j], va='center', ha='center', size='xx-large')

plt.xlabel('Predictions', fontsize=18)
plt.ylabel('Actuals', fontsize=18)
plt.title('Confusion Matrix', fontsize=18)
plt.show()

The following confusion matrix is printed:

Confusion Matrix representing predictions vs Actuals on Test Data
Fig 1. Confusion Matrix representing predictions vs Actuals on Test Data

The predicted data results in the above diagram could be read in the following manner given 1 represents the malignant cancer (positive).

  • True Positive (TP): True positive represents the value of correct predictions of positives out of actual positive cases. Out of 107 actual positive, 104 is correctly predicted positive. Thus, the value of True Positive is 104.
  • False Positive (FP): False positive represents the value of incorrect positive predictions. This value represents the number of negatives (out of 64) which gets falsely predicted as positive. Out of 64 actual negative, 3 is falsely predicted as positive. Thus, the value of False Positive is 3.
  • True Negative (TN): True negative represents the value of correct predictions of negatives out of actual negative cases. Out of 64 actual negative, 61 is correctly predicted negative. Thus, the value of True Negative is 61.
  • False Negative (FN): False negative represents the value of incorrect negative predictions. This value represents the number of positives (out of 107) which gets falsely predicted as negative. Out of 107 actual positive, 3 is falsely predicted as negative. Thus, the value of False Negative is 3.

Given above definitions, lets try and understand the concept of accuracy, precision, recall and f1-score.

What is Precision Score?

Precision: Model precision score represents the model’s ability to correctly predict the positives out of all the positive prediction it made. Precision score is a useful measure of success of prediction when the classes are very imbalanced.¬†Mathematically, it represents the ratio of true positive to the sum of true positive and false positive.

Precision Score = TP / (FP + TP)

The precision score from above confusion matrix will come out to be the following:

Precision score = 104 / (3 + 104) = 104/107 = 0.972

The same score can be obtained by using precision_score method from sklearn.metrics

print('Precision: %.3f' % precision_score(y_test, y_pred))

What is Recall Score?

Recall: Model recall score represents the model’s ability to correctly predict the positives out of actual positives. This is unlike precision which measures as to how many predictions made by models are actually positive out of all positive predictions made. Recall score is a useful measure of success of prediction when the classes are very imbalanced.¬† Mathematically, it represents the ratio of true positive to the sum of true positive and false negative.

Recall Score = TP / (FN + TP)

The recall score from above confusion matrix will come out to be the following:

Recall score = 104 / (3 + 104) = 104/107 = 0.972

The same score can be obtained by using recall_score method from sklearn.metrics

print('Recall: %.3f' % recall_score(y_test, y_pred))

What is Accuracy Score?

Model accuracy score represents the model’s ability to correctly predict both the positives and negatives out of all the predictions. Mathematically, it represents the ratio of sum of true positive and true negatives out of all the predictions.

Accuracy Score = (TP + TN)/ (TP + FN + TN + FP)

The accuracy score from above confusion matrix will come out to be the following:

Accuracy score = (104 + 61) / (104 + 3 + 61 + 3) = 165/171 = 0.965

The same score can be obtained by using accuracy_score method from sklearn.metrics

print('Accuracy: %.3f' % accuracy_score(y_test, y_pred))

What is F1-Score?

Model F1 score represents the model score as a function of precision and recall score. This is useful measure of the model in the scenarios where one tries to optimize either of precision or recall score and as a result, the model performance suffers. The following represents the aspects relating to issues with optimizing either precision or recall score:

  • Optimizing for recall helps with minimizing the chance of not detecting a malignant cancer. However, this comes at the cost of predicting malignant cancer in patients although the patients are healthy (a high number of FP).
  • Optimize for precision helps with correctness if the patient has a malignant cancer. However, this comes at the cost of missing malignant cancer more frequently (a high number of FN).

Mathematically, it can be represented as harmonic mean of precision and recall score.

F1 Score = 2* Precision Score * Recall Score/ (Precision Score + Recall Score/)

The accuracy score from above confusion matrix will come out to be the following:

F1 score = (2 * 0.972 * 0.972) / (0.972 + 0.972) = 1.89 / 1.944 = 0.972

The same score can be obtained by using f1_score method from sklearn.metrics

print('F1 Score: %.3f' % f1_score(y_test, y_pred))

Conclusions

Here is the summary of what you learned in relation to precision, recall, accuracy and f1-score.

  • Precision score is used to measure the model performance on measuring the count of true positives in correct manner out of all positive predictions made.
  • Recall score is used to measure the model performance in terms of measuring the count of true positives in correct manner out of all the actual positive values.
  • Precision-Recall score is a useful measure of success of prediction when the classes are very imbalanced.¬†
  • Accuracy score is used to measure the model performance in terms of measuring the ratio of sum of true positive and true negatives out of all the predictions made.
  • F1-score is harmonic mean of precision and recall score and is used as a metrics in the scenarios where choosing either of precision or recall score can result in compromise in terms of model giving high false positives and false negatives respectively.

Ajitesh Kumar
Share.

Leave A Reply

Time limit is exhausted. Please reload the CAPTCHA.