Last updated: 26th August, 2024
In this blog post, we will discuss the concepts of logistic regression machine learning algorithm with the help of python example. Logistic regression is a parametric algorithm which is used to estimate the probability of an event occurring. For example, it can be used in the medical field to predict the probability of a patient developing a certain disease based on various health indicators, such as age, weight, and blood pressure. It is often used in machine learning applications.
Logistic regression is a type of supervised learning classification algorithm that is adept not only in binary classification but also in multinomial classification problems. It can predict whether an instance belongs to one of multiple classes. For example, a logistic regression model can be employed to determine the probability that a given sample belongs to one of two or more possible classes. It might determine, for example, that there’s a 10% chance the values in a sample correspond to class 0 and a 90% chance they correspond to class 1. In this case, logistic regression will predict that the sample corresponds to class 1.
Logistic regression model effectively learns the relationship between input features and their corresponding class labels. Unlike linear regression which maps inputs to continuous outputs, the logistic regression algorithm maps inputs to probabilities. Check more details on this blog – Linear Regression vs Logistic Regression.
In binary classification, logistic regression maps input data to a probability between 0 and 1, indicating the likelihood of one of two possible outcomes. In the context of multinomial classification, the logistic regression algorithm extends this approach to accommodate multiple classes, assigning probabilities to each class for a given input.
There are primarily three types of logistic regression, each tailored for different kinds of classification problems:
Binary Logistic Regression: This is the most common form of logistic regression, used when the target variable has two possible outcomes. It predicts the probability that an instance belongs to each of the two categories. For example, determining whether an email is spam or not spam is a binary classification problem.
Multinomial Logistic Regression: This type is used when the target variable can fall into more than two categories that are not ordered. It is suitable for scenarios where the outcome needs to be categorized into three or more classes in no specific order. For instance, predicting the type of cuisine a restaurant serves (like Italian, Chinese, Indian, etc.) based on various features would be a multinomial logistic regression problem.
Ordinal Logistic Regression: This variant is used when the target variable has three or more ordinal categories. In this case, the categories have a natural order, but the intervals between the categories are not necessarily consistent. For example, rating satisfaction levels as ‘satisfied’, ‘neutral’, or ‘dissatisfied’ would be an appropriate application for ordinal logistic regression.
Each type of logistic regression applies the logistic or sigmoid function to model the probability of class membership based on one or more predictor variables. The choice of which type to use depends on the nature of the target variable in the dataset.
In logistic regression, the core operation is transforming input data into a probability value, which is achieved using the sigmoid function. The “why” behind this process is rooted in the goal of logistic regression: to predict the probability of a binary outcome (yes/no, true/false, 1/0). Unlike linear regression which outputs continuous values, logistic regression is used when the output is categorical. The output in logistic regression needs to represent a probability that a sample belongs to each of the classes.
The sigmoid function, also known as the logistic function, is a special mathematical equation that transforms any real-valued number into a value between 0 and 1. It’s given by:
$$\Large \sigma(z) = \frac{1}{1 + e^{-z}} $$ …(eq. 1)
The value of z in sigmoid function represents the weighted sum of input values and can be written as the following:
$$\Large z = \theta^{T}x $$ …(eq. 2)
Where θ represents the parameters.
The sigmoid function, plotted (below) against its input z, illustrates how smoothly transitions from 0 to 1. As increases, it approaches 1, and as decreases towards negative infinity, gets closer to 0. Notably, at , is exactly 0.5.
The picture below represents different aspects of a logistic regression model:
Based on the above picture, the following represents some of the key concepts related to logistic regression model:
The output of the logistic regression model (sigmoid function output) is always between 0 and 1. For a binary classification model, if the output is close to 0, it means that the event is less likely to occur. If the output is close to 1, it means that the event is more likely to happen. For example, if the value of logistic regression model (represented using sigmoid function) is 0.8, it represents that the probability that the event will occur is 0.8 given a particular set of parameters learned using cost function optimization. Based on the threshold function, the class label can said to be 1. For any new value X, the output of the above function will be used for making the prediction.
The parameters in logistic regression is learned using the maximum likelihood estimation. The cost function for logistic regression is defined as:
In above cost function, h represents the output of sigmoid function shown earlier, y represents the class/label of the training data, x represents the training data. Note that for binary classification problems, the first term will be zero for class labeled as as 0 and the second term will be zero for class labeled as 1. The equation below represents this aspect:
When the loss function is plotted against hypothesis function (sigmoid), the following plot occurs for y = 0 and y = 1.
In order to fit the parameters, the objective function J(θ) would need to be minimized. Gradient descent algorithm (stochastic gradient descent – SGD) can be used for optimizing the objective or cost function. This is how the equation looks like for updating the parameters when executing gradient descent algorithm. Ensuring that gradient descent is running correctly, the value of J(θ) is calculated for θ and checked that it is decreasing on every iteration.
Besides stochastic gradient descent algorithm, it is recommended to use advanced algorithms such as some of the following: Conjugate gradient, BFGS, L-BFGS etc. When using scikit-learn for training logistic regression models, these algorithms can be used by mentioning solver parameter such as newton-cg, lbfgs, liblinear, saga, sag, etc.
Check out my video on cost function of the logistic regression model to learn the details.
The following represents few examples of problems that can be solved using binary classification model trained using logistic regression algorithm:
The Python code used in this blog represents fitting a machine learning model using Logistic Regression (Sklearn Logistic Regression). First and foremost, we will load the appropriate packages, sklearn modules and classes.
# Importing basic packages
#
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# Importing Sklearn module and classes
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
from sklearn import metrics
from sklearn import datasets
from sklearn.model_selection import train_test_split
The IRIS data set is used for training the logistic regression model. The Iris data set is a classification dataset that contains three classes of 50 instances each, where each class refers to a type of iris plant. The three classes in the Iris dataset are:
As a next step, we will load the dataset and do the data preparation. The scikit-learn library will be used to load the Iris dataset.
iris = datasets.load_iris()
X = iris.data[:, [0, 2]]
Y = iris.target
Next step is to create a train and test split. Note the stratification parameter. This is used to ensure that class distribution in training / test split remains consistent / balanced.
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.3, random_state=1, stratify=Y)
Next step is to perform feature scaling in order to make sure features are in fixed range irrespective of their values / units etc.
sc = StandardScaler()
sc.fit(X_train)
X_train_std = sc.transform(X_train)
X_test_std = sc.transform(X_test)
Next step is to train a logistic regression model. The following needs to be noted while using LogisticRegression algorithm sklearn.linear_model implementation:
# Create an instance of LogisticRegression classifier
lr = LogisticRegression(C=100.0, random_state=1, solver='lbfgs', multi_class='ovr')
# Fit the model
#
lr.fit(X_train_std, Y_train)
Next step is to measure the model performance of the model trained using LogisticRegression as shown above. In the code below, predict method is used to predict the class. You can also use predict_proba to compute the probabilities for each class.
# Create the predictions
#
Y_predict = lr.predict(X_test_std)
# Use metrics.accuracy_score to measure the score
print("LogisticRegression Accuracy %.3f" %metrics.accuracy_score(Y_test, Y_predict))
The score method is used to quantify the accuracy of a classification model. The score method returns the model accuracy in terms of sum of the true positives and the true negatives divided by the total number of samples.
In case you are working with smaller datasets, and, want to train logistic regression based on cross-validation, you could use LogisticRegressionCV class. Code such as following can be used to instantiate logistic regression class with built-in CV.
model = logisticRegressionCV(cv=5)
model.fit(x, y)
In recent years, artificial intelligence (AI) has evolved to include more sophisticated and capable agents,…
Adaptive learning helps in tailoring learning experiences to fit the unique needs of each student.…
With the increasing demand for more powerful machine learning (ML) systems that can handle diverse…
Anxiety is a common mental health condition that affects millions of people around the world.…
In machine learning, confounder features or variables can significantly affect the accuracy and validity of…
Last updated: 26 Sept, 2024 Credit card fraud detection is a major concern for credit…