Data Science

AIC in Logistic Regression: Formula, Example

Have you as a data scientist ever been challenged by choosing the best logistic regression model for your data? As we all know, the difference between a good and the best model while training machine learning model can be subtle yet impactful. Whether it’s predicting the likelihood of an event occurring or classifying data into distinct categories, logistic regression provides a robust framework for analysts and researchers. However, the true power of logistic regression is harnessed not just by building models, but also by selecting the right model. This is where the Akaike Information Criterion (AIC) comes into play.

In this blog, we’ll delve into different aspects of AIC, decode its formula, learn with real-world examples including Python & R example, and unveil the best practices and common pitfalls.

What is Akaike Information Criterion (AIC)?

The Akaike Information Criterion (AIC), named after its creator, Hirotugu Akaike in 1970, is one of the most popular tool for comparing different models. Unlike traditional methods that might focus solely on the goodness of fit, AIC introduces a balance, considering both the complexity of the model and how well it aligns with the observed data.

AIC is based on the concept of entropy, a measure of uncertainty or randomness. In simple terms, AIC evaluates how much “information” a model loses when it approximates reality. The lesser the information loss, the better the model. AIC embodies the idea that among models with a comparable fit, the simpler one is preferable. This principle is crucial in avoiding the trap of overfitting, where a model might perform well on the training data but poorly on new, unseen data.

When using AIC, it’s important to remember that it’s a relative measure. The absolute value of AIC is not as informative as the difference in AIC between models. There is no absolute “good” value of AIC in isolation. A smaller AIC value indicates a better model, but the “best” model is the one with the lowest AIC among the set of models being compared.

If we have two logistic regression models having different AIC values such as $AIC_1$ and $AIC_2$, and if $AIC_1 < AIC_2$, then model with $AIC_1$ is selected. For that matter, this holds good for models trained with any classification algorithm.

When comparing models, the difference in AIC values is important. A general guideline is that a difference of less than 2 might not be significant, while a difference of 2 to 6 suggests a substantial difference, and a difference of more than 10 indicates a strong difference between models.

AIC Formula

At its core, AIC is calculated using the following formula:

AIC = −2×log-likelihood+2×K

Here, the log-likelihood represents the probability of the data given the model, essentially measuring how well the model fits the data. The second term, 2 x K (where K is the number of parameters), penalizes model complexity. The formula ensures that adding more parameters to improve the model fit is only justified if it significantly enhances the likelihood.

AIC in Logistic Regression

In logistic regression, where models can become complex rapidly, AIC helps with model selection. It aids in comparing different logistic models applied to the same dataset, helping to make informed decisions about which model to use. AIC is particularly well-suited for logistic regression for several reasons, some of which are unique to the nature of logistic regression as a statistical modeling tool:

  • Likelihood-Based Model: The likelihood, particularly the log-likelihood, is a central concept in logistic regression. It represents the probability of observing the given data under the specified model. AIC relies on the log-likelihood as a key component of its calculation, making it inherently compatible with logistic regression models. AIC thus directly uses a key output of logistic regression models, making it a natural fit for evaluating these models.
  • Model Comparison on Log-Likelihood Basis: Logistic regression models often involve comparing various combinations of predictors to find the most effective model. AIC facilitates this comparison by quantifying model quality in terms of log-likelihood, adjusted for the number of parameters. This allows for a direct comparison of different logistic regression models based on their likelihood estimates, considering both the fit and the complexity of the models.
  • Fit vs. Complexity Balance: AIC helps to balance the fit of the model (how well the model explains the observed data) against its complexity (number of parameters). Logistic regression models, which are built around maximizing the likelihood, benefit from this balance. The AIC ensures that adding more predictors to the logistic model (thus increasing complexity) is only beneficial if it significantly improves the fit.
  • Sensitivity to Overfitting: Logistic regression models can be prone to overfitting, especially with many predictors or complex interactions. Overfitting occurs when a model is too closely tailored to the training data and may not perform well on new data. AIC’s penalty for additional parameters naturally guards against overfitting by discouraging unnecessarily complex models.

Evaluating Logistic Regression Models using AIC

In this section, we will demonstrate how AIC can be used for evaluating two logistic regression models. We will demonstrate using both Python and R code.

Python Code for Comparing Logistic Regression Models using AIC

I will work with breast cancer dataset from sklearn. I will first create two logistic regression models with different sets of predictors. Then, I will calculate the log-likelihood for each model and use this to calculate AIC and compare them. In the Python code below, I created two logistic regression models using two different solvers such as liblinear and newton-cg.

from sklearn.datasets import load_breast_cancer
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
import numpy as np

# Load the dataset
data = load_breast_cancer()
X = data.data
y = data.target

# Split the dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Create two logistic regression models with different solvers
model1 = LogisticRegression(solver='liblinear')
model1.fit(X_train[:, :10], y_train)  

model2 = LogisticRegression(solver='newton-cg')
model2.fit(X_train[:, :10], y_train) 

# Predict log probabilities for each model
log_prob1 = model1.predict_log_proba(X_test[:, :10])
log_prob2 = model2.predict_log_proba(X_test[:, :10])

# Calculate log-likelihood for each model
log_likelihood1 = log_prob1[np.arange(len(y_test)), y_test].sum()
log_likelihood2 = log_prob2[np.arange(len(y_test)), y_test].sum()

# Compare the models
print(f"Log-Likelihood for Model 1: {log_likelihood1}")
print(f"Log-Likelihood for Model 2: {log_likelihood2}")

# Calculate AIC for each model
k1 = 10 + 1  # Number of parameters in model1
k2 = 10 + 1  # Number of parameters in model2

aic1 = 2 * k1 - 2 * log_likelihood1
aic2 = 2 * k2 - 2 * log_likelihood2

# Compare the models
print(f"AIC for Model 1: {aic1}")
print(f"AIC for Model 2: {aic2}")

The following output is printed:

Log-Likelihood for Model 1: -28.103983686589725
Log-Likelihood for Model 2: -26.20160164908428
AIC for Model 1: 78.20796737317946
AIC for Model 2: 74.40320329816856

Based on the above output, here’s how to interpret the results for the selection of the model:

  1. Log-Likelihood Values:
    • Model 1: -28.10
    • Model 2: -26.20
    The log-likelihood for Model 2 is higher (less negative) than that for Model 1, indicating that Model 2 fits the data better than Model 1.
  2. AIC Values:
    • Model 1: 78.21
    • Model 2: 74.40
    The AIC for Model 2 is lower than that for Model 1. Since a lower AIC value indicates a model that better balances goodness of fit with complexity, Model 2 is preferable based on this criterion.

Model Selection: Given these results, Model 2 is the better choice between the two. It not only has a better fit (higher log-likelihood) but also maintains a balance between fitting the data well and not being overly complex (lower AIC).

R Code for AIC in Logistic Regression

The following is the R code for evaluating logistic regression model using AIC. In the code below, both models use the same predictors, but model2 employs a different link function (probit instead of the default logit). The AIC values for both models are then calculated and compared. The model with the lower AIC is generally preferred.

# Load the necessary libraries
library(MASS)

# Load the biopsy dataset (similar to the breast cancer dataset)
data(biopsy)

# Clean the dataset (remove NA values)
biopsy_clean <- na.omit(biopsy)

# Define the response variable
response <- as.factor(biopsy_clean$class)

# Define the same set of predictors for both models
predictors <- biopsy_clean[, c("V1", "V2", "V3", "V4", "V5")]

# Model 1: Standard logistic regression model
model1 <- glm(response ~ ., data = predictors, family = binomial())

# Model 2: Logistic regression with a different link function (e.g., probit)
model2 <- glm(response ~ ., data = predictors, family = binomial(link = "probit"))

# Calculate AIC for each model
aic1 <- AIC(model1)
aic2 <- AIC(model2)

# Output the AIC values
print(paste("AIC for Model 1:", aic1))
print(paste("AIC for Model 2:", aic2))

The above code when executed prints the following output:

  • AIC for Model 1: 160.66
  • AIC for Model 2: 159.51

We can select Model 2. The reason for this selection is grounded in the principle behind the AIC.

Ajitesh Kumar

I have been recently working in the area of Data analytics including Data Science and Machine Learning / Deep Learning. I am also passionate about different technologies including programming languages such as Java/JEE, Javascript, Python, R, Julia, etc, and technologies such as Blockchain, mobile computing, cloud-native technologies, application security, cloud computing platforms, big data, etc. I would love to connect with you on Linkedin. Check out my latest book titled as First Principles Thinking: Building winning products using first principles thinking.

Recent Posts

Agentic Reasoning Design Patterns in AI: Examples

In recent years, artificial intelligence (AI) has evolved to include more sophisticated and capable agents,…

3 weeks ago

LLMs for Adaptive Learning & Personalized Education

Adaptive learning helps in tailoring learning experiences to fit the unique needs of each student.…

4 weeks ago

Sparse Mixture of Experts (MoE) Models: Examples

With the increasing demand for more powerful machine learning (ML) systems that can handle diverse…

1 month ago

Anxiety Disorder Detection & Machine Learning Techniques

Anxiety is a common mental health condition that affects millions of people around the world.…

1 month ago

Confounder Features & Machine Learning Models: Examples

In machine learning, confounder features or variables can significantly affect the accuracy and validity of…

1 month ago

Credit Card Fraud Detection & Machine Learning

Last updated: 26 Sept, 2024 Credit card fraud detection is a major concern for credit…

1 month ago