Last updated: 1st May, 2024
The bias-variance trade-off is a fundamental concept in machine learning that presents a challenging dilemma for data scientists. It relates to the problem of simultaneously minimizing two sources of residual error that prevent supervised learning algorithms from generalizing beyond their training data. These two sources of error are related to Bias and Variance. Bias-related errors refer to the error due to overly simplistic machine learning models. Variance-related errors refer to the error due to too much complexity in the models.
In this post, you will learn about the concepts of bias & variance in the machine learning (ML) models. You will learn about the tradeoff between bias and variance which is a fundamental problem in machine learning, and why it is often necessary to experiment with different model types to find the balance between model bias and variance that works best for a given dataset.
The bias-variance tradeoff is a central problem in supervised machine learning involving two types of error that models can exhibit: bias and variance. Both types of error affect the model’s performance.
It is often impossible to reduce both types of error simultaneously. The challenge is that decreasing one typically increases the other. A highly complex model (low bias) might overfit the training data, capturing noise, and leading to high variance. Conversely, a too-simple model (low variance) might underfit the data, failing to capture underlying patterns, resulting in high bias.
Consequently, careful consideration must be given to the bias-variance tradeoff when designing machine learning models. If the model is too simple, it will suffer from high bias and low variance. If the model is too complex, it will suffer from low bias and high variance. Finding the right balance is essential for creating an effective machine-learning model. The ideal model would have low bias and low variance, but in practice, it is often necessary to trade off one for the other.
Model bias can be lowered using good machine learning algorithms, and feature engineering while making domain-specific adjustments and unbiased training data.
Model variance can be lowered using good machine learning algorithms, a very large set of training data (more than required), and more and less noisy features if possible that are correlated to the target variable.
The intuition behind bias and variance can be understood based on the following diagram. One can get a detailed explanation by going through the free online course – Introduction to Machine Learning by Dr. Sebastian Raschka.
The above diagram represents a person throwing a stone with the goal of hitting the point representing the target value. However, his stones hit the ground at different points much away from the actual / target point. Let’s try and understand bias and variance concepts using the above diagram.
In the above diagram, the bias is represented as the distance between the target value (true value) and the average of points where the stone hit the ground. The dashed line represents the average of predictions made by different models (different points where the stone hit the ground) trained using different training data sets derived from the same population. The lower the distance (and, hence lower bias) between the average (dashed line) and the target value (red circle), the lower the bias, the better is the model. The variance can be represented as the spread of individual points around the average (red dashed line). The lower the spread (and, hence lower variance), the better is the model.
In predictive modeling, bias-variance tradeoff can be understood as a trade-off between simplicity and complexity. On one end, you have the simpler models (high bias), on the other, you have the more complex models (high variance).
The total error (or loss) in a predictive model can be broken down into three parts:
$\text{Total Error} = \text{Bias}^2 + \text{Variance} + \text{Irreducible Error}$
The ideal scenario for model bias and variance in machine learning is achieving a balance where both bias and variance are minimized. This balance is crucial for developing a model that generalizes well to unseen data. Here’s what this ideal scenario entails:
In machine learning, if the ideal balance of low bias and low variance is unattainable, the second best scenario typically involves prioritizing one over the other based on the specific requirements and constraints of the problem at hand. This decision is often influenced by the nature of the data, the application domain, and the cost of errors. The two common second-best scenarios are:
The bias in machine learning is a concept that describes how far a model’s predictions are from the actual values. It arises when a model is too simple and makes assumptions that don’t capture the complexity of the underlying data, resulting in underfitting. Underfitting occurs when a model is not able to model the data well enough, meaning it doesn’t perform well on the training data, and as a result, also performs poorly on new, unseen data.
High model bias may mean that the model is making wrong predictions because it might have missed certain important features. In other words, the model has been under-fitted. Low model bias would mean that the model may have considered important features apart from other less important ones while being trained. The following picture demonstrates the model bias for linear and polynomial regression model.
Here’s the plot demonstrating high bias and low bias scenarios in machine learning models:
Mathematically, the bias of the model can be represented using the following equation:
$\Large Bias = E[\hat{\theta}] – \theta$
In the above equation, the $E[\hat{\theta}]$ represents the expected value of the prediction which is an average of predictions made by different estimators, and, $\theta$ represents the true value.
The concept of variance in machine learning models relates to how much the predictions of a model vary for a given data point when the model is trained on different subsets of the same dataset. Since the training data are used to fit the machine learning models using a set of hyperparameters, different training data sets with different sets of hyperparameters will result in different models.
High variance indicates that the model’s predictions are highly sensitive to the specific data it was trained on, leading to overfitting. The ideal model will tend to have low variance. In other words, the ideal model will be the one whose predictions would not vary too much between training sets. However, if a model has high variance then small changes in the training data can result in large changes in the model prediction performance.
Mathematically, variance can be represented as the following:
$\Large Var = E[{\hat{\theta}}^2] – {E[\hat{\theta}]}^2$
Above equation can also be expressed as the following:
$\Large Var = E[(E[\hat\theta] – \hat\theta)^2]$
Artificial Intelligence (AI) agents have started becoming an integral part of our lives. Imagine asking…
In the ever-evolving landscape of agentic AI workflows and applications, understanding and leveraging design patterns…
In this blog, I aim to provide a comprehensive list of valuable resources for learning…
Have you ever wondered how systems determine whether to grant or deny access, and how…
What revolutionary technologies and industries will define the future of business in 2025? As we…
For data scientists and machine learning researchers, 2024 has been a landmark year in AI…