Bias and variance are two important properties of machine learning models. In this post, you will learn about the concepts of **bias & variance **in relation to the **machine learning (ML) models. **Bias refers to how well your model can represent all possible outcomes, whereas variance refers to how sensitive your predictions are to changes in the model’s parameters. The tradeoff between bias and variance is a fundamental problem in machine learning, and it is often necessary to experiment with different model types in order to find the balance that works best for a given dataset. In addition to learning the concepts related to Bias vs variance trade-off, you would also get a chance to take a quiz which would help you prepare for data scientists / ML Engineer interviews. As data scientists / ML Engineer, you must get a good understanding of Bias and Variance concepts in order to build models that generalize in a better manner or have lower generalization errors.

Table of Contents

## Bias & Variance of Machine Learning Models

The bias of the model, intuitively speaking, can be defined as an affinity of the model to make predictions or estimates based on only certain features of the dataset. High model bias may mean that the model may have missed certain important features and mostly made wrong predictions. In other words, the model has been **under-fitted**. Low model bias would mean that the model may have considered important features apart from other less important ones while being trained. Mathematically, the bias of the model can be defined as the difference between the average of predictions made by the different estimators trained using different training datasets/hyperparameters, and, the true value. **Higher the difference, the higher the bias. Lower the difference, lower the bias**. It essentially means that the bias is low if the difference of average predictions by different estimators and the true value is low. In other words, if the mode accuracy is very high, the bias can be said to be low. **Ideally, the desired model has a low bias**. Mathematically, the bias of the model can be represented using the following equation:

.

In the above equation, the \(E[\hat{\theta}]\) represents the expected value of the prediction which is an average of predictions made by different estimators, and, \(\theta\) represents the true value.

The variance of the model is the expected value or the average of how far the predictions of every estimator are, from the average of predictions of all estimators. Higher the average value (prior mentioned distance), the higher the variance. The ideal model will tend to have low variance. In simpler words, variance refers to the amount by which the model predictions would change if we estimated it using a different training data set. Since the training data are used to fit the machine learning models using a set of hyperparameters, different training data sets with different sets of hyperparameters will result in different models. The ideal model will be the one whose predictions would not vary too much between training sets. However, if a model has high variance then small changes in the training data can result in large changes in the model prediction performance. Mathematically, variance can be represented as the following:

\(\Large Var = E[{\hat{\theta}}^2] – {E[\hat{\theta}]}^2\).

Above equation can also be expressed as the following:

\(\Large Var = E[(E[\hat\theta] – \hat\theta)^2]\).

### Model Bias & Variance Intuition

The intuition behind bias and variance can be understood based on the following diagram. One can get a detailed explanation by going through the free online course – Introduction to Machine Learning by Dr. Sebastian Raschka.

The above diagram represents a person throwing a stone with the goal of hitting the point representing the target value. However, his stones hit the ground at different points much away from the actual / target point. Let’s try and understand bias and variance concepts using the above diagram.

In the above diagram, the bias is represented as the distance between the target value (true value) and the average of points where the stone hit the ground. The dashed line represents the average of predictions made by different models (different points where the stone hit the ground) trained using different training data sets derived from the same population. The lower the distance (and, hence lower bias) between the average (dashed line) and the target value (red circle), the lower the bias, the better is the model.

The variance can be represented as the spread of individual points around the average (red dashed line). The lower the spread (and, hence lower variance), the **better is the model. **

### Model Loss as a function of Bias & Variance

If you pay closer attention to the diagram in Fig 1, you may realize that for a particular target or true value, the loss of the model can be represented as the function of bias and variance. Based on different loss functions, the bias and the variance can have different contributions to make to the overall loss. One thing that becomes clear is this:

If the model bias and the variance, both are low, the model will have higher accuracy. In other words, the generalization error of the model will be very low.

### Yet Another Explanation of Model Bias & Variance

Let’s understand the bias and variance of the model vis-a-vis model accuracy or generalization error using another diagram such as the following. The diagram is very popular and sighted in several articles/posts. I will take another shot to explain bias and variance using the below diagram.

The above diagram represents four dart boards with points put on each dartboard. The center of the dartboard, in red, represents the target (or target value when speaking in ML terms). Different blue dots represent the throw made to hit the target value. The throws represent the predictions made by different estimators.

If the throws (blue dots – predictions by different estimators) are consistently near to each other, the throws are said to be very precise **(high precision**). In ML terms, the throws near to each other represent **low variance. **The models/estimators will be said to have low variance. If the throws are far off each other, the throws are not precise (low precision). This can be thought of as high variance, in ML terms. The models/estimators will be said to have high variance.

If the throws (blue dots – predictions by different estimators) are consistently near to the target (target value), this would represent **low bias** or no bias. If the throws are away from the target, the models will turn out to have a **high bias**.

So, what is the **ideal scenario** for the best machine learning model:

- The ideal scenario will be to have
**low bias and low variance**. In the above diagram, this will mean that the throws (blue dots) hit the red circle. This is very difficult to achieve. - The second-best scenario could be low bias and somewhat high variance. This would still mean that the loss is comparatively lower than the other settings such as high bias / low variance and high bias / high variance.

## Model Bias & Variance Trade-off vs Overfitting & Underfitting

A model which is suffering from underfitting (both training and test error are very high) will have high bias and low variance resulting in a very high loss (hence the high error). These models have a comparatively very lower capacity or, in other words, lower complexity. An example will be the decision stump (one node & two leaves) which will underfit and have low bias.

An algorithm having access to a larger hypothesis space (as a result of different training data sets, features, and hyperparameters) would result in models having higher overfitting. In other words, these models will have high variance. These models will be termed complex models owing to the fact that several models can be trained using a combination of different training datasets, a large number of features, and different values of hyperparameters.

As the model complexity increases, the model tends to move from the state of underfitting to overfitting. When speaking in terms of bias and variance, the models tend to move from the state of having a **high bias **to **high variance. **

**The goal will be to find a sweet spot where there is the optimum value of bias and variance. A sweet spot where the model neither under-fits nor overfits. **Remember that as the model complexity increases, the bias decreases, and the variance increases. Thus, the goal is to find an optimum level of model complexity (capacity) where a sweet spot can be found.

## What is the Bias-variance tradeoff?

The tradeoff between bias and variance is a fundamental challenge in machine learning, and it is often impossible to reduce both types of error simultaneously. Consequently, careful consideration must be given to the bias-variance tradeoff when designing machine learning models. If the model is too simple, it will suffer from high bias and low variance. If the model is too complex, it will suffer from low bias and high variance. Finding the right balance is essential for creating an effective machine learning model. The ideal model would have low bias and low variance, but in practice, it is often necessary to trade off one for the other.

Model bias can be lowered using good machine learning algorithms, unbiased training data, and regularization. Model variance can be lowered using good machine learning algorithms, a very large set of training data (more than required), and more features or less noisy features if possible that are correlated to the target variable. It is not always necessary to reduce model bias to zero but reducing it as much as possible would definitely help.

## Bias & Variance Examples

**An example of a** model having high bias is the linear regression model for a complex problem. The linear regression model has a high bias because it is not flexible enough to estimate the true function. Rather their function gets pre-determined. The models having high biases cannot represent complex relationships between different variables making them less powerful than models with low bias which can fit almost any data well. In general, the real-life problems modeled using the regression model may turn out to have a high bias.

**An example of a model having low bias** is the logistic regression classifier. The logistic regression classifier is a more powerful machine learning model because it can model complex relationships between different variables.

**An example of the model having high variance **is the decision tree classifier**.** Decision trees are machine learning models that split all training datasets into different leaves, representing the outcome of an event by splitting them into attributes. The complexity/flexibility of these machine learning algorithms is determined by depth (the number of times they split) and width (the number of different attributes they can split on). As machine learning models using decision trees grow in complexity (depth and width) the variance also increases making them less powerful than simpler machine learning algorithms. In general, more flexible machine learning algorithms (having lower interpretability) have higher variance.

**An example of a model having low variance **is the neural networks classifier which is very flexible and can fit almost any dataset. This makes them less powerful than logistic regression machine learning models because they have high variance, meaning that the predictions will vary greatly with respect to changes in model parameters.

## Bias & Variance Interview Questions

Here is a quiz consisting of questions related to **bias & variance.** These questions are intended to test your understanding of the concepts around bias & variance.

- Softmax Regression Explained with Python Example - May 24, 2022
- Neural Network Explained with Perceptron Example - May 23, 2022
- Classification Problems Real-life Examples - May 23, 2022

[…] value of bias and variance. You can read more about bias and variance concepts on this page – Bias and variance concepts and interview questions. The dartboard diagram given below can be used to understand the bias and variance concepts. The […]

[…] Bagging classifier helps reduce the variance of unstable classifiers (having high variance). The unstable classifiers include classifiers trained using algorithms such as decision tree which is found to have high variance and low bias. Thus, one can get the most benefit of using bagging classifier for algorithms such as decision trees. The stable classifiers such as linear discriminant analysis which have low variance may not benefit much from bagging technique. You may want to check this post to get a better understanding of Bias and Variance concepts – Bias & variance concepts and interview questions […]