In this post, you will learn about the the concepts ofÂ **bias & variance **in relation to the **machine learning (ML) models. **In addition to learning the concepts, you would also get a chance to take quiz which would help you prepare for data scientists / ML Engineer interviews. As data scientists / ML Engineer, you must get a good understanding of Bias and Variance concepts in order to build models that generalizes in a better manner or have lower generalization error.

## Bias & Variance of Machine Learning Models

Bias of the model, intuitively speaking, can be defined as affinity of the model to make predictions or estimate based on only certain features of the dataset. High model bias may mean that the model may have missed certain important features and mostly making wrong predictions. In other words, the model have underfit. Low model bias would mean that the model may have considered important features apart from other less important ones while being trained. Mathematically, the bias of the model can be defined as the difference between average of predictions made by the different estimators trained using different training dataset / hyper parameters, and, the true value. Higher the difference, higher the bias. Lower the difference, lower the bias. It essentially means that the bias is low if the difference of average predictions by different estimators and the true value is low. In other words, if the mode accuracy is very high, the bias can be said to be low. Ideally, the desired model has low bias. Mathematically, bias of model can be represented using the following equation:

\(\Large Bias = E[\hat{\theta}] – \theta\).

In above equation, the \(E[\hat{\theta}]\) represents the expected value of the prediction which is an average of predictions made by different estimators, and, \(\theta\) represents the true value.

Variance of the model is the expected value or the average of how far the predictions of every estimators are, from the average of predictions of every estimators. Higher the average value (prior mentioned distance), higher the variance. The ideal model will tend to have low variance. Mathematically, variance can be represented as the following:

\(\Large Var = E[{\hat{\theta}}^2] – {E[\hat{\theta}]}^2\).

Above equation can also be expressed as the following:

\(\Large Var = E[(E[\hat\theta] – \hat\theta)^2]\).

### Model Bias & Variance Intuition

The intuition behind bias and variance can be understood based on the following diagram. One can get a detailed explanation by going through the free online course – Introduction to Machine Learning by Dr. Sebastian Raschka.

The above diagram represents a person throwing a stone with a goal of hitting the point representing target value. However, his stones hit the ground at different points much away from actual / target point. Let’s try and understand bias and variance concept using the above diagram.

In the above diagram, the bias is represented as the distance between target value (true value) and the average of points where the stone hit the ground. The dashed line represents the average of predictions made by different models (different points where the stone hit the ground) trained using different training data set derived from the same population. Lower the distance (and, hence lower bias) between average (dashed line) and target value (red circle), better is the model.

The variance can be represented as the spread of individual points around the average (red dashed line). Lower the spread (and, hence lower variance), better is the model.

### Model Loss as a function of Bias & Variance

If you pay closer attention to the diagram in Fig 1, you may realise that for a particular target or true value, the loss of the model can be represented as the function of bias and variance. Based on different loss function, the bias and the variance can have different contributions to make in the overall loss. One thing that becomes clear is this:

If the model bias and the variance, both are low, the model will have higher accuracy. In other words, the generalization error of the model will be very low.

### Yet Another Explanation of Model Bias & Variance

Let’s understand the bias and variance of model vis-a-vis model accuracy or generalization error using another diagram such as the following. The diagram is very popular and sighted in several articles / posts. I will take another shot to explain bias and variance using the below diagram.

The above diagram represents four dart boards with points put on each dart board. The center of the dart board, in red, represents the target (or target value when speaking in ML terms). Different blue dots represent the throw made to hit the target value. The throws represent the predictions made by different estimators.

If the throws (blue dots – predictions by different estimators) are consistently near to each other, the throws are said to be very precise **(high precision**). In ML terms, the throws near to each other represent **low variance. **The models / estimators will said to have low variance. If the throws are far off each other, the throws are not precise (**low precision**). This can be thought of as **high variance, **in ML terms. The models / estimators will said to have high variance.

If the throws (blue dots – predictions by different estimators) are consistently near to the target (target value), this would represent **low bias** or no bias. If the throws are away from the target, the models will turn out to have **high bias**.

So, what is the **ideal scenario** for the best machine learning model:

- The ideal scenario will be to have low bias and low variance. In the above diagram, this will mean that the throws (blue dots) hit the red circle. This is very difficult to achieve.
- The second best scenario could be low bias and somewhat high variance. This would still mean that the loss is comparatively lower than the other settings such as high bias / low variance and high bias / high variance.

### Model Bias & Variance vs Overfitting & Underfitting

A model which is suffering from underfitting (both training and test error are very high) will have high bias and low variance resulting in very high loss (hence high error). These models have comparatively very lower capacity or, in other words, lower complexity. An example will be decision stump (one node & two leaves) which will underfit and have low bias.

An algorithm having access to larger hypothesis space (as a result of different training data sets, features and hyperparameters) would result in models having higher overfitting. In other words, these models will have high variance. These models will be termed as complex models owing to the fact that several models can be trained using a combination of different training datasets, large number of features and different values of hyper parameters.

As the model complexity increases, the model tends to move from the state of underfitting to overfitting. When speaking in terms of bias and variance, the models tend to move from the state of having **high bias **to **high variance. **

**The goal will be to find a sweet spot where there is optimum value of bias and variance. A sweet spot where model neither underfits or overfits. **Remember that as the model complexity increases, the bias decreases and the variance increases. Thus, the goal is to find an optimum level of model complexity (capacity) where sweet spot can be found.

## Bias & Variance Interview Questions

Here is a quiz consisting of questions related to **bias & variance.** These questions are intended to test your understanding of the concepts around bias & variance.

- First Principles Understanding based on Physics - April 13, 2021
- Precision & Recall Explained using Covid-19 Example - April 11, 2021
- Moving Average Method for Time-series forecasting - April 4, 2021

[…] value of bias and variance. You can read more about bias and variance concepts on this page – Bias and variance concepts and interview questions. The dartboard diagram given below can be used to understand the bias and variance concepts. The […]

[…] Bagging classifier helps reduce the variance of unstable classifiers (having high variance). The unstable classifiers include classifiers trained using algorithms such as decision tree which is found to have high variance and low bias. Thus, one can get the most benefit of using bagging classifier for algorithms such as decision trees. The stable classifiers such as linear discriminant analysis which have low variance may not benefit much from bagging technique. You may want to check this post to get a better understanding of Bias and Variance concepts – Bias & variance concepts and interview questions […]