*Last updated: 18th August, 2024*

As data scientists, we navigate a sea of metrics to evaluate the performance of our regression models. Understanding these metrics – **Mean Squared Error (MSE)**, **Root Mean Squared Error (RMSE)**, **Mean Absolute Error (MAE)**, **Mean Absolute Percentage Error (MAPE)**, and **R-Squared** – is crucial for robust model evaluation and selection. In this blog, we delve into the intricacies of these different metrics while learning them based on clear definitions, formulas, and guidance on when to use which of these metrics.

## Different Types of Regression Models Evaluation Metrics

The following are different types of regression model evaluation metrics including MSE, RMSE, MAE, MAPE, R-squared, and Adjusted R-squared which get used in different scenarios when training the regression models to solve the desired problem in hand. Each metric provides a different lens to evaluate the performance of a regression model. Choosing the right metric depends on the specific context and objectives of our analysis. Understanding these metrics intuitively helps in selecting the most appropriate model and communicating its performance effectively.

### Mean Squared Error (MSE)

MSE is a cost function that calculates the average of the squares of the errors—i.e., the average squared difference between the estimated values and the actual value. Suppose we have a regression model that predicts the house prices. MSE would measure the average squared difference between the actual price and the model’s predicted prices. For example, if the model predicts a house to be $300,000 and it’s $320,000, the squared error is square of $(300,000−320,000) = $400000. MSE does this for all predictions and averages them. It **emphasizes larger errors**, which could be crucial in scenarios like financial forecasting where large errors are more detrimental.

**Formula:** $MSE = \frac{1}{n} \sum_{i=1}^{n} (Y_i – \hat{Y}_i)^2$

In case of data having outliers, you can use evaluation metrics such as MAE. This is because error magnifies in case of outliers.

### Root Mean Squared Error (RMSE)

RMSE is a cost function that can be represented as the square root of the mean square error, bringing the scale of the errors to be the same as the scale of targets. In the context of the house pricing example, **RMSE brings the error metric back to the price scale**. This makes it easier to understand the average error in terms of the actual values. If RMSE is $20,000, it means the typical prediction error is about $20,000.

**Choosing Root Mean Squared Error (RMSE) over Mean Squared Error (MSE) can be advantageous for several reasons**, particularly in the context of practical application and interpretability.

- RMSE is in the same units as the target variable being predicted, while MSE is in squared units. This makes RMSE more interpretable. Similarly, RMSE is scale-dependent, meaning it is related to the scale of the data. When comparing model performance across datasets with different scales, RMSE can provide a more intuitive sense of the error magnitude relative to the scale of the data.
- While both MSE and RMSE are sensitive to large errors due to the squaring of the residuals, MSE tends to be more sensitive. This is because the squaring of errors before averaging, followed by taking the square root, magnifies the impact of larger errors more than smaller ones.

**Formula:** $RMSE = \sqrt{\frac{1}{n} \sum_{i=1}^{n} (Y_i – \hat{Y}_i)^2}$

Here are some of the** Kaggle competitions** which used **RMSE **as the evaluation metrics:

- Google Analytics Customer Revenue Prediction
- Elo Merchant Category Recommendation
- Avito Demand Prediction Challenge

As with the case of MSR, RMSE can be avoided when data has outliers. Larger errors due to outliers get punished. You might use MAE as the model performance metrics.

### Mean Absolute Error (MAE)

MAE measures the average magnitude of the errors in a set of predictions, without considering their direction. It is the average absolute difference between the predicted and actual values. Unlike MSE, it doesn’t square the errors, which means **it doesn’t punish larger errors as harshly**. In our house pricing example, if you’re off by $20,000 or $40,000, MAE treats these errors linearly. This metric is particularly useful when you want to avoid giving extra penalty to large errors.

The Mean Absolute Error (MAE) offers distinct advantages over Mean Squared Error (MSE) and Root Mean Squared Error (RMSE) in certain situations.

**Robustness to Outliers:**MAE is less sensitive to outliers compared to MSE and RMSE. In MSE and RMSE, errors are squared before they are averaged, which gives a disproportionately large weight to large errors (outliers). This can skew the overall error metric if your data has many outliers or is highly variable. MAE, by taking the absolute value of errors, treats all deviations from the true values equally, providing a more robust error metric in such cases.**Interpretability:**MAE is intuitively easier to understand since it’s simply the average error in the same units as the data. For instance, if you’re predicting the price of a product and your MAE is 5, it means that on average, your predictions are off by 5 units of currency.

**Formula:** $MAE = \frac{1}{n} \sum_{i=1}^{n} |Y_i – \hat{Y}_i|$

Here are a couple of **examples **of** Kaggle competitions** which used **MAE **as the evaluation metrics:

### Mean Absolute Percentage Error (MAPE)

MAPE expresses the error as a percentage of the actual values, providing an easy-to-understand metric. For instance, if a house is worth $200,000 and you predict $180,000, the error is 10%. This percentage-based approach makes MAPE very interpretable, especially when explaining model performance to stakeholders who might not be technical.

Mean Absolute Percentage Error (MAPE) offers unique advantages over Mean Absolute Error (MAE) and Mean Squared Error (MSE) / Root Mean Squared Error (RMSE) in certain scenarios. Its distinctive features make it a preferred choice in specific contexts:

**Relative Error Measurement:**MAPE expresses the error as a percentage, providing a relative measure of error. This is particularly useful in scenarios where it’s important to understand the size of the error in proportion to the actual value. For instance, a $10 error on a $100 item (10% error) is more significant than a $10 error on a $1,000 item (1% error).**Scale Independence:**Unlike MAE or MSE/RMSE, which are scale-dependent and influenced by the magnitude of the data, MAPE offers a scale-independent view of the error. This makes it especially valuable for comparing the performance of models across datasets with different scales or units.**Interpretability for Stakeholders:**The percentage-based error metric is often easier for non-technical stakeholders to understand. Telling a business team that the average error is 5% is more intuitive than saying the average error is 50 units.

**Formula:** $MAPE = \frac{100%}{n} \sum_{i=1}^{n} \left|\frac{Y_i – \hat{Y}_i}{Y_i}\right|$

### R-Squared

R-Squared indicates the proportion of the variance in the dependent variable that is predictable from the independent variables. R-Squared shows how well your predictions approximate the real data points. It’s like grading a test out of 100%. A high R-Squared (close to 1) means your model can very closely predict the actual values. For instance, in predicting house prices, a high R-Squared would indicate that your model captures most of the variability in house prices.

Using R-Squared over other metrics like MAE, MSE, RMSE, or MAPE has distinct advantages in specific contexts:

**Proportion of Variance Explained:**R-Squared quantifies the proportion of the variance in the dependent variable that is predictable from the independent variables. This makes it a measure of the model’s ability to capture the variability in the dataset, offering a sense of how well the model fits the data.**Scale Independence:**Unlike MAE, MSE, or RMSE, R-Squared is not affected by the scale of the data. This allows for easier comparison between models on different scales and makes it a useful tool in model selection.**Ease of Interpretation:**R-Squared values range from 0 to 1, with 0 indicating that the model explains none of the variability of the response data around its mean and 1 indicating that it explains all the variability. This range is intuitive and easily understood, even by non-technical stakeholders.**Compatibility with Linear Models:**R-Squared is particularly suitable for linear regression models. It provides a clear indication of how much better a model is than a simple average. This is especially useful in cases where the goal is to assess the improvement of a model over a baseline.

When using R-squared, we also come across another related metrics called as **adjusted R-Squared**. It is an essential statistical measure, especially in the context of multiple regression models. While R-Squared indicates the proportion of variance in the dependent variable that can be explained by the independent variables, it **(R-squared) has a significant limitation: it tends to increase as more predictors are added to the model, regardless of whether those predictors actually improve the model.** **This is where Adjusted R-Squared becomes invaluable.** It modifies the R-Squared formula to account for the number of predictors in the model. Unlike R-Squared, Adjusted R-Squared increases only if the new predictor improves the model more than what would be expected by chance and can decrease if the predictor doesn’t improve the model sufficiently. This makes Adjusted R-Squared a more reliable metric, particularly when comparing models with a different number of predictors. It penalizes the model for adding predictors that do not contribute to its predictive power, thus providing a more accurate reflection of the model’s ability to explain the variance in the dependent variable.

## Differences: What, Why and When to use these metrics?

Based on the discussion in the previous section, the following is a list of key differences between these evaluation metrics:

Metrics | What? | Why? | When to Use? |
---|---|---|---|

MSE | Measures average squared difference between estimated and actual values. | Emphasizes larger errors. | When large errors are more critical. |

RMSE | Square root of MSE, in same units as response variable. | Easier interpretation of errors. | When error scale should match target scale. |

MAE | Average absolute difference between estimated and actual values. | Less sensitive to outliers. | With many outliers or non-normal residuals. |

MAPE | Percentage error between estimated and actual values. | Easy interpretation as a percentage. | For forecasting and percentage-based error analysis. |

R-Squared | Proportion of variance explained by the model. | Indicates model’s explanatory power. | To evaluate linear regression models’ fit. |

Adjusted R-squared | Statistical measure that modifies the R-Squared value to account for the number of predictors | Unlike R-squared, it penalizes the model for including irrelevant predictors | Useful in multiple regression scenarios where you have several independent variables |

## Python, R Code for Determining Evaluation Metrics

The following is the Python and R code for calculating these metrics such as MSE / RMSE, MAE, MAPE, R-Squared, Adjusted R-Squared for evaluating regression models.

### Python Code Example

The following is the Python code example:

from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score # Assuming y_true and y_pred are the true and predicted values # MSE mse = mean_squared_error(y_true, y_pred) # RMSE rmse = mean_squared_error(y_true, y_pred, squared=False) # MAE mae = mean_absolute_error(y_true, y_pred) # R-Squared r_squared = r2_score(y_true, y_pred) # Custom method for calculating MAPE def calculate_mape(y_true, y_pred): y_true, y_pred = np.array(y_true), np.array(y_pred) non_zero_mask = y_true != 0 return np.mean(np.abs((y_true[non_zero_mask] - y_pred[non_zero_mask]) / y_true[non_zero_mask])) * 100 # Example usage # y_true = [actual values] # y_pred = [predicted values] # mape = calculate_mape(y_true, y_pred)

### R Code Example

The following is the R code example:

# Assuming y_true and y_pred are the true and predicted values # MSE mse &amp;amp;amp;lt;- mean((y_pred - y_true)^2) # RMSE rmse &amp;amp;amp;lt;- sqrt(mse) # MAE mae &amp;amp;amp;lt;- mean(abs(y_pred - y_true)) # R-Squared r_squared &amp;amp;amp;lt;- summary(lm(y_true ~ y_pred))$r.squared # Custom method for calculating MAPE calculate_mape &amp;amp;amp;lt;- function(y_true, y_pred) { non_zero_indices &amp;amp;amp;lt;- which(y_true != 0) if (length(non_zero_indices) &amp;amp;amp;gt; 0) { mean(abs((y_true[non_zero_indices] - y_pred[non_zero_indices]) / y_true[non_zero_indices])) * 100 } else { NA } } # Example usage # y_true &amp;amp;amp;lt;- c(actual values) # y_pred &amp;amp;amp;lt;- c(predicted values) # mape &amp;amp;amp;lt;- calculate_mape(y_true, y_pred)

By understanding these metrics, you as data scientists can choose the most appropriate one for specific context. Remember, no single metric is the “best” in all situations; it depends on the specific objectives and nature of your data. This insight into model evaluation will empower your data science journey, leading to more accurate and reliable predictive regression models.

## My YouTube tutorial Video

- Agentic Reasoning Design Patterns in AI: Examples - October 18, 2024
- LLMs for Adaptive Learning & Personalized Education - October 8, 2024
- Sparse Mixture of Experts (MoE) Models: Examples - October 6, 2024

I found it very helpful. However the differences are not too understandable for me