*Last updated: 29th Dec, 2023*

As data scientists, we navigate a sea of metrics to evaluate the performance of our regression models. Understanding these metrics – **Mean Squared Error (MSE)**, **Root Mean Squared Error (RMSE)**, **Mean Absolute Error (MAE)**, **Mean Absolute Percentage Error (MAPE)**, and **R-Squared** – is crucial for robust model evaluation and selection. In this blog, we delve into the intricacies of these different metrics while learning them based on clear definitions, formulas, and guidance on when to use which of these metrics.

## Different Types of Regression Models Evaluation Metrics

The following are different types of regression models evaluation metrics including MSE, RMSE, MAE, MAPE, R-squared and Adjusted R-squared which get used in different scenarios when training the regression models to solve desired problem in hand. Each metric provides a different lens to evaluate the performance of a regression model. Choosing the right metric depends on the specific context and objectives of your analysis. Understanding these metrics intuitively helps in selecting the most appropriate one for your model and communicating its performance effectively.

### Mean Squared Error (MSE)

MSE calculates the average of the squares of the errors—i.e., the average squared difference between the estimated values and the actual value. Imagine you’re predicting house prices. MSE would measure the average squared difference between the actual and predicted prices. For example, if you predict a house to be $300,000 and it’s actually $320,000, the squared error is square of (300,000−320,000). MSE does this for all predictions and averages them. It **emphasizes larger errors**, which could be crucial in scenarios like financial forecasting where large errors are more detrimental.

**Formula:** $MSE = \frac{1}{n} \sum_{i=1}^{n} (Y_i – \hat{Y}_i)^2$

### Root Mean Squared Error (RMSE)

RMSE is the square root of the mean square error, bringing the scale of the errors to be the same as the scale of targets. In the context of the house pricing example, **RMSE brings the error metric back to the price scale**. This makes it easier to understand the average error in terms of the actual values. If RMSE is $20,000, it means the typical prediction error is about $20,000.

**Choosing Root Mean Squared Error (RMSE) over Mean Squared Error (MSE) can be advantageous for several reasons**, particularly in the context of practical application and interpretability.

- RMSE is in the same units as the target variable being predicted, while MSE is in squared units. This makes RMSE more interpretable. Similarly, RMSE is scale-dependent, meaning it is related to the scale of the data. When comparing model performance across datasets with different scales, RMSE can provide a more intuitive sense of the error magnitude relative to the scale of the data.
- While both MSE and RMSE are sensitive to large errors due to the squaring of the residuals, RMSE tends to be more sensitive. This is because the squaring of errors before averaging, followed by taking the square root, magnifies the impact of larger errors more than smaller ones.

**Formula:** $RMSE = \sqrt{\frac{1}{n} \sum_{i=1}^{n} (Y_i – \hat{Y}_i)^2}$

Here are some of the** Kaggle competitions** which used **RMSE **as the evaluation metrics:

- Google Analytics Customer Revenue Prediction
- Elo Merchant Category Recommendation
- Avito Demand Prediction Challenge

### Mean Absolute Error (MAE)

MAE measures the average magnitude of the errors in a set of predictions, without considering their direction. It is the average absolute difference between the predicted and actual values. Unlike MSE, it doesn’t square the errors, which means **it doesn’t punish larger errors as harshly**. In our house pricing example, if you’re off by $20,000 or $40,000, MAE treats these errors linearly. This metric is particularly useful when you want to avoid giving extra penalty to large errors.

The Mean Absolute Error (MAE) offers distinct advantages over Mean Squared Error (MSE) and Root Mean Squared Error (RMSE) in certain situations.

**Robustness to Outliers:**MAE is less sensitive to outliers compared to MSE and RMSE. In MSE and RMSE, errors are squared before they are averaged, which gives a disproportionately large weight to large errors (outliers). This can skew the overall error metric if your data has many outliers or is highly variable. MAE, by taking the absolute value of errors, treats all deviations from the true values equally, providing a more robust error metric in such cases.**Interpretability:**MAE is intuitively easier to understand since it’s simply the average error in the same units as the data. For instance, if you’re predicting the price of a product and your MAE is 5, it means that on average, your predictions are off by 5 units of currency.

**Formula:** $MAE = \frac{1}{n} \sum_{i=1}^{n} |Y_i – \hat{Y}_i|$

Here are a couple of **examples **of** Kaggle competitions** which used **MAE **as the evaluation metrics:

### Mean Absolute Percentage Error (MAPE)

MAPE expresses the error as a percentage of the actual values, providing an easy-to-understand metric. For instance, if a house is worth $200,000 and you predict $180,000, the error is 10%. This percentage-based approach makes MAPE very interpretable, especially when explaining model performance to stakeholders who might not be technical.

Mean Absolute Percentage Error (MAPE) offers unique advantages over Mean Absolute Error (MAE) and Mean Squared Error (MSE) / Root Mean Squared Error (RMSE) in certain scenarios. Its distinctive features make it a preferred choice in specific contexts:

**Relative Error Measurement:**MAPE expresses the error as a percentage, providing a relative measure of error. This is particularly useful in scenarios where it’s important to understand the size of the error in proportion to the actual value. For instance, a $10 error on a $100 item (10% error) is more significant than a $10 error on a $1,000 item (1% error).**Scale Independence:**Unlike MAE or MSE/RMSE, which are scale-dependent and influenced by the magnitude of the data, MAPE offers a scale-independent view of the error. This makes it especially valuable for comparing the performance of models across datasets with different scales or units.**Interpretability for Stakeholders:**The percentage-based error metric is often easier for non-technical stakeholders to understand. Telling a business team that the average error is 5% is more intuitive than saying the average error is 50 units.

**Formula:** $MAPE = \frac{100%}{n} \sum_{i=1}^{n} \left|\frac{Y_i – \hat{Y}_i}{Y_i}\right|$

### R-Squared

R-Squared indicates the proportion of the variance in the dependent variable that is predictable from the independent variables. R-Squared shows how well your predictions approximate the real data points. It’s like grading a test out of 100%. A high R-Squared (close to 1) means your model can very closely predict the actual values. For instance, in predicting house prices, a high R-Squared would indicate that your model captures most of the variability in house prices.

Using R-Squared over other metrics like MAE, MSE, RMSE, or MAPE has distinct advantages in specific contexts:

**Proportion of Variance Explained:**R-Squared quantifies the proportion of the variance in the dependent variable that is predictable from the independent variables. This makes it a measure of the model’s ability to capture the variability in the dataset, offering a sense of how well the model fits the data.**Scale Independence:**Unlike MAE, MSE, or RMSE, R-Squared is not affected by the scale of the data. This allows for easier comparison between models on different scales and makes it a useful tool in model selection.**Ease of Interpretation:**R-Squared values range from 0 to 1, with 0 indicating that the model explains none of the variability of the response data around its mean and 1 indicating that it explains all the variability. This range is intuitive and easily understood, even by non-technical stakeholders.**Compatibility with Linear Models:**R-Squared is particularly suitable for linear regression models. It provides a clear indication of how much better a model is than a simple average. This is especially useful in cases where the goal is to assess the improvement of a model over a baseline.

When using R-squared, we also come across another related metrics called as **adjusted R-Squared**. It is an essential statistical measure, especially in the context of multiple regression models. While R-Squared indicates the proportion of variance in the dependent variable that can be explained by the independent variables, it **(R-squared) has a significant limitation: it tends to increase as more predictors are added to the model, regardless of whether those predictors actually improve the model.** **This is where Adjusted R-Squared becomes invaluable.** It modifies the R-Squared formula to account for the number of predictors in the model. Unlike R-Squared, Adjusted R-Squared increases only if the new predictor improves the model more than what would be expected by chance and can decrease if the predictor doesn’t improve the model sufficiently. This makes Adjusted R-Squared a more reliable metric, particularly when comparing models with a different number of predictors. It penalizes the model for adding predictors that do not contribute to its predictive power, thus providing a more accurate reflection of the model’s ability to explain the variance in the dependent variable.

## Differences: What, Why and When to use these metrics?

Based on the discussion in the previous section, the following is a list of key differences between these evaluation metrics:

Metrics | What? | Why? | When to Use? |
---|---|---|---|

MSE | Measures average squared difference between estimated and actual values. | Emphasizes larger errors. | When large errors are more critical. |

RMSE | Square root of MSE, in same units as response variable. | Easier interpretation of errors. | When error scale should match target scale. |

MAE | Average absolute difference between estimated and actual values. | Less sensitive to outliers. | With many outliers or non-normal residuals. |

MAPE | Percentage error between estimated and actual values. | Easy interpretation as a percentage. | For forecasting and percentage-based error analysis. |

R-Squared | Proportion of variance explained by the model. | Indicates model’s explanatory power. | To evaluate linear regression models’ fit. |

Adjusted R-squared | Statistical measure that modifies the R-Squared value to account for the number of predictors | Unlike R-squared, it penalizes the model for including irrelevant predictors | Useful in multiple regression scenarios where you have several independent variables |

## Python, R Code for Determining Evaluation Metrics

The following is the Python and R code for calculating these metrics such as MSE / RMSE, MAE, MAPE, R-Squared, Adjusted R-Squared for evaluating regression models.

### Python Code Example

The following is the Python code example:

from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score # Assuming y_true and y_pred are the true and predicted values # MSE mse = mean_squared_error(y_true, y_pred) # RMSE rmse = mean_squared_error(y_true, y_pred, squared=False) # MAE mae = mean_absolute_error(y_true, y_pred) # R-Squared r_squared = r2_score(y_true, y_pred) # Custom method for calculating MAPE def calculate_mape(y_true, y_pred): y_true, y_pred = np.array(y_true), np.array(y_pred) non_zero_mask = y_true != 0 return np.mean(np.abs((y_true[non_zero_mask] - y_pred[non_zero_mask]) / y_true[non_zero_mask])) * 100 # Example usage # y_true = [actual values] # y_pred = [predicted values] # mape = calculate_mape(y_true, y_pred)

### R Code Example

The following is the R code example:

# Assuming y_true and y_pred are the true and predicted values # MSE mse <- mean((y_pred - y_true)^2) # RMSE rmse <- sqrt(mse) # MAE mae <- mean(abs(y_pred - y_true)) # R-Squared r_squared <- summary(lm(y_true ~ y_pred))$r.squared # Custom method for calculating MAPE calculate_mape <- function(y_true, y_pred) { non_zero_indices <- which(y_true != 0) if (length(non_zero_indices) > 0) { mean(abs((y_true[non_zero_indices] - y_pred[non_zero_indices]) / y_true[non_zero_indices])) * 100 } else { NA } } # Example usage # y_true <- c(actual values) # y_pred <- c(predicted values) # mape <- calculate_mape(y_true, y_pred)

By understanding these metrics, you as data scientists can choose the most appropriate one for specific context. Remember, no single metric is the “best” in all situations; it depends on the specific objectives and nature of your data. This insight into model evaluation will empower your data science journey, leading to more accurate and reliable predictive regression models.

- OKRs vs KPIs vs KRAs: Differences and Examples - February 21, 2024
- CEP vs Traditional Database Examples - February 2, 2024
- Retrieval Augmented Generation (RAG) & LLM: Examples - February 1, 2024

## Leave a Reply