Mean Squared Error or R-Squared – Which one to use?

Mean Squared Error Representation

In this post, you will learn about the concepts of mean-squared error (MSE) and R-squared, difference between them and which one to use when working with regression models such as linear regression model. You also learn Python examples to understand the concepts in a better manner. In this post, the following topics are covered:

  • Introduction to Mean Squared Error (MSE) and R-Squared
  • Difference between MSE and R-Squared
  • MSE or R-Squared – Which one to use?
  • MSE and R-Squared Python code example

Introduction to Mean Square Error (MSE) and R-Squared

In this section, you will learn about the concepts of mean squared error and R-squared. These are used for evaluating the performance of regression models such as linear regression model.

What is Mean Squared Error (MSE)?

Mean squared error (MSE) is the average of sum of squared difference between actual value and the predicted or estimated value. It is also termed as mean squared deviation (MSD). This is how it is represented mathematically:

Mean squared error
Fig 1. Mean Squared Error

The value of MSE is always positive or greater than zero. A value close to zero will represent better quality of the estimator / predictor (regression model). An MSE of zero (0) represents the fact that the predictor is a perfect predictor. When you take a square root of MSE value, it becomes root mean squared error (RMSE). In the above equation, Y represents the actual value and the Y’ is predicted value. Here is the diagrammatic representation of MSE:

Mean Squared Error Representation
Fig 2. Mean Squared Error Representation

What is R-Squared?

R-Squared is the ratio of Sum of Squares Regression (SSR) and Sum of Squares Total (SST). Sum of Squares Regression is amount of variance explained by the regression line. R-squared value is used to measure the goodness of fit. Greater the value of R-Squared, better is the regression model. However, we need to take a caution. This is where adjusted R-squared concept comes into picture. This would be discussed in one of the later posts. R-Squared is also termed as the coefficient of determination. For the training dataset, the  is bounded between 0 and 1, but it can become negative for the test dataset if the SSE is greater than SST. If the value of R-Squared is 1, the model fits the data perfectly with a corresponding MSE = 0.

Here is a visual representation to understand the concepts of R-Squared in a better manner.

Digrammatic representation for understanding R-Squared
Fig 4. Diagrammatic representation for understanding R-Squared
R-Squared as ration of SSR and SST

Pay attention to the diagram and note that greater the value of SSR, more is the variance covered by the regression / best fit line out of total variance (SST). R-Squared can also be represented using the following formula:

R-Squared = 1 – (SSE/SST)

Pay attention to the diagram and note that smaller the value of SSE, smaller is the value of (SSE/SST) and hence greater will be value of R-Squared.

R-Squared can also be expressed as a function of mean squared error (MSE). The following equation represents the same.

R-Squared as a function of MSE

Difference between Mean Square Error & R-Squared

The similarity between mean-squared error and R-Squared is that they both are a type of metrics which are used for evaluating the performance of the regression models, especially statistical model such as linear regression model. The difference is that MSE gets pronounced based on whether the data is scaled or not. For example, if the response variable is housing price in the multiple of 10K, MSE will be different (lower) than when the response variable such as housing pricing is not scaled (actual values). This is where R-Squared comes to the rescue. R-Squared is also termed as the standardized version of MSE. R-squared represents the fraction of variance of response variable captured by the regression model rather than the MSE which captures the residual error.

MSE or R-Squared – Which one to Use?

It is recommended to use R-Squared or rather adjusted R-Squared for evaluating the model performance of the regression models. This is primarily because R-Squared captures the fraction of response variance captured by the regression and tend to give better picture of quality of regression model. Also, MSE values differ based on whether the values of the response variable is scaled or not. A better measure is root mean squared error (RMSE) which takes care of the fact related to whether the values of the response variable is scaled or not.

One can alternatively use MSE or R-Squared based on what is appropriate and need of the hour.

MSE or R-Squared Python Code Example

Here is the python code representing how to calculate mean squared error or R-Squared value while working with regression models. Pay attention to some of the following in the code given below:

  • Sklearn.metrics mean_squared_error and r2_score is used for measuring the MSE and R-Squared values. Input to this methods are actual values and predicted values.
  • Sklearn Boston housing dataset is used for training a multiple linear regression model using Sklearn.linear_model LinearRegression
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression
from sklearn.pipeline import make_pipeline
from sklearn.metrics import mean_squared_error, r2_score
from sklearn import datasets
# Load the Sklearn Boston Dataset
boston_ds = datasets.load_boston()
X =
y =
# Create a training and test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# Fit a pipeline using Training dataset and related labels
pipeline = make_pipeline(StandardScaler(), LinearRegression()), y_train)
# Calculate the predicted value for training and test dataset
y_train_pred = pipeline.predict(X_train)
y_test_pred = pipeline.predict(X_test)
# Mean Squared Error
print('MSE train: %.3f, test: %.3f' % (mean_squared_error(y_train, y_train_pred),
                mean_squared_error(y_test, y_test_pred)))
# R-Squared
print('R^2 train: %.3f, test: %.3f' % (r2_score(y_train, y_train_pred),
                r2_score(y_test, y_test_pred)))


Here is the summary of what you learned in this post regarding mean square error (MSE) and R-Squared and which one to use?

  • MSE represents the residual error which is nothing but sum of squared difference between actual values and the predicted / estimated values.
  • R-Squared represents the fraction of response variance captured by the regression model
  • The disadvantage of using MSE is that the value of MSE varies based on whether the values of response variable is scaled or not. If scaled, MSE will be lower than the unscaled values.
Ajitesh Kumar
Follow me

Ajitesh Kumar

I have been recently working in the area of Data Science and Machine Learning / Deep Learning. In addition, I am also passionate about various different technologies including programming languages such as Java/JEE, Javascript, Python, R, Julia etc and technologies such as Blockchain, mobile computing, cloud-native technologies, application security, cloud computing platforms, big data etc. I would love to connect with you on Linkedin.
Posted in Data Science, Machine Learning, Python. Tagged with , , .

Leave a Reply

Your email address will not be published. Required fields are marked *

Time limit is exhausted. Please reload the CAPTCHA.