# Ridge Regression Concepts & Python example Ridge regression is a type of linear regression that penalizes ridge coefficients. This technique can be used to reduce the effects of multicollinearity in ridge regression, which may result from high correlations among predictors or between predictors and independent variables. In this tutorial, we will explain ridge regression with a Python example.

## What is Ridge Regression?

Ridge regression is a type of linear regression technique that is used in machine learning to reduce the overfitting of linear models. Recall that Linear regression is a method of modeling data that represents relationships between a response variable and one or more predictor variables. Ridge regression is used when there are multiple variables that are highly correlated. It helps to prevent overfitting by penalizing the coefficients of the variables. Ridge regression reduces the overfitting by adding a penalty term to the error function that shrinks the size of the coefficients. Ridge regression is similar to ordinary least squares regression, but the penalty term ensures that the coefficients do not become too large. This can be beneficial when there is a lot of noise in the data, as it prevents the model from being too sensitive to individual data points. Ridge regression is often used in conjunction with other machine learning methods, such as cross-validation, to further reduce overfitting. Ridge regression is also less sensitive to outliers than linear regression. The downside of ridge regression is that it can be computationally intensive and can require more data to achieve accurate results.

## How does Ridge regression work?

Ridge regression works by adding a penalty term to the cost function, the penalty term being proportional to the sum of the squares of the coefficients. The penalty term is called the L2 norm. The result is that the optimization problem becomes easier to solve and the coefficients become smaller. This penalty term encourages the model to find a balance between fitting the training data well and having low complexity. As a result, ridge regression can help to improve the generalizability of a machine learning model.

The cost function for ridge regression looks like this. You may note that the cost function comprises two functions. The first one is the cost function same as the one used for the linear regression model. This term ensures that the training data fits well. The second term is called the L2 penalty or regularization term. The goal of this term is to keep the parameters small.

where y is the predicted value, x is the input value, β is the coefficient, and λ is the penalty term. As you can see, the penalty term is added to the error term. In ridge regression, we want to minimize both the error and the size of the coefficients. By adding the penalty term, we are encouraged to find a balance between these two objectives.

Ridge regression penalizes the sum of the squared coefficients, or beta values, in order to reduce the variance of the estimates. It shrinks the coefficients and thus reduces the standard errors. The penalty term serves to reduce the magnitude of the coefficients, and it also helps to prevent overfitting. As a result, Ridge regression can provide improved predictive accuracy.  This ultimately results in more stable and accurate predictions. Ridge regression also has the ability to handle nonlinear relationships between predictor and outcome variables better than linear regression.

Ridge regression has a number of advantages over least squares or linear regression. First, it is more robust to collinearity than least-squares/linear regression. Second, it can be used even when there are outliers in the data. Third, ridge regression does not require the data to be perfectly normalized. Finally, ridge regression can be applied even when the number of variables is greater than the number of observations.

However, ridge regression also has some disadvantages. First, it can be computationally expensive if the data set is large. Second, it can be difficult to interpret the results of ridge regression because the Ridge term or L2 norm modifies the coefficients. This is because the cost function contains a quadratic term, which makes it more difficult to optimize. In addition, ridge regression does not provide an exact solution and instead only provides a closed-form approximation. This can make it difficult to interpret the results of the model. Finally, ridge regression is sensitive to outliers and can produce unstable results if the data contains outliers.

## Ridge Regression Python Example

Python provides a number of Ridge regression implementations, including Ridge from the scikit-learn package and RidgeCV from the statsmodels package. The code below uses Ridge class from Sklearn.linear_model to perform ridge regression. Unlike standard linear regression, which minimizes the sum of squared errors, ridge regression also includes a penalty term that minimizes the sum of squared coefficients. This penalty term is known as the alpha value. The sklearn library in Python implements ridge regression with the Ridge class. The Ridge class takes an alpha parameter, which specifies the amount of regularization to apply.

The example below shows how to use ridge regression to predict the prices of houses in Boston using the dataset from the scikit-learn package. The code first splits the data into training and test sets and then fits a ridge regression model on the training set. An instance of Ridge is created with a value of alpha as 0.1. The alpha value determines how much weight is given to the penalty term. A higher alpha value means that more weight is given to the penalty term, and a lower alpha value means that less weight is given to the penalty term. In this example, the alpha value is set to 0.1. This means that the penalty term will be given a weight of 0.1.

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import Ridge
from sklearn.pipeline import make_pipeline
from sklearn.metrics import mean_squared_error, r2_score
from sklearn import datasets
#
# Load the Sklearn Boston Dataset
#
X = boston_ds.data
y = boston_ds.target
#
# Create a training and test split
#
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
#
# Fit a pipeline using Training dataset and related labels
# Use Ridge algorithm for training the model
#
pipeline = make_pipeline(StandardScaler(), Ridge(alpha=1.0))
pipeline.fit(X_train, y_train)
#
# Calculate the predicted value for training and test dataset
#
y_train_pred = pipeline.predict(X_train)
y_test_pred = pipeline.predict(X_test)
#
# Mean Squared Error
#
print('MSE train: %.3f, test: %.3f' % (mean_squared_error(y_train, y_train_pred),
mean_squared_error(y_test, y_test_pred)))
#
# R-Squared
#
print('R^2 train: %.3f, test: %.3f' % (r2_score(y_train, y_train_pred), r2_score(y_test, y_test_pred)))


## References ## Ajitesh Kumar

I have been recently working in the area of Data analytics including Data Science and Machine Learning / Deep Learning. I am also passionate about different technologies including programming languages such as Java/JEE, Javascript, Python, R, Julia, etc, and technologies such as Blockchain, mobile computing, cloud-native technologies, application security, cloud computing platforms, big data, etc. For latest updates and blogs, follow us on Twitter. I would love to connect with you on Linkedin. Check out my latest book titled as First Principles Thinking: Building winning products using first principles thinking
Posted in Data Science, Machine Learning, Python. Tagged with , , .