In this post, you will learn concepts of Lasso regression along with Python Sklearn examples. Lasso regression algorithm introduces penalty against model complexity (a large number of parameters) using regularization parameter. The other two similar forms of regularized linear regression are Ridge regression and Elasticnet regression which will be discussed in future posts. In this post, the following topics are discussed:
What’s Lasso Regression?
Lasso regression is a machine learning algorithm that can be used to perform linear regression while also reducing the number of features used in the model. Lasso stands for least absolute shrinkage and selection operator. Pay attention to the words, “least absolute shrinkage” and “selection”. We will refer to it shortly. Lasso regression is used in machine learning to prevent overfitting. It is also used to select features by setting coefficients to zero. Lasso regression is also called L1-norm regularization.
Lasso regression is an extension of linear regression in the manner that a regularization parameter multiplied by the summation of the absolute value of weights gets added to the loss function (ordinary least squares) of linear regression. Lasso regression is also called regularized linear regression. The idea is to induce the penalty against complexity by adding the regularization term such that with increasing value of the regularization parameter, the weights get reduced (and, hence penalty induced) to keep the overall goal of the minimized sum of squares. The hypothesis or the mathematical model (equation) for Lasso regression is the same as linear regression and can be expressed as the following. However, what is different is loss function.
Here is the loss function of LASSO regression. Compare it with the loss function of linear regression.
Compare it with the linear regression loss function.
You may note that in Lasso regression’s loss function, there is an extra element such as the following:
The equation in fig 4 represents the regularization parameter \(\lambda\) and summation of absolute values of weights. “m” represents the constant. The increasing value of the regularization parameter means increasing regularization strength, the absolute values of weights would need to decrease (shrink) to keep the overall value of the loss function minimized. The optimization of the Lasso loss function results in some of the weights becoming zero and hence can be seen as a method of selection of the features. Pay attention to the usage of words, shrinkage, selection, and absolute. This is why LASSO is termed as Least absolute shrinkage and selection operator.
Optimizing the LASSO loss function does result in some of the weights becoming zero. Thus, some of the features will be removed as a result. This is why LASSO regression is considered to be useful as a supervised feature selection technique.
Lasso Regression Python Example
In Python, Lasso regression can be performed using the Lasso class from the sklearn.linear_model library. The Lasso class takes in a parameter called alpha which represents the strength of the regularization term. A higher alpha value results in a stronger penalty, and therefore fewer features being used in the model. In other words, a higher alpha value such as 1.0 results in more features being removed from the model than a value such as 0.1. The Lasso class also has a fit() method that can be used to fit the model to training data, and a predict() method that can be used to make predictions on new data.
Here is the Python code which can be used for fitting a model using LASSO regression. Pay attention to some of the following in the code given below:
- Sklearn Boston Housing dataset is used for training Lasso regression model
- Sklearn.linear_model Lasso class is used as Lasso regression implementation. The value of the regularization parameter is passed as 1.0
from sklearn import datasets from sklearn.linear_model import Lasso from sklearn.model_selection import train_test_split # # Load the Boston Data Set # bh = datasets.load_boston() X = bh.data y = bh.target # # Create training and test split # X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42) # # Create an instance of Lasso Regression implementation # lasso = Lasso(alpha=1.0) # # Fit the Lasso model # lasso.fit(X_train, y_train) # # Create the model score # lasso.score(X_test, y_test), lasso.score(X_train, y_train)
Once the model is fit, one can look into the coefficients by printing lasso.coef_ command. It will be interesting to find that some of the coefficients value is found to be zero. Here is the screenshot:
Lasso Regression Cross-validation Python Example
In this section, you will see how you could use the cross-validation technique with Lasso regression. Pay attention to some of the following:
- Sklearn.linear_model LassoCV is used as Lasso regression cross validation implementation.
- LassoCV takes one of the parameter inputs as “cv” which represents a number of folds to be considered while applying cross-validation. In the example below, the value of cv is set to 5.
- Also, the entire dataset is used for training and testing purposes. This is unlike the 2-way or 3-way holdout method where the model is trained and tested on different data split.
- The model performance of the LassoCV model is found to be greater than the Lasso regression algorithm.
from sklearn import datasets from sklearn.linear_model import LassoCV from sklearn.model_selection import train_test_split # # Load the Boston Data Set # bh = datasets.load_boston() X = bh.data y = bh.target # # Create an instance of Lasso Regression implementation # lasso_cv = LassoCV(cv=5) # # Fit the Lasso model # lasso_cv.fit(X, y) # # Create the model score # lasso_cv.score(X, y)
Here is the summary of what you learned in relation to LASSO regression:
- Lasso regression extends Linear regression in the way that a regularization element is added to the least-squares loss function of linear regression in order to induce the penalty (decrease weights) against complexity (large number of features)
- Increasing regularization parameter value (strength) results in weights getting reduced. This may result in some of the weights becoming zero. This is why Lasso regression is also considered for supervised feature selection.
- Use LassoCV implementation for applying cross-validation to Lasso regression.
- A higher alpha value results in a stronger penalty, and therefore fewer features being used in the model. In other words, a higher alpha value such as 1.0 results in more features being removed from the model than a value such as 0.1.