In this post, you will learn concepts of Lasso regression along with Python Sklearn examples. Lasso regression algorithm introduces penalty against model complexity (large number of parameters) using regularization parameter. Other two similar form of regularized linear regression are Ridge regression and Elasticnet regression which will be discussed in future posts. In this post, the following topics are discussed:

• What’s Lasso regression?
• Lasso regression python example
• Lasso regression cross validation python example

## What’s Lasso Regression?

LASSO stands for least absolute shrinkage and selection operator. Pay attention to words, “least absolute shrinkage” and “selection”. We will refer it shortly. Lasso regression is also called as L1-norm regularization.

Lasso regression is an extension to linear regression in the manner that a regularization parameter multiplied by summation of absolute value of weights gets added to the loss function (ordinary least squares) of linear regression. Lasso regression is also called as regularized linear regression. The idea is to induce the penalty against complexity by adding the regularization term such as that with increasing value of regularization parameter, the weights get reduced (and, hence penalty induced). The hypothesis or the mathematical model (equation) for Lasso regression is same as linear regression and can be expressed as the following. However, what is different is loss function.

Here is the loss function of LASSO regression. Compare it with the loss function of linear regression.

Compare it with linear regression loss function.

You may note that in Lasso regression’s loss function, there is an extra element such as the following:

The equation is fig 4 represents the regularization parameter $$\lambda$$ and summation of absolute values of weights. “m” represents the constant. With increasing value of regularization parameter which means increasing regularization strength, the absolute values of weights would need to decrease (shrink) to keep the overall value of loss function minimized. The optimization of Lasso loss function results in some of the weights becoming zero and hence can be seen as method of selection of the features. Pay attention to usage of words, shrinkage, selection and absolute. This is why LASSO is termed as Least absolute shrinkage and selection operator.

Optimizing LASSO loss function does result in some of the weights becoming zero. Thus, some of the features will be removed as a result. This is why LASSO regression is considered to be useful as supervised feature selection technique.

## Lasso Regression Python Example

Here is the Python code which can be used for fitting a model using LASSO regression. Pay attention to some of the following in the code given below:

• Sklearn Boston Housing dataset is used for training Lasso regression model
• Sklearn.linear_model Lasso class is used as Lasso regression implementation. The value of regularization parameter is passed as 1.0
from sklearn import datasets
from sklearn.linear_model import Lasso
from sklearn.model_selection import train_test_split
#
# Load the Boston Data Set
#
X = bh.data
y = bh.target
#
# Create training and test split
#
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
#
# Create an instance of Lasso Regression implementation
#
lasso = Lasso(alpha=1.0)
#
# Fit the Lasso model
#
lasso.fit(X_train, y_train)
#
# Create the model score
#
lasso.score(X_test, y_test), lasso.score(X_train, y_train)


Once the model is fit, one can look into the coefficients by printing lasso.coef_ command. It will be interesting to find that some of the coefficients value is found to be zero. Here is the screenshot:

## Lasso Regression Crossvalidation Python Example

In this section, you will see how you could use cross-validation technique with Lasso regression. Pay attention to some of the following:

• Sklearn.linear_model LassoCV is used as Lasso regression cross validation implementation.
• LassoCV takes one of the parameter input as “cv” which represents number of folds to be considered while applying cross-validation. In the example below, the value of cv is set to 5.
• Also, entire dataset is used for training and testing purpose. This is unlike the 2-way or 3-way holdout method where the model is trained and tested on different data split.
• The model performance of LassoCV model is found to be greater than the Lasso regression algorithm.
from sklearn import datasets
from sklearn.linear_model import LassoCV
from sklearn.model_selection import train_test_split
#
# Load the Boston Data Set
#
X = bh.data
y = bh.target
#
# Create an instance of Lasso Regression implementation
#
lasso_cv = LassoCV(cv=5)
#
# Fit the Lasso model
#
lasso_cv.fit(X, y)
#
# Create the model score
#
lasso_cv.score(X, y)


## Conclusions

Here is the summary of what you learned in relation to LASSO regression:

• Lasso regression extends Linear regression in the way that a regularization element is added to the least squares loss function of linear regression in order to induce the penalty (decrease weights) against complexity (large number of features)
• Increasing regularization parameter value (strength) results in weights getting reduced. This may result in some of the weights becoming zero. This is why Lasso regression is also considered for supervised feature selection.
• Use LassoCV implementation for applying cross-validation to Lasso regression.