Lasso Regression Explained with Python Example

Lasso regression, also known as L1 regularization, is a linear regression method that uses regularization to prevent overfitting and improve model performance. It works by adding a penalty term to the cost function that encourages the model to select only the most important features and set the coefficients of less important features to zero. This makes Lasso regression a popular method for feature selection and high-dimensional data analysis.

In this post, you will learn concepts, advantages and limitations of Lasso regression along with Python Sklearn examples. The other two similar forms of regularized linear regression are Ridge regression and Elasticnet regression which will be discussed in future posts. 

What’s Lasso Regression?

Lasso regression is a machine learning algorithm that can be used to perform linear regression while also reducing the number of features used in the model. Lasso stands for least absolute shrinkage and selection operator. Pay attention to the words, “least absolute shrinkage” and “selection”. We will refer to it shortly. Lasso regression is used in machine learning to prevent overfitting. It is also used to select features by setting coefficients to zero. Lasso regression is also called L1-norm regularization. In L1 regularization, a penalty term is added to the cost function that is proportional to the sum of the absolute values of the coefficients. This encourages the model to select only the most important features and set the coefficients of less important features to zero.

Compared to other regularization methods, such as Ridge regression that uses L2 regularization, Lasso regression has the advantage of producing sparse solutions, where only a subset of the features are used in the model. This makes Lasso regression a popular method for feature selection and high-dimensional data analysis.

One limitation of Lasso regression is that it tends to work better when the number of features is smaller than the number of samples. This is because Lasso regression can completely eliminate some features from the model by setting their coefficients to zero, which can be problematic when the number of features is large.

Lasso regression is an extension of linear regression in the manner that a regularization parameter multiplied by the summation of the absolute value of weights gets added to the loss function (ordinary least squares) of linear regression. Lasso regression is also called regularized linear regression. The idea is to induce the penalty against complexity by adding the regularization term such that with increasing value of the regularization parameter, the weights get reduced (and, hence penalty induced) to keep the overall goal of the minimized sum of squares. The hypothesis or the mathematical model (equation) for Lasso regression is the same as linear regression and can be expressed as the following. However, what is different is loss function.

Lasso Regression Hypothesis Function
Fig 1. Lasso Regression Hypothesis Function

Here is the loss function of LASSO regression. Compare it with the loss function of linear regression.

Lasso Regression Loss Function
Fig 2. Lasso Regression Loss Function

Compare it with the linear regression loss function.

Linear Regression Loss Function
Fig 3. Linear Regression Loss Function

You may note that in Lasso regression’s loss function, there is an extra element such as the following:

Regularization parameter with absolute summation of weights values
Fig 4. Regularization parameter with absolute summation of weights values

The equation in fig 4 represents the regularization parameter \(\lambda\) and summation of absolute values of weights. “m” represents the constant. The increasing value of the regularization parameter means increasing regularization strength, the absolute values of weights would need to decrease (shrink) to keep the overall value of the loss function minimized. The optimization of the Lasso loss function results in some of the weights becoming zero and hence can be seen as a method of selection of the features. Pay attention to the usage of words, shrinkage, selection, and absolute. This is why LASSO is termed as Least absolute shrinkage and selection operator.

Optimizing the LASSO loss function does result in some of the weights becoming zero. Thus, some of the features will be removed as a result. This is why LASSO regression is considered to be useful as a supervised feature selection technique.

Lasso Regression Python Example

In Python, Lasso regression can be performed using the Lasso class from the sklearn.linear_model library. The Lasso class takes in a parameter called alpha which represents the strength of the regularization term. A higher alpha value results in a stronger penalty, and therefore fewer features being used in the model. In other words, a higher alpha value such as 1.0 results in more features being removed from the model than a value such as 0.1. The Lasso class also has a fit() method that can be used to fit the model to training data, and a predict() method that can be used to make predictions on new data.

Here is the Python code which can be used for fitting a model using LASSO regression. Pay attention to some of the following in the code given below:

  • Sklearn Breast Cancer dataset is used for training Lasso regression model
  • Sklearn.linear_model Lasso class is used as Lasso regression implementation. The value of the regularization parameter is passed as 0.1
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.linear_model import Lasso
from sklearn.metrics import mean_squared_error
from sklearn.preprocessing import StandardScaler

# Load the diabetes dataset
diabetes = datasets.load_diabetes()

# Split the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(diabetes.data, diabetes.target, test_size=0.3, random_state=42)

# Scale the data using StandardScaler
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Fit Lasso regression model
lasso = Lasso(alpha=0.1)
lasso.fit(X_train, y_train)

# Evaluate model performance on test set
y_pred = lasso.predict(X_test)

# Model Score
print("Model Score: ", lasso.score(X_test, y_test))

# Lasso Coefficient
lasso.coef_

Once the model is fit, one can look into the coefficients by printing lasso.coef_ command. It will be interesting to find that some of the coefficients value is found to be zero. Here is the screenshot:

Lasso Regression Cross-validation Python Example

In this section, you will see how you could use the cross-validation technique with Lasso regression. Pay attention to some of the following:

  • Sklearn.linear_model LassoCV is used as Lasso regression cross validation implementation.
  • LassoCV takes one of the parameter inputs as “cv” which represents a number of folds to be considered while applying cross-validation. In the example below, the value of cv is set to 5.
  • Also, the entire dataset is used for training and testing purposes. This is unlike the 2-way or 3-way holdout method where the model is trained and tested on different data split.
  • The model performance of the LassoCV model is found to be greater than the Lasso regression algorithm.
from sklearn import datasets
from sklearn.linear_model import LassoCV
from sklearn.model_selection import train_test_split
#
# Load the Boston Data Set
#
bh = datasets.load_boston()
X = bh.data
y = bh.target
#
# Create an instance of Lasso Regression implementation
#
lasso_cv = LassoCV(cv=5)
#
# Fit the Lasso model
#
lasso_cv.fit(X, y)
#
# Create the model score
#
lasso_cv.score(X, y)

The following code can be used for fine tuning Lasso regularization parameter using Grid search. The code given below has used the GridSearchCV function from scikit-learn, which allows us to search over a range of parameter values and select the best combination based on cross-validation performance.

from sklearn.model_selection import GridSearchCV

# Define parameter grid
param_grid = {'alpha': [0.001, 0.01, 0.1, 1, 10]}

# Perform grid search with cross-validation
lasso_cv = GridSearchCV(Lasso(), param_grid, cv=5)
lasso_cv.fit(X_train, y_train)

# Print best parameter values and score
print("Best Parameters:", lasso_cv.best_params_)
print("Best Score:", lasso_cv.best_score_)

Conclusions

Here is the summary of what you learned in relation to LASSO regression:

  • Lasso regression extends Linear regression in the way that a regularization element is added to the least-squares loss function of linear regression in order to induce the penalty (decrease weights) against complexity (large number of features)
  • Increasing regularization parameter value (strength) results in weights getting reduced. This may result in some of the weights becoming zero. This is why Lasso regression is also considered for supervised feature selection.
  • Use LassoCV implementation for applying cross-validation to Lasso regression.
  • A higher alpha value results in a stronger penalty, and therefore fewer features being used in the model. In other words, a higher alpha value such as 1.0 results in more features being removed from the model than a value such as 0.1.
Ajitesh Kumar
Follow me

Ajitesh Kumar

I have been recently working in the area of Data analytics including Data Science and Machine Learning / Deep Learning. I am also passionate about different technologies including programming languages such as Java/JEE, Javascript, Python, R, Julia, etc, and technologies such as Blockchain, mobile computing, cloud-native technologies, application security, cloud computing platforms, big data, etc. For latest updates and blogs, follow us on Twitter. I would love to connect with you on Linkedin. Check out my latest book titled as First Principles Thinking: Building winning products using first principles thinking
Posted in Data Science, Machine Learning, Python. Tagged with , , , .

2 Responses

Leave a Reply

Your email address will not be published. Required fields are marked *

Time limit is exhausted. Please reload the CAPTCHA.