Last updated: 2nd May, 2024
The success of machine learning models often depends on the quality of the features used to train them. This is where the concepts of feature extraction and feature selection come in. In this blog post, we’ll explore the difference between feature selection and feature extraction, two key techniques used as part of feature engineering in machine learning to optimize feature sets for better model performance. Both feature selection and feature extraction are used for dimensionality reduction which is key to reducing model complexity given that higher model complexity often results in overfitting. We’ll provide examples of how they can be applied in real-world scenarios. If you want to improve your machine learning models, understanding the basics of feature selection and feature extraction is essential. So, let’s dive in and explore these concepts in more detail.
Feature selection is a process in machine learning that involves selecting the most relevant subset of features out of the original features in the dataset, to be used as inputs for training the model. The goal of feature selection is to improve model performance on unseen datasets by reducing the number of irrelevant or redundant features that may introduce high variance into the model, thereby resulting in the model overfitting.
Benefits of feature selection: By selecting only the most important features, the model can focus on the features that have the most positive impact on the model performance, and ignore irrelevant or redundant features that may lead to the overfitting of the model. This can result in faster training times, and improved accuracy on unseen data sets based on reduced generalization error.
Disadvantages of feature selection being ignored: If we don’t adopt feature selection when training a machine learning model, we may encounter several problems.
The following represents some of the important feature selection techniques:
L1 norm regularization, also known as Lasso regularization, is a common regularization technique used in feature selection. It works by adding a penalty term that encourages the model to select only the most important features, while reducing the weights of irrelevant or redundant features to zero. L1 norm regularization introduces sparsity into the feature weights, meaning that only a subset of the features have non-zero weights. The other features are effectively ignored by the model, resulting in a form of automatic feature selection. L1 norm regularization can be particularly useful in cases where the dataset contains many features, and some of them are irrelevant or redundant.
Feature importance techniques such as using estimator such as Random Forest algorithm to fit a model and select features based on the value of attribute such as feature_importances_ . The feature_importances_ attribute of the Random Forest estimator can be used to obtain the relative importance of each feature in the dataset. The feature_importances_ attribute of the Random Forest estimator provides a score for each feature in the dataset, indicating how important that feature is for making predictions. These scores are calculated based on the reduction in impurity (e.g., Gini impurity or entropy) achieved by splitting the data on that feature. The feature with the highest score is considered the most important, while features with low scores can be considered less important or even irrelevant. The code below
# Import required libraries
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
# Load the IRIS dataset
iris = load_iris()
# Split data into features (X) and target variable (y)
X = iris.data
y = iris.target
# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# Train the Random Forest classifier
rf = RandomForestClassifier(n_estimators=100, random_state=42)
rf.fit(X_train, y_train)
# Get feature importances
importances = rf.feature_importances_
# Print feature importances
for feature, importance in zip(iris.feature_names, importances):
print(f'{feature}: {importance}')
This is what will get printed.
Greedy search algorithms such as some of the following which are useful for algorithms (such as K-nearest neighbours, K-NN) where regularization techniques are not supported.
According to the utilized training data (labeled, unlabeled, or partially labeled), feature selection methods can be divided into supervised, unsupervised, and semi-supervised models. According to their relationship with learning methods, feature selection methods can be classified into the following:
According to the evaluation criterion, feature selection methods can be derived from correlation, Euclidean distance, consistency, dependence and information measures. According to the type of output, feature selection methods can be divided into feature rank (weighting) and subset selection models.
Feature extraction is about extracting/deriving information from the original features set to create a new features subspace. The primary idea behind feature extraction is to compress the data with the goal of maintaining most of the relevant information. As with feature selection techniques, these techniques are also used for reducing the number of features from the original features set to reduce model complexity, model overfitting, enhance model computation efficiency and reduce generalization error. The following are different types of feature extraction techniques:
The key difference between feature selection and feature extraction techniques used for dimensionality reduction is that while the original features are maintained in the case of feature selection algorithms, the feature extraction algorithms transform the data onto a new feature space.
Feature selection techniques can be used if the requirement is to maintain the original features, unlike the feature extraction techniques which derive useful information from data to construct a new feature subspace. Feature selection techniques are used when model explainability is a key requirement.
Feature extraction techniques can be used to improve the predictive performance of the models, especially, in the case of algorithms that don’t support regularization.
Unlike feature selection, feature extraction usually needs to transform the original data to features with strong pattern recognition ability, where the original data can be regarded as features with weak recognition ability.
Here is a quick quiz you can use to check your knowledge on feature selection vs feature extraction.
[wp_quiz id=”10213″]
In the ever-evolving landscape of agentic AI workflows and applications, understanding and leveraging design patterns…
In this blog, I aim to provide a comprehensive list of valuable resources for learning…
Have you ever wondered how systems determine whether to grant or deny access, and how…
What revolutionary technologies and industries will define the future of business in 2025? As we…
For data scientists and machine learning researchers, 2024 has been a landmark year in AI…
ChatGPT Canvas is a cutting-edge, user-friendly platform that simplifies content creation and elevates collaboration. Whether…