In this post you will learn about the difference between feature extraction and feature selection concepts and techniques. Both feature selection and extraction are used for dimensionality reduction which is key to reducing model complexity and overfitting. The dimensionality reduction is one of the most important aspects of training machine learning models. As a data scientist, you must get a good understanding of dimensionality reduction techniques such as feature extraction and feature selection. In this post, the following topics will be covered:
- Feature selection concepts and techniques
- Feature extraction concepts and techniques
- When to use feature selection and feature extraction
Feature Selection Concepts & Techniques
Simply speaking, feature selection is about selecting a subset of features out of the original features in order to reduce model complexity, enhance the computational efficiency of the models and reduce generalization error introduced due to noise by irrelevant features. The following represents some of the important feature selection techniques:
- Regularization techniques such as L1 norm regularisation which results in most features’ weight to turn to zero
- Feature importance techniques such as using estimator such as Random Forest algorithm to fit a model and select features based on the value of attribute such as feature_importances_ .
- Greedy search algorithms such as some of the following which are useful for algorithms (such as K-nearest neighbours, K-NN) where regularization techniques are not supported.
Feature Extraction Concepts & Techniques
Feature extraction is about extracting/deriving information from the original features set to create a new features subspace. The primary idea behind feature extraction is to compress the data with the goal of maintaining most of the relevant information. As with feature selection techniques, these techniques are also used for reducing the number of features from the original features set to reduce model complexity, model overfitting, enhance model computation efficiency and reduce generalization error. The following are different types of feature extraction techniques:
- Principal component analysis (PCA) for unsupervised data compression. Here is a detailed post on feature extraction using PCA with Python example. You will get a good understanding of how PCA can help with finding the directions of maximum variance in high-dimensional data and projects the data onto a new subspace with equal or fewer dimensions than the original one. This is explained with example of identifying Taj Mahal (7th wonder of world) from top view or side view based on dimensions in which there is maximum variance. The diagram below shows the dimensions of maximum variance (PCA1 and PCA2) as a result of PCA.
- Linear discriminant analysis (LDA) as a supervised dimensionality reduction technique for maximizing class separability
- Nonlinear dimensionality reduction via kernel principal component analysis (KPCA)
When to use Feature Selection & Feature Extraction
The key difference between feature selection and feature extraction techniques used for dimensionality reduction is that while the original features are maintained in the case of feature selection algorithms, the feature extraction algorithms transform the data onto a new feature space.
Feature selection techniques can be used if the requirement is to maintain the original features, unlike the feature extraction techniques which derive useful information from data to construct a new feature subspace. Feature selection techniques are used when model explainability is a key requirement.
Feature extraction techniques can be used to improve the predictive performance of the models, especially, in the case of algorithms that don’t support regularization.
Quiz – Test your knowledge
Here is a quick quiz you can use to check your knowledge on feature selection vs feature extraction.