Machine Learning – Feature Selection vs Feature Extraction

Feature extraction vs feature selection

In this post you will learn about the difference between feature extraction and feature selection concepts and techniques. Both feature selection and extraction are used for dimensionality reduction which is key to reducing model complexity and overfitting. The dimensionality reduction is one of the most important aspects of training machine learning models. As a data scientist, you must get a good understanding of dimensionality reduction techniques such as feature extraction and feature selection. In this post, the following topics will be covered:

  • Feature selection concepts and techniques
  • Feature extraction concepts and techniques
  • When to use feature selection and feature extraction

Feature Selection Concepts & Techniques

Simply speaking, feature selection is about selecting a subset of features out of the original features in order to reduce model complexity, enhance the computational efficiency of the models and reduce generalization error introduced due to noise by irrelevant features. The following represents some of the important feature selection techniques:

  • Regularization techniques such as L1 norm regularisation which results in most features’ weight to turn to zero
  • Feature importance techniques such as using estimator such as Random Forest algorithm to fit a model and select features based on the value of attribute such as feature_importances_ .
  • Greedy search algorithms such as some of the following which are useful for algorithms (such as K-nearest neighbours, K-NN) where regularization techniques are not supported.

Feature Extraction Concepts & Techniques

Feature extraction is about extracting/deriving information from the original features set to create a new features subspace. The primary idea behind feature extraction is to compress the data with the goal of maintaining most of the relevant information. As with feature selection techniques, these techniques are also used for reducing the number of features from the original features set to reduce model complexity, model overfitting, enhance model computation efficiency and reduce generalization error. The following are different types of feature extraction techniques:

  • Principal component analysis (PCA) for unsupervised data compression. Here is a detailed post on feature extraction using PCA with Python example. You will get a good understanding of how PCA can help with finding the directions of maximum variance in high-dimensional data and projects the data onto a new subspace with equal or fewer dimensions than the original one. This is explained with example of identifying Taj Mahal (7th wonder of world) from top view or side view based on dimensions in which there is maximum variance. The diagram below shows the dimensions of maximum variance (PCA1 and PCA2) as a result of PCA.

    principal component analysis explained
  • Linear discriminant analysis (LDA) as a supervised dimensionality reduction technique for maximizing class separability
  • Nonlinear dimensionality reduction via kernel principal component analysis (KPCA)

When to use Feature Selection & Feature Extraction

The key difference between feature selection and feature extraction techniques used for dimensionality reduction is that while the original features are maintained in the case of feature selection algorithms, the feature extraction algorithms transform the data onto a new feature space.

Feature selection techniques can be used if the requirement is to maintain the original features, unlike the feature extraction techniques which derive useful information from data to construct a new feature subspace. Feature selection techniques are used when model explainability is a key requirement.

Feature extraction techniques can be used to improve the predictive performance of the models, especially, in the case of algorithms that don’t support regularization.

Quiz – Test your knowledge

Here is a quick quiz you can use to check your knowledge on feature selection vs feature extraction.

Feature selection and feature extraction methods are one and same.

Correct! Wrong!

Which of the following can be used for feature extraction?

Correct! Wrong!

Which of the following technique is used for feature extraction?

Correct! Wrong!

Which of the following can be used for feature selection?

Correct! Wrong!

Which of the following can be used for feature selection?

Correct! Wrong!

In which of the following techniques, the original features set are maintained?

Correct! Wrong!

Which of the following techniques is recommended when original feature set is required to be maintained?

Correct! Wrong!

Which of the following technique is recommended when the model interpretability is key requirement?

Correct! Wrong!

Ajitesh Kumar
Follow me

Ajitesh Kumar

I have been recently working in the area of Data Science and Machine Learning / Deep Learning. In addition, I am also passionate about various different technologies including programming languages such as Java/JEE, Javascript, Python, R, Julia etc and technologies such as Blockchain, mobile computing, cloud-native technologies, application security, cloud computing platforms, big data etc. I would love to connect with you on Linkedin.
Posted in Data Science, Machine Learning. Tagged with , .

Leave a Reply

Your email address will not be published. Required fields are marked *

Time limit is exhausted. Please reload the CAPTCHA.