Machine Learning – Feature Selection vs Feature Extraction


In this post you will learn about the difference between feature extraction and feature selection concepts and techniques. Both feature selection and extraction are used for dimensionality reduction which is key to reducing model complexity and overfitting. The dimensionality reduction is one of the most important aspects of training machine learning models. As a data scientist, you must get a good understanding about dimensionality reduction techniques such as feature extraction and feature selection. In this post, the following topics will be covered:

  • Feature selection concepts and techniques
  • Feature extraction concepts and techniques
  • When to use feature selection and feature extraction

Feature Selection Concepts & Techniques

Simply speaking, feature selection is about selecting a subset of features out of the original features in order to reduce model complexity, enhance computational efficiency of the models and reduce generalization error introduced due to noise by irrelevant features. The following represents some of the important feature selection techniques:

  • Regularization techniques such as L1 norm regularisation which results in most features’ weight to turn to zero
  • Feature importance techniques such as using estimator such as Random Forest algorithm to fit a model and select features based on the value of attribute such as feature_importances_ .
  • Greedy search algorithms such as some of the following which are useful for algorithms (such as K-nearest neighbours, K-NN) where regularization techniques are not supported.

Feature Extraction Concepts & Techniques

Feature extraction is about extracting / deriving information from the original features set to create a new features subspace. The primary idea behind feature extraction is to compress the data with the goal of maintaining most of the relevant information. As like feature selection techniques, these techniques are also used for reducing the number of features from the original features set to reduce model complexity, model overfitting, enhance model computation efficiency and reduce generalization error. The following are different types of feature extraction techniques:

  • Principal component analysis (PCA) for unsupervised data compression
  • Linear discriminant analysis (LDA) as a supervised dimensionality reduction technique for maximizing class separability
  • Nonlinear dimensionality reduction via kernel principal component analysis (KPCA)

When to use Feature Selection & Feature Extraction

The key difference between feature selection and feature extraction techniques used for dimensionality reduction is that while the original features are maintained in case of feature selection algorithms, the feature extraction algorithms transform the data onto a new feature space.

Feature selection techniques can be used if the requirement is to maintain the original features unlike the feature extraction techniques which derive useful information from data to construct new feature subspace. Feature selection techniques are used when model explainability is key requirement.

Feature extraction techniques can be used to improve the predictive performance of the models, especially, in case of algorithms that don’t support regularization.

Ajitesh Kumar

Leave A Reply

Time limit is exhausted. Please reload the CAPTCHA.