Data Science

Sklearn SelectFromModel for Feature Importance

In this post, you will learn about how to use Sklearn SelectFromModel class for reducing the training / test data set to the new dataset which consists of features having feature importance value greater than a specified threshold value. This method is very important when one is using Sklearn pipeline for creating different stages and Sklearn RandomForest implementation (such as RandomForestClassifier) for feature selection. You may refer to this post to check out how RandomForestClassifier can be used for feature importance. The SelectFromModel usage is illustrated using Python code example.

SelectFromModel Python Code Example

Here are the steps and related python code for using SelectFromModel.

  • Determine the feature importance using estimator such as RandomForestClassifier or RandomForestRegressor. Use the technique shown in this post. The data used in this post is Sklearn wine data set which can be loaded in the manner shown in this post.
  • Create an estimator using SelectFromModel class that takes parameters such as estimator (RandomForestClassifier instance) and threshold
  • Transform the training data to the dataset consisting of features value whose importance is greater than the threshold value.
  • Create the visualization plot representing the feature

Here is the python code representing the above steps:

from sklearn.feature_selection import SelectFromModel
#
# Fit the estimator; forest is the instance of RandomForestClassifier
#
sfm = SelectFromModel(forest, threshold=0.1, prefit=True)
#
# Transform the training data set
#
X_training_selected = sfm.transform(X_train)
#
# Count of features whose importance value is greater than the threshold value
#
importantFeaturesCount = X_selected.shape[1]
#
# Here are the important features
#
X_train.columns[sorted_indices][:X_selected.shape[1]]

The above may give output such as the following as the important features whose importance is greater than threshold value:

Index(['proline', 'flavanoids', 'color_intensity', 'od_dilutedwines', 'alcohal'], dtype='object')

Here is the visualization plot for important features:

Fig 1. Important features greater than threshold using sklearn SelectFromModel
Ajitesh Kumar

I have been recently working in the area of Data analytics including Data Science and Machine Learning / Deep Learning. I am also passionate about different technologies including programming languages such as Java/JEE, Javascript, Python, R, Julia, etc, and technologies such as Blockchain, mobile computing, cloud-native technologies, application security, cloud computing platforms, big data, etc. I would love to connect with you on Linkedin. Check out my latest book titled as First Principles Thinking: Building winning products using first principles thinking.

Recent Posts

Retrieval Augmented Generation (RAG) & LLM: Examples

Last updated: 25th Jan, 2025 Have you ever wondered how to seamlessly integrate the vast…

6 days ago

How to Setup MEAN App with LangChain.js

Hey there! As I venture into building agentic MEAN apps with LangChain.js, I wanted to…

2 weeks ago

Build AI Chatbots for SAAS Using LLMs, RAG, Multi-Agent Frameworks

Software-as-a-Service (SaaS) providers have long relied on traditional chatbot solutions like AWS Lex and Google…

2 weeks ago

Creating a RAG Application Using LangGraph: Example Code

Retrieval-Augmented Generation (RAG) is an innovative generative AI method that combines retrieval-based search with large…

3 weeks ago

Building a RAG Application with LangChain: Example Code

The combination of Retrieval-Augmented Generation (RAG) and powerful language models enables the development of sophisticated…

3 weeks ago

Building an OpenAI Chatbot with LangChain

Have you ever wondered how to use OpenAI APIs to create custom chatbots? With advancements…

3 weeks ago