Category Archives: Python

Sequential Forward Selection – Python Example

Sequential forward selection algorithm

In this post, you will learn about one of feature selection techniques namely sequential forward selection with Python code example. Refer to my earlier post on sequential backward selection technique for feature selection. Sequential forward selection algorithm is a part of sequential feature selection algorithms. Some of the following topics will be covered in this post: Introduction to sequential feature selection algorithms Sequential forward selection algorithm Python example using sequential forward selection Introduction to Sequential Feature Selection Sequential feature selection algorithms including sequential forward selection algorithm belongs to the family of greedy search algorithms which are used to reduce an initial d-dimensional feature space to a k-dimensional feature subspace where k < d. …

Continue reading

Posted in Data Science, Machine Learning, Python. Tagged with , , .

Sequential Backward Feature Selection – Python Example

Sequential Backward Search for Feature Selection

In this post, you will learn about a feature selection technique called as Sequential Backward Selection using Python code example. Feature selection is one of the key steps in training the most optimal model in order to achieve higher computational efficiency while training the model, and also reduce the the generalization error of the model by removing irrelevant features or noise. Some of the important feature selection techniques includes L-norm regularization and greedy search algorithms such as sequential forward or backward feature selection, especially for algorithms which don’t support regularization. It is of utmost importance for data scientists to learn these techniques in order to build optimal models. Sequential backward …

Continue reading

Posted in Data Science, Machine Learning, Python. Tagged with , , .

MinMaxScaler vs StandardScaler – Python Examples

MinMaxScaler vs StandardScaler

In this post, you will learn about concepts and differences between MinMaxScaler & StandardScaler with the help of Python code examples. Note that these are classes provided by sklearn.preprocessing module and used for feature scaling purpose. As a data scientist, you will need to learn these concepts in order to train machine learning models using algorithms which requires features to be on the same scale. For algorithms such as random forests and decision trees which are scale invariant, you do not need to use these feature scaling techniques. The following topics are covered in this post: Why is feature scaling needed? Normalization vs Standardization MinMaxScaler for normalization StandardScaler for standardization …

Continue reading

Posted in Data Science, Machine Learning, Python. Tagged with , , .

One-hot Encoding Concepts & Python Code Examples

One-hot encoding concepts and python examples

In this post, you will learn about One-hot Encoding concepts and code examples using Python programming language. One-hot encoding is also called as dummy encoding. In this post, OneHotEncoder class of sklearn.preprocessing will be used in the code examples. As a data scientist or machine learning engineer, you must learn the one-hot encoding techniques as it comes very handy while training machine learning models. Some of the following topics will be covered in this post: One-hot encoding concepts Using OneHotEncoder for single categorical feature Using OneHotEncoder & ColumnTransformer for encoding multiple categorical features Using Pandas get_dummies API for one-hot encoding One-Hot Encoding Concepts Simply speaking, one-hot encoding is a technique …

Continue reading

Posted in Data Science, Machine Learning, Python. Tagged with , , .

Pandas – Append Columns to Dataframe

Append columns to the data frame

In this post, you will learn different techniques to append or add one column or multiple columns to Pandas Dataframe (Python). There are different scenarios where this could come very handy. For example, when there are two or more data frames created using different data sources, and you want to select a specific set of columns from different data frames to create one single data frame, the methods given below can be used to append or add one or more columns to create one single data frame. It will be good to know these methods as it helps in data preprocessing stage of building machine learning models. In this post, …

Continue reading

Posted in Data Science, Machine Learning, Python. Tagged with , , .

LabelEncoder Example – Single & Multiple Columns

LabelEncoder for converting labels to integers

In this post, you will learn about LabelEncoder code examples for handling encoding labels related to categorical features of single and multiple columns in Python Pandas Dataframe. The following are some of the points which will get covered: Background What are labels and why encode them? How to use LabelEncoder to encode single & multiple columns (all at once)? When not to use LabelEncoder? Background When working with dataset having categorical features, you come across two different types of features such as the following. Many machine learning algorithms require the categorical data (labels) to be converted or encoded in the numerical or number form. Ordinal features – Features which has …

Continue reading

Posted in Data Science, Machine Learning, Python. Tagged with , , .

Pandas – Fillna method for replacing missing values

Fillna method for replacing missing values

In this post, you will learn about how to use fillna method to replace or impute missing values of one or more feature column with central tendency measures in Pandas Dataframe (Python).The central tendency measures which are used to replace missing values are mean, median and mode. Here is a detailed post on how, what and when of replacing missing values with mean, median or mode. This will be helpful in the data preprocessing stage of building machine learning models. Other technique used for filling missing values is backfill or bfill and forward-fill or ffill. Before going further and learn about fillna method, here is the Pandas sample dataframe we will work with. It represents marks in …

Continue reading

Posted in Data Science, Machine Learning, Python. Tagged with , , .

Python – Replace Missing Values with Mean, Median & Mode

Boxplot for deciding whether to use mean, mode or median for imputation

In this post, you will learn about how to impute or replace missing values  with mean, median and mode in one or more numeric feature columns of Pandas DataFrame while building machine learning (ML) models with Python programming. You will also learn about how to decide which technique to use for imputing missing values with central tendency measures of feature column such as mean, median or mode. This is important to understand this technique for data scientists as handling missing values one of the key aspects of data preprocessing when training ML models. The dataset used for illustration purpose is related campus recruitment and taken from Kaggle page on Campus Recruitment.  As a first step, the …

Continue reading

Posted in Data Science, Machine Learning, Python. Tagged with , , .

How to Create a Pandas Sample Dataframe

Create Pandas Dataframe using Sample Data

In this post, you will learn about how to create a Pandas dataframe with some sample data. The following represent two different techniques using which one can create the Pandas Dataframe: Create Dataframe without using Numpy Array Create Dataframe using Numpy Array Create Dataframe without using Numpy Array Here is the python code for creating a Pandas dataframe without using Numpy array: Create Dataframe using Numpy Array Here is the code for creating a dataframe using Numpy array. Note the usage of np.array used for creating an instance of Numpy Ndarray. The following will be printed:

Posted in Data Science, Python. Tagged with , .

Random Forest Classifier Python Code Example

Random forest classifier using python sklearn library

In this post, you will learn about how to train a Random Forest Classifier using Python Sklearn library. This code will be helpful if you are a beginner data scientist or just want to quickly get code sample to get started with training a machine learning model using Random Forest algorithm. The following topics will be covered: Brief introduction of Random Forest Python code example for training a random forest classifier Brief Introduction to Random Forest Classifier Random forest can be considered as an ensemble of several decision trees. The idea is to aggregate the prediction outcome of multiple decision trees and create a final outcome based on averaging mechanism …

Continue reading

Posted in AI, Data Science, Machine Learning, Python. Tagged with , , .

Decision Tree Classifier Python Code Example

Decision tree decision boundaries

In this post, you will learn about how to train a decision tree classifier machine learning model using Python. The following points will be covered in this post: What is decision tree? Decision tree python code sample What is Decision Tree? Simply speaking, the decision tree algorithm breaks the data points into decision nodes resulting in a tree structure. The decision nodes represent the question based on which the data is split further into two or more child nodes. The tree is created until the data points at a specific child node is pure (all data belongs to one class). The criteria for creating the most optimal decision questions is …

Continue reading

Posted in AI, Data Science, Machine Learning, Python. Tagged with , , .

How to Convert Sklearn Dataset to Dataframe

In this post, you will learn how to convert Sklearn.datasets to Pandas Dataframe. It will be useful to know this technique (code example) if you are comfortable working with Pandas Dataframe. You will be able to perform several operations faster with the dataframe. Sklearn datasets class comprises of several different types of datasets including some of the following: Iris Breast cancer Diabetes Boston Linnerud Images The code sample below is demonstrated with IRIS data set. Before looking into the code sample, recall that IRIS dataset when loaded has data in form of “data” and labels present as “target”. Executing the above code will print the following dataframe. In case, you don’t want to explicitly assign …

Continue reading

Posted in Data Science, Machine Learning, Python. Tagged with , .

Sklearn SVM Classifier using LibSVM – Code Example

In this post, you learn about Sklearn LibSVM implementation used for training an SVM classifier, with code example.  Here is a great guide for learning SVM classification, especially, for beginners in the field of data science/machine learning. LIBSVM is a library for Support Vector Machines (SVM) which provides an implementation for the following: C-SVC (Support Vector Classification) nu-SVC epsilon-SVR (Support Vector Regression) nu-SVR Distribution estimation (one-class SVM) In this post, you will see code examples in relation to C-SVC, and nu-SVC LIBSVM implementations. I will follow up with code examples for SVR and distribution estimation in future posts. Here are the links to their SKLearn pages for C-SVC and nu-SVC …

Continue reading

Posted in AI, Data Science, Machine Learning, Python. Tagged with , , .

SVM Classifier using Scikit Learn – Code Examples

In this post, you will learn about how to train an SVM Classifier using Scikit Learn or SKLearn implementation with the help of code examples/samples.  Scikit Learn offers different implementations such as the following to train an SVM classifier.  LIBSVM: LIBSVM is a C/C++ library specialised for SVM. The SVC class is the LIBSVM implementation and can be used to train the SVM classifier (hard/soft margin classifier). Native Python implementation: Scikit Learn provides python implementation of SVM classifier in form SGDClassifier which is based on a stochastic gradient algorithm. LIBSVM SVC Code Example In this section, the code below makes use of SVC class (from sklearn.svm import SVC) for fitting a model. SVM Python Implementation …

Continue reading

Posted in Data Science, Machine Learning, Python. Tagged with , , .

Classification Model with SVM Classifier – Python Example

In this post, you will get an access to Python code example for building a machine learning classification model using SVM (Support Vector Machine) classifier algorithm. We will work with Python Sklearn package for building the model. The following steps will be covered for training the model using SVM: Load the data Create training and test split Perform feature scaling Instantiate an SVC classifier Fit the model Measure the model performance First and foremost we will load appropriate Sklearn modules and classes. Lets get started with loading the data set and creating the training and test split from the data set. Pay attention to the stratification aspect used when creating the training and test split. The train_test_split class of sklearn.model_selection …

Continue reading

Posted in AI, Data Science, Machine Learning, Python. Tagged with , , .

Python – Training a Model using Logistic Regression

In this post, you will learn about how to train a model using machine learning algorithm such as Logistic Regression. Here is the code we can use for fitting a model using Logistic Regression. We will use IRIS data set for training the model. Loading SkLearn Modules / Classes First and foremost, we will load the appropriate packages, sklearn modules and classes. Data Loading As a next step, we will load the dataset and do the data preparation. Create Training / Test Data Next step is to create a train and test split. Note the stratification parameter. This is used to ensure that class distribution in training / test split remains consistent …

Continue reading

Posted in Data Science, Machine Learning, Python. Tagged with , , .