# Tag Archives: python

## Python – 5 Sets of Useful Numpy Unary Functions

In this post, you will learn about some of the 5 most popular or useful set of unary universal functions (ufuncs) provided by Python Numpy library. As data scientists, it will be useful to learn these unary functions by heart as it will help in performing arithmetic operations on sequential-like objects. These functions can also be termed as vectorized wrapper functions which are used to perform element-wise operations. The following represents different set of popular functions: Basic arithmetic operations Summary statistics Sorting Minimum / maximum Array equality Basic Arithmetic Operations The following are some of the unary functions whichc an be used to perform arithmetic operations: add, subtract, multiply, divide, …

## What, When & How of Scatterplot Matrix in Python

In this post, you will learn about some of the following in relation to scatterplot matrix. Note that scatter plot matrix can also be termed as pairplot. Later in this post, you would find Python code example in relation to using scatterplot matrix / pairplot (seaborn package). What is scatterplot matrix? When to use scatterplot matrix / pairplot? How to use scatterplot matrix in Python? What is Scatterplot Matrix? Scatter plot matrix is a matrix (or grid) of scatter plots where each scatter plot in the grid is created between different combinations of variables. In other words, scatter plot matrix represents bi-variate or pairwise relationship between different combinations of variables …

## When to use LabelEncoder – Python Example

In this post, you will learn about when to use LabelEncoder. As a data scientist, you must have a clear understanding on when to use LabelEncoder and when to use other encoders such as One-hot Encoder. Using appropriate type of encoders is key part of data preprocessing in machine learning model building lifecycle. Here are some of the scenarios when you could use LabelEncoder without having impact on model. Use LabelEncoder when there are only two possible values of a categorical features. For example, features having value such as yes or no. Or, maybe, gender feature when there are only two possible values including male or female. Use LabelEncoder for …

## Feature Extraction using PCA – Python Example

In this post, you will learn about how to use principal component analysis (PCA) for extracting important features (also termed as feature extraction technique) from a list of given features. As a machine learning / data scientist, it is very important to learn the PCA technique for feature extraction as it helps you visualize the data in the lights of importance of explained variance of data set. The following topics get covered in this post: What is principal component analysis? PCA algorithm for feature extraction PCA Python implementation step-by-step PCA Python Sklearn example What is Principal Component Analysis? Principal component analysis (PCA) is an unsupervised linear transformation technique which is primarily used …

## Eigenvalues & Eigenvectors with Python Examples

In this post, you will learn about how to calculate Eigenvalues and Eigenvectors using Python code examples. Before getting ahead and learning the code examples, you may want to check out this post on when & why to use Eigenvalues and Eigenvectors. As a machine learning Engineer / Data Scientist, you must get a good understanding of Eigenvalues / Eigenvectors concepts as it proves to be very useful in feature extraction techniques such as principal components analysis. Python Numpy package is used for illustration purpose. The following topics are covered in this post: Creating Eigenvectors / Eigenvalues using Numpy Linalg module Re-creating original transformation matrix from eigenvalues & eigenvectors Creating Eigenvectors / Eigenvalues using Numpy In …

## Sklearn SelectFromModel for Feature Importance

In this post, you will learn about how to use Sklearn SelectFromModel class for reducing the training / test data set to the new dataset which consists of features having feature importance value greater than a specified threshold value. This method is very important when one is using Sklearn pipeline for creating different stages and Sklearn RandomForest implementation (such as RandomForestClassifier) for feature selection. You may refer to this post to check out how RandomForestClassifier can be used for feature importance. The SelectFromModel usage is illustrated using Python code example. SelectFromModel Python Code Example Here are the steps and related python code for using SelectFromModel. Determine the feature importance using …

## Sequential Forward Selection – Python Example

In this post, you will learn about one of feature selection techniques namely sequential forward selection with Python code example. Refer to my earlier post on sequential backward selection technique for feature selection. Sequential forward selection algorithm is a part of sequential feature selection algorithms. Some of the following topics will be covered in this post: Introduction to sequential feature selection algorithms Sequential forward selection algorithm Python example using sequential forward selection Introduction to Sequential Feature Selection Sequential feature selection algorithms including sequential forward selection algorithm belongs to the family of greedy search algorithms which are used to reduce an initial d-dimensional feature space to a k-dimensional feature subspace where k < d. …

## Sequential Backward Feature Selection – Python Example

In this post, you will learn about a feature selection technique called as Sequential Backward Selection using Python code example. Feature selection is one of the key steps in training the most optimal model in order to achieve higher computational efficiency while training the model, and also reduce the the generalization error of the model by removing irrelevant features or noise. Some of the important feature selection techniques includes L-norm regularization and greedy search algorithms such as sequential forward or backward feature selection, especially for algorithms which don’t support regularization. It is of utmost importance for data scientists to learn these techniques in order to build optimal models. Sequential backward …

## Pandas – Append Columns to Dataframe

In this post, you will learn different techniques to append or add one column or multiple columns to Pandas Dataframe (Python). There are different scenarios where this could come very handy. For example, when there are two or more data frames created using different data sources, and you want to select a specific set of columns from different data frames to create one single data frame, the methods given below can be used to append or add one or more columns to create one single data frame. It will be good to know these methods as it helps in data preprocessing stage of building machine learning models. In this post, …

## LabelEncoder Example – Single & Multiple Columns

In this post, you will learn about LabelEncoder code examples for handling encoding labels related to categorical features of single and multiple columns in Python Pandas Dataframe. The following are some of the points which will get covered: Background What are labels and why encode them? How to use LabelEncoder to encode single & multiple columns (all at once)? When not to use LabelEncoder? Background When working with dataset having categorical features, you come across two different types of features such as the following. Many machine learning algorithms require the categorical data (labels) to be converted or encoded in the numerical or number form. Ordinal features – Features which has …

## Pandas – Fillna method for replacing missing values

In this post, you will learn about how to use fillna method to replace or impute missing values of one or more feature column with central tendency measures in Pandas Dataframe (Python).The central tendency measures which are used to replace missing values are mean, median and mode. Here is a detailed post on how, what and when of replacing missing values with mean, median or mode. This will be helpful in the data preprocessing stage of building machine learning models. Other technique used for filling missing values is backfill or bfill and forward-fill or ffill. Before going further and learn about fillna method, here is the Pandas sample dataframe we will work with. It represents marks in …

## Decision Tree Classifier Python Code Example

In this post, you will learn about how to train a decision tree classifier machine learning model using Python. The following points will be covered in this post: What is decision tree? Decision tree python code sample What is Decision Tree? Simply speaking, the decision tree algorithm breaks the data points into decision nodes resulting in a tree structure. The decision nodes represent the question based on which the data is split further into two or more child nodes. The tree is created until the data points at a specific child node is pure (all data belongs to one class). The criteria for creating the most optimal decision questions is …

## How to Convert Sklearn Dataset to Dataframe

In this post, you will learn how to convert Sklearn.datasets to Pandas Dataframe. It will be useful to know this technique (code example) if you are comfortable working with Pandas Dataframe. You will be able to perform several operations faster with the dataframe. Sklearn datasets class comprises of several different types of datasets including some of the following: Iris Breast cancer Diabetes Boston Linnerud Images The code sample below is demonstrated with IRIS data set. Before looking into the code sample, recall that IRIS dataset when loaded has data in form of “data” and labels present as “target”. Executing the above code will print the following dataframe. In case, you don’t want to explicitly assign …

## Sklearn SVM Classifier using LibSVM – Code Example

In this post, you learn about Sklearn LibSVM implementation used for training an SVM classifier, with code example. Here is a great guide for learning SVM classification, especially, for beginners in the field of data science/machine learning. LIBSVM is a library for Support Vector Machines (SVM) which provides an implementation for the following: C-SVC (Support Vector Classification) nu-SVC epsilon-SVR (Support Vector Regression) nu-SVR Distribution estimation (one-class SVM) In this post, you will see code examples in relation to C-SVC, and nu-SVC LIBSVM implementations. I will follow up with code examples for SVR and distribution estimation in future posts. Here are the links to their SKLearn pages for C-SVC and nu-SVC …

## Python – How to Plot Learning Curves of Classifier

In this post, you will learn a technique using which you could plot the learning curve of a machine learning classification model. As a data scientist, you will find the Python code example very handy. In this post, the plot_learning_curves class of mlxtend.plotting module from mlxtend package is used. This package is created by Dr. Sebastian Raschka. Lets train a Perceptron model using iris data from sklearn.datasets. The accuracy of the model comes out to be 0.956 or 95.6%. Next, we will want to see how did the learning go. In order to do that, we will use plot_learning_curves class of mlxtend.plotting module. Here is a post on how to install mlxtend with Anaconda. The following …

## Feature Scaling & Stratification for Model Performance (Python)

In this post, you will learn about how to improve machine learning models performance using techniques such as feature scaling and stratification. The following topics are covered in this post. The concepts have been explained using Python code samples. What is feature scaling and why one needs to do it? What is stratification? Training Perceptron model without feature scaling and stratification Training Perceptron model with feature scaling Training Perceptron model with feature scaling and stratification What is Feature Scaling and Why is it needed? Feature scaling is a technique of standardizing the features present in the data in a fixed range. This is done when data consists of features of varying …

Nice question to help us