# Tag Archives: python

## ROC Curve & AUC Explained with Python Examples

In this post, you will learn about ROC Curve and AUC concepts along with related concepts such as True positive and false positive rate with the help of Python examples. It is very important to learn ROC, AUC and related concepts as it helps in selecting the most appropriate machine learning models based on the model performance. What is ROC & AUC / AUROC? Receiver operating characteristic (ROC) graphs are used for selecting the most appropriate classification models based on their performance with respect to the false positive rate (FPR) and true positive rate (TPR). These metrics are computed by shifting the decision threshold of the classifier. ROC curve is used for probabilistic models …

## Python – How to Draw Confusion Matrix using Matplotlib

In this post, you will learn about how to draw / show confusion matrix using Matplotlib Python package. It is important to learn this technique as it will come very handy in assessing the machine learning model performance of classification models trained using different classification algorithms. Confusion Matrix using Matplotlib In order to demonstrate the confusion matrix using Matplotlib, let’s fit a pipeline estimator to the Sklearn breast cancer dataset using StandardScaler (for standardising the dataset) and Random Forest Classifier as the machine learning algorithm. Once an estimator is fit to the training data set, nest step is to print the confusion matrix. In order to do that, the following steps will need to be …

## Accuracy, Precision, Recall & F1-Score – Python Examples

In this post, you will learn about how to calculate machine learning model performance metrics such as some of the following scores while assessing the performance of the classification model. The concepts is illustrated using Python Sklearn example. Accuracy score Precision score Recall score F1-Score As a data scientist, you must get a good understanding of concepts related to the above in relation to measuring classification model performance. Lets work with Sklearn datasets for breast cancer. You can load the dataset using the following code: The target labels in the breast cancer dataset is Benign (1) and Malignant (0). There are 212 records with label as malignant and 357 records with …

## Python – Nested Cross Validation for Algorithm Selection

In this post, you will learn about nested cross validation technique and how you could use it for selecting the most optimal algorithm out of two or more algorithms used to train machine learning model. The usage of nested cross validation technique is illustrated using Python Sklearn example. When it is about selecting models trained with a particular algorithm with most optimal combination of hyper parameters, you can adopt the model tuning techniques such as some of the following: Grid search Randomized search Validation curve The following topics get covered in this post: Why nested cross-validation? Nested cross-validation with Python Sklearn example Why Nested Cross-Validation? Nested cross-validation technique is used for estimating …

## Randomized Search Explained – Python Sklearn Example

In this post, you will learn about one of the machine learning model tuning technique called Randomized Search which is used to find the most optimal combination of hyper parameters for coming up with the best model. The randomized search concept will be illustrated using Python Sklearn code example. As a data scientist, you must learn some of these model tuning techniques to come up with most optimal models. You may want to check some of the other posts on tuning model parameters such as the following: Sklearn validation_curve for tuning model hyper parameters Sklearn GridSearchCV for tuning model hyper parameters In this post, the following topics will be covered: What and why …

## Grid Search Explained – Python Sklearn Examples

In this post, you will learn about another machine learning model hyperparameter optimization technique called as Grid Search with the help of Python Sklearn code examples. In one of the earlier posts, you learned about another hyperparamater optimization technique namely validation curve. As a data scientist, it will be useful to learn some of these model tuning techniques (tuning hyperparameters) as it would help us select most appropriate models with most appropriate parameters. The following are some of the topics covered in this post: What & Why of grid search? Grid search with Python Sklearn examples What & Why of Grid Search? Grid Search technique helps in performing exhaustive search over specified parameter (hyper parameters) values for …

## Validation Curves Explained – Python Sklearn Example

In this post, you will learn about validation curves with Python Sklearn example. You will learn about how validation curves can help diagnose or assess your machine learning models in relation to underfitting and overfitting. On the similar topic, I recommend you reading one of the previous post on assessing overfitting and underfitting titled Learning curves explained with Python Sklearn example. The following gets covered in this post: Why validation curves? Python Sklearn example for validation curves Why Validation Curves? As like learning curve, the validation curve also helps in diagnozing the model bias vs variance. The validation curve plot helps in selecting most appropriate model parameters (hyper-parameters). Unlike learning …

## Python – 5 Sets of Useful Numpy Unary Functions

In this post, you will learn about some of the 5 most popular or useful set of unary universal functions (ufuncs) provided by Python Numpy library. As data scientists, it will be useful to learn these unary functions by heart as it will help in performing arithmetic operations on sequential-like objects. These functions can also be termed as vectorized wrapper functions which are used to perform element-wise operations. The following represents different set of popular functions: Basic arithmetic operations Summary statistics Sorting Minimum / maximum Array equality Basic Arithmetic Operations The following are some of the unary functions whichc an be used to perform arithmetic operations: add, subtract, multiply, divide, …

## What, When & How of Scatterplot Matrix in Python

In this post, you will learn about some of the following in relation to scatterplot matrix. Note that scatter plot matrix can also be termed as pairplot. Later in this post, you would find Python code example in relation to using scatterplot matrix / pairplot (seaborn package). What is scatterplot matrix? When to use scatterplot matrix / pairplot? How to use scatterplot matrix in Python? What is Scatterplot Matrix? Scatter plot matrix is a matrix (or grid) of scatter plots where each scatter plot in the grid is created between different combinations of variables. In other words, scatter plot matrix represents bi-variate or pairwise relationship between different combinations of variables …

## Learning Curves Explained with Python Sklearn Example

In this post, you will learn about how to use learning curves in learning curves using Python code (Sklearn) example to determine model bias-variance. Knowing how to use learning curves will help you assess/diagnose whether the model is suffering from high bias (underfitting) or high variance (overfitting) and whether increasing training data samples could help solve the bias or variance problem. Some of the following topics are covered in this post: Why learning curves? Python Sklearn example for the Learning curve You may want to check some of the following posts in order to get a better understanding of bias-variance and underfitting-overfitting. Bias-variance concepts and interview questions Overfitting/Underfitting concepts and interview …

## K-Fold Cross Validation – Python Example

In this post, you will learn about K-fold Cross Validation concepts with Python code example. It is important to learn the concepts cross validation concepts in order to perform model tuning with an end goal to choose model which has the high generalization performance. As a data scientist / machine learning Engineer, you must have a good understanding of the cross validation concepts in general. The following topics get covered in this post: What and why of K-fold cross validation When to select what values of K? K-fold cross validation with python (using cross-validation generators) K-fold cross validation with python (using cross_val_score) What and Why of K-fold Cross Validation K-fold cross validation …

## Sklearn Machine Learning Pipeline – Python Example

In this post, you will learning about concepts about machine learning (ML) pipeline and how to build ML pipeline using Python Sklearn Pipeline (sklearn.pipeline) package. Getting to know how to use Sklearn.pipeline effectively for training/testing machine learning models will help automate various different activities such as feature scaling, feature selection / extraction and training/testing the models. It is recommended for data scientists (Python) to get a good understanding of Sklearn.pipeline. The following are some of the topics covered in this post: Introduction to ML Pipeline Sklearn ML Pipeline Python code example Introduction to ML Pipeline Machine Learning (ML) pipeline, theoretically, represents different steps including data transformation and prediction through which data …

## Imputing Missing Data using Sklearn SimpleImputer

In this post, you will learn about how to use Python’s Sklearn SimpleImputer for imputing / replacing numerical & categorical missing data using different strategies. In one of the related article posted sometime back, the usage of fillna method of Pandas DataFrame is discussed. Here is the link, Replace missing values with mean, median and mode. Handling missing values is key part of data preprocessing and hence, it is of utmost importance for data scientists / machine learning Engineers to learn different techniques in relation imputing / replacing numerical or categorical missing values with appropriate value based on appropriate strategies. The following topics will be covered in this post: SimpleImputer explained with Python …

## When to use LabelEncoder – Python Example

In this post, you will learn about when to use LabelEncoder. As a data scientist, you must have a clear understanding on when to use LabelEncoder and when to use other encoders such as One-hot Encoder. Using appropriate type of encoders is key part of data preprocessing in machine learning model building lifecycle. Here are some of the scenarios when you could use LabelEncoder without having impact on model. Use LabelEncoder when there are only two possible values of a categorical features. For example, features having value such as yes or no. Or, maybe, gender feature when there are only two possible values including male or female. Use LabelEncoder for …

## Feature Extraction using PCA – Python Example

In this post, you will learn about how to use principal component analysis (PCA) for extracting important features (also termed as feature extraction technique) from a list of given features. As a machine learning / data scientist, it is very important to learn the PCA technique for feature extraction as it helps you visualize the data in the lights of importance of explained variance of data set. The following topics get covered in this post: What is principal component analysis? PCA algorithm for feature extraction PCA Python implementation step-by-step PCA Python Sklearn example What is Principal Component Analysis? Principal component analysis (PCA) is an unsupervised linear transformation technique which is primarily used …

## PCA Explained Variance Concepts with Python Example

In this post, you will learn about the concepts of explained variance which is one of the key concepts related to principal component analysis (PCA). The explained variance concepts will be illustrated with Python code examples. Some of the following topics will be covered: What is explained variance? Python code examples of explained variance What is Explained Variance? Explained variance refers to the variance explained by each of the principal components (eigenvectors). It can be represented as a function of ratio of related eigenvalue and sum of eigenvalues of all eigenvectors. Let’s say that there are N eigenvectors, then the explained variance for each eigenvector (principal component) can be expressed the …