# Category Archives: Data Science

## Bayes Theorem Explained with Examples

In this post, you will learn about Bayes’ Theorem with the help of examples. It is of utmost importance to get a good understanding of Bayes Theorem in order to create probabilistic models. Bayes’ theorem is alternatively called as Bayes’ rule or Bayes’ law. One of the many applications of Bayes’s theorem is Bayesian inference which is one of the approaches of statistical inference (other being Frequentist inference), and fundamental to Bayesian statistics. In this post, you will learn about the following: Introduction to Bayes’ Theorem Bayes’ theorem real-world examples Introduction to Bayes’ Theorem In simple words, Bayes Theorem is used to determine the probability of a hypothesis in the presence of more evidence or information. In other …

## Joint & Conditional Probability Explained with Examples

In this post, you will learn about joint and conditional probability differences and examples. When starting with Bayesian analytics, it is very important to have a good understanding around probability concepts. And, the probability concepts such as joint and conditional probability is fundamental to probability and key to Bayesian modeling in machine learning. As a data scientist, you must get a good understanding of probability related concepts. Joint & Conditional Probability Concepts In this section, you will learn about basic concepts in relation to Joint and conditional probability. Probability of an event can be quantified as a function of uncertainty of whether that event will occur or not. Let’s say an event A is …

## K-means Clustering Elbow Method & SSE Plot – Python

In this plot, you will quickly learn about how to find elbow point using SSE or Inertia plot with Python code and You may want to check out my blog on K-means clustering explained with Python example. The following topics get covered in this post: What is Elbow Method? How to create SSE / Inertia plot? How to find Elbow point using SSE Plot What is Elbow Method? Elbow method is one of the most popular method used to select the optimal number of clusters by fitting the model with a range of values for K in K-means algorithm. Elbow method requires drawing a line plot between SSE (Sum of Squared errors) …

## K-Means Clustering Explained with Python Example

In this post, you will learn about K-Means clustering concepts with the help of fitting a K-Means model using Python Sklearn KMeans clustering implementation. Before getting into details, let’s briefly understand the concept of clustering. Clustering represents a set of unsupervised machine learning algorithms belonging to different categories such as prototype-based clustering, hierarchical clustering, density-based clustering etc. K-means is one of the most popular clustering algorithm belong to prototype-based clustering category. The idea is to create K clusters of data where data in each of the K clusters have greater similarity with other data in the same cluster. The different clustering algorithms sets out rules based on how the data …

## Adaboost Algorithm Explained with Python Example

In this post, you will learn about boosting technique and adaboost algorithm with the help of Python example. You will also learn about the concept of boosting in general. Boosting classifiers are a class of ensemble-based machine learning algorithms which helps in variance reduction. It is very important for you as data scientist to learn both bagging and boosting techniques for solving classification problems. Check my post on bagging – Bagging Classifier explained with Python example for learning more about bagging technique. The following represents some of the topics covered in this post: What is Boosting and Adaboost Algorithm? Adaboost algorithm Python example What is Boosting and Adaboost Algorithm? As …

## Bagging Classifier Python Code Example

In this post, you will learn about the concept of Bagging along with Bagging Classifier Python code example. Bagging is also called bootstrap aggregation. It is a data sampling technique where data is sampled with replacement. Bagging classifier helps combine prediction of different estimators and in turn helps reduce variance. In this post, you will learn about the following topics: Introduction to Bagging and Bagging Classifier Bagging Classifier python example Introduction to Bagging & Bagging Classifier / Regressor Bagging classifier can be called as an ensemble meta-estimator which is created by fitting multiple versions of base estimator, trained with modified training data set created using bagging sampling technique (data sampled using replacement) or otherwise. …

## Hard vs Soft Voting Classifier Python Example

In this post, you will learn about one of the popular and powerful ensemble classifier called as Voting Classifier using Python Sklearn example. Voting classifier comes with multiple voting options such as hard and soft voting options. Hard vs Soft Voting classifier is illustrated with code examples. The following topic has been covered in this post: Voting classifier – Hard vs Soft voting options Voting classifier Python example Voting Classifier – Hard vs Soft Voting Options Voting Classifier is an estimator that combines models representing different classification algorithms associated with individual weights for confidence. The Voting classifier estimator built by combining different classification models turns out to be stronger meta-classifier that balances out the individual …

## Keras Hello World Example

In this post, you will learn about how to set up Keras and get started with Keras, one of the most popular deep learning frameworks in current times which is built on top of TensorFlow 2.0 and can scale to large clusters of GPUs. You will also learn about getting started with hello world program with Keras code example. Here are some of the topics which will be covered in this post: Set up Keras with Anaconda Keras Hello World Program Set up Keras with Anaconda In this section, you will learn about how to set up Keras with Anaconda. Here are the steps: Go to Environments page in Anaconda App. …

## Handling Class Imbalance using Sklearn Resample

In this post, you will learn about how to tackle class imbalance issue when training machine learning classification models with imbalanced dataset. This is illustrated using Python SKlearn example. In the same context, you may check out my earlier post on handling class imbalance using class_weight. As a data scientist, it is of utmost importance to learn some of these techniques as you will often come across the class imbalance problem while working on different classification problems. Here is how the class imbalance in the dataset can be visualized: Before going ahead and looking at the Python code example related to how to use Sklearn.utils resample method, lets create an imbalanced data …

## Handle Class Imbalance using Class Weight – Python

In this post, you will learn about how to tackle with or handle class imbalance by adjusting class weight while solving a machine learning classification problem. This will be illustrated using Sklearn Python code example. What is Class Imbalance? Class imbalance is a one of the most common problem when solving classification problems related to healthcare domain, banking (fraud) domain etc. For example, if you want to build a model which classifies a transaction to be fraud or otherwise, the dataset will be highly imbalanced as there won’t be many instances where fraud-related transactions is found. The challenge related to building models having high performance is to address highly skewed data …

## Micro-average & Macro-average Scoring Metrics – Python

In this post, you will learn about how to use micro-averaging and macro-averaging methods for evaluating scoring metrics (precision, recall, f1-score) for multi-class classification machine learning problem. You will also learn about weighted precision, recall and f1-score metrics in relation to micro-average and macro-average scoring metrics for multi-class classification problem. The concepts will be explained with Python code examples. What & Why of Micro and Macro-averaging scoring metrics? With binary classification, it is very intuitive to score the model in terms of scoring metrics such as precision, recall and F1-score. However, in case of multi-class classification it becomes tricky. The questions to ask are some of the following: Which metrics to use to score …

## PyTorch – How to Load & Predict using Resnet Model

In this post, you will learn about how to load and predict using pre-trained Resnet model using PyTorch library. Here is arxiv paper on Resnet. Before getting into the aspect of loading and predicting using Resnet (Residual neural network) using PyTorch, you would want to learn about how to load different pretrained models such as AlexNet, ResNet, DenseNet, GoogLenet, VGG etc. The PyTorch Torchvision projects allows you to load the models. Note that the torchvision package consists of popular datasets, model architectures, and common image transformations for computer vision. Here is the command: The output of above will list down all the pre-trained models available for loading and prediction. You may …

## How to install PyTorch on Anaconda

This is a quick post on how to install PyTorch on Anaconda and get started with deep learning projects. As a machine learning enthusiasts, this is the first step in getting started with PyTorch. I followed this steps on Mac Air and got started with PyTorch in no time. Here are the steps: Go to Anaconda tool. Click on “Environments” in the left navigation. Click on arrow marks on “base (root)” as shown in the diagram below. It will open up a small modal window as down. Click open terminal. This will open up a terminal window. Execute the following command to set up PyTorch. Once done, go to Jupyter Notebook window and …

## ROC Curve & AUC Explained with Python Examples

In this post, you will learn about ROC Curve and AUC concepts along with related concepts such as True positive and false positive rate with the help of Python examples. It is very important to learn ROC, AUC and related concepts as it helps in selecting the most appropriate machine learning models based on the model performance. What is ROC & AUC / AUROC? Receiver operating characteristic (ROC) graphs are used for selecting the most appropriate classification models based on their performance with respect to the false positive rate (FPR) and true positive rate (TPR). These metrics are computed by shifting the decision threshold of the classifier. ROC curve is used for probabilistic models …

## Python – How to Draw Confusion Matrix using Matplotlib

In this post, you will learn about how to draw / show confusion matrix using Matplotlib Python package. It is important to learn this technique as it will come very handy in assessing the machine learning model performance of classification models trained using different classification algorithms. Confusion Matrix using Matplotlib In order to demonstrate the confusion matrix using Matplotlib, let’s fit a pipeline estimator to the Sklearn breast cancer dataset using StandardScaler (for standardising the dataset) and Random Forest Classifier as the machine learning algorithm. Once an estimator is fit to the training data set, nest step is to print the confusion matrix. In order to do that, the following steps will need to be …

## Python – Nested Cross Validation for Algorithm Selection

In this post, you will learn about nested cross validation technique and how you could use it for selecting the most optimal algorithm out of two or more algorithms used to train machine learning model. The usage of nested cross validation technique is illustrated using Python Sklearn example. When it is about selecting models trained with a particular algorithm with most optimal combination of hyper parameters, you can adopt the model tuning techniques such as some of the following: Grid search Randomized search Validation curve The following topics get covered in this post: Why nested cross-validation? Nested cross-validation with Python Sklearn example Why Nested Cross-Validation? Nested cross-validation technique is used for estimating …