Tag Archives: python

Python – Extract Text from PDF file using PDFMiner

In this post, you will get a quick code sample on how to use PDFMiner, a Python library, to extract text from PDF files and perform text analysis. I will be posting several other posts in relation to how to use other Python libraries for extracting text from PDF files.  In this post, the following topic will get covered: How to set up PDFMiner Python code for extracting text from PDF file using PDFMiner Setting up PDFMiner Here is how you would set up PDFMiner.six. You could execute the following command to get set up with PDFMiner while working in Jupyter notebook: Python Code for Extracting Text from PDF file …

Continue reading

Posted in AI, NLP, Python. Tagged with , , .

RANSAC Regression Explained with Python Examples

In this post, you will learn about the concepts of RANSAC regression algorithm along with Python Sklearn example for RANSAC regression implementation using RANSACRegressor. RANSAC regression algorithm is useful for handling the outliers dataset. Instead of taking care of outliers using statistical and other techniques, one can use RANSAC regression algorithm which takes care of the outlier data. In this post, the following topics are covered: Introduction to RANSAC regression RANSAC Regression Python code example Introduction to RANSAC Regression RANSAC (RANdom SAmple Consensus) algorithm takes linear regression algorithm to the next level by excluding the outliers in the training dataset. The presence of outliers in the training dataset does impact …

Continue reading

Posted in Data Science, Machine Learning, Python. Tagged with , , , .

KMeans Silhouette Score Explained with Python Example

In this post, you will learn about concepts of KMeans Silhouette Score in relation to assessing the quality of K-Means clusters fit on the data. As a data scientist, it is of utmost important to understand the concepts of Silhouette score as it would help in evaluating the quality of clustering done using K-Means algorithm. In this post, the following topics will be covered: Introduction to Silhouette Score concepts Silhouette score explained using Python example You may want to check some of the following posts in relation to clustering: K-Means clustering explained with Python examples K-Means clustering elbow method and SSE Plot K-Means interview questions and answers Introduction to Silhouette Score Concepts …

Continue reading

Posted in Data Science, Machine Learning, Python. Tagged with , , , .

K-means Clustering Elbow Method & SSE Plot – Python

In this plot, you will quickly learn about how to find elbow point using SSE or Inertia plot with Python code and  You may want to check out my blog on K-means clustering explained with Python example.  The following topics get covered in this post: What is Elbow Method? How to create SSE / Inertia plot? How to find Elbow point using SSE Plot What is Elbow Method? Elbow method is one of the most popular method used to select the optimal number of clusters by fitting the model with a range of values for K in K-means algorithm. Elbow method requires drawing a line plot between SSE (Sum of Squared errors) …

Continue reading

Posted in Data Science, Machine Learning, Python. Tagged with , , , .

K-Means Clustering Explained with Python Example

In this post, you will learn about K-Means clustering concepts with the help of fitting a K-Means model using Python Sklearn KMeans clustering implementation. Before getting into details, let’s briefly understand the concept of clustering. Clustering represents a set of unsupervised machine learning algorithms belonging to different categories such as prototype-based clustering, hierarchical clustering, density-based clustering etc. K-means is one of the most popular clustering algorithm belong to prototype-based clustering category. The idea is to create K clusters of data where data in each of the K clusters have greater similarity with other data in the same cluster. The different clustering algorithms sets out rules based on how the data …

Continue reading

Posted in Data Science, Machine Learning, Python. Tagged with , , , .

Adaboost Algorithm Explained with Python Example

In this post, you will learn about boosting technique and adaboost algorithm with the help of Python example. You will also learn about the concept of boosting in general. Boosting classifiers are a class of ensemble-based machine learning algorithms which helps in variance reduction. It is very important for you as data scientist to learn both bagging and boosting techniques for solving classification problems. Check my post on bagging – Bagging Classifier explained with Python example for learning more about bagging technique. The following represents some of the topics covered in this post: What is Boosting and Adaboost Algorithm? Adaboost algorithm Python example What is Boosting and Adaboost Algorithm? As …

Continue reading

Posted in Data Science, Machine Learning, Python. Tagged with , , , .

Hard vs Soft Voting Classifier Python Example

In this post, you will learn about one of the popular and powerful ensemble classifier called as Voting Classifier using Python Sklearn example. Voting classifier comes with multiple voting options such as hard and soft voting options. Hard vs Soft Voting classifier is illustrated with code examples. The following topic has been covered in this post: Voting classifier – Hard vs Soft voting options Voting classifier Python example Voting Classifier – Hard vs Soft Voting Options Voting Classifier is an estimator that combines models representing different classification algorithms associated with individual weights for confidence. The Voting classifier estimator built by combining different classification models turns out to be stronger meta-classifier that balances out the individual …

Continue reading

Posted in Data Science, Machine Learning, Python. Tagged with , , , .

Handling Class Imbalance using Sklearn Resample

In this post, you will learn about how to tackle class imbalance issue when training machine learning classification models with imbalanced dataset. This is illustrated using Python SKlearn example. In the same context, you may check out my earlier post on handling class imbalance using class_weight. As a data scientist, it is of utmost importance to learn some of these techniques as you will often come across the class imbalance problem while working on different classification problems. Here is how the class imbalance in the dataset can be visualized: Before going ahead and looking at the Python code example related to how to use Sklearn.utils resample method, lets create an imbalanced data …

Continue reading

Posted in Data Science, Machine Learning, Python. Tagged with , , , .

Micro-average & Macro-average Scoring Metrics – Python

In this post, you will learn about how to use micro-averaging and macro-averaging methods for evaluating scoring metrics (precision, recall, f1-score) for multi-class classification machine learning problem. You will also learn about weighted precision, recall and f1-score metrics in relation to micro-average and macro-average scoring metrics for multi-class classification problem. The concepts will be explained with Python code examples.  What & Why of Micro and Macro-averaging scoring metrics? With binary classification, it is very intuitive to score the model in terms of scoring metrics such as precision, recall and F1-score. However, in case of multi-class classification it becomes tricky. The questions to ask are some of the following: Which metrics to use to score …

Continue reading

Posted in Data Science, Machine Learning, Python. Tagged with , , , .

PyTorch – How to Load & Predict using Resnet Model

In this post, you will learn about how to load and predict using pre-trained Resnet model using PyTorch library. Here is arxiv paper on Resnet. Before getting into the aspect of loading and predicting using Resnet (Residual neural network) using PyTorch, you would want to learn about how to load different pretrained models such as AlexNet, ResNet, DenseNet, GoogLenet, VGG etc. The PyTorch Torchvision projects allows you to load the models. Note that the torchvision package consists of popular datasets, model architectures, and common image transformations for computer vision. Here is the command:  The output of above will list down all the pre-trained models available for loading and prediction. You may …

Continue reading

Posted in Data Science, Deep Learning, Machine Learning, Python. Tagged with , , , , .

ROC Curve & AUC Explained with Python Examples

In this post, you will learn about ROC Curve and AUC concepts along with related concepts such as True positive and false positive rate with the help of Python examples. It is very important to learn ROC, AUC and related concepts as it helps in selecting the most appropriate machine learning models based on the model performance.  What is ROC & AUC / AUROC? Receiver operating characteristic (ROC) graphs are used for selecting the most appropriate classification models based on their performance with respect to the false positive rate (FPR) and true positive rate (TPR). These metrics are computed by shifting the decision threshold of the classifier. ROC curve is used for probabilistic models …

Continue reading

Posted in Data Science, Machine Learning, Python. Tagged with , , , .

Python – Nested Cross Validation for Algorithm Selection

In this post, you will learn about nested cross validation technique and how you could use it for selecting the most optimal algorithm out of two or more algorithms used to train machine learning model. The usage of nested cross validation technique is illustrated using Python Sklearn example. When it is about selecting models trained with a particular algorithm with most optimal combination of hyper parameters, you can adopt the model tuning techniques such as some of the following: Grid search  Randomized search Validation curve The following topics get covered in this post: Why nested cross-validation? Nested cross-validation with Python Sklearn example Why Nested Cross-Validation? Nested cross-validation technique is used for estimating …

Continue reading

Posted in Data Science, Machine Learning, Python. Tagged with , , , .

Randomized Search Explained – Python Sklearn Example

randomized search python sklearn example

In this post, you will learn about one of the machine learning model tuning technique called Randomized Search which is used to find the most optimal combination of hyper parameters for coming up with the best model. The randomized search concept will be illustrated using Python Sklearn code example. As a data scientist, you must learn some of these model tuning techniques to come up with most optimal models. You may want to check some of the other posts on tuning model parameters such as the following: Sklearn validation_curve for tuning model hyper parameters  Sklearn GridSearchCV for tuning model hyper parameters In this post, the following topics will be covered: What and why …

Continue reading

Posted in Data Science, Machine Learning, Python. Tagged with , , , .

Grid Search Explained – Python Sklearn Examples

GridSearchCV Python Sklearn Examples

In this post, you will learn about another machine learning model hyperparameter optimization technique called as Grid Search with the help of Python Sklearn code examples. In one of the earlier posts, you learned about another hyperparamater optimization technique namely validation curve. As a data scientist, it will be useful to learn some of these model tuning techniques (tuning hyperparameters) as it would help us select most appropriate models with most appropriate parameters.  The following are some of the topics covered in this post: What & Why of grid search? Grid search with Python Sklearn examples What & Why of Grid Search? Grid Search technique helps in performing exhaustive search over specified parameter (hyper parameters) values for …

Continue reading

Posted in Data Science, Machine Learning, Python. Tagged with , , , .

Validation Curves Explained – Python Sklearn Example

In this post, you will learn about validation curves with Python Sklearn example. You will learn about how validation curves can help diagnose or assess your machine learning models in relation to underfitting and overfitting. On the similar topic, I recommend you reading one of the previous post on assessing overfitting and underfitting titled Learning curves explained with Python Sklearn example. The following gets covered in this post: Why validation curves? Python Sklearn example for validation curves Why Validation Curves? As like learning curve, the validation curve also helps in diagnozing the model bias vs variance. The validation curve plot helps in selecting most appropriate model parameters (hyper-parameters). Unlike learning …

Continue reading

Posted in Data Science, Machine Learning, Python. Tagged with , , , .

Python – 5 Sets of Useful Numpy Unary Functions

In this post, you will learn about some of the 5 most popular or useful set of unary universal functions (ufuncs) provided by Python Numpy library. As data scientists, it will be useful to learn these unary functions by heart as it will help in performing arithmetic operations on sequential-like objects.  These functions can also be termed as vectorized wrapper functions which are used to perform element-wise operations. The following represents different set of popular functions: Basic arithmetic operations  Summary statistics Sorting  Minimum / maximum Array equality Basic Arithmetic Operations The following are some of the unary functions whichc an be used to perform arithmetic operations: add, subtract, multiply, divide, …

Continue reading

Posted in Data Science, Python. Tagged with , , .