## Top 10 Data Science Skills for Product Managers

In this post, you will learn about some of the top data science skills / concepts which may be required for product managers / business analyst to have, in order to create useful machine learning based solutions. Here are some of the topics / concepts which need to be understood well by product managers / business analysts in order to tackle day-to-day challenges while working with data science / machine learning teams. Knowing these concepts will help product managers / business analyst acquire enough skills in order to solve machine learning based problems. Understanding the difference between AI, machine learning, data science, deep learning Which problems are machine learning problems? …

## Python – Extract Text from PDF file using PDFMiner

In this post, you will get a quick code sample on how to use PDFMiner, a Python library, to extract text from PDF files and perform text analysis. I will be posting several other posts in relation to how to use other Python libraries for extracting text from PDF files. In this post, the following topic will get covered: How to set up PDFMiner Python code for extracting text from PDF file using PDFMiner Setting up PDFMiner Here is how you would set up PDFMiner.six. You could execute the following command to get set up with PDFMiner while working in Jupyter notebook: Python Code for Extracting Text from PDF file …

## NLTK Hello World Python Example

In this post, you will learn about getting started with natural language processing (NLP) with NLTK (Natural Language Toolkit), a platform to work with human languages using Python language. The post is titled hello world because it helps you get started with NLTK while also learning some important aspects of processing language. In this post, the following will be covered: Install / Set up NLTK Common NLTK commands for language processing operations Install / Set up NLTK This is what you need to do set up NLTK. Make sure you have Python latest version set up as NLTK requires Python version 3.5, 3.6, 3.7, or 3.8 to be set up. In Jupyter notebook, you could execute …

## 8 Key AI Challenges for Telemedicine / Telehealth

In this post, you will learn about some of key challenges of implementing Telemedicine / Telehealth. In case you are working in the field of data science / machine learning, you may want to go through some of the challenges, primarily AI related, which is thrown in Telemedicine domain due to upsurge in need of reliable Telemedicine services. Here are the slides I recently presented in Digital Data Science Conclave hosted by KIIT University. The primary focus is to make sure appropriate controls are in place to make responsible use of AI (Responsible AI). Here are the top 8 challenges which need to be addressed to take full advantage of AI, RPA …

## RANSAC Regression Explained with Python Examples

In this post, you will learn about the concepts of RANSAC regression algorithm along with Python Sklearn example for RANSAC regression implementation using RANSACRegressor. RANSAC regression algorithm is useful for handling the outliers dataset. Instead of taking care of outliers using statistical and other techniques, one can use RANSAC regression algorithm which takes care of the outlier data. In this post, the following topics are covered: Introduction to RANSAC regression RANSAC Regression Python code example Introduction to RANSAC Regression RANSAC (RANdom SAmple Consensus) algorithm takes linear regression algorithm to the next level by excluding the outliers in the training dataset. The presence of outliers in the training dataset does impact …

## Mean Squared Error or R-Squared – Which one to use?

In this post, you will learn about the concepts of mean-squared error (MSE) and R-squared, difference between them and which one to use when working with regression models such as linear regression model. You also learn Python examples to understand the concepts in a better manner. In this post, the following topics are covered: Introduction to Mean Squared Error (MSE) and R-Squared Difference between MSE and R-Squared MSE or R-Squared – Which one to use? MSE and R-Squared Python code example Introduction to Mean Square Error (MSE) and R-Squared In this section, you will learn about the concepts of mean squared error and R-squared. These are used for evaluating the …

## Linear Regression Explained with Python Examples

In this post, you will learn about concepts of linear regression along with Python Sklearn examples for training linear regression models. Linear regression belongs to class of parametric models and used to train supervised models. The following topics are covered in this post: Introduction to linear regression Linear regression concepts / terminologies Linear regression python code example Introduction to Linear Regression Linear regression is a machine learning algorithm used to predict the value of continuous response variable. The predictive analytics problems that are solved using linear regression models are called as supervised learning problems as it requires that the value of response / target variables must be present and used for training the models. …

## Correlation Concepts, Matrix & Heatmap using Seaborn

In this post, you will learn about the concepts of Correlation and how to draw Correlation Heatmap using Python Seaborn library for different columns in Pandas dataframe. The following are some of the topics covered in this post: Introduction to Correlation What is correlation heatmap? Corrleation heatmap Pandas / Seaborn python example Introduction to Correlation Correlation is a term used to represent the statistical measure of linear relationship between two variables. It can also be defined as the measure of dependence between two different variables. If there are multiple variables and the goal is to find correlation between all of these variables and store them using appropriate data structure, the …

## Beta Distribution Explained with Python Examples

In this post, you will learn about Beta probability distribution with the help of Python examples. As a data scientist, it is very important to understand beta distribution as it is used very commonly as prior in Bayesian modeling. In this post, the following topics get covered: Beta distribution intuition and examples Introduction to beta distribution Beta distribution python examples Beta Distribution Intuition & Examples Beta distribution is widely used to model the prior beliefs or probability distribution in real world applications. Here is a great article on understanding beta distribution with an example of baseball game. You may want to pay attention to the fact that even if the baseball …

## Bernoulli Distribution Explained with Python Examples

In this post, you will learn about the concepts of Bernoulli Distribution along with real-world examples and Python code samples. As a data scientist, it is very important to understand statistical concepts around various different probability distributions to understand the data distribution in a better manner. In this post, the following topics will get covered: Introduction to Bernoulli distribution Bernoulli distribution real-world examples Bernoulli distribution python code examples Introduction to Bernoulli Distribution Bernoulli distribution is a discrete probability distribution representing the discrete probabilities of a random variable which can take only one of the two possible values such as 1 or 0, yes or no, true or false etc. The probability of …

## K-Nearest Neighbors Explained with Python Examples

In this post, you will learn about K-nearest neighbors algorithm with Python Sklearn examples. K-nearest neighbors algorithm is used for solving both classification and regression machine learning problems. The following topics will get covered in this post: Introduction to K-nearest neighbors What is the most appropriate value of K? K-NN Python example Introduction to K-nearest neighbors K-nearest neighbors is a supervised learning algorithm which can be used to solve both classification and regression problems. It belongs to the class of non-parametric models. The models don’t learn parameters from training data set to come up with a discriminative function in order to classify the test or unseen data set. Rather model memorizes the training data …

## Local & Global Minima Explained with Examples

In this post, you will learn the concepts of local and global minima with illustrative pictures and examples. Optimization problems are one of the key types of data analytics problems. Prescriptive analytics are mostly optimisation problems. Other types of data analytics problems includes descriptive analytics (what has happened?) and predictive analytics (what can happen?). Predictive analytics primarily makes use of machine learning (ML) algorithms. ML algorithms are based on optimising (minimising) the cost or loss function. In order to become very good at finding solutions to optimisation problems (relating to minimising functions) including machine learning based problems, one must get a good understanding of the concepts of Local minima / global …

## Gradient Descent Explained Simply with Examples

In this post, you will learn about gradient descent algorithm with simple examples. It is attempted to make the explanation in layman terms. For a data scientist, it is of utmost importance to get a good grasp on the concepts of gradient descent algorithm as it is widely used for optimising the objective function / loss function related to various machine learning algorithms such as regression, neural network etc in order to learn weights / parameters. The related topics such as the following are covered in this post: Introduction to Gradient Descent algorithm Different types of gradient descent List of top 5 Youtube videos on Gradient descent algorithm Introduction to …

## Deep Learning Explained Simply in Layman Terms

In this post, you will get to learn deep learning through simple explanation (layman terms) and examples. Deep learning is part or subset of machine learning and not something which is different than machine learning. Many of us when starting to learn machine learning try and look for the answers to the question “what is the difference between machine learning & deep learning?”. Well, both machine learning and deep learning is about learning from past experience (data) and make predictions on future data. Deep learning can be termed as an approach to machine learning where learning from past data happens based on artificial neural network (a mathematical model mimicking human brain). …

## Tensor Broadcasting Explained with Examples

In this post, you will learn about the concepts of Tensor Broadcasting with the help of Python Numpy examples. Recall that Tensor is defined as the container of data (primarily numerical) most fundamental data structure used in Keras and Tensorflow. You may want to check out a related article on Tensor – Tensor explained with Python Numpy examples. Broadcasting of tensor is borrowed from Numpy broadcasting. Broadcasting is technique used for performing arithmetic operations between Numpy arrays / Tensors having different shapes. In this technique, the smaller array is transformed appropriately according to larger array (broadcasted to large array) such that the arithmetic operations can be performed on these arrays. Take a look …

## Elbow Method vs Silhouette Score – Which is Better?

In this post, you will learn about two different methods to use for finding optimal number of clusters in K-means clustering. These methods are commonly termed as Elbow method and Silhouette analysis. Selecting optimal number of clusters is key to applying clustering algorithm to the dataset. As a data scientist, knowing these two techniques to find out optimal number of clusters would prove to be very helpful while In this relation, you may want to check out detailed posts on the following: K-means clustering elbow method and SSE plot K-means Silhouette score explained with Python examples In this post, we will use YellowBricks machine learning visualization library for creating the plot related …