# Tag Archives: Data Science

## Sklearn SimpleImputer Example – Impute Missing Data

In this post, you will learn about how to use Python’s Sklearn SimpleImputer for imputing / replacing numerical & categorical missing data using different strategies. In one of the related article posted sometime back, the usage of fillna method of Pandas DataFrame is discussed. Handling missing values is key part of data preprocessing and hence, it is of utmost importance for data scientists / machine learning Engineers to learn different techniques in relation imputing / replacing numerical or categorical missing values with appropriate value based on appropriate strategies. SimpleImputer Python Code Example SimpleImputer is a class in the sklearn.impute module that can be used to replace missing values in a dataset, using a …

## Pandas dropna: Drop Rows & Columns with Missing Values

In this blog post, we will be discussing Pandas’ dropna method. This method is used for dropping rows and columns that have missing values. Pandas is a powerful data analysis library for Python, and the dropna function is one of its most useful features. As data scientists, it is important to be able to handle missing data, and Pandas’ dropna function makes this easy. Pandas dropna Method Pandas’ dropna function allows us to drop rows or columns with missing values in our dataframe. Find the documentation of Pandas dropna method on this page: pandas.DataFrame.dropna. The dropna method looks like the following: DataFrame.dropna(axis=0, how=’any’, thresh=None, subset=None, inplace=False) Given the above method and parameters, the following …

## Accuracy, Precision, Recall & F1-Score – Python Examples

Classification models are used in classification problems to predict the target class of the data sample. The classification model predicts the probability that each instance belongs to one class or another. It is important to evaluate the performance of the classifications model in order to reliably use these models in production for solving real-world problems. Performance measures in machine learning classification models are used to assess how well machine learning classification models perform in a given context. These performance metrics include accuracy, precision, recall, and F1-score. Because it helps us understand the strengths and limitations of these models when making predictions in new situations, model performance is essential for machine …

## Support Vector Machine (SVM) Python Example

In this post, you will learn about the concepts of Support Vector Machine (SVM) with the help of Python code example for building a machine learning classification model. We will work with Python Sklearn package for building the model. As data scientists, it is important to get a good grasp on SVM algorithm and related aspects. What is Support Vector Machine (SVM)? Support vector machine (SVM) is a supervised machine learning algorithm that can be used for both classification and regression tasks. At times, SVM for classification is termed as support vector classification (SVC) and SVM for regression is termed as support vector regression (SVR). In this post, we will learn about SVM …

## Overfitting & Underfitting in Machine Learning

The performance of the machine learning models depends upon two key concepts called underfitting and overfitting. In this post, you will learn about some of the key concepts of overfitting and underfitting in relation to machine learning models. In addition, you will also get a chance to test your understanding by attempting the quiz. The quiz will help you prepare well for interview questions in relation to underfitting & overfitting. As data scientists, you must get a good understanding of the overfitting and underfitting concepts. Introduction to Overfitting & Underfitting Assuming an independent and identically distributed (I.I.d) dataset, when the prediction error on both the training and validation dataset is …

## Spend Analytics Use Cases: AI & Data Science

In this post, you will learn about the high-level concepts of spend analytics in relation to procurement and how data science / machine learning & AI can be used to extract actionable insights as part of spend analytics. This will be useful for procurement professionals such as category managers, sourcing managers, and procurement analytics stakeholders looking to understand the concepts of spend analytics and how they can drive decisions based on spend analytics. What is Spend Analytics? Simply speaking, spend analytics is about performing systematic computational analysis to extract actionable insights from spend and savings data across different categories of spends in order to achieve desired business outcomes such as cost savings, …

## Logistic Regression Explained with Python Example

In this blog post, we will discuss the logistic regression machine learning algorithm with a python example. Logistic regression is a type of regression algorithm that is used to predict the probability of occurrence of an event. It is often used in machine learning applications. In this tutorial, we will use python to implement logistic regression for binary classification problems. What is Logistic Regression? Logistic regression is a machine learning algorithm used for classification problems. That is, it can be used to predict whether an instance belongs to one class or the other. For example, it could be used to predict whether a person is male or female, based on …

## Perceptron Explained using Python Example

In this post, you will learn about the concepts of Perceptron with the help of Python example. It is very important for data scientists to understand the concepts related to Perceptron as a good understanding lays the foundation of learning advanced concepts of neural networks including deep neural networks (deep learning). What is Perceptron? Perceptron is a machine learning algorithm which mimics how a neuron in the brain works. It is also called as single layer neural network consisting of a single neuron. The output of this neural network is decided based on the outcome of just one activation function associated with the single neuron. In perceptron, the forward propagation of information happens. Deep …

## Classification Problems Real-life Examples

In this post, you will learn about some popular and most common real-life examples of machine learning classification problems. For beginner data scientists, these examples will prove to be helpful to gain perspectives on real-world problems which can be termed as machine learning classification problems. This post will be updated from time-to-time to include interesting real-life examples which can be solved by training machine learning classification models. Before going ahead and looking into examples, let’s understand a little about what is machine learning (ML) classification problem. You may as well skip this section if you are familiar with the definition of machine learning classification problems & solutions. You may want …

## Linear vs Non-linear Data: How to Know

In this post, you will learn the techniques in relation to knowing whether the given data set is linear or non-linear. Based on the type of machine learning problems (such as classification or regression) you are trying to solve, you could apply different techniques to determine whether the given data set is linear or non-linear. For a data scientist, it is very important to know whether the data is linear or not as it helps to choose appropriate algorithms to train a high-performance model. You will learn techniques such as the following for determining whether the data is linear or non-linear: Use scatter plot when dealing with classification problems Use …

## Python – Creating Scatter Plot with IRIS Dataset

In this blog post, we will be learning how to create a Scatter Plot with the IRIS dataset using Python. The IRIS dataset is a collection of data that is used to demonstrate the properties of various statistical models. It contains information about 50 observations on four different variables: Petal Length, Petal Width, Sepal Length, and Sepal Width. As data scientists, it is important for us to be able to visualize the data that we are working with. Scatter plots are a great way to do this because they show the relationship between two variables. In this post, we have plotted and explored how how Petal Length and Sepal Length …

## Neural Network Explained with Perceptron Example

Neural networks are an important part of machine learning, so it is essential to understand how they work. A neural network is a computer system that has been modeled based on a biological neural network comprising neurons connected with each other. It can be built to solve machine learning tasks, like classification and regression problems. The perceptron algorithm is a representation of how neural networks work. The artificial neurons were first proposed by Frank Rosenblatt in 1957 as models for the human brain’s perception mechanism. This post will explain the basics of neural networks with a perceptron example. You will understand how a neural network is built using perceptrons. This …

## Chi-square test – Types, Concepts, Examples

The Chi-square (χ2) test is a statistical test used to determine whether the distribution of observed data is consistent with the distribution of data expected under a particular hypothesis. The Chi-square test can be used to compare two distributions, or to assess the goodness of fit of a given distribution to observed data. In this blog post, we will discuss the types of Chi-square tests, the concepts behind them, and how to perform them using Python / R. As data scientists, it is important to have a strong understanding of the Chi-square test so that we can use it to make informed decisions about our data. We will also provide …

## Hypothesis Testing Steps & Real Life Examples

Hypothesis testing is a technique that helps scientists, researchers, or for that matter, anyone test the validity of their claims or hypotheses about real-world or real-life events. Hypothesis testing techniques are often used in statistics and data science to analyze whether the claims about the occurrence of the events are true, whether the results returned by performance metrics of machine learning models are representative of the models or they happened by chance. This blog post will cover some of the key statistical concepts including steps and examples in relation to what is hypothesis testing, and, how to formulate them. The knowledge of hypothesis formulation and hypothesis testing holds the key …

## Insurance Machine Learning Use Cases

As insurance companies face increasing competition and ever-changing customer demands, they are turning to machine learning for help. Machine learning / AI can be used in a variety of ways to improve insurance operations, from developing new products and services to improving customer experience. It would be helpful for product manager and data science architects to get a good understanding around some of the use cases which can be addressed / automated using machine learning / AI based solutions. In this blog post, we will explore some of the most common insurance machine learning / AI use cases. Stay tuned for future posts that will dive into each of these …

## Invoice Processing Machine Learning Use Cases

Invoice processing is a critical part of any business. It’s the process of creating, managing, and paying invoices. Without invoice processing, businesses would have a difficult time keeping track of their finances. There are many different invoice processing use cases. For example, businesses can use invoice processing to keep track of customer payments, manage vendor contracts, and streamline their accounting processes. Invoice processing can also be used to detect fraud and prevent errors. Machine learning / AI can be used to improve invoice processing in a number of ways. As a product manager, it will be helpful to understand these use cases and how machine learning can be used to …