# Category Archives: Data Science

## Spend Analytics Use Cases: AI & Data Science

In this post, you will learn about the high-level concepts of spend analytics in relation to procurement and how data science / machine learning & AI can be used to extract actionable insights as part of spend analytics. This will be useful for procurement professionals such as category managers, and procurement analytics stakeholders looking to understand the concepts of spend analytics and how they can drive decisions based on spend analytics. What is Spend Analytics? Simply speaking, spend analytics is about performing systematic computational analysis to extract actionable insights from spend and savings data in order to achieve desired business outcomes such as cost savings, cost avoidance, spend forecasting, spend anomalies management, …

## What are Features in Machine Learning?

Machine learning is a field of machine intelligence concerned with the design and development of algorithms and models that allow computers to learn without being explicitly programmed. Machine learning has many applications including those related to regression, classification, clustering, natural language processing, audio and video related, computer vision, etc. Machine learning requires training one or more models using different algorithms. Check out this detailed post in relation to learning machine learning concepts – What is Machine Learning? Concepts & Examples. One of the most important aspects of the machine learning model is identifying the features which will help create a great model, the model that performs well on unseen data. …

## SVM Classifier using Sklearn: Code Examples

In this post, you will learn about how to train an SVM Classifier using Scikit Learn or SKLearn implementation with the help of code examples/samples. An SVM classifier, or support vector machine classifier, is a type of machine learning algorithm that can be used to analyze and classify data. A support vector machine is a supervised machine learning algorithm that can be used for both classification and regression tasks. The Support vector machine classifier works by finding the hyperplane that maximizes the margin between the two classes. The Support vector machine algorithm is also known as a max-margin classifier. Support vector machine is a powerful tool for machine learning and has been widely used …

## Two sample Z-test for Proportions: Formula & Examples

In statistics, a two-sample z-test for proportions is a method used to determine whether two samples are drawn from the same population. This test is used when the population proportion is unknown and there is not enough information to use the chi-squared distribution. The test uses the standard normal distribution to calculate the test statistic. As data scientists, it is important to know how to conduct this test in order to determine whether two proportions are equal. In this blog post, we will discuss the formula and examples of the two-proportion Z-test. What is two proportion Z-test? A two-proportion Z-test is a statistical hypothesis test used to determine whether two …

## Hold-out Method for Training Machine Learning Models

The hold-out method for training the machine learning models is a technique that involves splitting the data into different sets: one set for training, and other sets for validation and testing. The hold-out method is used to check how well a machine learning model will perform on the new data. In this post, you will learn about the hold-out method used during the process of training the machine learning model. Do check out my post on what is machine learning? concepts & examples for a detailed understanding of different aspects related to the basics of machine learning. Also, check out a related post on what is data science? When evaluating …

## Different types of Time-series Forecasting Models

Forecasting is the process of predicting future events based on past and present data. Time-series forecasting is a type of forecasting that predicts future events based on time-stamped data points. There are many different types of time-series forecasting models, each with its own strengths and weaknesses. In this blog post, we will discuss the most common time-series forecasting machine learning models such as the following, and provide examples of how they can be used to predict future events. Autoregressive (AR) model Moving average (MA) model Autoregressive moving average (ARMA) model Autoregressive integrated moving average (ARIMA) model Seasonal autoregressive integrated moving average (SARIMA) model Vector autoregressive (VAR) model Vector error correction …

## Autoregressive (AR) models with Python examples

Autoregressive (AR) models are a subset of time series models, which can be used to predict future values based on previous observations. AR models use regression techniques and rely on autocorrelation in order to make accurate predictions. This blog post will provide Python code examples that demonstrate how you can implement an AR model for your own predictive analytics project. You will learn about the concepts of autoregressive (AR) models with the help of Python code examples. If you are starting on time-series forecasting, this would be a useful read. Note that time-series forecasting is one of the important areas of data science/machine learning. For beginners, time-series forecasting is the process of using a model …

## Ridge Regression Concepts & Python example

Ridge regression is a type of linear regression that penalizes ridge coefficients. This technique can be used to reduce the effects of multicollinearity in ridge regression, which may result from high correlations among predictors or between predictors and independent variables. In this tutorial, we will explain ridge regression with a Python example. What is Ridge Regression? Ridge regression is a type of linear regression technique that is used in machine learning to reduce the overfitting of linear models. Recall that Linear regression is a method of modeling data that represents relationships between a response variable and one or more predictor variables. Ridge regression is used when there are multiple variables that …

## Lasso Regression Explained with Python Example

In this post, you will learn concepts of Lasso regression along with Python Sklearn examples. Lasso regression algorithm introduces penalty against model complexity (a large number of parameters) using regularization parameter. The other two similar forms of regularized linear regression are Ridge regression and Elasticnet regression which will be discussed in future posts. In this post, the following topics are discussed: What’s Lasso Regression? Lasso regression is a machine learning algorithm that can be used to perform linear regression while also reducing the number of features used in the model. Lasso stands for least absolute shrinkage and selection operator. Pay attention to the words, “least absolute shrinkage” and “selection”. We will …

## Difference between Data Science & Decision Science

Data science and decision science are two data-driven fields that have grown in prominence over the past few years. Data scientists use data to come up with conclusions or predictions about things like customer behavior, while decision scientists combine data with other information sources to make decisions. The difference between data science and decision science is important for business owners who want to make informed decisions. In this post, you will learn about the difference between data science and decision science. Those venturing out to learn data science must understand whether they want to learn data science or decision science or both. The following are some of the key questions …

## Bias-Variance Trade-off Concepts & Interview Questions

Bias and variance are two important properties of machine learning models. In this post, you will learn about the concepts of bias & variance in relation to the machine learning (ML) models. Bias refers to how well your model can represent all possible outcomes, whereas variance refers to how sensitive your predictions are to changes in the model’s parameters. The tradeoff between bias and variance is a fundamental problem in machine learning, and it is often necessary to experiment with different model types in order to find the balance that works best for a given dataset. In addition to learning the concepts related to Bias vs variance trade-off, you would …

## Hypothesis Testing Explained with Examples

Hypothesis testing is a statistical technique that helps scientists and researchers test the validity of their claims about real-world/real-life events. Hypothesis testing techniques are often used in statistics and data science to analyze whether the claims about the occurrence of the events are true. This blog post will cover some of the key statistical concepts along with examples in relation to how to formulate hypotheses for hypothesis testing. The knowledge of hypothesis formulation and hypothesis testing holds the key to solving business problems using data science. You may want to check out this post on how hypothesis testing is at the heart of data science – What is data science? In …

## What is Data Science? Concepts & Examples

What is data science? This is a question that many people are asking, and for good reason. Data science is a relatively new field, and it covers a lot of ground. In this blog post, we will discuss what data science is, and we will give some examples of how it can be used to solve problems. Stay tuned, because by the end of this post you will have a clear understanding of what data science is and why it matters! What is Data Science? Before understanding what is data science, let’s understand what is science? Science can be defined as a systematic and logical approach to discovering how things …

## Machine Learning – Sensitivity vs Specificity Difference

In this post, we will try and understand the concepts behind machine learning model evaluation metrics such as sensitivity and specificity which is used to determine the performance of the machine learning models. The post also describes the differences between sensitivity and specificity. The concepts have been explained using the model for predicting whether a person is suffering from a disease or not. You may want to check out another related post titled ROC Curve & AUC Explained with Python examples. What is Sensitivity Sensitivity is a measure of how well a machine learning model can detect positive instances. It is also known as the true positive rate (TPR) or recall. Sensitivity is …

## Stochastic Gradient Descent Python Example

In this post, you will learn the concepts of Stochastic Gradient Descent (SGD) using a Python example. Stochastic gradient descent is an optimization algorithm that is used to optimize the cost function while training machine learning models. The most popular algorithm such as gradient descent takes a long time to converge for large datasets. This is where the variant of gradient descent such as stochastic gradient descent comes into the picture. In order to demonstrate Stochastic gradient descent concepts, the Perceptron machine learning algorithm is used. Recall that Perceptron is also called a single-layer neural network. Before getting into details, let’s quickly understand the concepts of Perceptron and the underlying learning …

## Dummy Variables in Regression Models: Python, R

In linear regression, dummy variables are used to represent the categorical variables in the model. There are a few different ways that dummy variables can be created, and we will explore a few of them in this blog post. We will also take a look at some examples to help illustrate how dummy variables work. We will also understand concepts related to the dummy variable trap. By the end of this post, you should have a better understanding of how to use dummy variables in linear regression models. As a data scientist, it is important to understand how to use linear regression and dummy variables. What are dummy variables in …