# Category Archives: Data Science

## Machine Learning – Training, Validation & Test Data Set

In this post, you will learn about the concepts of training, validation, and test data sets used for training machine learning models. The post is most suitable for data science beginners or those who would like to get clarity and a good understanding of training, validation, and test data sets concepts. The following topics will be covered: Data split – training, validation, and test data set Different model performance based on different data splits Data Splits – Training, Validation & Test Data Sets You can split data into the following different sets and each data split configuration will have machine learning models having different performance: Training data set: When you …

## Why use Random Seed in Machine Learning?

In this post, you will learn about why and when do we use random seed values while training machine learning models. This is a question most likely asked by beginners data scientist/machine learning enthusiasts. We use random seed value while creating training and test data set. The goal is to make sure we get the same training and validation data set while we use different hyperparameters or machine learning algorithms in order to assess the performance of different models. This is where the random seed value comes into the picture. Different Python libraries such as scikit-learn etc have different ways of assigning random seeds. While training machine learning models using Scikit-learn, …

## Precision & Recall Explained using Covid-19 Example

In this post, you will learn about the concepts of precision, recall, and accuracy when dealing with the machine learning classification model. Given that this is Covid-19 age, the idea is to explain these concepts in terms of a machine learning classification model predicting whether the patient is Corona positive or not based on the symptoms and other details. The following model performance concepts will be described with the help of examples. What is the model precision? What is the model recall? What is the model accuracy? What is the model confusion matrix? Which metrics to use – Precision or Recall? Before getting into learning the concepts, let’s look at the data (hypothetical) derived out …

## Moving Average Method for Time-series forecasting

In this post, you will learn about the concepts of the moving average method in relation to time-series forecasting. You will get to learn Python examples in relation to training a moving average machine learning model. The following are some of the topics which will get covered in this post: What is the moving average method? Why use the moving average method? Python code example for the moving average methods What is Moving Average method? The moving average is a statistical method used for forecasting long-term trends. The technique represents taking an average of a set of numbers in a given range while moving the range. For example, let’s say …

## Difference between Data Science & Decision Science

In this post, you will learn about the difference between data science and decision science. Those venturing out to learn data science must understand whether they want to learn data science or decision science or both. The following are some of the key questions in relation to understanding the concepts related to data science and decision science. What is data science & decision science? When do we need data and decision science as part of the analytics strategy? Are there specialized courses for decision science? What are some good websites for decision sciences? What is Data Science & Decision Science? While Data science is used to extract insights from the data …

## Spend Analytics using AI & Data Science

In this post, you will learn about the high-level concepts of spend analytics in relation to procurement and how data science / machine learning & AI can be used to extract actionable insights from spend analytics. This will be useful for data analytics or business analytics professionals looking to understand the concepts of spend analytics. The following topics will get covered in this post: What is spend analytics? Why spend analytics? Spend analytics – Descriptive & Predictive Some popular spend analytics products What is Spend Analytics? Simply speaking, spend analytics is about performing systematic computational analysis to extract actionable insights from spend data. As part of spend analytics, the following are …

## Autoregressive (AR) models with Python examples

In this post, you will learn about the concepts of autoregressive (AR) models with the help of Python code examples. If you are starting on time-series forecasting, this would be useful read. Note that time-series forecasting is one of the important areas of data science / machine learning. Here are some of the topics that will be covered in the post: Autoregressive (AR) models concepts with examples Alternative methods to AR models Python code example for AR models Learning References Autoregressive (AR) Models concepts with Examples Autoregressive (AR) modeling is one of the technique used for time-series analysis. For the beginners, time series analysis represents the class of problems where the dependent variable or response variable …

## Free Datasets for Machine Learning & Deep Learning

Here is the list of free data sets for machine learning & deep learning publicly available: Machine learning problems datasets UC Irvine Machine Learning Repository: A repository of 560 datasets suitable for traditional machine learning algorithm problems such as classification and regression Public available dataset through public APIs: A list of 650+ datasets available via public API Penn machine learning dataset: The data sets cover a broad range of applications, and include binary/multi-class classification problems and regression problems, as well as combinations of categorical, ordinal, and continuous features. The good part if that the datasets is available in tabular form that makes it very useful for training models with traditional …

## Actionable Insights Examples – Turning Data into Action

In this post, you will learn about how to turn data into information and then to actionable insights with the help of few examples. It will be helpful for data analysts, data scientists, and business analysts to get a good understanding of what is actionable insight? You will understand aspects related to data-driven decision making. Before getting into the details, let’s understand what is the problem at hand? The school authority is trying to assess and improve the health of students. Here is the question it is dealing with: How could we improve the overall health of the students in the school? We will look into the approach of finding the …

## When to use Deep Learning vs Machine Learning Models?

In this post, you will learn about when to go for training deep learning models from the perspective of model performance and volume of data. As a machine learning engineer or data scientist, it always bothers as to can we use deep learning models in place of traditional machine learning models trained using algorithms such as logistic regression, SVM, tree-based algorithms, etc. The objective of this post is to provide you with perspectives on when to go for traditional machine learning models vs deep learning models. The two key criteria based on which one can decide whether to go for deep learning vs traditional machine learning models are the following: …

## Most Common Types of Machine Learning Problems

In this post, you will learn about the most common types of machine learning (ML) problems along with a few examples. Without further ado, let’s look at these problem types and understand the details. Regression Classification Clustering Time-series forecasting Anomaly detection Ranking Recommendation Data generation Optimization Problem types Details Algorithms Regression When the need is to predict numerical values, such kinds of problems are called regression problems. For example, house price prediction Linear regression, K-NN, random forest, neural networks Classification When there is a need to classify the data in different classes, it is called a classification problem. If there are two classes, it is called a binary classification problem. …

## Historical Dates & Timeline for Deep Learning

This post is a quick check on the timeline including historical dates in relation to the evolution of deep learning. Without further ado, let’s get to the important dates and what happened on those dates in relation to deep learning: Year Details/Paper Information Who’s who 1943 An artificial neuron was proposed as a computational model of the “nerve net” in the brain. Paper: “A logical calculus of the ideas immanent in nervous activity,” Bulletin of Mathematical Biophysics, volume 5, 1943 Warren McCulloch, Walter Pitts Late 1950s A neural network application by reducing noise in phone lines was developed Paper: Andrew Goldstein, “Bernard Widrow oral history,” IEEE Global History Network, 1997 Bernard …

## Machine Learning Techniques for Stock Price Prediction

In this post, you will learn about some of the popular machine learning techniques in relation to making stock price movement (direction of stock price) predictions and classify whether a stock is a buy, sell, or hold. The stock price prediction problem is a fairly complex problem and different techniques can be used appropriately to achieve good prediction accuracy. Here are the three most popular or common techniques used for building machine learning models for stock price movement (upward / downward) and classifying whether a stock is a buy, sell, or hold: Fundamental analysis: In fundamental analysis (FA), the machine learning models can be trained using data related to companies’ …

## Machine Learning – Why use Confidence Intervals?

In this post, you will learn about the concepts of confidence intervals in relation to machine learning models and related concepts with the help of an example and Python code examples. When you get a hypothesis function by training a machine learning classification model, you evaluate the hypothesis/model by calculating the classification error. The classification error is calculated on the sample of the data used for training the model. However, does this classification error for the sample (sample error) also represent (same as) the classification error of the hypothesis/model for the entire population (true error)? How can the true error be represented as a function of the sample error? This is …

## Great Mind Maps for Learning Machine Learning

In this post, you will get to look at some of the great mind-maps for learning different machine learning topics. I have gathered these mind maps from different web pages on the Internet. The idea is to reinforce our understanding of different machine learning topics using pictures. You may have heard the proverb – A picture is worth a thousand words. Keeping this in mind, I thought to pull some of the great mind maps posted on different web pages. I would be updating this blog post from time-to-time. If you are a beginner data scientist or an experienced one, you may want to bookmark this page for refreshing your …

## Different Types of Distance Measures in Machine Learning

In this post, you will learn different types of distance measures used in different machine learning algorithms such as K-nearest neighbours, K-means etc. Distance measures are used to measure the similarity between two or more vectors in multi-dimensional space. The following represents different forms of distance metrics / measures: Geometric distances Computational distances Statistical distances Geometric Distance Measures Geometric distance metrics, primarily, tends to measure the similarity between two or more vectors solely based on the distance between two points in multi-dimensional space. The examples of such type of geometric distance measures are Minkowski distance, Euclidean distance and Manhattan distance. One other different form of geometric distance is cosine similarity which will discuss …