# Author Archives: Ajitesh Kumar

## Difference between Online & Batch Learning

In this post, you will learn about the concepts and differences between online and batch learning in relation to how machine learning models in production learn incrementally from the stream of incoming data. It is one of the most important aspects of designing machine learning systems. Data science architects would require to get a good understanding of when to go for online learning and when to go for batch or offline learning. What is Batch Learning? Batch learning represents the training of machine learning models in a batch manner. The data get accumulated over a period of time. The models then get trained with the accumulated data from time to …

## Scikit-learn vs Tensorflow – When to use What?

In this post, you will learn about when to use Scikit-learn vs Tensorflow. For data scientists/machine learning enthusiasts, it is very important to understand the difference such that they could use these libraries appropriately while working on different business use cases. When to use Scikit-learn? Scikit-learn is a great entry point for beginners data scientists. It provides an efficient implementation of many machine learning algorithms. In addition, it is very simple and easy to use. You can get started with Scikit-learn in a very easy manner by using Jupyter notebook. Scikit-learn can be used to solve different kinds of machine learning problems including some of the following: Classification (SVM, nearest neighbors, random …

## Data Science Architect Interview Questions

In this post, you will learn about interview questions that can be asked if you are going for a data scientist architect job. Data science architect needs to have knowledge in both data science/machine learning and cloud architecture. In addition, it also helps if the person is hands-on with programming languages such as Python & R. Without further ado, let’s get into some of the common questions right away. I will add further questions in the time to come. Q. How do you go about architecting a data science or machine learning solution for any business problem? Solving a business problem using data science or machine learning based solution can …

## Drivetrain Approach for Machine Learning

In this post, you will learn about a very popular approach or methodology called as Drivetrain approach coined by Jeremy Howard. The approach provides you a process to design data products that provide you with actionable outcomes while using one or more machine learning models. The approach is indeed very useful for data scientists/machine learning enthusiasts at all levels. However, this would prove to be a great guide for data science architects whose key responsibility includes designing the data products. Without further ado, let’s do a deep dive. Why drivetrain approach? Before getting into the drivetrain approach and understands the basic concepts, Lets understand why drivetrain approach in the first …

## Machine Learning – Training, Validation & Test Data Set

In this post, you will learn about the concepts of training, validation, and test data sets used for training machine learning models. The post is most suitable for data science beginners or those who would like to get clarity and a good understanding of training, validation, and test data sets concepts. The following topics will be covered: Data split – training, validation, and test data set Different model performance based on different data splits Data Splits – Training, Validation & Test Data Sets You can split data into the following different sets and each data split configuration will have machine learning models having different performance: Training data set: When you …

## Why use Random Seed in Machine Learning?

In this post, you will learn about why and when do we use random seed values while training machine learning models. This is a question most likely asked by beginners data scientist/machine learning enthusiasts. We use random seed value while creating training and test data set. The goal is to make sure we get the same training and validation data set while we use different hyperparameters or machine learning algorithms in order to assess the performance of different models. This is where the random seed value comes into the picture. Different Python libraries such as scikit-learn etc have different ways of assigning random seeds. While training machine learning models using Scikit-learn, …

## Deep Learning – Top 5 Online Jupyter Notebooks Servers

In this post, you will get information regarding the online Jupyter notebooks platform (GPU-based) which you can use to get started with both, machine learning and deep learning. The list consists of both freely available and paid options of online Jupyter notebook available with GPUs. When starting with GPUs, it is recommended to use rented options available online rather than buying your own GPU servers. There are online GPU Linux servers available (free and paid options) that can be used to train deep learning & machine learning models. I will be writing about it in my next post. Here is the list of Jupyter notebook platforms that could be used …

## Top Deep Learning Myths You should know

This post highlights the top deep learning myths you should know. This is important to understand in order to leverage deep learning to solve complex AI problems. Many times, beginner to intermediate level machine learning enthusiasts don’t consider deep learning based on the myths discussed in this post. Without further ado, let’s look at the topmost and most common deep learning myths: Good understanding of complex mathematical concepts: Well, that is just a myth. At times, they say that one needs to have a higher degree in Mathematics & statistics. That is not true. With tools and programming languages along with libraries available today, basic mathematical concepts should be able …

## First Principles Understanding based on Physics

In this post, you will understand the concepts of first principles and first principles thinking based on physics concepts. Let’s jump in right away. In the meanwhile, you could also access one of my other posts on the first principles: First-principles thinking explained with examples. It will help you get started on what are first principles and what is first principle thinking. One of the most fundamental Physics concept to understand the first principle is this: Every physical quantity can be represented as the derived quantity or fundamental quantity. The fundamental quantities, also termed basic quantity, are most basic or fundamental and unique and there are no overlaps between them. …

## Precision & Recall Explained using Covid-19 Example

In this post, you will learn about the concepts of precision, recall, and accuracy when dealing with the machine learning classification model. Given that this is Covid-19 age, the idea is to explain these concepts in terms of a machine learning classification model predicting whether the patient is Corona positive or not based on the symptoms and other details. The following model performance concepts will be described with the help of examples. What is the model precision? What is the model recall? What is the model accuracy? What is the model confusion matrix? Which metrics to use – Precision or Recall? Before getting into learning the concepts, let’s look at the data (hypothetical) derived out …

## Moving Average Method for Time-series forecasting

In this post, you will learn about the concepts of the moving average method in relation to time-series forecasting. You will get to learn Python examples in relation to training a moving average machine learning model. The following are some of the topics which will get covered in this post: What is the moving average method? Why use the moving average method? Python code example for the moving average methods What is Moving Average method? The moving average is a statistical method used for forecasting long-term trends. The technique represents taking an average of a set of numbers in a given range while moving the range. For example, let’s say …

## Difference between Data Science & Decision Science

In this post, you will learn about the difference between data science and decision science. Those venturing out to learn data science must understand whether they want to learn data science or decision science or both. The following are some of the key questions in relation to understanding the concepts related to data science and decision science. What is data science & decision science? When do we need data and decision science as part of the analytics strategy? Are there specialized courses for decision science? What are some good websites for decision sciences? What is Data Science & Decision Science? While Data science is used to extract insights from the data …

## Spend Analytics using AI & Data Science

In this post, you will learn about the high-level concepts of spend analytics in relation to procurement and how data science / machine learning & AI can be used to extract actionable insights from spend analytics. This will be useful for data analytics or business analytics professionals looking to understand the concepts of spend analytics. The following topics will get covered in this post: What is spend analytics? Why spend analytics? Spend analytics – Descriptive & Predictive Some popular spend analytics products What is Spend Analytics? Simply speaking, spend analytics is about performing systematic computational analysis to extract actionable insights from spend data. As part of spend analytics, the following are …

## Autoregressive (AR) models with Python examples

In this post, you will learn about the concepts of autoregressive (AR) models with the help of Python code examples. If you are starting on time-series forecasting, this would be useful read. Note that time-series forecasting is one of the important areas of data science / machine learning. Here are some of the topics that will be covered in the post: Autoregressive (AR) models concepts with examples Alternative methods to AR models Python code example for AR models Learning References Autoregressive (AR) Models concepts with Examples Autoregressive (AR) modeling is one of the technique used for time-series analysis. For the beginners, time series analysis represents the class of problems where the dependent variable or response variable …

## Image Classification & Machine learning

In this post, you will learn about how could image classification problems be solved using machine learning techniques. The following are some of the topics which will be covered: How does the computer learn about an image? How could machine learning be used to classify the images? How does the computer learn about an image? Unlike the human beings, the image has to be converted into numbers for computer to learn about the image. So, the question is how can an image be converted into numbers? The most fundamental element or the smallest building block of an image is a pixel. An image can be represented as a set of …

## Free Datasets for Machine Learning & Deep Learning

Here is the list of free data sets for machine learning & deep learning publicly available: Machine learning problems datasets UC Irvine Machine Learning Repository: A repository of 560 datasets suitable for traditional machine learning algorithm problems such as classification and regression Public available dataset through public APIs: A list of 650+ datasets available via public API Penn machine learning dataset: The data sets cover a broad range of applications, and include binary/multi-class classification problems and regression problems, as well as combinations of categorical, ordinal, and continuous features. The good part if that the datasets is available in tabular form that makes it very useful for training models with traditional …