Category Archives: Machine Learning
Decision Tree Concepts, Examples, Interview Questions

Decision tree is one of the most commonly used machine learning algorithms which can be used for solving both classification and regression problems. It is very simple to understand and use. Here is a lighter one representing how decision trees and related algorithms (random forest etc) are agile enough for usage. In this post, you will learn about some of the following in relation to machine learning algorithm – decision trees vis-a-vis one of the popular C5.0 algorithm used to build a decision tree for classification. In another post, we shall also be looking at CART methodology for building a decision tree model for classification. The post also presents a …
Bias-Variance in Machine Learning: Trade-off, Examples

Bias and variance are two important properties of machine learning models. In this post, you will learn about the concepts of bias & variance in relation to the machine learning (ML) models. Bias refers to how well your model can represent all possible outcomes, whereas variance refers to how sensitive your predictions are to changes in the model’s parameters. The tradeoff between bias and variance is a fundamental problem in machine learning, and it is often necessary to experiment with different model types in order to find the balance that works best for a given dataset. In addition to learning the concepts related to bias vs variance trade-off, you would …
Account Receivables Use Cases for Machine Learning / AI

Account receivables (AR) account for a significant portion of total assets and revenue. However, the account receivable process is typically handled manually by accountants or finance staff. This can lead to inefficiencies when it comes to identifying account issues and resolving them quickly. In addition, there are opportunities of leveraging data-driven decision making in different areas related to account receivables. In this blog post, you will learn about account receivables analytics use cases and how AI/machine learning and deep learning techniques can be used to streamline account receivable processes. For product managers and data scientists, this post will prove to be useful to understand different machine learning use cases related …
True Error vs Sample Error: Difference
Understanding the differences between true error and sample error is an important aspect of data science. In this blog post, we will be exploring the difference between these two common features of statistical inference. We’ll discuss what they are and how they differ from each other, as well as provide some examples of real-world scenarios where an understanding of both is important. By the end, you should have a better grasp of the differences between true error and sample error. In case you are a data scientist, you will want to understand the concept behind the true error and sample error. These concepts are key to understand for evaluating a …
Confidence Intervals Formula, Examples

In this post, you will learn about the statistics concepts of confidence intervals in relation to machine learning models with the help of an example and Python code examples. You will learn about how to interpret confidence intervals, what are formulas for confidence intervals with the help of examples. When you get a hypothesis function by training a machine learning classification model, you evaluate the hypothesis/model by calculating the classification error. The classification error is calculated on the sample of the data used for training the model. However, does this classification error for the sample (sample error) also represent (same as) the classification error of the hypothesis/model for the entire …
Logistic Regression Concepts, Python Example

In this blog post, we will discuss the logistic regression machine learning algorithm with a python example. Logistic regression is a type of regression algorithm that is used to predict the probability of occurrence of an event. It is often used in machine learning applications. In this tutorial, we will use python to implement logistic regression for binary classification problems. What is Logistic Regression? Logistic regression is a machine learning algorithm used for classification problems. That is, it can be used to predict whether an instance belongs to one class or the other. For example, it could be used to predict whether a person is male or female, based on …
Overfitting & Underfitting in Machine Learning

The performance of the machine learning models depends upon two key concepts called underfitting and overfitting. In this post, you will learn about some of the key concepts of overfitting and underfitting in relation to machine learning models. In addition, you will also get a chance to test your understanding by attempting the quiz. The quiz will help you prepare well for interview questions in relation to underfitting & overfitting. As data scientists, you must get a good understanding of the overfitting and underfitting concepts. Introduction to Overfitting & Underfitting Assuming an independent and identically distributed (I.I.d) dataset, when the prediction error on both the training and validation dataset is …
Types of Probability Distributions: Codes, Examples

In this post, you will learn the definition of 25 different types of probability distributions. Probability distributions play an important role in statistics and in many other fields, such as economics, engineering, and finance. They are used to model all sorts of real-world phenomena, from the weather to stock market prices. Before we get into understanding different types of probability distributions, let’s understand some fundamentals. If you are a data scientist, you would like to go through these distributions. This page could also be seen as a cheat sheet for probability distributions. What are Probability Distributions? Probability distributions are a way of describing how likely it is for a random …
Cross Entropy Loss Explained with Python Examples

In this post, you will learn the concepts related to the cross-entropy loss function along with Python code examples and which machine learning algorithms use the cross-entropy loss function as an objective function for training the models. Cross-entropy loss is used as a loss function for models which predict the probability value as output (probability distribution as output). Logistic regression is one such algorithm whose output is a probability distribution. You may want to check out the details on how cross-entropy loss is related to information theory and entropy concepts – Information theory & machine learning: Concepts What’s Cross-Entropy Loss? Cross-entropy loss, also known as negative log likelihood loss, is …
AI Product Manager Interview Questions

AI has become such an integral part of our lives that it is important to hire professionals who can help create AI / machine learning products that will be used by many people. These AI product manager interview questions will give you insight into your product manager candidate’s experience, skills, and industry knowledge so that you can get prepared in a better manner before appearing for your next interview as an AI product manager. Check out a detailed interview questions and answers with greater focus on machine learning topics. Before getting into the list of interview questions, lets understand what can be the job description of an AI product manager. …
Instance-based vs Model-based Learning: Differences

Machine learning is a field of artificial intelligence that deals with giving machines the ability to learn without being explicitly programmed. In this context, instance-based learning and model-based learning are two different approaches used to create machine learning models. While both approaches can be effective, they also have distinct differences that must be taken into account when building a machine learning system. Let’s explore the differences between these two types of machine learning. What is instance-based learning & how does it work? Instance-based learning (also known as memory-based learning or lazy learning) involves memorizing training data in order to make predictions about future data points. This approach doesn’t require any …
Data-Driven Decision Making: What, Why & How?

Data-driven decision-making is a data-driven approach to making decisions to achieve desired outcome. More precisely, data-driven decision making is an insights-driven approach to drive decisions and related actions. The data can come from internal and external data sources to avoid data biases. Data-driven decision-makers use data in their decision process to validate existing actions or take new actions (predictive or prescriptive analytics). They make decisions based on the actionable insights generated from the data. The goal is to make informed decisions while ensuring trust & transparency across the stakeholders & organization as a whole. It can be noted that data-driven decision making provides great thrust to digital transformation initiatives. In …
Different types of Clustering in Machine Learning

Clustering is a type of unsupervised machine learning technique that is used to group data points into distinct categories or clusters. It is one of the most widely used techniques in machine learning and can be used for various tasks such as grouping customers by their buying habits, creating groups of similar documents, or finding groups of related genes. In this blog post, we will explore different types / categories of clustering methods and discuss why they are so important in the field of machine learning. Prototype-based Clustering Prototype based clustering represents one of the categories of clustering algorithms that are used to identify groups within a larger dataset. This …
Python Pickle Example: What, Why, How

Have you ever heard of the term “Python Pickle“? If not, don’t feel bad—it can be a confusing concept. However, it is a powerful tool that all data scientists, Python programmers, and web application developers should understand. In this article, we’ll break down what exactly pickling is, why it’s so important, and how to use it in your projects. What is Python Pickle? In its simplest form, pickling is the process of converting any object into a byte stream (a sequence of bytes). This byte stream can then be transmitted over a network or stored in a file for later use. It’s like putting the object into an envelope and …
Feature Importance & Random Forest – Python

In this post, you will learn about how to use Random Forest Classifier (RandomForestClassifier) for determining feature importance using Sklearn Python code example. This will be useful in feature selection by finding most important features when solving classification machine learning problem. It is very important to understand feature importance and feature selection techniques for data scientists to use most appropriate features for training machine learning models. Recall that other feature selection techniques includes L-norm regularization techniques, greedy search algorithms techniques such as sequential backward / sequential forward selection etc. What & Why of Feature Importance? Feature importance is a key concept in machine learning that refers to the relative importance of each feature …
Free Datasets for Machine Learning & Deep Learning

Are you looking for free / popular datasets to use for your machine learning or deep learning project? Look no further! In this blog post, we will provide an overview of some of the best free datasets available for machine learning and deep learning. These datasets can be used to train and evaluate your models, and many of them contain a wealth of valuable information that can be used to address a wide range of real-world problems. So, let’s dive in and take a look at some of the top free datasets for machine learning and deep learning! Here is the list of free data sets for machine learning & …