Category Archives: Machine Learning

MSE vs RMSE vs MAE vs MAPE vs R-Squared: When to Use?

Regression models evaluation metrics MSE RMSE MAE MAPE R-Squared

Last updated: 18th August, 2024 As data scientists, we navigate a sea of metrics to evaluate the performance of our regression models. Understanding these metrics – Mean Squared Error (MSE), Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), and R-Squared – is crucial for robust model evaluation and selection. In this blog, we delve into the intricacies of these different metrics while learning them based on clear definitions, formulas, and guidance on when to use which of these metrics. Different Types of Regression Models Evaluation Metrics The following are different types of regression model evaluation metrics including MSE, RMSE, MAE, MAPE, R-squared, and Adjusted …

Continue reading

Posted in Data Science, Machine Learning, statistics. Tagged with , , .

K-Fold Cross Validation in Machine Learning – Python Example

K-Fold Cross Validation Concepts with Python and Sklearn Code Example

Last updated: 16th Aug, 2024 In this post, you will learn about K-fold Cross-Validation concepts used while training machine learning models with the help of Python code examples. K-fold cross-validation is a data splitting technique that is primarily used for assessing the model accuracy given smaller datasets. This technique can be implemented with k > 1 folds where k is equal number of data splits. K-Fold Cross Validation is also known as k-cross, k-fold cross-validation, k-fold CV, and k-folds. The k-fold cross-validation technique can be implemented easily using Python with scikit learn (Sklearn) package which provides an easy way to implement training of k-fold cross-validation models.  It is important to learn the …

Continue reading

Posted in Data Science, Machine Learning, Python. Tagged with , , , .

Gradient Boosting Machines (GBM): Concepts, Examples

gradient boosting algorithm error vs iterations

Last updated: 16th August, 2024 Gradient Boosting Machines (GBM) Algorithm is considered as one of the most powerful ensemble machine learning algorithms used for both regression and classification problems. This algorithm has been proven to increase the accuracy of predictions and is found to be extremely popular among data scientists. Let’s take a closer look at GBM and explore how it works with an example. What is a Gradient Boosting Machines Algorithm? Gradient boosting algorithm is an ensemble machine learning technique in which an ensemble of weak learners are created. In simpler words, the algorithm combines several smaller, simpler models in order to obtain a more accurate prediction than what …

Continue reading

Posted in Data Science, Machine Learning.

Random Forest Classifier – Sklearn Python Example

random forest classifier machine learning

Last updated: 14th Aug, 2024 A random forest classifier is an ensemble machine learning model which is used for classification problems, and operates by constructing a multitude of decision trees during training, and, predicting the class label (of the data). In general, Random Forest is popular due to its high accuracy, robustness to overfitting, ability to handle large datasets with numerous features, and its effectiveness for both classification and regression tasks. Random Forest and Decision Tree classification algorithms are different, although Random Forest is built upon the concept of Decision Trees. In this post, you will learn about the concepts of random forest classifiers and how to train a Random …

Continue reading

Posted in AI, Data Science, Machine Learning, Python. Tagged with , , .

Decision Tree Regression vs Linear Regression: Differences

Decision tree example

When it comes to building a regression model, one comes across the question such as whether to train the regression model using DecisionTree Regressor algorithm or linear regression algorithm? The following is the key differences you need to know in order to decide which algorithm is the most suitable one, and, why and when one can use one over the other? Linear vs Non-Linear Dataset: Which Algorithm to Use? Linear regression algorithm can be used when there exists linear relationship between the response and predictor variables in the given data set. For two or three dimensional datasets, it is as easy as draw scatter plot and find about the said …

Continue reading

Posted in Data Science, Machine Learning. Tagged with , .

Parametric vs Non-Parametric Models: Differences, Examples

Last updated: 11 Aug, 2024 When working with machine learning models, data scientists often come across a fundamental question: What sets parametric and non-parametric models apart?  What are the key differences between these two different classes of models? What needs to be done when working on these models? This is also one of the most frequent questions asked in the interviews. Machine learning models can be parametric or non-parametric. Parametric models are those that require the specification of some parameters before they can be used to make predictions, while non-parametric models do not rely on any specific parameter settings and therefore often produce more accurate results. These two distinct approaches …

Continue reading

Posted in Data Science, Machine Learning. Tagged with , .

How to know if Linear Regression Model is Appropriate?

How to determine whether linear regression model is appropriate or valid

If you want to build a model for predicting a numerical value and wondering whether the linear regression model is most appropriate or valid, then creating the regression scatter plot is the most efficient way. And, this works best if the data set is two or three-dimensional. If a dataset is two-dimensional, it’s simple enough to plot the data to determine its shape. You can plot three-dimensional data too. The objective is to find whether the data set is relatively linear. When the plot is created, the data points fall roughly along a straight line as shown below. Whether Linear Regression Appropriate for High-dimension Datasets? The challenge comes when we …

Continue reading

Posted in Data Science, Machine Learning. Tagged with , .

Lasso Regression in Machine Learning: Python Example

Last updated: 10th Aug, 2024 Lasso regression, sometimes referred to as L1 regularization, is a technique in linear regression that incorporates regularization to curb overfitting and enhance the performance of machine learning models. It works by adding a penalty term to the cost function that encourages the model to select only the most important features and set the coefficients of less important features to zero. This makes Lasso regression a popular method for feature selection and high-dimensional data analysis. In this post, you will learn concepts, formulas, advantages, and limitations of Lasso regression along with Python Sklearn examples. The other two similar forms of regularized linear regression are Ridge regression and …

Continue reading

Posted in Data Science, Machine Learning, Python. Tagged with , , , .

Completion Model vs Chat Model: Python Examples

Completion model vs chat model

In this blog, we will learn about the concepts of completion and chat large language models (LLMs) with the help of Python examples. What’s the Completion Model in LLM? A completion model is a type of LLM that takes a text input and generates a text output, which is called a completion. In other words, a completion model is a type of LLM that generates text that continues from a given prompt or partial input. When provided with an initial piece of text, the model uses its trained knowledge to predict and generate the most likely subsequent text. A completion model can generate summaries, translations, stories, code, lyrics, etc depending on …

Continue reading

Posted in Generative AI, Large Language Models, Machine Learning. Tagged with , , , .

Python Pickle Security Issues / Risk

Python Pickle Security Issue Risk Vulnerabilities

Suppose your machine learning model is serialized as a Python pickle file and later loaded for making predictions. In that case, you need to be aware of security risks/issues associated with loading the Python Pickle file. Security Issue related to Python Pickle The Python pickle module is a powerful tool for serializing and deserializing Python object structures. However, its very power is also what makes it a potential security risk. When data is “pickled,” it is converted into a byte stream that can be written to a file or transmitted over a network. “Unpickling” this data reconstructs the original object in memory. The danger lies in the fact that unpickling …

Continue reading

Posted in Application Security, Machine Learning, Python. Tagged with , .

Pricing Analytics in Banking: Strategies, Examples

pricing analytics in banking examples

Last updated: 15th May, 2024 Have you ever wondered how your bank decides what to charge you for its services? Or, perhaps how do banks arrive at the pricing (fees, rates, and charges) associated with various banking products? If you’re a product manager, data analyst, or data scientist in the banking industry, you might be aware that these pricing decisions are far from arbitrary. Rather, these pricing decisions are made based on one or more frameworks while leveraging data analytics. They result from intricate pricing strategies, driven by an extensive array of data and sophisticated analytics. In this blog, we will learn about some popular pricing strategies banks execute to …

Continue reading

Posted in Banking, Data Science, Finance, Machine Learning. Tagged with , .

Machine Learning Lifecycle: Data to Deployment Example

Machine Learning Lifecycle

Last updated: 12th May 2024 In this blog, we get an overview of the machine learning lifecycle, from initial data handling to the deployment and iterative improvement of ML models. You might want to check out this book for greater insights into machine learning (ML) concepts – Machine Learning Interviews. The following is the diagram representing the machine learning lifecycle while showcasing three key stages such as preparing data, ML development, and ML deployment. These three stages are explained later in this blog. Stage A: Preparing Data Preparing data for training machine learning models involves collecting data, constructing data pipelines for preprocessing, and refining the data to prepare it for …

Continue reading

Posted in Data, Data Science, Machine Learning, MLOps. Tagged with , , .

Autoencoder vs Variational Autoencoder (VAE): Differences, Example

autoencoder vs variational autoencoder - point vs distribution

Last updated: 12th May, 2024 In the world of generative AI models, autoencoders (AE) and variational autoencoders (VAEs) have emerged as powerful unsupervised learning techniques for data representation, compression, and generation. While they share some similarities, these algorithms have unique properties and applications that distinguish them. This blog post aims to help machine learning / deep learning enthusiasts understand these two methods, their key differences, and how they can be utilized in various data-driven tasks. We will learn about autoencoders and VAEs, understanding their core components, working mechanisms, and common use cases. We will also try and understand their differences in terms of architecture, objectives, and outcomes. What are Autoencoders? …

Continue reading

Posted in Deep Learning, Generative AI, Machine Learning. Tagged with , , .

Feature Engineering in Machine Learning: Python Examples

feature engineering in machine learning

Last updated: 3rd May, 2024 Have you ever wondered why some machine learning models perform exceptionally well while others don’t? Could the magic ingredient be something other than the algorithm itself? The answer is often “Yes,” and the magic ingredient is feature engineering. Good feature engineering can make or break a model. In this blog, we will demystify various techniques for feature engineering, including feature extraction, interaction features, encoding categorical variables, feature scaling, and feature selection. To demonstrate these methods, we’ll use a real-world dataset containing car sales data. This dataset includes a variety of features such as ‘Company Name’, ‘Model Name’, ‘Price’, ‘Model Year’, ‘Mileage’, and more. Through this …

Continue reading

Posted in Machine Learning, Python. Tagged with , .

Feature Selection vs Feature Extraction: Machine Learning

Feature extraction vs feature selection

Last updated: 2nd May, 2024 The success of machine learning models often depends on the quality of the features used to train them. This is where the concepts of feature extraction and feature selection come in. In this blog post, we’ll explore the difference between feature selection and feature extraction, two key techniques used as part of feature engineering in machine learning to optimize feature sets for better model performance. Both feature selection and feature extraction are used for dimensionality reduction which is key to reducing model complexity given that higher model complexity often results in overfitting. We’ll provide examples of how they can be applied in real-world scenarios. If …

Continue reading

Posted in Data Science, Machine Learning. Tagged with , .

Model Selection by Evaluating Bias & Variance: Example

Model selection - Bias vs Variance Example

When working on a machine learning project, one of the key challenges faced by data scientists/machine learning engineers is to select the most appropriate model that generalizes well to unseen datasets. To achieve the best generalization on unseen data, the model’s bias and variance need to be balanced. In this post, we’ll explore how to visualize and interpret the trade-off between bias and variance using a residual error vs. model complexity plot. We’ll use a specific plot to guide our discussion. The following is the residual error vs model complexity plot that would need to be drawn for evaluating the model bias vs variance for model selection. We will learn …

Continue reading

Posted in Machine Learning. Tagged with .