Author Archives: Ajitesh Kumar

Ajitesh Kumar

I have been recently working in the area of Data analytics including Data Science and Machine Learning / Deep Learning. I am also passionate about different technologies including programming languages such as Java/JEE, Javascript, Python, R, Julia, etc, and technologies such as Blockchain, mobile computing, cloud-native technologies, application security, cloud computing platforms, big data, etc. I would love to connect with you on Linkedin. Check out my latest book titled as First Principles Thinking: Building winning products using first principles thinking.

Decoding Bagging in Random Forest: Examples

Bagging in the Random Forest algorithm

This blog provides an overview of how bagging, or bootstrap aggregating, improves the effectiveness of Random Forest machine learning models. You will learn about the process of creating multiple data subsets through bootstrap sampling, building individual decision trees for each subset, and how this diversity among trees reduces overfitting, leading to more accurate and robust random forest models. The post also explains how the aggregation of predictions from these trees ensures a balanced and less biased overall model. You also get to learn based on Python code example. What is Bagging? Before we delve into Random Forest, it’s crucial to understand the concept of bagging. Bagging is a general ensemble …

Continue reading

Posted in Machine Learning, Python. Tagged with , .

Logistic Regression Customer Churn Prediction: Example

Customer churn prediction using a logistic regression model

In today’s fast-paced and highly competitive business world, spanning across industries like telecommunications, finance, e-commerce, and more, the ability to predict and understand customer churn has emerged as a critical component of strategic business management. Whether it’s a telecom giant grappling with subscriber turnover, a fintech company aiming to retain its user base, or an e-commerce platform trying to reduce shopping cart abandonment, the implications of churn are vast and deeply impactful. This is where the role of logistic regression, a potent and versatile statistical method, comes into play. This blog delves into different aspects of training a logistic regression machine learning model for churn prediction, highlighting its universality and …

Continue reading

Posted in Data Science, Machine Learning, Python. Tagged with , , .

GLM vs Linear Regression: Difference, Examples

Differences between GLM and Linear Regression

Linear Regression and Generalized Linear Models (GLM) are both statistical methods used for understanding the relationship between variables. Understanding the difference between GLM and Linear Regression is essential for accurate model selection, tailored to data types and research questions. It’s crucial for predicting diverse outcomes, ensuring valid statistical inference, and is vital in interdisciplinary research. In this blog, we will learn about the differences between Linear Regression and GLM by delving into their distinct characteristics, suitable applications, and the importance of choosing the right model based on data type and research objective. What’s the difference between GLM & Linear Regression? Linear Regression and Generalized Linear Models (GLM) are two closely …

Continue reading

Posted in Data Science, Machine Learning, Python, statistics. Tagged with , , , .

MinMaxScaler vs StandardScaler – Python Examples

MinMaxScaler vs StandardScaler

Last updated: 7th Dec, 2023 Feature scaling is an essential part of exploratory data analysis (EDA), when working with machine learning models. Feature scaling helps to standardize the range of features and ensure that each feature (continuous variable) contributes equally to the analysis. Two popular feature scaling techniques used in Python are MinMaxScaler and StandardScaler. In this blog, we will learn about the concepts and differences between these feature scaling techniques with the help of Python code examples, highlight their advantages and disadvantages, and provide guidance on when to use MinMaxScaler vs StandardScaler. Note that these are classes provided by sklearn.preprocessing module. As a data scientist, you will need to …

Continue reading

Posted in Data Science, Machine Learning, Python. Tagged with , , .

Using GridSearchCV with Logistic Regression Models: Examples

GridSearchCV in machine learning with Logistic Regression

GridSearchCV method is a one of the popular technique for optimizing logistic regression models, automating the search for the best hyperparameters like regularization strength and type. It enhances model performance by incorporating cross-validation, ensuring robustness and generalizability to new data. This method saves time and ensures objective model selection, making it an essential technique in various domains where logistic regression is applied. Its integration with the scikit-learn library (sklearn.model_selection.GridSearchCV) simplifies its use in existing data pipelines, making it a valuable asset for both novice and experienced machine learning practitioners. How is GridSearchCV used with Logistic Regression? GridSearchCV is a technique used in machine learning for hyperparameter tuning. It is a …

Continue reading

Posted in Data Science, Machine Learning, Python, statistics. Tagged with , , , .

Handling Class Imbalance in Machine Learning: Python Example

Techniques for Handling Class Imbalance Class imbalance may not always impact performance, and using imbalance-specific methods can sometimes worsen results. Xu-Ying Liu, Jianxin Wu, and Zhi-Hua Zhou, Exploratory Undersampling for Class-Imbalance Learning Above said, there are different techniques such as the following for handling class imbalance when training machine learning models with datasets having imbalanced classes.  Python packages such as Imbalanced Learn can be used to apply techniques related to under-sampling majority classes, upsampling minority classes, and SMOTE.  In this post, techniques related to using class weight will be used for tackling class imbalance. How to create a Sample Dataset having Class Imbalance? In this section, you will learn about how to create an …

Continue reading

Posted in Data Science, Machine Learning, Python. Tagged with , , , .

Handling Class Imbalance using Sklearn Resample

Last updated: 5th Dec, 2023 The class imbalance problem in machine learning occurs when the classes in a dataset are not represented equally, leading to a significant difference in the number of instances for different classes. This imbalance can cause a classification model to be biased towards the majority class, resulting in poor performance on the minority class. Thus, the class imbalance hinders data scientists by challenging the development of accurate and fair models, as the skewed distribution can lead to misleading training predictions / outcomes and reduced effectiveness in real-world applications where minority classes are critical. In this post, you will learn about how to tackle class imbalance issue …

Continue reading

Posted in Data Science, Machine Learning, Python. Tagged with , , , .

Ordinary Least Squares Method: Concepts & Examples

ordinary least squares method

Last updated: 5th Dec, 2023 Regression analysis is a fundamental statistical technique used in many fields, from finance, econometrics to social sciences. It involves creating a regression model for modeling the relationship between a dependent variable and one or more independent variables. The Ordinary Least Squares (OLS) method helps estimate the parameters of this regression model. Ordinary least squares (OLS) is a technique used in linear regression model to find the best-fitting line for a set of data points by minimizing the residuals (the differences between the observed and predicted values). It does so by estimating the coefficients of the linear regression model by minimizing the sum of the squared …

Continue reading

Posted in Data Science, Machine Learning. Tagged with , .

Linear Regression vs Correlation: Examples

Correlation Heatmap

Linear regression and correlation are fundamental concepts in statistics, often used in data analysis to understand the relationship between two variables. Linear regression and correlation, while related, are not the same. They serve different purposes and provide different types of information. In this blog, we will explore each concept with examples to clarify their differences and applications. Linear Regression vs Correlation: Definition Linear Regression is a statistical method used for modeling the relationship between a dependent variable and one or more independent variables. The core idea is to find a linear equation that best describes this relationship, enabling the prediction of the dependent variable based on the values of the …

Continue reading

Posted in Data Science, statistics. Tagged with , .

Different Types of CNN Architectures Explained: Examples

VGG16 CNN Architecture

Last updated: 4th Dec, 2023. In the fast-paced world of computer vision and image processing, the problem of image classification consistently stands out: the ability to effectively recognize and classify images. As we continue to digitize and automate our world, the demand for systems that can understand and interpret visual data is growing at an unprecedented rate. The challenge is not just about recognizing images – it’s about doing so accurately and efficiently. Traditional machine learning methods often fall short, struggling to handle the complexity and high dimensionality of image data. This is where Convolutional Neural Networks (CNNs) comes to rescue. And, there are different types of CNN architectures based …

Continue reading

Posted in Deep Learning, Machine Learning. Tagged with , .

MongoDB – Commands to Check the Status of MongoDB Database

This article represents different commands which can be used to check the status of MongoDB database on Linux/Ubuntu. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos. MongoDB Status Check Commands The following represents some of the commands that can be used to check the status of MongoDB database. Note that mongod represents the daemon process of MongDB databass and, primarily, used to manage database access. It is recommended to check the log file (/var/log/mongo/mongo.log) to get details. Following are some of the commands which can be used to get the status of Mongodb: service mongod status: Displays the status …

Continue reading

Posted in NoSQL. Tagged with , .

Logit vs Probit Models: Differences, Examples

Logit vs probit models

Logit and Probit models are both types of regression models commonly used in statistical analysis, particularly in the field of binary classification. This means that the outcome of interest can only take on two possible values / classes. In most cases, these models are used to predict whether or not something will happen in form of binary outcome. For example, a bank might want to know if a particular borrower might default on loan or otherwise. In this blog post, we will explain what logit and probit models are, and we will provide examples of how they can be used. As data scientists, it is important to understand the concepts …

Continue reading

Posted in Data Science, Machine Learning, statistics. Tagged with , .

Linear Regression Cost Function: Python Example

Cost function in linear regression

Linear regression is a foundational algorithm in machine learning and statistics, used for predicting numerical values based on input data. Understanding the cost function in linear regression is crucial for grasping how these models are trained and optimized. In this blog, we will understand different aspects of cost function used in linear regression including how it does help in building a regression model having high performance. What is a Cost Function in Linear Regression? In linear regression, the cost function quantifies the error between predicted values and actual data points. It is a measure of how far off a linear model’s predictions are from the actual values. The most commonly …

Continue reading

Posted in Data Science, Machine Learning, Python. Tagged with , , .

KNN vs Logistic Regression: Differences, Examples

Difference between K-Nearest Neighbors (KNN) and Logistic Regression algorithms

In this blog, we will learn about the differences between K-Nearest Neighbors (KNN) and Logistic Regression, two pivotal algorithms in machine learning, with the help of examples. The goal is to understand the intricacies of KNN’s instance-based learning and Logistic Regression‘s probability modeling for binary and multinomial outcomes, offering clarity on their core principles. We will also navigate through the practical applications of K-NN and logistic regression algorithms, showcasing real-world examples in various business domains like healthcare and finance. Accompanying this, we’ll provide concise Python code samples, guiding you through implementing these algorithms with datasets. This dual focus on theory and practicality aims to equip you with both the understanding …

Continue reading

Posted in Data Science, Machine Learning, Python. Tagged with , , .

6 Types of Brainstorming Techniques for Ideas Generation

Mind mapping brainstorming ideas

Last updated: 1st Dec, 2023 Generating innovative and creative ideas is a key component of success in many fields, from business and marketing to science, technology, and the arts. However, the process of coming up with new and unique ideas can be challenging, especially when faced with deadlines, limited resources, or creative blocks. This is where brainstorming or mindstorming comes into picture. Fortunately, there are several different types of brainstorming techniques that can help individuals and teams generate great ideas and innovate. While brainstorming is one of the most effective techniques out there, not all brainstorming sessions are created equal. The question that is frequently asked is how to brainstorm for effective …

Continue reading

Posted in News. Tagged with .

Python – How to Create Scatter Plot with IRIS Dataset

scatter-plot-with-IRIS-dataset-using-Python

Last updated: 1st Dec, 2023 In this blog post, we will be learning how to create a Scatter Plot with the IRIS dataset using Python. The IRIS dataset is a collection of data that is used to demonstrate the properties of various statistical models. It contains information about 50 observations on four different variables: Petal Length, Petal Width, Sepal Length, and Sepal Width. As data scientists, it is important for us to be able to visualize the data that we are working with. Scatter plots are a great way to do this because they show the relationship between two variables. In this post, we learn how to plot IRIS dataset …

Continue reading

Posted in Data Science, Python. Tagged with , , .