# Author Archives: Ajitesh Kumar ## Ajitesh Kumar

I have been recently working in the area of Data analytics including Data Science and Machine Learning / Deep Learning. I am also passionate about different technologies including programming languages such as Java/JEE, Javascript, Python, R, Julia, etc, and technologies such as Blockchain, mobile computing, cloud-native technologies, application security, cloud computing platforms, big data, etc. For latest updates and blogs, follow us on Twitter. I would love to connect with you on Linkedin. Check out my latest book titled as First Principles Thinking: Building winning products using first principles thinking

## K-Nearest Neighbors (KNN) Python Examples If you’re working with data analytics projects including building machine learning (ML) models, you’ve probably heard of the K-nearest neighbors (KNN) algorithm. But what is it, exactly? And more importantly, how can you use it in your own AI / ML projects? In this post, we’ll take a closer look at the KNN algorithm and walk through a simple Python example. You will learn about the K-nearest neighbors algorithm with Python Sklearn examples. K-nearest neighbors algorithm is used for solving both classification and regression machine learning problems. Stay tuned!  Introduction to K-Nearest Neighbors (K-NN) Algorithm K-nearest neighbors is a supervised machine learning algorithm for classification and regression. In both cases, the input consists …

Posted in Data Science, Machine Learning, Python. Tagged with , , , .

## Wilcoxon Rank Sum Test: Concepts, Examples The Wilcoxon rank sum test is a statistical test used to compare two sets of data. This test is also known as the Mann-Whitney U test. It is a non-parametric statistical hypothesis test used to compare two samples. It is similar to the Student’s t-test, but does not require the assumption of normality. The test is appropriate for use with small sample sizes.  What is Wilcoxon Rank Sum Test? The Wilcoxon rank sum test is a statistical test used to compare two independent samples. The test is used to compare the medians (location of medians) in the two samples. The null hypothesis is that the location of medians in two …

Posted in Data Science, statistics. Tagged with , .

## Different Types of Statistical Tests: Concepts Statistical tests are an important part of data analysis. They help us understand the data and make inferences about the population. They are used to examine relationships between variables and test hypotheses. They are a way of analyzing data to see if there is a significant difference between the two groups. In statistics, there are two main types of tests: parametric and non-parametric. Both types of tests are used to make inferences about a population based on a sample. The difference between the two types of tests lies in the assumptions that they make about the data. Parametric tests make certain assumptions about the data, while non-parametric tests do not make …

Posted in Data Science, statistics. Tagged with , .

## Role of Data in Digital Transformation In order to understand the role of data in digital transformation, it is important to first understand what digital transformation is. Digital transformation is the process of using digital technologies to create new or improved business processes, products, or services. This can be done through the use of big data, cloud computing, mobile technologies, and the Internet of Things (IoT). Data is a key enabler of digital transformation. It helps organizations to identify new opportunities, make better decisions, and improve operational efficiency. Big data, in particular, is playing an increasingly important role in digital transformation initiatives. Big data refers to large volumes of data that can be structured, unstructured, or …

Posted in Data, digital transformation. Tagged with , .

## How to Identify Use Cases for AI / Machine Learning As artificial intelligence (AI ) and machine learning (ML) solutions and technologies continue to evolve, more and more businesses are looking for ways to incorporate them into their operations to realize a greater business impact. But with so many potential applications, it can be difficult to know where to start. In this blog post, we’ll outline some tips for identifying AI / ML use cases. We’ll also provide a few examples of how AI & machine learning can be used in business settings. So if you’re thinking about adding AI or machine learning to your toolkit, read on! This blog post will be appropriate for product managers, business analysts, data science …

Posted in AI, Data analytics, Machine Learning, Product Management. Tagged with , , .

## Predicting Customer Churn with Machine Learning Customer churn, also known as customer attrition, is a major problem for businesses that rely on recurring revenue. Customer churn costs businesses billions of dollars every year, and it’s only getting worse as customers become more and more fickle. In fact, it’s been estimated that the average company loses 10-15% of its customers each year. That number may seem small, but it can have a huge impact on a company’s bottom line. Fortunately, there’s a way to combat churn: by using machine learning to predict which customers are likely to churn. In this blog post, we’ll discuss how customer churn prediction works and why it’s so important. We’ll also provide …

Posted in AI, Data Science, Machine Learning. Tagged with , , .

## Stacking Classifier Sklearn Python Example In this blog post, we will be going over a very simple example of how to train a stacking classifier machine learning model in Python using the Sklearn library and learn the concepts of stacking classifier. A stacking classifier is an ensemble learning method that combines multiple classification models to create one “super” model. This can often lead to improved performance, since the combined model can learn from the strengths of each individual model. What are Stacking Classifiers? Stacking is a machine learning ensemble technique that combines multiple models to form a single powerful model. The individual models are trained on different subsets of the data using some type of …

Posted in Data Science, Machine Learning, Python. Tagged with , , .

## Decision Tree Hyperparameter Tuning Grid Search Example The output prints out grid search across different values of hyperparameters, the model score with best hyperparameters and the most optimal hyperparameters value. In the above code, the decision tree model is train and evaluate our for each value combination and choose the combination that results in the best performance. In this case, “best performance” could be defined as either accuracy or AUC (area under the curve). Once we’ve found the best performing combination of hyperparameters, we can then train our final model using those values and deploy it to production. Conclusion In this blog post, we explored how to use grid search to tune the hyperparameters of a Decision …

Posted in Data Science, Machine Learning, Python. Tagged with , , .

## Reinforcement Learning Real-world examples In this blog post, we’ll learn about some real-world / real-life examples of Reinforcement learning, one of the different approaches to machine learning where other approaches are supervised and unsupervised learning. Reinforcement learning is a type of machine learning that enables a computer system to learn how to make choices by being rewarded for its successes. This can be an extremely powerful tool for optimization and decision-making. It’s one of the most popular machine learning methods used today. Before looking into the real-world examples of Reinforcement learning, let’s quickly understand what is reinforcement learning. Introduction to Reinforcement Learning (RL) Reinforcement learning is an approach to machine learning in which the agents …

Posted in Data Science, Machine Learning. Tagged with , .

## Passive Aggressive Classifier: Concepts & Examples The passive aggressive classifier is a machine learning algorithm that is used for classification tasks. This algorithm is a modification of the standard Perceptron algorithm. The passive aggressive classifier was first proposed in 2006 by Crammer et al. as a way to improve the performance of the Perceptron algorithm on linearly separable data sets. In this blog, we will learn about the basic concepts and principles behind the passive aggressive classifier, as well as some examples of its use in real-world applications. What is the passive aggressive classifier and how does it work? The passive aggressive classifier algorithm falls under the category of online learning algorithms, can handle large datasets, …

Posted in Data Science. Tagged with , , .

## Generalized Linear Models Explained with Examples Generalized linear models (GLMs) are a powerful tool for data scientists, providing a flexible way to model data. In this post, you will learn about the concepts of generalized linear models (GLM) with the help of Python examples.  It is very important for data scientists to understand the concepts of generalized linear models and how are they different from general linear models such as regression or ANOVA models.  What are Generalized Linear Models? Generalized linear models (GLM) are a type of statistical models that can be used to model data that is not normally distributed. It is a flexible general framework that can be used to build many types of regression models, including …

Posted in Data Science, Machine Learning, Python. Tagged with , .

## Pandas Dataframe: How to add Rows & Columns Adding rows and columns in Pandas Dataframe is a very easy process. While working on a data project using Python programming, there are several scenarios when you’ll need to add new rows and columns to your Dataframe. In this article, we will show you how to do it. As data scientists or data analysts, you must get a good understanding of how to add Dataframe rows and columns. In this post, we will work with the following Pandas data frame. How to add a row in Dataframe There are multiple ways of adding rows to Dataframe. You can use Dataframe.loc or Dataframe.append method to add a row at the end …

Posted in Data Science, Python. Tagged with , .

## Generate Random Numbers & Normal Distribution Plots In this blog post, we’ll be discussing how to generate random numbers samples from normal distribution and create normal distribution plots in Python. We’ll go over the different techniques for random number generation from normal distribution available in the Python standard library such as SciPy, Numpy and Matplotlib. We’ll also create normal distribution plots from these numbers generated. Generate random numbers using Numpy random.randn Numpy is a Python library that contains built-in functions for generating random numbers. The numpy.random.randn function generates random numbers from a normal distribution. This function takes size N as in number of numbers to be generated as an input and returns an array of N random …

Posted in Data Science, Python, statistics. Tagged with , , .

## Pandas: Creating Multiindex Dataframe from Product or Tuples MultiIndex is a powerful tool that enables us to work with higher dimensional data, but it can be tricky to create MultiIndex Dataframes using the from_tuples and from_product function in Pandas. In this blog post, we will be discussing how to create a MultiIndex dataframe using MultiIndex from_tuples and from_product function in Pandas.  What is a MultiIndex? MultiIndex is an advanced Pandas function that allows users to create MultiIndexed DataFrames – i.e., dataframes with multiple levels of indexing. MultiIndex can be useful when you have data that can be naturally grouped by more than one category. For example, you might have data on individual employees that can be grouped by …

Posted in Data Science, Python. Tagged with , .

## Top Python Statistical Analysis Packages As a data scientist, you know that one of the most important aspects of your job is statistical analysis. After all, without accurate data, it would be impossible to make sound decisions about your company’s direction. Thankfully, there are a number of excellent Python statistical analysis packages available that can make your job much easier. In this blog post, we’ll take a look at some of the most popular ones. SciPy SciPy is a Python-based ecosystem of open-source software for mathematics, science, and engineering. SciPy contains modules for statistics, optimization, linear algebra, integration, interpolation, special functions, Fourier transforms (FFT), signal and image processing, and other tasks common in science and … 