Author Archives: Ajitesh Kumar

Ajitesh Kumar

I have been recently working in the area of Data analytics including Data Science and Machine Learning / Deep Learning. I am also passionate about different technologies including programming languages such as Java/JEE, Javascript, Python, R, Julia, etc, and technologies such as Blockchain, mobile computing, cloud-native technologies, application security, cloud computing platforms, big data, etc. For latest updates and blogs, follow us on Twitter. I would love to connect with you on Linkedin. Check out my latest book titled as First Principles Thinking: Building winning products using first principles thinking

Pearson Correlation Coefficient & Statistical Significance

pearson correlation coefficient example

In this post, we will discuss what Pearson’s r represents, how it works mathematically, its interpretation, statistical significance, and importance for making decisions in real-world applications such as business forecasting or medical diagnosis. We will also explore some examples of using Pearson’s r with real data sets so you can see how this powerful statistic works in action. As a data scientist, it is very important to understand Pearson’s r and its implications for making decisions based on data. What is Pearson Correlation Coefficient? Pearson correlation coefficient is a statistical measure that describes the linear relationship between two variables. It is typically represented by the symbol ‘r’. Pearson correlation coefficient …

Continue reading

Posted in Data Science, statistics. Tagged with , .

Logistic Regression Concepts, Python Example

logistic regression model 3

In this blog post, we will discuss the logistic regression machine learning algorithm with a python example. Logistic regression is a type of regression algorithm that is used to predict the probability of occurrence of an event. It is often used in machine learning applications. In this tutorial, we will use python to implement logistic regression for binary classification problems.  What is Logistic Regression? Logistic regression is a machine learning algorithm used for classification problems. That is, it can be used to predict whether an instance belongs to one class or the other. For example, it could be used to predict whether a person is male or female, based on …

Continue reading

Posted in Data Science, Machine Learning, Python. Tagged with , , .

One-way ANOVA test: Concepts, Formula & Examples

One way ANOVA table example

The one-way analysis of variance (ANOVA) test is a statistical procedure commonly used to compare the means values on a specific variable between three or more groups. In this blog post, we will discuss the concepts behind the one-way ANOVA test, as well as how to calculate and interpret the results. We will also provide some examples to help illustrate how this test works. What is ANOVA? An ANalysis Of VAriance (ANOVA) test, also known as a one-way ANOVA test, is a hypothesis test used to determine whether there is a significant difference between the mean values of some variable in three or more groups. In other words, it can …

Continue reading

Posted in Data Science, statistics. Tagged with , .

Statistics Terminologies Cheat Sheet & Examples

probability distribution histogram

Have you ever felt overwhelmed by all the statistics terminology out there? From sampling distribution to central limit theorem to null hypothesis to p-values to standard deviation, it can be hard to keep up with all the statistical concepts and how they fit into your research. That’s why we created a Statistics Terminologies Cheat Sheet & Examples – a comprehensive guide to help you better understand the essential terms and their use in data analysis. Our cheat sheet covers topics like descriptive statistics, probability, hypothesis testing, and more. And each definition is accompanied by an example to help illuminate the concept even further. Understanding statistics terminology is critical for data …

Continue reading

Posted in Data Science, statistics. Tagged with , .

Types & Uses of Moments in Statistics

fourth moment kurtosis

In statistics, moments are measures of the shape and variability of a data set. They are used to describe the location and dispersion of the data. There are several types of moments that can be calculated, each providing different information about the data set. Let’s take a look at some of these moments and how they can be used in statistical analysis. What are moments in Statistics and what are their types? In statistics, moments are an important tool used to measure the characteristics of a distribution. Moments can provide useful information about the spread, shape, and center of a distribution.  The following are different types of moments: First moment …

Continue reading

Posted in Data Science, statistics. Tagged with , .

Two independent samples t-tests: Formula & Examples

independent samples t-test representation

In statistics, the independent samples t-test, also known as unpaired two samples t-test, is a type of hypothesis test that can be used to determine whether the means of two independent groups are significantly different given the two samples are independent and have normal distributions. As data scientists, it is important to understand how to use the two sample t-test for independent samples so that you can correctly analyze your data. In this blog post, we will discuss the two samples t-test for independent samples in detail, including the formula and examples. What is independent-samples or unpaired two samples T-test? The independent samples T-test is defined as statistical hypothesis testing …

Continue reading

Posted in Data Science, statistics. Tagged with , .

Chi-square test – Types, Concepts, Examples

Chi-square goodness of fit - Tossing coin

The Chi-square (χ2) test is a statistical test used to determine whether the distribution of observed data is consistent with the distribution of data expected under a particular hypothesis. The Chi-square test can be used to compare two distributions, or to assess the goodness of fit of a given distribution to observed data. In this blog post, we will discuss the types of Chi-square tests, the concepts behind them, and how to perform them using Python / R. As data scientists, it is important to have a strong understanding of the Chi-square test so that we can use it to make informed decisions about our data. We will also provide …

Continue reading

Posted in Data Science, Python, statistics. Tagged with , .

Machine Learning Concepts & Examples

Machine Learning Modeling Workflow

Machine learning is a machine’s ability to learn from data. It has been around for decades, but machine learning is now being applied in nearly every industry and job function. In this blog post, we’ll cover a detailed introduction to what is machine learning (ML) including different definitions. We will also learn about different types of machine learning tasks, algorithms, etc along with real-world examples. What is machine learning & how does it work? Simply speaking, machine learning can be used to model our beliefs about real-world events. For example, let’s say a person came to a doctor with a certain blood report. A doctor based on his belief system …

Continue reading

Posted in Data Science, Deep Learning, Machine Learning. Tagged with , , .

Linear Regression Explained with Real Life Example

Multiple linear regression example

In this post, the linear regression concept in machine learning is explained with multiple real-life examples. Both types of regression models (simple/univariate and multiple/multivariate linear regression) are taken up for sighting examples. In case you are a machine learning or data science beginner, you may find this post helpful enough. You may also want to check a detailed post on what is machine learning – What is Machine Learning? Concepts & Examples. What is Linear Regression? Linear regression is a machine learning concept that is used to build or train the models (mathematical models or equations)  for solving supervised learning problems related to predicting continuous numerical value. Supervised learning problems …

Continue reading

Posted in AI, Data Science, Machine Learning. Tagged with , , , .

Levene Test & Statistics: Concepts & Examples

null and alternate hypothesis for Levene Test

The Levene test is used to test for equality of variance in a dataset. It is used in statistical analysis to determine if two or more samples have similar variances. If the results of the test indicate that the samples do not have similar variances, then it means that one sample has a higher variance than the other and should be treated as an outlier. In this blog post, we’ll take a look at what exactly the Levene test is, how it works, and provide some examples of how it can be applied. As data scientists, it will be important for us to understand the Levene test in order to …

Continue reading

Posted in Data Science, statistics. Tagged with , .

Questions to Ask Before Starting Data Analysis

Questions to ask before starting the data analysis

Data analysis is a crucial part of any business or organization. It helps make decisions and assists in strategy development. But before you can dive into the data, there are several questions that need to be answered first. These questions will help you understand whether you have right kind of data for analysis purpose in addition to defining your goals for data analysis. As data scientists or data analysts, it is your job to ask the right questions. Let’s take a look at some important questions to ask before starting data analysis. Who collected the data? When it comes to data analysis, it is essential to know who collected the …

Continue reading

Posted in Data, Data analytics, Data Science.

Overfitting & Underfitting in Machine Learning

Overfitting and underfitting represented using Model error vs complexity plot

The performance of the machine learning models depends upon two key concepts called underfitting and overfitting. In this post, you will learn about some of the key concepts of overfitting and underfitting in relation to machine learning models. In addition, you will also get a chance to test your understanding by attempting the quiz. The quiz will help you prepare well for interview questions in relation to underfitting & overfitting. As data scientists, you must get a good understanding of the overfitting and underfitting concepts.  Introduction to Overfitting & Underfitting Assuming an independent and identically distributed (I.I.d) dataset, when the prediction error on both the training and validation dataset is …

Continue reading

Posted in Data Science, Interview questions, Machine Learning. Tagged with , , .

Python – Creating Scatter Plot with IRIS Dataset


In this blog post, we will be learning how to create a Scatter Plot with the IRIS dataset using Python. The IRIS dataset is a collection of data that is used to demonstrate the properties of various statistical models. It contains information about 50 observations on four different variables: Petal Length, Petal Width, Sepal Length, and Sepal Width. As data scientists, it is important for us to be able to visualize the data that we are working with. Scatter plots are a great way to do this because they show the relationship between two variables. In this post, we have plotted and explored how how Petal Length and Sepal Length …

Continue reading

Posted in Data Science, Python. Tagged with , , .

Supervised & Unsupervised Learning Difference

Supervised vs Unsupervised Machine Learning Problems

Supervised and unsupervised learning are two different common types of machine learning tasks that are used to solve many different types of business problems. Supervised learning uses training data with labels to create supervised models, which can be used to predict outcomes for future datasets. Unsupervised learning is a type of machine learning task where the training data is not labeled or categorized in any way. For beginner data scientists, it is very important to get a good understanding of the difference between supervised and unsupervised learning. In this post, we will discuss how supervised and unsupervised algorithms work and what is difference between them. You may want to check …

Continue reading

Posted in AI, Data Science, Machine Learning. Tagged with , .

Logit vs Probit Models: Differences, Examples

Logit vs probit models

Logit and probit models are statistical models that are used to model binary or dichotomous dependent variables. This means that the outcome of interest can only take on two possible values. In most cases, these models are used to predict whether or not something will happen. For example, a business might want to know if a particular advertising campaign will lead to an increase in sales. In this blog post, we will explain what logit and probit models are, and we will provide examples of how they can be used. As data scientists, it is important to understand the concepts of logit and probit models and when should they be …

Continue reading

Posted in Data Science, Machine Learning, statistics. Tagged with , .

Categorical Data Visualization: Concepts, Examples

bar chart data visualization for categorical data

Everyone knows that data visualization is one of the most important tools for any data scientist or statistician. It helps us to better understand the relationships between variables and identify patterns in our data. There are specific types of visualization used to represent categorical data. This type of data visualization can be incredibly helpful when it comes to analyzing our data and making predictions about future trends. In this blog, we will dive into what categorical data visualization is, why it’s useful, and some examples of how it can be used. Types of Data Visualizations for Categorical Dataset When it comes to visualizing categorical data sets, there are primarily four …

Continue reading

Posted in Data Science, statistics. Tagged with , .