# Category Archives: statistics

## Normal Distributions Questions and Answers for Interviews

In order to be successful in normal distribution interviews, you need a solid understanding of the normal distribution. This blog post will focus on normal distribution questions and answers that are commonly asked in the data science and statistics interviews. Before jumping into questions and answers, lets quickly understand what normal distribution is. What is normal distribution? A normal distribution is a symmetric, bell-shaped curve that describes the distribution of many types of data. The normal distribution has two parameters, mean and standard deviation. It is important to know these two parameters because they are used to calculate probabilities associated with the normal distribution. The normal curve describes how data …

## Level of Significance & Hypothesis Testing

In hypothesis testing, the level of significance is a measure of how confident you can be about rejecting the null hypothesis. This blog post will explore what hypothesis testing is and why understanding significance levels are important for your data science projects. In addition, you will also get to test your knowledge of level of significance towards the end of the blog with the help of quiz. These questions can help you test your understanding and prepare for data science / statistics interviews. Before we look into what level of significance is, let’s quickly understand what is hypothesis testing. What is Hypothesis testing and how is it related to significance …

## P-Value & Hypothesis Testing: Examples

Many describe p-value as the probability that the null hypothesis holds good. That is an incorrect definition. The concept of p-value is understood differently by different people and is considered as one of the most used & abused concepts in statistics, mostly in relation to hypothesis testing. In this blog post, you will learn the P-VALUE concepts with multiple different examples. It is extremely important to get a good understanding of P-value if you are starting to learn data science/machine learning as the concepts of P-value are key to hypothesis testing. Before getting into the description of p-value, let’s quickly go through the hypothesis testing concepts to get a good …

## Type I & Type II Errors in Hypothesis Testing: Examples

This article describes Type I and Type II errors made due to incorrect evaluation of the outcome of hypothesis testing, based on a couple of examples such as the person comitting a crime, the house on fire, and Covid-19. You may want to note that it is key to understand type I and type II errors as these concepts will show up when we are evaluating a hypothesis such as those related to machine learning algorithms (linear regression, logistic regression, etc). For example, in the case of linear regression models, the significance value is compared with the p-value and, the null hypothesis that the parameter/coefficient is equal to zero is …

## Survival Analysis Modeling for Customer Churn

Customer churn is a prevalent problem for many businesses. It can happen in several different ways, such as when customers stop using the product, or when they leave because of an issue with customer service. This blog post will explore survival analysis modeling and what it can do to help you better understand customer churn problems. First, we will discuss survival analysis itself and why it is beneficial for analyzing customer behavior. Then we will show some examples on how survival analysis has been used to analyze customer churn problems. As data scientists, it will be good to familiarize ourselves with survival analysis, as it is a popular modeling technique …

## Poisson Distribution Explained with Python Examples

Poisson distribution is a probability distribution that can be used to model the number of events in a fixed interval. It is often referred to as “random poisson process” or “poisson process”. The poisson distribution describes how many occurrences of an event occur within a given time frame, for example, how many customers visit your store or restaurant every hour. In this post, you will learn about the concepts of Poisson probability distribution with Python examples. As a data scientist, you must get a good understanding of the concepts of probability distributions including normal, binomial, Poisson etc. What is Poisson distribution? Poisson distribution is the discrete probability distribution which represents the …

## Negative Binomial Distribution Python Examples

In this post, you will learn about the concepts of negative binomial distribution explained using real-world examples and Python code. We will go over some of the following topics to understand negative binomial distribution: What is negative binomial distribution? What is difference between binomial and negative binomial distribution? Negative binomial distribution real-world examples Negative binomial distribution Python example What is Negative Binomial Distribution? Negative binomial distribution is a discrete probability distribution representing the probability of random variable, X, which is number of Bernoulli trials required to have r number of successes. This random variable is called as negative binomial random variable. And, the experiment representing X number of Bernoulli trials required to product r successes is called …

## Geometric Distribution Explained with Python Examples

In this post, you will learn about the concepts of Geometric probability distribution with the help of real-world examples and Python code examples. It is of utmost importance for data scientists to understand and get an intuition of different kinds of probability distribution including geometric distribution. You may want to check out some of my following posts on other probability distribution. Normal distribution explained with Python examples Binomial distribution explained with 10+ examples Hypergeometric distribution explained with 10+ examples In this post, the following topics have been covered: Geometric probability distribution concepts Geometric distribution python examples Geometric distribution real-world examples Geometric Probability Distribution Concepts Geometric probability distribution is a discrete …

## Python – How to Add Trend Line to Line Chart / Graph

In this plot, you will learn about how to add trend line to the line chart / line graph using Python Matplotlib.As a data scientist, it proves to be helpful to learn the concepts and related Python code which can be used to draw or add the trend line to the line charts as it helps understand the trend and make decisions. In this post, we will consider an example of IPL average batting scores of Virat Kohli, Chris Gayle, MS Dhoni and Rohit Sharma of last 10 years, and, assess the trend related to their overall performance using trend lines. Let’s say that main reason why we want to …

## Beta Distribution Explained with Python Examples

In this post, you will learn about Beta probability distribution with the help of Python examples. As a data scientist, it is very important to understand beta distribution as it is used very commonly as prior in Bayesian modeling. In this post, the following topics get covered: Beta distribution intuition and examples Introduction to beta distribution Beta distribution python examples Beta Distribution Intuition & Examples Beta distribution is widely used to model the prior beliefs or probability distribution in real world applications. Here is a great article on understanding beta distribution with an example of baseball game. You may want to pay attention to the fact that even if the baseball …

## Bernoulli Distribution Explained with Python Examples

In this post, you will learn about the concepts of Bernoulli Distribution along with real-world examples and Python code samples. As a data scientist, it is very important to understand statistical concepts around various different probability distributions to understand the data distribution in a better manner. In this post, the following topics will get covered: Introduction to Bernoulli distribution Bernoulli distribution real-world examples Bernoulli distribution python code examples Introduction to Bernoulli Distribution Bernoulli distribution is a discrete probability distribution representing the discrete probabilities of a random variable which can take only one of the two possible values such as 1 or 0, yes or no, true or false etc. The probability of …

## Bayes Theorem Explained with Examples

In this post, you will learn about Bayes’ Theorem with the help of examples. It is of utmost importance to get a good understanding of Bayes Theorem in order to create probabilistic models. Bayes’ theorem is alternatively called as Bayes’ rule or Bayes’ law. One of the many applications of Bayes’s theorem is Bayesian inference which is one of the approaches of statistical inference (other being Frequentist inference), and fundamental to Bayesian statistics. In this post, you will learn about the following: Introduction to Bayes’ Theorem Bayes’ theorem real-world examples Introduction to Bayes’ Theorem In simple words, Bayes Theorem is used to determine the probability of a hypothesis in the presence of more evidence or information. In other …

## Joint & Conditional Probability Explained with Examples

In this post, you will learn about joint and conditional probability differences and examples. When starting with Bayesian analytics, it is very important to have a good understanding around probability concepts. And, the probability concepts such as joint and conditional probability is fundamental to probability and key to Bayesian modeling in machine learning. As a data scientist, you must get a good understanding of probability related concepts. Joint & Conditional Probability Concepts In this section, you will learn about basic concepts in relation to Joint and conditional probability. Probability of an event can be quantified as a function of uncertainty of whether that event will occur or not. Let’s say an event A is …

## What, When & How of Scatterplot Matrix in Python

In this post, you will learn about some of the following in relation to scatterplot matrix. Note that scatter plot matrix can also be termed as pairplot. Later in this post, you would find Python code example in relation to using scatterplot matrix / pairplot (seaborn package). What is scatterplot matrix? When to use scatterplot matrix / pairplot? How to use scatterplot matrix in Python? What is Scatterplot Matrix? Scatter plot matrix is a matrix (or grid) of scatter plots where each scatter plot in the grid is created between different combinations of variables. In other words, scatter plot matrix represents bi-variate or pairwise relationship between different combinations of variables …

## Hypergeometric Distribution Explained with 10+ Examples

In this post, we will learn Hypergeometric distribution with 10+ examples. The following topics will be covered in this post: What is Hypergeometric Distribution? 10+ Examples of Hypergeometric Distribution If you are an aspiring data scientist looking forward to learning/understand the binomial distribution in a better manner, this post might be very helpful. The Binomial distribution can be considered as a very good approximation of the hypergeometric distribution as long as the sample consists of 5% or less of the population. One would need a good understanding of binomial distribution in order to understand the hypergeometric distribution in a great manner. I would recommend you take a look at some of my related posts on …

## Binomial Distribution with Python Code Examples

In this code, you will learn code examples, written with Python Numpy package, related to the binomial distribution. You may want to check out the post, Binomial Distribution explained with 10+ examples to get an understanding of Binomial distribution with the help of several examples. All of the examples could be tried with code samples given in this post. Here are the instructions: Load the Numpy package: First and foremost, load the Numpy and Seaborn library Code Syntax – np.random.binomial(n, p, size=1): The code np.random.binomial(n, p, size=1) will be used to print the number of successes that will happen in one (size=1) experiment comprising of n number of trials with probability/proportion of success being p. Tossing a …

Nice question to help us