In this post, you will learn the definition of 25 different types of probability distributions. Before we get into understanding different types of probability distributions, let’s understand some fundamentals. If you are a data scientist, you would like to go through these distributions. This page could also be seen as a cheatsheet for probability distributions.
What is Probability Distributions?
A probability distribution is a mathematical function that can be thought of as providing the probabilities of occurrence of different possible outcomes in an experiment. Probability distributions are divided into two classes:
- Discrete Probability Distribution: The probabilities defined on a discrete random variable, one which can only take a discrete set of values, is said to be a discrete probability distribution.
- Continuous Probability Distribution: The probabilities defined on a continuous random variable, one which can take any value between two numbers, is said to be a continuous probability distribution.
Different Types of Probability Distributions
Here is the list of different types of probability distributions:
- Uniform: Also known as rectangular distribution, the uniform distribution is a type of continuous probability distribution that has a constant probability. Simply speaking, it is a type of probability distribution in which all outcomes are equally likely. Rolling a single die is one example of a discrete uniform distribution; a die roll has six possible outcomes: 1,2,3,4,5, or 6. There is a 1/6 probability for each number being rolled. Here is a sample plot representing uniform probability distribution:
- Binomial: A discrete probability distribution used to model the number of successes in a sequence of n independent experiments or a fixed number of Bernoulli trials, each asking a yes-no question. Each experiment has boolean-valued outcome such as success/yes/true/one (with probability p) or failure/no/false/zero (with probability, q = 1 − p). The binomial distribution is frequently used to model the number of successes in a sample of size n drawn with replacement from a population of size N. The following conditions need to be satisfied for the experiment to be termed as a binomial experiment: A. Fixed number of n trials. B. Each trial is independent. C. Only two outcomes are possible (Success and Failure). D. The probability of success (p) for each trial is constant. E. A random variable Y= the number of successes.
Here are some examples of binomial distribution:
- For a coin tossed N times, binomial distribution can be used to model the probability of the number of successes (say, heads). For example, for the coin tossed 10 times, the binomial distribution could be used to model the probability of a number of heads (1 to 10).
- Here is the sample binomial distribution plot created with different values of n and p
- Multinomial: A generalization of the binomial distribution. For example, it models the probability of counts of each side for rolling a k-sided die n times.
- When k = 2 and n = 1, the multinomial distribution is the Bernoulli distribution.
- When k = 2 and n > 1, it is the binomial distribution.
- When k > 2 and n = 1, it is the categorical distribution.
- When k > 2 and n > 1, it is termed as multinomial distribution.
Here is a great read on multinomial distribution on this page, Visualizing Dirichlet distributions with Matplotlib.
- Bernoulli: A discrete or categorical probability distribution for a Bernoulli trial. For n = 1 (one experiment), binomial distribution can be termed as Bernoulli distribution. The Bernoulli distribution is often termed as a special case of the binomial distribution, where n = 1. A single success/failure experiment is also called a Bernoulli trial or Bernoulli experiment and a sequence of outcomes is called a Bernoulli process. The following is a Bernoulli distribution. In the plot given below, the probability of the failure is labeled on the x-axis as 0 and success is labeled as 1. The probability of success (1) is 0.4, and the probability of failure (0) is 0.6. Here is a great read on Bernoulli distribution.
- Negative Binomial: A discrete probability distribution of the number of trials in a sequence of independent and identically distributed Bernoulli trials before a specified or fixed number of successes occurs. Sometimes, the negative binomial distribution has also defined as a probability distribution of number of successes needed to get a fixed number of failures or vice-versa. The negative binomial experiment is almost the same as a binomial experiment with one difference: a binomial experiment has a fixed number of trials. The negative binomial distribution is known as the Pascal distribution. Here are some examples/scenarios which can be modeled using negative binomial distribution:
- In case of tossing a coin, the negative binomial distribution can give the number of trials required to happen before a certain number of heads appear
- The negative binomial distribution can be used to model the number of the goal attempts an athlete makes before scoring r goals, though.
- The negative binomial distribution can be used to model the number of days a certain machine works before it breaks down.
- Take a standard deck of cards, shuffle them, and choose a card. Replace the card and repeat until you have drawn two aces. Y is the number of draws needed to draw two aces. As the number of trials isn’t fixed (i.e. you stop when you draw the second ace), this makes it a negative binomial distribution.
Here is a sample plot representing negative binomial distribution:
- Normal: A type of continuous probability distribution for a real-valued random variable. It is a type of symmetric distribution where most of the observations cluster around the central peak and the probabilities for values further away from the mean taper off equally in both directions. It is represented using a bell-shaped density curve described by its mean and standard deviation. It is also known as the Gaussian distribution. It has got the following features:
- Symmetric bell shape
- Mean and median are equal; both located at the center of the distribution
- 68% of the data falls within 1 standard deviation of the mean
- 95% of the data falls within 2 standard deviations of the mean
- 99.7% percent of the data falls within 3 standard deviations of the mean
Here is a sample normal distribution curve:
- Poisson: A Poisson distribution is a discrete probability distribution that shows how many times an event is likely to occur within a fixed interval of time or space if these events occur with a known average rate and independently of the time since the last event. It is used for independent events that occur at a constant rate within a given interval of time. Note that Poisson distribution is associated with both time and space. Another key point is that events need to be independent of each other. When to use Poisson distribution? Poisson distribution is used for finding the probability of a number of events in a time period or finding the probability of waiting some time until the next event. Here is a great read on Poisson distribution. Here are some examples:
- Customers calling a help center: On average, there are, say, 10 customers which call in an hour. Thus, Poisson distribution can be used to model the probability of a different number of customers calling within an hour (say, 5 or 6 or 7 or 8 or 9 or 11 customers, etc). The diagram below represents
- No. of visitors to a website: On average, there are 500 visitors to a website every day. Poisson distribution can be used to estimate the number of visitors every day.
- Radioactive decay in atoms
- Photons arriving at a space telescope
- Movements in a stock price
- Number of trees in a given acre of land
Here is a sample diagram representing the probability distribution for a given lambda (rate of change of event)
- Lognormal: A continuous distribution in which the logarithm of a variable has a normal distribution. In other words, Lognormal distribution is a probability distribution with a normally distributed logarithm. A random variable is log-normally distributed if its logarithm is normally distributed. Skewed distributions with low mean values, large variance, and all-positive values often fit this type of distribution. Some of the other names of the Lognormal distribution are Galton, Galton-McAlister, Gibrat, Cobb-Douglas distributions. Here are some examples of the lognormal distributions:
- Size of silver particles in a photographic emulsion
- Survival time of bacteria in disinfectants
- The weight and blood pressure of humans
- The number of words written in sentences by George Bernard Shaw
- Milk production by cows.
- Lives of industrial units with failure modes that are characterized by fatigue-stress.
- Amounts of rainfall.
- Size distributions of rainfall droplets.
- The volume of gas in a petroleum reserve.
- 3-parameter Lognormal: The 3-parameter lognormal distribution is a general skew distribution in which the logarithm of any linear function of a given variable is normally distributed. The three-parameter lognormal distribution is frequently used in the hydrologic analysis of extreme floods, seasonal flow volumes, duration curves for daily streamflow, rainfall intensity-duration, soil water retention.
- Exponential: Exponential distribution is a continuous probability distribution that describes the waiting time until the next event (success, failure, arrival, etc) in a Poisson process. Taking the cue from the example sighted in the Poisson distribution in this post, the probability distribution used to model the time between every customer calls is the exponential distribution. Recall that the Poisson distribution is used to model the probability distribution of the number of customer calls within an hour given a particular rate (lambda). Simply speaking, the exponential distribution could be used to model the waiting time before the next event occurs. This is one of the commonly used distributions in reliability engineering. Here are some examples of exponential distribution:
- How much time will go before a customer call happens?
- How much time will go before the next customer arrive in the shop?
- How much time will pass by before the next robbery happens in any part of the city?
- How much time will pass by before the next childbirth in the city?
- How much time will go until the customer finishes browsing and actually purchases something in your store (success)?
- How much time will go until the hardware on AWS EC2 fails (failure)?
- How much time will go until the bus arrives (arrival)?
Here is the sample plot representing the exponential distribution:
- 2-parameter Exponential:
- Weibull: A type of continuous probability distribution, the Weibull distribution has the ability to assume the characteristics of many different types of distributions. It is flexible enough to model a variety of data sets, be it right-skewed, left-skewed or symmetrical dataset. Generally speaking, the Weibull distribution is determined by 2-parameters such as shape and scale. The location parameter is set to zero. Here are some of the examples where Weibull distribution is used for modeling the related random variable.
- Reliability engineering, life data, and failure analysis
- In electrical engineering to represent overvoltage occurring in an electrical system
- In survival analysis
- In weather forecasting and the wind power industry to describe wind speed distributions, as the natural distribution often matches the Weibull shape
- In information retrieval to model dwell times on web pages
- In general insurance to model the size of reinsurance claims, and the cumulative development of asbestosis losses
- In hydrology, extreme events such as annual maximum one-day rainfalls and river discharges
Here is a sample plot representing Weibull distribution for different value of shape parameter:
- 3-parameter Weibull: In 3-parameter Weibull distribution, apart from shape and scale, the location parameter also becomes important.
- Dirichlet: A type of continuous probability distribution. The Dirichlet distribution is a probability distribution over the space of multinomial distributions. It is the probability distribution over a probability simplex – a bunch of numbers that add up to 1. The following is an example of probability simplex:
(0.2, 0.1, 0.7)
(0.07, 0.2, 0.13, 0.1, 0.2, 0.3)
The above numbers represent probabilities over K distinct categories. In the above examples, K is 2, 3, and 6 respectively. That’s why they are also called categorical distributions
Dirichlet distributions are commonly used as the prior distributions in Bayesian statistics. It is a multivariate generalization of the beta distribution. Thus, it is also termed as multivariate beta distribution.
- Beta: A type of continuous probability distribution, Beta distribution is defined on the interval [0, 1] parameterized by two positive shape parameters α and β, that appear as exponents of the random variable and control the shape of the distribution. Simply speaking, Beta distribution is used to represent percentages, proportion or probability outcomes. It can also be said as the distribution of probabilities. Here are some examples which could be modeled using beta distribution:
- How likely it is that the preferred candidate for mayor will receive 70% of the vote. 70% of the vote (0.7) here is the proportion or probability that falls in the limit [0,1].
- How likely is that President Trump will win 2020 presidential elections? Note the value will fall within the limit [0, 1]
Check out this post sighting example of how beta distribution could be used to model the probability of runs that could be scored in upcoming cricket match.
- Gamma: Gamma distribution is a continuous probability distribution that is used to model the continuous variables that are always positive and have skewed distributions. It is a two-parameter family of continuous probability distributions. The exponential distribution, Erlang distribution, and chi-squared distribution are special cases of the gamma distribution. It is a distribution that arises naturally in processes for which the waiting times between events are relevant. It can be thought of as a waiting time between Poisson distributed events. Here are some of the examples of a gamma distribution:
- In life testing, the waiting time until death is a random variable that is frequently modeled with a gamma distribution
- The size of loan defaults or aggregate insurance claims
- The flow of items through manufacturing and distribution processes
- The load on web servers
- The many and varied forms of telecom exchange
- 3-parameter Gamma: Also called as a generalized gamma distribution, 3-parameter Gamma distribution is a type of continuous probability distribution with three parameters. Since many distributions commonly used for parametric models in survival analysis (such as the Exponential distribution, the Weibull distribution, and the Gamma distribution) are special cases of the generalized gamma, it is sometimes used to determine which parametric model is appropriate for a given set of data.
- Logistic: A continuous probability distribution, whose cumulative distribution function is the logistic function, which appears in logistic regression and feedforward neural networks. It resembles the normal distribution in shape but has heavier tails (higher kurtosis). Here is the sample plot representing the logistic distribution for different location and shape parameter:
- Turkey Lambda: A continuous, symmetric probability distribution defined in terms of its quantile function, and is typically used to identify an appropriate distribution. The Tukey lambda distribution is symmetric around zero, therefore the expected value of this distribution is equal to zero. The most common use of this distribution is to generate a Tukey lambda PPCC plot of a data set. Based on the PPCC plot, an appropriate model for the data is suggested.
- Log-Logistic: The probability distribution of a random variable whose logarithm has a logistic distribution. It is similar in shape to the log-normal distribution but has heavier tails. Here are some examples of scenarios that are modeled using Log-logistic distribution:
- Used in survival analysis as a parametric model for events whose rate increases initially and decreases later, as, for example, the mortality rate from cancer following diagnosis or treatment.
- Used in hydrology to model streamflow and precipitation,
- Used in economics as a simple model of the distribution of wealth or income
- Used in networking to model the transmission times of data considering both the network and the software.
Here is a sample plot representing Log-Logistic distribution:
- 3-parameter Log-Logistic: 3-parameter log-logistic distribution is a generalization of the two–parameter log–logistic distribution. It has been applied to the frequency analysis of precipitation and streamflow data. Check out this paper in relation to how this distribution is employed for flood frequency analysis of annual maximum series for part of Scotland
- Smallest extreme value: This distribution is used to model time to failure for a system that fails when its weakest component fails. It is defined by its location and scale parameters. Skewed to the left, the smallest extreme value distribution describes extreme phenomena such as the minimum temperature and rainfall during a drought. Here is the sample plot representing the smallest extreme value distribution
- Largest extreme value: The largest extreme value distribution is used to model the maximum value from a distribution of random observations. Skewed to the right, it describes extreme phenomena such as extreme wind velocities and high insurance losses. For example, the distribution of the water levels in river overtime is frequently skewed to the right with a few cases of extreme water levels to the right and a majority of water levels in the lower tail. Here is the sample plot representing the largest extreme value distribution:
- Chi-square: Chi-squared distribution with k degrees of freedom is the distribution of a sum of the squares of k independent standard normal random variables. It is a special case of the gamma distribution and is one of the most widely used probability distributions in inferential statistics, notably in hypothesis testing and in construction of confidence intervals. It is used in the common chi-square tests for some of the following reasons:
- The goodness of fit of an observed distribution to a theoretical one
- The independence of two criteria of classification of qualitative data
- In confidence interval estimation for a population standard deviation of a normal distribution from a sample standard deviation.
Here is a sample chi-square distribution plot:
- Geometric: The geometric distribution is the probability distribution of the number of trials needed to get the first success in repeated independent Bernoulli trials. In other words, the Geometric distribution can also be defined as a number of failures before the first success happens or vice-versa. The geometric distribution is an appropriate model if the following assumptions are true.
- The phenomenon being modeled is a sequence of independent trials.
- There are only two possible outcomes for each trial, often designated success or failure.
- The probability of success, p, is the same for every trial.
There are many more probability distributions such as some of the following which I will be covering in the following posts.
- Inverse Gamma
- Inverse Chi-squared
- Why Weibull distribution is always welcome
- Characteristics of Weibull distribution
- Understanding beta distribution (using baseball statistics)
- What is the beta distributions?
- Visualizing Dirichlet distributions with Matplotlib
- Dirichlet distributions
- Dirichlet distribution: Simple definition, PDF, Mean
- Quora – Intuitive explanation of Dirichlet distribution
- Smallest and largest extreme value distribution
- Logistic distribution
- Log-logistic distribution
- Exponential distribution – Intuition, applications
- Intuition behind beta distribution
- When to use Deep Learning vs Machine Learning Models? - January 17, 2021
- Most Common Types of Machine Learning Problems - January 14, 2021
- Historical Dates & Timeline for Deep Learning - January 10, 2021