# Data Science: P-Value Explained with Examples Many describe p-value as the probability that the null hypothesis holds good. That is an incorrect definition. The concept of p-value is understood differently by different people and is considered as one of the most used & abused concepts in statistics. In this blog post, you will learn the P-VALUE concepts with multiple different examples. It is extremely important to get a good understanding of P-value if you are starting to learn data science/machine learning as the concepts of P-value are key to hypothesis testing. The following use cases and related hypotheses made about the population will either be accepted or rejected based on the P-VALUE:

• Whether a coin is fair
• Whether a dice is fair

Before getting into understanding concepts of p-value, let’s quickly go through the hypothesis testing concepts to get a good understanding about p-value.

### What is Hypothesis Testing?

Hypothesis testing can be defined as the statistical framework which can be used to answer “yes-or-no” questions about data. Take a look at the following questions:

• Is the coin fair?
• Is the dice fair?

Hypothesis testing requires determining the null hypothesis and the alternate hypothesis. The null hypothesis represents the default state of belief about the world. For example, the coin is fair. Or, the dice is fair. The alternate hypothesis represents something different and unexpected. The following represent key steps in hypothesis testing. Note the usage of p-value:

• Deine the null and alternative hypotheses.
• Construct a test statistic that summarizes the strength of evidence against the null hypothesis.
• Compute a p-value that quantifies the probability of having obtained a comparable or more extreme value of the test statistic under the null hypothesis
• Based on the value of p-value, decide whether to reject the null hypothesis

### What is P-VALUE?

In hypothesis testing, once the test statistics are determined to evaluate the null hypothesis, the next step is to compute the probability of observing a test statistic equal to or more extreme than the observed statistic, under
the assumption that the null hypothesis Ho is true. This probability is called the P-value. If the value of p-value is small, it provides evidence against the null hypothesis. While one of the ways to evaluate the null hypothesis is the value of test statistics, p-value allows transforming the test statistic, which is measured on some arbitrary and uninterpretable scale, into a number between 0 and 1 that can be more easily interpreted.

The p-value can be defined as the probability of obtaining test statistics more extreme than the ones observed if we repeated the experiment many many times, provided the null hypothesis holds. In other words, the p-value represents the fraction of the time that one would expect to see such an extreme value of the test statistics if the experiments are repeated many many times, provided the null hypothesis holds.

It is measured using techniques such as determining the test statistics such as Z, T, or chi-square and calculating P-value using the related distribution tables such as z-distribution, t-distribution, or chi-square distribution respectively. The distribution of the test statistic for null hypothesis testing will depend on the details of what type of null hypothesis is being tested, and what type of test statistic is used. In general, most commonly-used test statistics follow a well-known statistical distribution under the null hypothesis — such as a normal distribution, at-distribution, a χ2-distribution, or an F-distribution.

### P-VALUE explained with Examples

The examples below do not take into consideration the test statistics and p-value for explaining whether the null hypothesis can be rejected or otherwise. These examples however intend to explain the concepts regarding the p-value and the notion regarding rejecting or failing to reject the null hypothesis.

Let’s take a quick example to understand the concept of P-value. Given a school consisting of both boys and girls students, let’s test the hypothesis that the boys to girls ratio is not equal to 0.5. In other words, the percentage of boys to the total number of students is greater than 0.5 or 50%. In order to test the hypothesis, as a first step, we will need to formulate the null and alternate hypotheses. In this example, we set the null hypothesis as the ratio of boys to the total student is 0.5 (50%). The alternate hypothesis is that the ratio of the number of boys to the number of girls is not equal to 0.5. As part of the test, several random samples of 20 students are taken to count the number of boys/girls. The following output would help understand the definition of P-Value.

 Sample (No. of students) Outcome (No. of Boys) Outcome (Ratio of Boys/Total Student) Interpretation 20 12 0.6 Given that the null hypothesis holds, there is a high likelihood that the test outcome looks to have happened by chance; Can’t reject the null hypothesis 20 16 0.8 Given that the null hypothesis holds, the test outcome looks to be doubtful; Does not look like the outcome happened by chance; However, the evidence is not enough to reject the null hypothesis. 20 18 0.9 Given that the null hypothesis holds, with a very high confidence level, it could be stated that the test outcome definitely does not look to have happened by chance; Given that the sample is chosen in a fair and random manner, the null hypothesis can be rejected. The alternate hypothesis is accepted which implies that the boys are greater in number in the school. 20 8 0.4 Given that the null hypothesis holds, there is a high likelihood that the test outcome looks to have happened by chance; Can’t reject the null hypothesis 20 2 0.1 Given that the null hypothesis holds, the test outcome definitely does not look to have happened by chance; Given that the sample is chosen in a fair and random manner, the null  hypothesis can be rejected. The alternate hypothesis is accepted which implies that the girls are greater in number in the school.

In the above example, the tests with a number of boys counted as 18 and 2 (red) in a random sample of 20 students are at an extreme level. The test outcomes look to be significant enough to indicate that the test results do not look to have happened by chance and that it is incorrect to claim that the ratio of the number of boys to the number of girls is 0.5.  In such cases, the P-Value when calculated may/will turn out to be lesser than 0.05. Given that the level of significance is set to be 0.05, the P-value can be used to indicate that the null hypothesis can be rejected. Thus, one could reject the null hypothesis.

The P-VALUE is used to represent whether the outcome of a hypothesis test is statistically significant enough to be able to reject the null hypothesis. It lies between 0 and 1.

The threshold value below which the P-VALUE becomes statistically significant is usually set to be 0.05. The threshold value is called the level of significance and is a function of confidence level.  One could choose to set different threshold values (such as 0.025 or 0.01) based on the confidence level based on which one could choose to reject the null hypothesis.

The following diagram represents the p-value of the test statistics as the area of the shaded region (with red). Figure 1. P-Value

Let’s try and understand the intuition behind P-VALUE.

### P-Value Explained using Null Hypothesis: The Coin is Fair

In case a coin is fair, it is expected that the probability of heads and tails being rolled out is around (or near to) 50%. In order to prove the claim for the population, multiple different experiments with samples representing 10 tosses of coins are done. The null hypothesis is that the coin is fair. The alternate hypothesis is that the coin is unfair. The following represents the test outcomes and interpretation related to when the hypothesis can be rejected.

 Sample (No. of tosses) Outcome (No. of Heads) Interpretation 10 6 Given that the null hypothesis holds,  there is a high likelihood that the test outcome looks to have happened by chance; Can’t reject the null hypothesis 10 7 Given that the null hypothesis holds, the test outcome looks to be doubtful; Does not look like the outcome happened by chance; However, the evidence is not enough to reject the null hypothesis. 10 9 Given that the null hypothesis holds, with a very high confidence level, it could be stated that the test outcome does not look to have happened by chance; Given that the sample is chosen in a fair and random manner, the null hypothesis, that the coin is fair, can be rejected. The alternate hypothesis is accepted which implies that the coin is not fair. 10 4 Given that the null hypothesis holds, there is a high likelihood that the test outcome looks to have happened by chance; Can’t reject the null hypothesis 10 1 Given that the null hypothesis holds, with a very high confidence level, it could be stated that the test outcome definitely does not look to have happened by chance; Given that the sample is chosen in a fair and random manner, the null hypothesis, that the coin is fair, can be rejected. The alternate hypothesis is accepted which implies that the coin is not fair.

In above example, the tests with a number of heads counted as 9 and 1 (red) in a random sample of 10 tosses are at an extreme level. The test outcomes look to be significant enough to indicate that the test results do not look to have happened by chance and that it is incorrect to claim that the coin is fair.  In such cases, the P-Value may/will turn out to be lesser than 0.05. Given that the level of significance is set to be 0.05, the P-value can be used to indicate that the null hypothesis can be rejected. Thus, one could reject the null hypothesis.

### P-Value Explained using Null Hypothesis: The Dice is Fair

In case the dice is fair, it is expected that the probability of getting 6 when the dice is rolled out is around (or near to) 16.67% (Expected value – the probability of 1/6). In order to prove the claim for the population, multiple different experiments with samples representing 50 tosses of dice are done. The null hypothesis is that the dice are fair. The alternate hypothesis is that the dice are unfair. The following represents the test outcomes and interpretation related to when the hypothesis can be rejected.

 Sample (No. of tosses) Outcome (No. of 6s) Interpretation 50 25 Given that the null hypothesis holds, the test outcome looks to be doubtful; Does not look like the outcome happened by chance; However, the evidence is not enough to reject the null hypothesis. 50 15 Given that the null hypothesis holds, there is a high likelihood that the test outcome looks to have happened by chance; Can’t reject the null hypothesis 50 3 With a very high confidence level, it could be stated that the test outcome does not look to have happened by chance; Given that the sample is chosen in a fair and random manner, the null hypothesis, that the dice are fair, can be rejected. The alternate hypothesis is accepted which implies that the dice are not fair. 50 38 The test outcome looks to be doubtful; Does not look like the outcome happened by chance; However, the evidence is not enough to reject the null hypothesis. 50 47 With a very high confidence level, it could be stated that the test outcome definitely does not look to have happened by chance; Given that the sample is chosen in a fair and random manner, the null hypothesis, that the dice are fair, can be rejected. The alternate hypothesis is accepted which implies that the dice are not fair.

In the above example, the tests with a number of 6s counted as 3 and 48 (red) in a random sample of 50 tosses are at an extreme level. The test outcomes look to be significant enough to indicate that the test results do not look to have happened by chance and that it is incorrect to claim that the dice are fair.  In such cases, the P-Value may/will turn out to be lesser than 0.05. Given that the level of significance is set to be 0.05, the P-value can be used to indicate that the null hypothesis can be rejected. Thus, one could reject the null hypothesis.

### References ## Ajitesh Kumar

I have been recently working in the area of Data Science and Machine Learning / Deep Learning. In addition, I am also passionate about various different technologies including programming languages such as Java/JEE, Javascript, Python, R, Julia etc and technologies such as Blockchain, mobile computing, cloud-native technologies, application security, cloud computing platforms, big data etc. I would love to connect with you on Linkedin and Twitter.
Posted in Data Science, statistics. Tagged with , .