Data Science – Hypothesis Testing & Type I and Type II Errors

0
This article describes Type I and Type II errors made during hypothesis testing, based on a couple of examples such as House on Fire, Swine Flu. You may want to note that it is key to understand type I and type II errors as these concepts will show up when we are evaluating a hypothesis function such as that related with machine learning algorithms such as linear regression, logistic regression etc. For example, in case of linear regression models, the significance value (often set as 0.05 and represent probability of making Type I error) is compared with p-value and, the null hypothesis that the parameter/coefficient is equal to zero is either rejected or failed to be rejected. You may want to check my earlier article on how to formulate hypothesis for hypothesis testing  as a precursor to understanding the concepts around Type I and Type II error in a better manner. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos.

Following are the key points described later in this article:

  • What is Type I Error?
  • What is Type II Error?
What is Type I Error?
When doing a hypothesis testing, one ends up incorrectly rejecting Null Hypothesis when in reality it holds true. This is called as Type I Error. Generally, the significance level (alpha) of a hypothesis test is defined as the probability of making Type I error. Simply speaking, significance level can also be defined as probability of rejecting Null Hypothesis when it actually holds true. Hypothesis testing rules are constructed assigning the significance level a fairly small value. Common values for alpha are 0.10, 0.05 and 0.01, although in average scenarios, 0.05 is used. Thus, mathematically speaking, if the significance level is set to be 0.05, it is acceptable/OK to falsely or incorrectly reject the Null Hypothesis for 5% of the times.

Hypothesis Test: Whether House is on Fire?
Whether house is on fire?

Whether house is on fire?

Lets take an example of smoke coming out of a house. There are two possibilities. Either the smoke is due to the some sort of food getting cooking OR alternatively, the house is on fire. Lets state the Null Hypothesis, H0, that the house is not on the fire and the smoke is mainly due to some food getting cooked. Thus, the alternate hypothesis , Ha, will be that the house is on fire.
A person passing by the house thought that the house is actually burning with fire and thus called the fire fighters. However, fire fighters after arriving at the spot found that the smoke was actually due to the food being cooked.
Lets analyse above from statistical perspective and understand Type I error. The person passing by rejected the Null hypothesis that the house is not on fire and called the fire fighters informing that the house is burning with fire. However, in reality, the smoke was really due to food being cooked, and the house was not on fire. Thus, the person incorrectly or falsely rejected the Null Hypothesis. Or, the person raised a false alarm. Cases like these are also termed as false positives. In other words, the person made Type I Error. One may note that in scenarios like these, it is better to have false alarm than to ignore and end up paying huge price by making Type II error.

Hypothesis Test: Whether Swine Flu is diagnosed?
Whether Swine Flu is Diagnosed?a

Whether Swine Flu is Diagnosed?

The symptoms of Swine Flu is actually Flu like symptoms and many a times diagnosed late thereby proving fatal to life. This is a classical example of how a Type II error could prove to be fatal.
Lets state the Null hypothesis as the fact that a person having symptoms of just the running nose problems is not suffering from Swine Flu (in other words, he is healthy). In other words, diagnosis of swine flu is null or negative. Thus, the alternate hypothesis is stated as the fact that the person is suffering from Swine Flu (or he is unhealthy) OR, the diagnosis comes out to be positive. In the scenario when a person suffering from some sort of running nose problem opts for (or decides to take) Swine Flu (popularly called as H1N1) diagnostic test to make sure that he isn’t suffering from Swine Flu, he rejects the Null Hypothesis that he is healthy. However, when the test result comes out, if the person is not found to be having Swine Flu, it can be called as “False Alarm” raised by the person. The person made Type I error where he incorrectly or falsely rejected the Null Hypothesis. One may note that in cases like these when health is concerned, it is better to raise false alarm or commit Type I error just to make sure.
What is Type II Error?
When doing a hypothesis testing, one fails to reject the Null Hypothesis when he should actually have rejected, this error or mistake is termed as Type II error. The probability of making Type II error is also represented using Beta.
House On Fire
In above example, the Null hypothesis is set as the statement that the house is not on fire and the smoke is mainly due to food being cooked and, the alternate hypothesis is the statement that the smoke is because of the fact that house is on fire. From above example, lets say the passerby ignored the smoke from the house thinking that the smoke is coming out due to food being cooked. After few hours, it was found that house actually burnt. In statistical sense, the passerby failed to reject the Null hypothesis that the house is not on fire and the smoke is coming due to food being cooked. Actually, the alternative hypothesis that the house is on fire was true. Cases like Type II error are also termed as False Negatives.
Swine Flu

In case of Swine Flu example, if the person having breathing problem fails to reject the Null hypothesis, and does not go for H1N1 diagnostic tests, when he should actually have rejected. This may prove fatal to life in case the person is actually suffering from Swine Flu.

Type II error can turn out to be very fatal and expensive.

 

Type I Error & Type II Error Explained with Diagram
Type I and Type II Errors

Type I and Type II Errors

Given the diagram above, one could observe following two scenarios:

  • Type I Error: When one rejects the Null Hypothesis (H0) given that H0 is true, one commits Type I error. It can also be termed as false positive.
  • Type II Error: When one fails to reject the Null hypothesis when it is actually false or does not hold good, one commits Type II error. It can also be termed as false negative.
  • In other cases when one rejects the Null Hypothesis when it is false or not true, and when fails to reject the Null hypothesis when it is true is correct decision.
Ajitesh Kumar

Ajitesh Kumar

Ajitesh has been recently working in the area of AI and machine learning. Currently, his research area includes Safe & Quality AI. In addition, he is also passionate about various different technologies including programming languages such as Java/JEE, Javascript and technologies such as Blockchain, mobile computing, cloud-native technologies, application security, cloud computing platforms, big data etc.

He has also authored the book, Building Web Apps with Spring 5 and Angular.
Ajitesh Kumar

Leave A Reply

Time limit is exhausted. Please reload the CAPTCHA.