
In statistics, a two-sample z-test for means is used to determine if the means of two populations are equal. This test is used when the population standard deviations are known. As data scientists, it is of utmost importance to be able to understand and conduct this test accurately. This blog post will provide a detailed explanation of the two-sample z-test for means, as well as examples to help illustrate how it is used.
What is a two-sample Z-test for means?
Two-sample Z-test for means is a statistical hypothesis testing technique that is used to determine if the difference between the two population means is not statistically significant. This test is used when the standard deviations (σ) of the two populations are known. This test can be used when we have a sample from each population and we know the variance for these populations.
Here are some of the real-world examples where a two-sample z-test for means can be used:
- Comparing the performance of students in two different classes
- Comparing the average salaries of men and women in a company
- Comparing the KPIs of two different teams
- Comparing the performance of employees in two different departments
- Comparing the average IQ scores of two groups of people
- Determining if there is a significant difference in the amount of rainfall between two cities
- Investigating whether the mean daily energy intake of men and women are different
Formula: Two-sample Z-test for means?
The following is the formula for a two-sample z-test for means:
x̄1 is the mean of the first sample
x̄2 is the mean of the second sample
μ1 is the mean of the first population
μ2 is the mean of the second population
(μ1 – μ2) is hypothesized difference between the population means
σ1 is the standard deviation of the first population
σ2 is the standard deviation of the second population
n1 is the number of the data points in the first sample
n2 is the number of the data points in the second sample
Example: Two-sample Z-test for Means
A company wanted to compare the performance of its call center employees in two different centers located in two different parts of the country – Hyderabad, and Bengaluru, in terms of the number of tickets resolved in a day (hypothetically speaking). The company randomly selected 30 employees from the call center in Hyderabad and 30 employees from the call center in Bengaluru. The following data was collected:
Hyderabad: x̄1 = 750, σ1 = 20
Bengaluru: x̄2 = 780, σ2 = 25
The company wants to determine if the performance of the employees in Hyderabad is different from the performance of the employees in the Bengaluru center. To do this, we will use a two-sample z-test for means.
First, we will formulate the null and alternate hypotheses and set the level of significance for the test.
H0: There is no difference between the performance of employees at different call centers.
H1: There is a difference in the performance of the employees.
The level of significance is set as 0.05.
Next, the mean and standard deviation for each sample will need to be determined.
Hyderabad: x̄1 = 750, σ1 = 20
Bengaluru: x̄2 = 780, σ2 = 25
Next, we will calculate the hypothesized difference between the two population means. In this case, the company is hypothesizing that the mean performance in Hyderabad is the same as that of Bengaluru. So, (μ1 – μ2 ) = 0
Finally, we will use the formula for two-sample z-test for means to calculate the test statistic.
z= (x̄1 – x̄2 ) / √((σ1 )²/n1 + (σ2)²/n2)
z = (-30) / √((20)²/30 + (25)²/30))
z = -5.13
At a significance level of 0.05, the p-value is less than 0.00001. You can calculate the same by using P-value from Z-score calculator. As the p-value is lot less than the critical value of 0.05, the result is statistically significant and hence you can reject the null hypothesis. Hence, the performance of Hyderabad’s team is considered to be not equal to the performance of Bengaluru’s team.
Summary
When two samples are taken from two populations, the two-sample z-test for means is used to determine whether or not there is a significant difference between the two means. The null hypothesis states that there isn’t any statistical significance between the two population means (H0) and the alternate hypothesis says otherwise (H1). In order to perform the hypothesis testing to determine whether the difference exists between the two groups or samples and that the difference is statistically significant, the two-samples Z-test for means is used.
- Generative Modeling in Machine Learning: Examples - March 19, 2023
- Data Analytics Training Program (Beginners) - March 18, 2023
- Histogram Plots using Matplotlib & Pandas: Python - March 18, 2023
Hello sir,
In this case why is standard error not considered as pooled one i.e sqrt((n1*σ^2 + n2*σ^2)/(n1 + n2 – 2))
[…] There is no difference between the two populations. Or, the difference between the two population means is not statistically significant. This hypothesis can be tested using two-sample Z-test for means. The formula for Z-statistics is the following. Read further details in this blog, Two-sample Z-test for means. […]