In statistics, the two sample t-test for independent samples is a type of hypothesis test that can be used to determine whether the means of two populations are statistically different given the two samples are independent and have normal distributions. As data scientists, it is important to understand how to use the two sample t-test for independent samples so that you can correctly analyze your data. In this blog post, we will discuss the **two sample** **t-test** for **independent samples** in detail, including the formula and examples.

Table of Contents

## What is two-sample T-test?

A two-sample T-test is defined as statistical hypothesis testing technique in which **two independent sample**s are compared to determine if the means of two populations are statistically different. The two-sample T-test is used when the standard deviations of the populations to be compared are unknown and the sample size is small. The size of sample 30 or less is considered as small sample. That said, the size of the sample is not a strict condition for using T-test. The two-sample T-test is used when the** two samples are independent **and have** normal distributions**. In order to use a two-sample T-test as desccribed in this blog, you need to have two **independent samples**. The independent samples mean that the* two samples cannot be from the same group of people and they cannot be related in any way*. However, two-sample T-test can also be used for pairwise comparisons when the “two” samples represent the same items tested in different scenarios. The pairwise t-test will be dealt with in different blog.

Let’s say you want to know if two different brands of batteries have the same average life. You could take a battery from each brand, use them until they die, and record the results. This would be an extremely time-consuming process, and it’s not very likely that you’d get a large enough sample size to draw any conclusions. Another option is to use a two-sample T-test. This test allows you to compare the averages of two groups without having to measure the batteries’ life spans yourself.

The following are a few real-life examples where two-sample T-test for independent samples can be used:

- Comparing the average test scores of two classes from two different schools
- Comparing the average weights of two different oor independent groups of people
- Determining whether the medication have the same efficacy on two different or independent groups of people
- Compare whether the effect of vaccination on two different groups

### T-statistics when population variances or standard deviations are unequal

The formula for T-statistics is different based on whether the populations’ standard deviation are same / equal or different. When __the standard deviations of populations are not equal__, the following formula is used to calculate the T-statistics and degrees of freedom.

Where **X̄1** is mean of first sample, **X̄2** is mean of second sample, **μ1 **is the mean of first population, **μ2 **is the mean of second population, **s1 **is the standard deviation of first sample, **s2 **is the standard deviation of second sample, **n1 **is the size of the first sample, **n2 **is the size of the second sample.

The degrees of freedom can be calculated as the sum of two sample sizes minus two.

Degrees of freedom, **df** = **n1 + n2 – 2**

A confidence interval for the difference between two means specifies a range of values within which the difference between the means of the two populations may lie. The difference between the means of two populations can be estimated based on the following formula:

Difference in population means = Difference in sample means +/- T*standard error

In above formula, the standard error is the square root term.

### T-statistics when population variances or standard deviations are equal

In case, __the two populations’ standard deviations are equal__, the formula termed as **pooled t-statistics **is used based on the usage of **pooled standard deviations **of the two samples. The following is the formula for the **pooled t-statistics:**

In the above formula, **Sp** is termed as pooled standard deviation. The formula for pooled variance can be calculated based on the following:

The value for the degree of freedom can be calculated as the sum of two sample sizes minus two.

Degrees of freedom, **df** = **n1 + n2 – 2**

## When two-sample T-test instead of two-sample Z-test?

When the population standard deviations are known and the sample size is large, we go for two-sample Z-test for comparing the two different populations. The sample size greater than 30 is considered to be large sample size. Otherwise, a two-sample T-test is known with T-distribution and a given degrees of freedom.

## Two-sample T-test: Examples

Lets say we need to compare the performance of two call centers in terms of average call lengths and find out if the difference is statistically significant or the difference is a chance occurrence. To start with, we will need to formulate the null and alternate hypothesis.

**Null hypothesis, H0**: There is no difference between the average call length between two call centers.

**Alternate hypothesis, Ha**: There is a difference between the average call length and hence the performance.

We randomly select 20 calls from each call center and calculate the average call lengths. The two call centers seem to have different average call lengths. Is this difference statistically significant?

First, we need to calculate the two sample means and standard deviations:

Call Center A: Sample mean, **X̄1** = 122 seconds, SD, **S1** = 15 seconds, **n1 **= 20

Call Center B: Sample mean, **X̄2** = 135 seconds, SD, **S2** = 20 seconds, **n2** = 20

Next, we use a two-sample t-test to determine if the difference between two sample means is statistically significant. We will use a 95% confidence level and α = 0.05.

The two-sample t-statistic is calculated as the following assuming that the standard deviations of the population is not same and the population mean is same.

t = ((135 – 122) – 0)/SQRT((20*20/20) + ((15*15)/20))

t = 13/SQRT(20 + 11.25)

t = 13/SQRT(31.25)

**t = 2.3256**

The value of degrees of freedom can be calculated as the following:

Degree of freedom, df = n1 + n2 -2 = 20 + 20 – 2 = 38

The critical value of a two-tailed T-test with degrees of freedom as 38 and level of significance as 0.05 comes out to be **2.0244**. Since the current t-value of 2.3256 is greater than the critical value of 2.0244, one can reject the null hypothesis that there is no difference between the performance in terms of the call length time. Thus, based on the given evidence, the alternate hypothesis stands as true.

## Summary

The two-sample t-test for independent samples is a statistical method for comparing two different populations. The t-test can be used when the population standard deviations are not known and the sample size is smaller (less than 30). The two sample t-statistic calculation depends on given degrees of freedom, df = n1 + n2 – 2. If the value of two samples t-test for independent samples exceeds critical T at alpha level, then you can reject null hypothesis that there is no difference between two data sets (H0). Otherwise if two sample T-statistics is less than or equal to critical T at alpha level, then one cannot reject H0; this means both values could have come from same distribution in which case any observed difference would be due to chance alone. Different formulas are required to be used for performing t-test for two independent samples based on whether the variances of two populations are equal or otherwise.

- Model Compression Techniques – Machine Learning - October 30, 2022
- What are Features in Machine Learning? - October 29, 2022
- Feature Scaling in Machine Learning: Python Examples - October 29, 2022

[…] Two sample T-test: Formula & Examples […]