In statistics, the t-test is often used in research when the researcher wants to know if there is a significant difference between the mean of sample and the population, or whether there is a significant difference between the means of two different groups. There are two types of t-tests: the one sample t-test and the two samples t-test. As data scientists, it is important for us to understand the concepts of t-test and how to use it in our data analysis. In this blog post, we will focus on the one sample t-test and explain with **formula** and **examples**.

## What is one-sample T-test?

One-sample T-test is a statistical hypothesis testing technique in which the mean of a sample is tested against a hypothesized value, e.g., a population mean. The t-test is used to determine whether the difference between the sample mean and the hypothesized value, e.g., the population mean is statistically significant or not. T-test is used for hypothesis testing of one-sample mean when the population standard deviation is unknown and the sample size is small. The distribution used is T-distribution with certain degrees of freedom. A sample of size lesser than 30 observations is considered as a small sample.

**T = (X̄ – μ) / S/√n**

Where, **X̄** is the sample mean, **μ** is the hypothesized population mean, **S** is the standard deviation of the sample and **n** is the number of sample observations.

When working with T-test, T-distribution is used in place of the normal distribution. The t-distribution is a family of curves that are symmetrical about the mean, and have increasing variability as the degrees of freedom increase. The t-test statistic (T) follows a t-distribution with n – 1 degrees of freedom, where n is the number of observations in the sample.

## One-sample T-test: Example

Suppose a claim is made that the average number of days a person spends on vacation is more than or equal to 5 days (hypothesized population mean) based on a sample of 16 people whose mean came out to be 9 days. As a first step, we will formulate the null and alternate hypothesis.

**Null hypothesis, H0**: There is no difference between the sample mean and the population mean; What has occured with a sample is just an instance of chance occurrence.

**Alternate hypothesis, Ha**: There is a significant difference between the sample mean and the population mean.

We will use one-sample t-test to test this hypothesis. A right-tailed test will be performed.

**T = (X̄ – μ) / S/√n**

Where, X̄ is the sample mean, μ is the hypothesized population mean, S is the standard deviation of the sample and n is the number of observations in the sample.

A sample size of 16 persons is taken. The mean number of days spent on vacation by the persons in sample is found to be 9 days with a sample standard deviation is found to be 3 days.

T = (X̄ – μ) / S/√n

= (9 – 5)/(3/ √16)

= 5.33

At a level of significance of 0.05, the T-value for a right-tailed test comes out to be 1.75305. Since the calculated T-value of 5.33 is much larger than the critical value of 1.75305, the null hypothesis can be rejected. Thus, there is a statistically significant difference between sample mean and the population mean. You can use this T-value calculator to calculate the critical value of T for a given level of significance and degrees of freedom.

Another way to test is to calculate the p-value for getting the T-statistics of 5.33. You can use this P-value calculator to calculate p-value for a given T-value, degrees of freedom and the types of tail-test (one-tailed or two-tailed test). For a T-statistics of 5.33, the p-value came out to be 0.000042. This means that there is a probability of only 0.000042 to get this kind of sample given the null hypothesis holds good. As this value is less than 0.05, one can reject the null hypothesis given the evidence of current sample.

## T-score / T-statistics for Estimating Population Mean

The population mean can be estimated as a function of the t-score using the following equation:

**Population mean** = **Sample mean **+ T*(**Standard error** of the mean)

Where T is a statistic that has a T-distribution with known properties. The standard error of the mean (SE) is an estimate of the standard deviation of the sampling distribution of the t-statistic. The T-statistic can be used to calculate confidence intervals for population means given the sample size is small and the population standard deviation is unknown. When the population standard deviation is know, we use Z-statistics and Z-distribution instead of T-statistics.

The value of standard error of the mean can be calculated as :

**SE of the mean = S/√n **

Where, S is the standard deviation of the sample and n is the number of observations in the sample.

## Summary

The one-sample t-test is a statistical test that can be used to determine whether there is a significant difference between the sample mean and the population mean. The t-test statistic (T) follows a t-distribution with n – 1 degrees of freedom, where n is the number of observations in the sample. T-statistics can be used to estimate the population mean when the population standard deviation is unknown. The t-test can be used to calculate confidence intervals for population means when the sample size is small and the population standard deviation is unknown.

- Generative Modeling in Machine Learning: Examples - March 19, 2023
- Data Analytics Training Program (Beginners) - March 18, 2023
- Histogram Plots using Matplotlib & Pandas: Python - March 18, 2023

[…] One-sample T-test: Formula & Examples […]