
In statistics, the t-test is often used in research when the researcher wants to know if there is a significant difference between the mean of sample and the population, or whether there is a significant difference between the means of two different groups. There are two types of t-tests: the one sample t-test and the two samples t-test. As data scientists, it is important for us to understand the concepts of t-test and how to use it in our data analysis. In this blog post, we will focus on the one sample t-test and explain with formula and examples.
What is one-sample T-test?
One-sample T-test is a statistical hypothesis testing technique in which the mean of a sample is tested against a hypothesized value, e.g., a population mean. The t-test is used to determine whether the difference between the sample mean and the hypothesized value, e.g., the population mean is statistically significant or not. T-test is used for hypothesis testing of one-sample mean when the population standard deviation is unknown and the sample size is small. The distribution used is T-distribution with certain degrees of freedom. A sample of size lesser than 30 observations is considered as a small sample.
T = (X̄ – μ) / S/√n
Where, X̄ is the sample mean, μ is the hypothesized population mean, S is the standard deviation of the sample and n is the number of sample observations.
When working with T-test, T-distribution is used in place of the normal distribution. The t-distribution is a family of curves that are symmetrical about the mean, and have increasing variability as the degrees of freedom increase. The t-test statistic (T) follows a t-distribution with n – 1 degrees of freedom, where n is the number of observations in the sample.
One-sample T-test: Example
Suppose a claim is made that the average number of days a person spends on vacation is more than or equal to 5 days (hypothesized population mean) based on a sample of 16 people whose mean came out to be 9 days. As a first step, we will formulate the null and alternate hypothesis.
Null hypothesis, H0: There is no difference between the sample mean and the population mean; What has occured with a sample is just an instance of chance occurrence.
Alternate hypothesis, Ha: There is a significant difference between the sample mean and the population mean.
We will use one-sample t-test to test this hypothesis. A two-tailed test will be performed.
T = (X̄ – μ) / S/√n
Where, X̄ is the sample mean, μ is the hypothesized population mean, S is the standard deviation of the sample and n is the number of observations in the sample.
A sample size of 16 persons is taken. The mean number of days spent on vacation by the persons in sample is found to be 9 days with a sample standard deviation is found to be 3 days.
T = (X̄ – μ) / S/√n
= (9 – 5)/(3/ √16)
= 5.33
At a level of significance of 0.05, the T-value for a two-tailed test comes out to be 2.131. Since the calculated T-value of 5.33 is much larger than the critical value of 2.131, the null hypothesis can be rejected. Thus, there is a statistically significant difference between sample mean and the population mean. You can use this T-value calculator to calculate the critical value of T for a given level of significance and degrees of freedom.
Another way to test is to calculate the p-value for getting the T-statistics of 5.33. You can use this P-value calculator to calculate p-value for a given T-value, degrees of freedom and the types of tail-test (one-tailed or two-tailed test). For a T-statistics of 5.33, the p-value came out to be 0.000042. This means that there is a probability of only 0.000042 to get this kind of sample given the null hypothesis holds good. As this value is less than 0.05, one can reject the null hypothesis given the evidence of current sample.
T-score / T-statistics for Estimating Population Mean
The population mean can be estimated as a function of the t-score using the following equation:
Population mean = Sample mean + T*(Standard error of the mean)
Where T is a statistic that has a T-distribution with known properties. The standard error of the mean (SE) is an estimate of the standard deviation of the sampling distribution of the t-statistic. The T-statistic can be used to calculate confidence intervals for population means given the sample size is small and the population standard deviation is unknown. When the population standard deviation is know, we use Z-statistics and Z-distribution instead of T-statistics.
The value of standard error of the mean can be calculated as :
SE of the mean = S/√n
Where, S is the standard deviation of the sample and n is the number of observations in the sample.
Summary
The one-sample t-test is a statistical test that can be used to determine whether there is a significant difference between the sample mean and the population mean. The t-test statistic (T) follows a t-distribution with n – 1 degrees of freedom, where n is the number of observations in the sample. T-statistics can be used to estimate the population mean when the population standard deviation is unknown. The t-test can be used to calculate confidence intervals for population means when the sample size is small and the population standard deviation is unknown.
Question: In the One-sample T-test example wouldn’t the hypotheses as stated denote a two-tailed test? Therefore the critical value would be 2.131
Null hypothesis, H0: There is no difference between the sample mean and the population mean. Thus H0 x-bar = u
Alternate hypothesis, Ha: There is a significant difference between the sample mean and the population mean. Thus H0 x-bar u
If the alternate hypothesis, Ha was stated differently such as: There is a significant positive difference between the sample mean and the population mean. Thus H0 x-bar > u; denoting a right hand one-tailed test then the critical value would be 1.75. [1] I will note that in either case the 5.33 value does exceed the critical values both the one-tailed and two-tailed.
Thanks, Dave
Source: [1] https://www.nipissingu.ca/sites/default/files/One-tailed-Test-or-Two-tailed-Test.pdf
Hi Dave,
You are correct in pointing out that the hypotheses mentioned in the example denote a two-tailed test, which tests for the possibility of the sample mean being significantly greater or less than the hypothesized population mean.
For a two-tailed test with α = 0.05 and 15 degrees of freedom (n-1), the critical t-value is approximately 2.131. This value will reject the null hypothesis if the calculated t-statistic is either less than -2.131 or greater than 2.131. Since 5.33 is greater than 2.131, we can reject the null hypothesis.
Made the appropriate changes.
Thank you
Hello. May I ask, the problem states that that the average number of days on vacation is more than or equal to 16, so shouldn’t that mean that µ≥5 is the null hypothesis while the alternative hypothesis is µ<5?
Please answer speedily. God bless and thanks!
As the claim is made about average number of days spent on vacation is greater than or equal to 5 days, we are talking about establishing a new truth such as µ≥5. The null hypothesis would rather be µ<5. Read my post on hypothesis testing for more details (https://vitalflux.com/data-science-how-to-formulate-hypothesis-for-hypothesis-testing/)
[…] test is a non-parametric test which is often seen as a cousin to the one-sample t-test, allows us to infer information about a whole population based on a small, paired sample. It is […]