In linear regression, the t-test is a statistical hypothesis testing technique that is used to test the linearity of the relationship between the response variable and different predictor variables. In other words, it is used to determine whether or not there is a linear correlation between the response and predictor variables. The t-test helps to determine if this linear relationship is statistically significant. As data scientists, it is of utmost importance to understand why t-statistics is used to determine the coefficients of the linear regression model. In this blog, we will discuss linear regression and t-test and related formulas and examples.
What is linear regression?
Linear regression is defined as a linear relationship between the response variable and predictor variables. In other words, it is a statistical technique that is used to determine if there is a linear correlation between the response and predictor variables. A linear regression equation can also be called the linear regression model. It can as well be called the statistical linear model.
The linear regression line can be represented by the equation such as the following:
Where Y represents the response variable or dependent variable, X represents the predictor variable or independent variable, m represents the linear slope and b represents the linear intercept. The linear slope, m, can also be termed as the coefficient of the predictor variable. The diagram below represents the linear regression line, dependent (response) and independent (predictor) variables.
Linear regression is of two different types such as the following:
- Simple linear regression: Simple linear regression is defined as linear regression with a single predictor variable. An example of a simple linear regression is Y = mX + b.
- Multiple linear regression: Multiple linear regression is defined as linear regression with more than one predictor variable along with its coefficients. An example of multiple linear regression is Y = aX + bZ.
Why is a t-test used in the linear regression model?
The linearity of the linear relationship can be determined by calculating the t-test statistic. The t-test statistic helps to determine how linear, or nonlinear, this linear relationship is. The linear regression model is used to predict the value of a continuous variable, based on the value of another continuous variable. In most cases, linear regression is an excellent tool for prediction. However, in some instances, the linearity of the linear relationship may not be appropriate. This can be determined by examining the t-test statistic.
In a simple linear regression model such as Y = mX + b, the t-test statistics are used to determine the following hypothesis:
- H0: m = 0
- Ha: m ≠ 0
The slope or the coefficient of the predictor variable, m = 0 represents the hypothesis that there is no relationship between the predictor variable and the response variable. Assuming that the null hypothesis is true, the linear regression line will be parallel to X-axis such as the following, given Y-axis represents the response variable and the X-axis represent the predictor variable. The following diagram represents the null hypothesis:
A one-sample t-test will be used in linear regression to test the null hypothesis that the slope or the coefficients of the predictor variables is equal to zero. This test is used when the linear regression line is a straight line.
The formula for the one-sample t-test statistic in linear regression is as follows:
t = (m – m0) / SE
t is the t-test statistic
m is the linear slope or the coefficient value obtained using the least square method
m0 is the hypothesized value of linear slope or the coefficient of the predictor variable. The value of m0 = 0.
SE represents the standard error of estimation which can be estimated using the following formula:
SE = S / √N
Where S represents the standard deviation and N represents the total number of data points
Linear regression is a linear relationship between the response variable and predictor variables. It can be used to predict the value of a continuous variable, based on the value of another continuous variable. The t-test statistic helps to determine the correlation between the response and the predictor variables. A one-sample t-test will be used in linear regression to test the null hypothesis that the slope or the coefficient is equal to zero. In the case of the multiple regression model, the null hypothesis is that the coefficient of each of the predictor variables is equal to zero.