AI

When not to use F-Statistics for Multi-linear Regression

In this post, you will learn about the scenario in which you may NOT want to use F-Statistics for doing the hypothesis testing on whether there is a relationship between response and predictor variables in the multilinear regression model. Multilinear regression is a machine learning / statistical learning method which is used to predict the quantitative response variable and also understand/infer the relationship between the response and multiple predictor variables. We will look into the following topics:

  • Background
  • When not to use F-Statistics for Multilinear Regression Model

Background

F-statistics is used in hypothesis testing for determining whether there is a relationship between response and predictor variables in multilinear regression models. Let’s consider the following multilinear regression model:

[latex]Y = \beta_0 + \beta_1X_1 + \beta_2X_2 + \beta_3X_3 + … + \beta_pX_p + \epsilon[/latex]

In the above equation, Y is the response variable, [latex]\beta_0, …, \beta_p[/latex] are coefficients and [latex]\epsilon[/latex] is the error term.

The null hypothesis can be stated as the following:

[latex]H_0: \beta_1 = \beta_2 = … = \beta_p = 0[/latex]

The alternate hypothesis can be stated as the following:

At least one of the coefficients, [latex]\beta_j[/latex] is not equal to zero

In order to reject or fail to reject the above mentioned null hypothesis, F-Statistics is used. The following represents the formula for F-Statistics:

F Value = [latex]\frac{\frac{(TSS – RSS)}{p}}{\frac{RSS}{N – P – 1}}[/latex]

In the above equation, TSS is total sum of squares [latex](Y – \bar{Y})^2[/latex], RSS is Residual sum of Squares [latex](Y – \hat{Y})^2[/latex], N is number of observations and P is number of parameters.

Based on the above, the value of F-statistics could be calculated and the related p-value could, then, be calculated. In case, the value of p-value is less than 0.05, one could reject the null hypothesis. This essentially means that there is a relationship between response and one or more predictor variables and the multilinear regression model holds good.

However, the question arises as to whether F-statistics could always be used?

When not to use F-Statistics for Multilinear Regression Model

The F-statistics could be used to establish the relationship between response and predictor variables in a multilinear regression model when the value of P (number of parameters) is relatively small, small enough compared to N.

However, when the number of parameters (features) is larger than N (the number of observations), it would be difficult to fit the regression model. Thus, F-statistics could not be used.

Summary

F-statistics could be used to perform hypothesis testing of whether there is a relationship between response and the predictor variables in a multilinear regression model. If the number of parameters (features) is smaller in comparison to the number of observations, one could go about using F-statistics to perform hypothesis testing. However, in case, the number of parameters is much larger than the number of observations, F-statistics could not be used as one won’t be able to fit a multilinear regression model in the first place.

 

 

 

Ajitesh Kumar

I have been recently working in the area of Data analytics including Data Science and Machine Learning / Deep Learning. I am also passionate about different technologies including programming languages such as Java/JEE, Javascript, Python, R, Julia, etc, and technologies such as Blockchain, mobile computing, cloud-native technologies, application security, cloud computing platforms, big data, etc. I would love to connect with you on Linkedin. Check out my latest book titled as First Principles Thinking: Building winning products using first principles thinking.

Recent Posts

Agentic Reasoning Design Patterns in AI: Examples

In recent years, artificial intelligence (AI) has evolved to include more sophisticated and capable agents,…

1 month ago

LLMs for Adaptive Learning & Personalized Education

Adaptive learning helps in tailoring learning experiences to fit the unique needs of each student.…

1 month ago

Sparse Mixture of Experts (MoE) Models: Examples

With the increasing demand for more powerful machine learning (ML) systems that can handle diverse…

2 months ago

Anxiety Disorder Detection & Machine Learning Techniques

Anxiety is a common mental health condition that affects millions of people around the world.…

2 months ago

Confounder Features & Machine Learning Models: Examples

In machine learning, confounder features or variables can significantly affect the accuracy and validity of…

2 months ago

Credit Card Fraud Detection & Machine Learning

Last updated: 26 Sept, 2024 Credit card fraud detection is a major concern for credit…

2 months ago