Data Science – 8 Steps to Multiple Regression Analysis

This article represents a list of steps and related details that one would want to follow when doing multiple regression analysis. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos.

Following are the key points described later in this article:

  • 8 Steps to Multiple Regression Analysis
  • Techniques used in Multiple regression analysis

 

8 Steps to Multiple Regression Analysis

Following is a list of 7 steps that could be used to perform multiple regression analysis

  1. Identify a list of potential variables/features; Both independent (predictor) and dependent (response)
  2. Gather data on the variables
  3. Check the relationship between each predictor variable and the response variable. This could be done using scatterplots and correlations.
  4. Check the relationship amoung the predictor variables. This could be done using scatterplots and correlations. It is also termed as multi-collinearity test.
  5. Try and analyze the simple linear regression between the predictor and response variable.
  6. Use the non-redundant predictor variables in the analysis. This is based on checking the multicollinearity between each of the predictor variables. If the correlation exists, one may want to one of these variable.
  7. Analyze one or more model based on some of the following criteria
    • t-statistics of one or more parameters: This is used to test the null hypothesis whether the parameter’s value is equal to zero.
    • p-value: This is used to test the null hypothesis whether there exists a relationship between the dependent and independent variable. Lesser the p-value, greater is the statistical significance of the parameter. This could, in turn, imply that there exists a relationship between the dependent and independent variable
    • f-value: Tests how fit is the model
    • R2 (R squared) or adjusted R2: Tests the fitness of the regression model
  8. Use the best fitting model to make prediction based on the predictor (independent variables). This is done based on the statistical analysis of some of the above mentioned statistics such as t-score, p-value, R squared, F-value etc.

 

Techniques used in Multiple Regression Analysis

Following are some of the key techniques that could be used for multiple regression analysis:

  • Scatterplots: Scatterplots could be used to visualize the relationship between two variables.
  • Correlation analysis (also includes multicollinearity test): Correlation tests could be used to find out following:
    • Whether the dependent and independent variables are related
    • Whether the independent variables are related among each other. This is also termed as multicollinearity.

    whether two variables are correlated or not.

  • Individual/group regressions:This is done to understand whether there exists a regression between the dependent variable and each independent variable given all the remaining independent variables parameter are equal to 0.

 

Ajitesh Kumar

Ajitesh Kumar

I have been recently working in the area of Data analytics including Data Science and Machine Learning / Deep Learning. I am also passionate about different technologies including programming languages such as Java/JEE, Javascript, Python, R, Julia, etc, and technologies such as Blockchain, mobile computing, cloud-native technologies, application security, cloud computing platforms, big data, etc. I would love to connect with you on Linkedin. Check out my latest book titled as First Principles Thinking: Building winning products using first principles thinking.
Posted in Big Data. Tagged with , , .