Fixed vs Random vs Mixed Effects Models – Examples

fixed and random effects models

In this post, you will learn about the concepts of fixed and random effects models along with when to use fixed effects models and when to go for fixed + random effects (mixed) models. The concepts will be explained with examples. As data scientists, you must get a good understanding of these concepts as it would help you build better linear models such as general linear mixed models or generalized linear mixed models (GLMM)

The following are some of the topics covered in this post:

  • What are fixed, random & mixed effects models?
  • When to use fixed effects vs mixed effects models?

What are fixed, random & mixed effects models?

First, we will take a real world example and try and understand fixed and random effects.

Let’s create a model for understanding the patients’ response to Covid-19 vaccine when administered to multiple patients across different countries. You might be aware that as I am writing this post, there are several companies which are contending that their Covid-19 vaccine is most effective in terms of percentage of effectiveness. For example, Pfizer is claiming the effectiveness of Covid-19 vaccine at 95%. Another company Moderna is also claiming 95% effectiveness. Another company Astrazeneca Oxford vaccine is claimed to be 90% effective. The percentage of effectiveness must have been determined based on some kind of model which estimates patients’ response to Covid-19 vaccine. This can be fixed effects model or a mixed model combining fixed and random effects.

Mixed effect model = Fixed effect + Random effect

What are Fixed Effects Model?

Let’s understand how the patients’ response can be estimated using both fixed effects model, and, mixed model which combines both fixed and the random effects. In this example given below, the patients’ response to the vaccine is modelled as the probability of the vaccinated person falling sick due to Covid-19. While creating the model, we may need to consider the effect of some of the following (as features):

  • Age-group of the person (Below 18, 18-30, 30-50,50-70, 70-90)
  • Gender of the person (Female, Male)
  • Whether the person is having prior health problems related to hypertension (blood pressure), diabetes (sugar) etc.
  • Country of the person

While training linear model with fixed effects due to above features, the model will look like the following:

\(\log(\frac{P}{1-P}) = \beta_0 + \beta_{age-group}*AgeGroup + \beta_{gender}*Gender + \beta_{bp}*BloodPressure + \beta_{db}*Diabetic + \beta_{country}*Country\)

.

\(\log(\frac{P}{1-P}) = \beta_0 + fixed effects\)

.

Note that all the features in the above model have pre-determined categories and the inferences (patients’ response) are made for the categories of the features used to train the model. This is why it is called as fixed effects models. Features used for training the model have only fixed / pre-determined categories and the patients’ response is based on one of these fixed categories effects. For example, the feature related to hypertension can only have two levels / categories. Either the person can have hypertension problem or he/she does not have hypertension problem. Even if the experiments are repeated multiple times, the feature related to hypertension will only have two categories in all experiments. Thus, the hypertension feature will said to have fixed effect and, thus, could become part of fixed effects model. The fixed effect model can be used to estimate the patients’ response based on this features having fixed effects.

What are Random Effects Model?

One of the factors / features used in the fixed effect model is country. Is this appropriate to consider the country predictor variable as fixed effect? There may be factors related to country / region which may result in different patients’ response to the vaccine, and, not all countries are included in the study. The experiment if performed next time can include some other countries which were not included in the first experiment simply because vaccines were not tested in those countries. Essentially, we are talking about working with only a sample of countries from all countries. So treating country as a random effect will allow us to incorporate the variability in the country effect that is due to picking a set of K countries out of all the countries or only a limited number of countries where test has been performed.

The general idea is that the list of countries used for modeling is not fixed but was selected from set of all the countries where vaccine was tested. There could be more countries possible provided vaccine were tested there as well. This would have resulted in different patients’ response to the vaccine. So treating country as a random effect incorporates that type of variability into the model that we would not get from treating the country as the fixed effect. Thus, the model would look like the following where fixed effects for age, gender is considered and random effect for country is considered.

Log(Odds) = intercept + fixed effects + random effect

For random effects, what is estimated is variance of the predictor variable and not the actual values. The above model can be called as mixed effects model. If the model has just random effects and no fixed effects used for training, the model can be termed as random effects model.

When to go for Fixed Effects Model & Mixed Models?

When the features / factors used in training the model has fixed levels / categories (such as gender, age group etc), the apt model is fixed effects model. However, if the one or more features / factors has only limited set of levels / categories considered for training, and the model outcome is supposed to apply for all other levels / categories, this could be random effect or mixed effect model.

The most fundamental difference between fixed and random effects model is that of inference / prediction. A fixed-effects model supports prediction about the only the levels / categories of features used for training. A random-effects model, by contrast, allows to predict something about the population from which the sample is drawn. There can be categories / levels of the features / factors which may not have been present in the sample. If the effect size related to the variance between the samples drawn is large enough, it can be fairly concluded that the population will exhibit that effect.

Baseline is that if the fixed effect model is used on a random sample, one can’t use that model to make prediction / inference on the data outside the sample data set. Fixed Effects model assumes that the individual specific effect is correlated to the independent variable. Random effects model allows to make inference on the population data based on the assumption of normal distribution. Random Effects model assumes that the individual specific effects are uncorrelated with the independent variables.

References

Conclusions

Here is the the summary of what you learned about the fixed and random effect models:

  • A fixed-effects model supports prediction about the only the levels / categories of features used for training.
  • If the fixed effect model is used on a random sample, one can’t use that model to make prediction / inference on the data outside the sample data set.
  • A random-effects model, by contrast, allows to predict something about the population from which the sample is drawn. There can be categories / levels of the features / factors which may not have been present in the sample.
  • Random effects model allows to make inference on the population data based on the assumption of normal distribution.

Ajitesh Kumar
Follow me

Ajitesh Kumar

I have been recently working in the area of Data Science and Machine Learning / Deep Learning. In addition, I am also passionate about various different technologies including programming languages such as Java/JEE, Javascript, Python, R, Julia etc and technologies such as Blockchain, mobile computing, cloud-native technologies, application security, cloud computing platforms, big data etc. I would love to connect with you on Linkedin.
Posted in Data Science, statistics. Tagged with .

Leave a Reply

Your email address will not be published. Required fields are marked *

Time limit is exhausted. Please reload the CAPTCHA.