In this post, you will learn about the concepts of **fixed and random effects models **along with **when to use fixed effects models **and **when to go for fixed + random effects (mixed) models. **The concepts will be explained with **examples. **As data scientists, you must get a good understanding of these concepts as it would help you build better linear models such as **general linear mixed models** or **generalized linear mixed models (GLMM)**.

The following are some of the topics covered in this post:

- What are fixed, random & mixed effects models?
- When to use fixed effects vs mixed effects models?

## What are fixed, random & mixed effects models?

First, we will take a real world example and try and understand fixed and random effects.

Let’s create a model for understanding the **patients’ response to Covid-19 vaccine **when administered to multiple patients across different countries. You might be aware that as I am writing this post, there are several companies which are contending that their **Covid-19 vaccine** is most effective in terms of percentage of effectiveness. For example, **Pfizer** is claiming the effectiveness of **Covid-19 vaccine** at 95%. Another company **Moderna** is also claiming 95% effectiveness. Another company **Astrazeneca Oxford **vaccine is claimed to be 90% effective. The percentage of effectiveness must have been determined based on **some kind of mode**l which **estimates** **patients’ response to Covid-19 vaccine**. This can be **fixed effects model** or a **mixed model combining fixed and random effects**.

**Mixed effect model = Fixed effect + Random effect**

### What are Fixed Effects Model?

Let’s understand how the patients’ response can be estimated using both fixed effects model, and, mixed model which combines both fixed and the random effects. In this example given below, the patients’ response to the vaccine is modelled as the probability of the vaccinated person falling sick due to Covid-19. While creating the model, we may need to consider the effect of some of the following (as features):

- Age-group of the person (Below 18, 18-30, 30-50,50-70, 70-90)
- Gender of the person (Female, Male)
- Whether the person is having prior health problems related to hypertension (blood pressure), diabetes (sugar) etc.
- Country of the person

While training linear model with fixed effects due to above features, the model will look like the following:

\(\log(\frac{P}{1-P}) = \beta_0 + \beta_{age-group}*AgeGroup + \beta_{gender}*Gender + \beta_{bp}*BloodPressure + \beta_{db}*Diabetic + \beta_{country}*Country\).

\(\log(\frac{P}{1-P}) = \beta_0 + fixed effects\).

Note that all the features in the above model have pre-determined categories and the inferences (patients’ response) are made for the categories of the features used to train the model. This is why it is called as **fixed effects models. **Features used for training the model have only **fixed / pre-determined **categories and the patients’ response is based on one of these fixed categories effects. For example, the feature related to hypertension can only have two levels / categories. Either the person can have hypertension problem or he/she does not have hypertension problem. Even if the experiments are repeated multiple times, the feature related to hypertension will only have two categories in all experiments. Thus, the hypertension feature will said to have **fixed effect **and, thus, could become part of **fixed effects model.** The fixed effect model can be used to estimate the patients’ response based on this features having fixed effects.

### What are Random Effects Model?

One of the factors / features used in the fixed effect model is country. **Is this appropriate to consider the country predictor variable as fixed effect?** **There may be factors related to country / region which may result in different patients’ response to the vaccine, and, not all countries are included in the study. ** The experiment if performed next time can include some other countries which were not included in the first experiment simply because vaccines were not tested in those countries. Essentially, we are talking about working with only a sample of countries from all countries. **So treating country as a random effect will allow us to incorporate the variability in the country effect that is due to picking a set of K countries out of all the countries or only a limited number of countries where test has been performed.**

The general idea is that the list of countries used for modeling is not fixed but was selected from set of all the countries where vaccine was tested. There could be more countries possible provided vaccine were tested there as well. This would have resulted in different patients’ response to the vaccine. So treating country as a random effect incorporates that type of variability into the model that we would not get from treating the country as the fixed effect. Thus, the model would look like the following where fixed effects for age, gender is considered and random effect for country is considered.

**Log(Odds) = intercept + fixed effects + random effect**

For random effects, what is estimated is variance of the predictor variable and not the actual values. The above model can be called as **mixed effects model.** If the model has just random effects and no fixed effects used for training, the model can be termed as **random effects model.**

## When to go for Fixed Effects Model & Mixed Models?

When the features / factors used in training the model has fixed levels / categories (such as gender, age group etc), the apt model is fixed effects model. However, if the one or more features / factors has only limited set of levels / categories considered for training, and the model outcome is supposed to apply for all other levels / categories, this could be random effect or mixed effect model.

**The most fundamental difference between fixed and random effects model is that of inference / prediction**. A fixed-effects model supports prediction about the only the levels / categories of features used for training. A random-effects model, by contrast, allows to predict something about the population from which the sample is drawn. There can be categories / levels of the features / factors which may not have been present in the sample. If the effect size related to the variance between the samples drawn is large enough, it can be fairly concluded that the population will exhibit that effect.

Baseline is that if the fixed effect model is used on a random sample, one can’t use that model to make prediction / inference on the data outside the sample data set. Fixed Effects model assumes that the individual specific effect is correlated to the independent variable. Random effects model allows to make inference on the population data based on the assumption of normal distribution. Random Effects model assumes that the individual specific effects are uncorrelated with the independent variables.

## References

- Difference between fixed & random effect and marginal model
- Concepts related to fixed and random effects models

## Conclusions

Here is the the summary of what you learned about the fixed and random effect models:

- A fixed-effects model supports prediction about the only the levels / categories of features used for training.
- If the fixed effect model is used on a random sample, one can’t use that model to make prediction / inference on the data outside the sample data set.
- A random-effects model, by contrast, allows to predict something about the population from which the sample is drawn. There can be categories / levels of the features / factors which may not have been present in the sample.
- Random effects model allows to make inference on the population data based on the assumption of normal distribution.

- First Principles Understanding based on Physics - April 13, 2021
- Precision & Recall Explained using Covid-19 Example - April 11, 2021
- Moving Average Method for Time-series forecasting - April 4, 2021

## Leave a Reply