Fixed vs Random vs Mixed Effects Models – Examples

fixed and random effects models

Have you ever wondered what fixed effect, random effect and mixed effects models are? Or, more importantly, how they differ from one another?  In this post, you will learn about the concepts of fixed and random effects models along with when to use fixed effects models and when to go for fixed + random effects (mixed) models. The concepts will be explained with examples. As data scientists, you must get a good understanding of these concepts as it would help you build better linear models such as general linear mixed models or generalized linear mixed models (GLMM)

What are fixed, random & mixed effects models?

First, we will take a real-world example and try and understand fixed and random effects.

Let’s create a model for understanding the patients’ response to the Covid-19 vaccine when administered to multiple patients across different countries. You might be aware that as I am writing this post, there are several companies that are contending that their Covid-19 vaccine is most effective in terms of percentage of effectiveness. For example, Pfizer is claiming the effectiveness of the Covid-19 vaccine at 95%. Another company Moderna is also claiming 95% effectiveness. Another company AstraZeneca Oxford vaccine is claimed to be 90% effective. The percentage of effectiveness must have been determined based on some kind of model which estimates patients’ response to the Covid-19 vaccine. This can be a fixed-effects model or a mixed model combining fixed and random effects.

Mixed effect model = Fixed effect + Random effect

What are Fixed Effects Model?

Fixed effect models assume that the explanatory variable has a fixed or constant relationship with the response variable across all observations. Let’s understand how the patients’ response can be estimated using both the fixed-effects models and, the mixed model which combines both fixed and random effects. In this example given below, the patients’ response to the vaccine is modeled as the probability of the vaccinated person falling sick due to Covid-19. While creating the model, we may need to consider the effect of some of the following (as features):

  • Age-group of the person (Below 18, 18-30, 30-50,50-70, 70-90)
  • Gender of the person (Female, Male)
  • Whether the person is having prior health problems related to hypertension (blood pressure), diabetes (sugar) etc.
  • Country of the person

While training linear model with fixed effects due to above features, the model will look like the following:

\(\log(\frac{P}{1-P}) = \beta_0 + \beta_{age-group}*AgeGroup + \beta_{gender}*Gender + \beta_{bp}*BloodPressure + \beta_{db}*Diabetic + \beta_{country}*Country\)

.

\(\log(\frac{P}{1-P}) = \beta_0 + fixed effects\)

.

Note that all the features in the above model have pre-determined categories and the inferences (patients’ response) are made for the categories of the features used to train the model. This is why it is called as fixed effects model. Features used for training the model have only fixed / pre-determined categories and the patients’ response is based on one of these fixed categories effects. For example, the feature related to hypertension can only have two levels/categories. Either the person can have a hypertension problem or he/she does not have a hypertension problem. Even if the experiments are repeated multiple times, the feature related to hypertension will only have two categories in all experiments. Thus, the hypertension feature will be said to have a fixed effect and, thus, could become part of the fixed-effects model. The fixed-effect model can be used to estimate the patients’ response based on these features having fixed effects.

A fixed-effect model is often used in medical research when testing a new drug. In this example, we want to identify if the efficacy of the new drug differs from that of the current medication available on the market. We assume there is no difference between these two types of drugs and any observed differences are due to chance alone.

What are Random Effects Model?

A random-effects model assumes that explanatory variables have fixed relationships with the response variable across all observations, but that these fixed effects may vary from one observation to another. For example, let’s say you might be interested in studying how different levels of stress affect heart rate and blood pressure; here the assumption is that there is a fixed difference (i.e., slope) between each level of stress and its corresponding outcome (i.e., heart rate or blood pressure). However, this fixed difference can vary across individuals (e.g., some people might experience more stress than others when exposed to the same level of stress).

Let’s understand with the example shown in the previous section. One of the factors/features used in the fixed effect model is country. Is this appropriate to consider the country predictor variable as a fixed effect? There may be factors related to country/region which may result in different patients’ responses to the vaccine, and, not all countries are included in the study. The experiment if performed next time can include some other countries which were not included in the first experiment simply because vaccines were not tested in those countries. Essentially, we are talking about working with only a sample of countries from all countries. So treating the country as a random effect will allow us to incorporate the variability in the country effect that is due to picking a set of K countries out of all the countries or only a limited number of countries where the test has been performed.

The general idea is that the list of countries used for modeling is not fixed but was selected from a set of all the countries where the vaccine was tested. There could be more countries possible provided vaccines were tested there as well. This would have resulted in different patients’ responses to the vaccine. So treating the country as a random effect incorporates that type of variability into the model that we would not get from treating the country as the fixed effect. Thus, the model would look like the following where fixed effects for age, gender is considered and a random effect for the country is considered.

Log(Odds) = intercept + fixed effects + random effect

For random effects, what is estimated is the variance of the predictor variable and not the actual values. The above model can be called a mixed effect model. If the model has just random effects and no fixed effects used for training, the model can be termed a random-effects model.

What are mixed-effects models?

The mixed-effect model assumes that the explanatory variables have different relationships with the response variable within groups (e.g., subjects) but share the same fixed relationship across groups. For example, you could be studying the effect of taking an online course on academic performance in college students where there is a fixed difference between males and females but within each group (males or females) they share similar relationships with outcomes like grades.

When to go for fixed-effects model & mixed-effects models?

When the features/factors used in training the model have fixed levels/categories (such as gender, age group, etc), the apt model is a fixed-effects model. However, if one or more features/factors has only a limited set of levels/categories considered for training, and the model outcome is supposed to apply for all other levels/categories, this could be a random effect or mixed effect model.

The most fundamental difference between the fixed and random effects models is that of inference/prediction. A fixed-effects model supports prediction about only the levels/categories of features used for training. A random-effects model, by contrast, allows predicting something about the population from which the sample is drawn. There can be categories/levels of the features/factors which may not have been present in the sample. If the effect size related to the variance between the samples drawn is large enough, it can be fairly concluded that the population will exhibit that effect.

Fixed effects models are recommended when the fixed effect is of primary interest. Mixed-effects models are recommended when there is a fixed difference between groups but within-group homogeneity, or if the outcome variable follows a normal distribution and has constant variance across units. Finally, the random-effects models are appropriate for studies where it is not possible to identify which individuals belong to which subgroups (i.e., nesting is not possible).

If the fixed effect model is used on a random sample, one can’t use that model to make a prediction/inference on the data outside the sample data set. The fixed-effects model assumes that the individual-specific effect is correlated to the independent variable. The random-effects model allows making inferences on the population data based on the assumption of normal distribution. The random-effects model assumes that the individual-specific effects are uncorrelated with the independent variables.

References

Conclusions

Here is the summary of what you learned about the fixed and random effect models:

  • A fixed-effects model supports prediction about the only the levels / categories of features used for training.
  • If the fixed effect model is used on a random sample, one can’t use that model to make prediction / inference on the data outside the sample data set.
  • A random-effects model, by contrast, allows to predict something about the population from which the sample is drawn. There can be categories / levels of the features / factors which may not have been present in the sample.
  • Random effects model allows to make inference on the population data based on the assumption of normal distribution.

Ajitesh Kumar
Follow me

Ajitesh Kumar

I have been recently working in the area of Data Science and Machine Learning / Deep Learning. In addition, I am also passionate about various different technologies including programming languages such as Java/JEE, Javascript, Python, R, Julia etc and technologies such as Blockchain, mobile computing, cloud-native technologies, application security, cloud computing platforms, big data etc. I would love to connect with you on Linkedin.
Posted in Data Science, statistics. Tagged with .

Leave a Reply

Your email address will not be published. Required fields are marked *

Time limit is exhausted. Please reload the CAPTCHA.