Data Science

Fixed vs Random vs Mixed Effects Models – Examples

Have you ever wondered what fixed effect, random effect and mixed effects models are? Or, more importantly, how they differ from one another?  In this post, you will learn about the concepts of fixed and random effects models along with when to use fixed effects models and when to go for fixed + random effects (mixed) models. The concepts will be explained with examples. As data scientists, you must get a good understanding of these concepts as it would help you build better linear models such as general linear mixed models or generalized linear mixed models (GLMM)

What are fixed, random & mixed effects models?

First, we will take a real-world example and try and understand fixed and random effects.

Let’s create a model for understanding the patients’ response to the Covid-19 vaccine when administered to multiple patients across different countries. You might be aware that as I am writing this post, there are several companies that are contending that their Covid-19 vaccine is most effective in terms of percentage of effectiveness. For example, Pfizer is claiming the effectiveness of the Covid-19 vaccine at 95%. Another company Moderna is also claiming 95% effectiveness. Another company AstraZeneca Oxford vaccine is claimed to be 90% effective. The percentage of effectiveness must have been determined based on some kind of model which estimates patients’ response to the Covid-19 vaccine. This can be a fixed-effects model or a mixed model combining fixed and random effects.

Mixed effect model = Fixed effect + Random effect

What are Fixed Effects Models?

Fixed effect models assume that the explanatory variable has a fixed or constant relationship with the response variable across all observations. Lets understand it with a simple example. Let’s say we want to study how exercise affects a person’s weight. We have a group of 10 people, and we measure their weight before and after they start an exercise program. If we use a fixed effect model, we would assume that each person’s weight is always affected the same way by exercise. So if one person lost 5 pounds after the exercise program, we would assume that they would always lose 5 pounds if they did the program again in the future. This can be a helpful way to analyze data because it allows us to compare the effects of different factors on the outcome. For example, we could compare the weight loss of people who did the exercise program to people who didn’t, and see if there was a difference.

Let’s understand the fixed effects model with Covid example where we will understand as to how the patients’ response can be estimated using both the fixed-effects models and, the mixed model which combines both fixed and random effects. In this example given below, the patients’ response to the vaccine is modeled as the probability of the vaccinated person falling sick due to Covid-19. While creating the model, we may need to consider the effect of some of the following (as features):

  • Age-group of the person (Below 18, 18-30, 30-50,50-70, 70-90)
  • Gender of the person (Female, Male)
  • Whether the person is having prior health problems related to hypertension (blood pressure), diabetes (sugar) etc.
  • Country of the person

While training linear model with fixed effects due to above features, the model will look like the following:

[latex] \log(\frac{P}{1-P}) = \beta_0 + \beta_{age-group}*AgeGroup + \beta_{gender}*Gender + \beta_{bp}*BloodPressure + \beta_{db}*Diabetic + \beta_{country}*Country[/latex]

.

[latex]\log(\frac{P}{1-P}) = \beta_0 + fixed effects[/latex]

.

Note that all the features in the above model have pre-determined categories and the inferences (patients’ response) are made for the categories of the features used to train the model. This is why it is called as fixed effects model. Features used for training the model have only fixed / pre-determined categories and the patients’ response is based on one of these fixed categories effects. For example, the feature related to hypertension can only have two levels/categories. Either the person can have a hypertension problem or he/she does not have a hypertension problem. Even if the experiments are repeated multiple times, the feature related to hypertension will only have two categories in all experiments. Thus, the hypertension feature can be said to have a fixed effect and, thus, could become part of the fixed-effects model. The fixed-effect model can be used to estimate the patients’ response based on these features having fixed effects.

A fixed-effect model is often used in medical research when testing a new drug. In this example, we want to identify if the efficacy of the new drug differs from that of the current medication available on the market. We assume there is no difference between these two types of drugs and any observed differences are due to chance alone.

What are Random Effects Models?

A random effects model is a way of analyzing data that takes into account the fact that some factors affecting the outcome may vary randomly across individuals or groups. For example, let’s say we’re interested in understanding how much a person’s height affects their weight. We have data on 100 people, and we want to see how much their height affects their weight. If we use a random effects model, we would assume that there are some factors that affect weight that vary randomly across individuals. For example, some people may have a higher metabolism or be more active, which would affect their weight differently than someone with a lower metabolism or who is less active. We can account for these random factors by including a random effect in our model. This allows us to better estimate the effect of height on weight by taking into account the random variation across individuals.

So why is it called a random effects model? The word “random” refers to the fact that some of the factors that affect the outcome vary randomly across individuals or groups. By including a random effect in our model, we can better estimate the effect of the factor we’re interested in by accounting for the random variation across individuals.

Let’s understand with the example shown in the previous section. One of the factors/features used in the fixed effect model is country. Is this appropriate to consider the country predictor variable as a fixed effect? There may be factors related to country/region which may result in different patients’ responses to the vaccine, and, not all countries are included in the study. If we use a fixed effects model, we would assume that each country has a constant effect that doesn’t change over time. However, this assumption may not be true in reality. In addition, the experiment if performed next time can include some other countries which were not included in the first experiment simply because vaccines were not tested in those countries. Essentially, we are talking about working with only a sample of countries from all countries. So treating the country as a random effect will allow us to incorporate the variability in the country effect that is due to picking a set of K countries out of all the countries or only a limited number of countries where the test has been performed., and, also difference between countries’ scenarios.

The general idea is that the list of countries used for modeling is not fixed but was selected from a set of all the countries where the vaccine was tested. There could be more countries possible provided vaccines were tested there as well. This would have resulted in different patients’ responses to the vaccine. So treating the country as a random effect incorporates that type of variability into the model that we would not get from treating the country as the fixed effect. Thus, the model would look like the following where fixed effects for age, gender is considered and a random effect for the country is considered.

Log(Odds) = intercept + fixed effects + random effect

For random effects, what is estimated is the variance of the predictor variable and not the actual values. The above model can be called a mixed effect model. If the model has just random effects and no fixed effects used for training, the model can be termed a random-effects model.

What are Mixed-effects Models?

A mixed effects model is a type of regression model that combines both fixed and random effects. Mixed effects models are useful when there is variation in the effect of a factor across groups or individuals, but some of the variation is systematic (i.e., can be explained by specific variables) and some is random (i.e., cannot be explained by specific variables).

In a mixed effects model, the fixed effects are used to capture the systematic variation, while the random effects are used to capture the random variation. The fixed effects represent the effects of variables that are assumed to have a constant effect on the outcome variable, while the random effects represent the effects of variables that have a varying effect on the outcome variable across groups or individuals. For example, you could be studying the effect of taking an online course on academic performance in college students where there is a fixed difference between males and females but within each group (males or females) they share similar relationships with outcomes like grades.

When to go for fixed-effects model & mixed-effects models?

When the features/factors used in training the model have fixed levels/categories (such as gender, age group, etc), the apt model is a fixed-effects model. However, if one or more features/factors has only a limited set of levels/categories considered for training, and the model outcome is supposed to apply for all other levels/categories, this could be a random effect or mixed effect model.

The most fundamental difference between the fixed and random effects models is that of inference/prediction. A fixed-effects model supports prediction about only the levels/categories of features used for training. A random-effects model, by contrast, allows predicting something about the population from which the sample is drawn. There can be categories/levels of the features/factors which may not have been present in the sample. If the effect size related to the variance between the samples drawn is large enough, it can be fairly concluded that the population will exhibit that effect.

Fixed effects models are recommended when the fixed effect is of primary interest. Mixed-effects models are recommended when there is a fixed difference between groups but within-group homogeneity, or if the outcome variable follows a normal distribution and has constant variance across units. Finally, the random-effects models are appropriate for studies where it is not possible to identify which individuals belong to which subgroups (i.e., nesting is not possible).

If the fixed effect model is used on a random sample, one can’t use that model to make a prediction/inference on the data outside the sample data set. The fixed-effects model assumes that the individual-specific effect is correlated to the independent variable. The random-effects model allows making inferences on the population data based on the assumption of normal distribution. The random-effects model assumes that the individual-specific effects are uncorrelated with the independent variables.

Compared to fixed and random effects models, mixed effects models offer several advantages. They allow for the inclusion of both fixed and random effects in a single model, which can improve the accuracy of the model and the estimation of the effects of variables. Additionally, mixed effects models can handle unbalanced data (i.e., data in which not all groups or individuals have the same number of observations) more easily than fixed or random effects models.

References

Conclusions

Here is the summary of what you learned about the fixed and random effect models:

  • A fixed-effects model supports prediction about the only the levels / categories of features used for training.
  • If the fixed effect model is used on a random sample, one can’t use that model to make prediction / inference on the data outside the sample data set.
  • A random-effects model, by contrast, allows to predict something about the population from which the sample is drawn. There can be categories / levels of the features / factors which may not have been present in the sample.
  • Random effects model allows to make inference on the population data based on the assumption of normal distribution.

Ajitesh Kumar

I have been recently working in the area of Data analytics including Data Science and Machine Learning / Deep Learning. I am also passionate about different technologies including programming languages such as Java/JEE, Javascript, Python, R, Julia, etc, and technologies such as Blockchain, mobile computing, cloud-native technologies, application security, cloud computing platforms, big data, etc. I would love to connect with you on Linkedin. Check out my latest book titled as First Principles Thinking: Building winning products using first principles thinking.

View Comments

  • am working on modelling and bootstrapping analytic of panel data with homogeneity and serial correlation of error term.
    what should be my focused pleas
    my aim and oblective

  • How do I incorporate the random effects in two groups of donkeys whose body temperature were recorded before trekking and after trekking in a hot environment? Student's t-test was used for the analysis. Regression analysis was also carried out to understand the relationship between environmental parameters and body temperature.

Recent Posts

Agentic Reasoning Design Patterns in AI: Examples

In recent years, artificial intelligence (AI) has evolved to include more sophisticated and capable agents,…

3 weeks ago

LLMs for Adaptive Learning & Personalized Education

Adaptive learning helps in tailoring learning experiences to fit the unique needs of each student.…

4 weeks ago

Sparse Mixture of Experts (MoE) Models: Examples

With the increasing demand for more powerful machine learning (ML) systems that can handle diverse…

1 month ago

Anxiety Disorder Detection & Machine Learning Techniques

Anxiety is a common mental health condition that affects millions of people around the world.…

1 month ago

Confounder Features & Machine Learning Models: Examples

In machine learning, confounder features or variables can significantly affect the accuracy and validity of…

1 month ago

Credit Card Fraud Detection & Machine Learning

Last updated: 26 Sept, 2024 Credit card fraud detection is a major concern for credit…

1 month ago