statistics

Z-Score Explained with Ronaldo / Robert Example

In this post, you will learn the concepts of Z-Score with the help from examples including Christiano Ronaldo and Robert Lewandowski. You will learn about how to compare and call out whose performance was better in Champions League 2019-2020. As a data scientist, it will be extremely important to learn the concepts of Z-Scores, also called as Standard scores, as it would help you evaluate / compare a particular data set with past data set.

Before getting into the example of Z-scores, lets understand some concepts of Z-scores.

What’s Z-Score or Z-statistics?

Z-score can be defined as number of standard deviations the data point is above or below the mean of the data points (be it sample or population). Z-score or Z-statistics allows to perform hypothesis testing for the following different scenarios given the population standard deviation is know:

  • Check whether the sample belongs to the population by compare sample mean from that of population mean (1 sample Z-test)
  • Check whether there is difference between two samples coming from two different populations (Two-samples Z-test for means)
  • Check whether the sample proportion is different from the population proportion (1 sample Z-test for proportion)
  • Check whether there is difference between two samples proportion (two samples Z-test for proportion)

Z-Score Explained with Example

In this section, we will learn about Z-score and their usage using the example related to evaluating the performance of Ronaldo and Robert in Champions League matches in 2019-2020.

Fig 1. Whose performance was better in the 2019-2020 Champions League?

In Champion’s league 2019-2020, here is the data related to their performance (ESPN.in).

PlayerNo. of Matches PlayedNo. of Goals ScoredAvg Goals / Matches
Christiano Ronaldo840.5
Robert Lewandowski10151.5
Table 1. Ronaldo / Robert performance in 2019-2020 Champion’s League

.

Well, the average goals/match indicates that Robert Lewandowski played much better than Christiano Ronaldo. However, can we conclude the same using statistical measures? How could we find out if they performed better than their own performance over the last 7-8 years? This is where Z-Score comes into the picture.

In the above evaluation, what is used to compare the performance is average goals / matches. Although the numbers say that Robert Lewandowski played much better than Ronaldo. But is this an actionable insight? Is this comparison enough for me to put my money on Robert for next year? Is he really playing better than his last performance?

Let’s take this hypothetical scenario. What if it is said that Robert’s statistics for average goals/matches in the last 7-8 years have been 1.2 goals/match with a standard deviation of 0.5 goals and Ronaldo’s statistics is 0.4 goals/match with a standard deviation of 0.2 goals. What would these numbers speak about their performance in 2019-2020 matches in comparison to their past performances? In order to evaluate the performance of each player in comparison to their past performances, what is needed is a standard way of evaluating their performance. This is where the concept of Z-Score comes into the picture. Z-score is also called a standard score.

In order to evaluate their performance in comparison to their past performances, Z-score or standard score is calculated. Z-score is a mechanism using which one can compare two different data sets having different means and standard deviations. Z-score or standard score will be used to evaluate both Ronaldo and Robert’s performance relative to their personal past performance. We can calculate the standard scores related to their performance in 2019-2020 and use the scores for the evaluation.

Let’s use the Z-score formula to calculate the standard score or z-score of Ronaldo and Robert for their performance in 2019-2020. As a first step, we will calculate the mean and standard deviations of their average goals/matches across different seasons for last 7-8 years or so. In the table below, CR represents Christiano Ronaldo, RL represents Robert Lewandowski, P represents No. of games played, and G represents No. of goals scored,

YearsCR_PCR_GCR_AvgRL_PRL_GRL_Avg
2018960.67881.0
201713151.151150.45
201613120.92980.89
201512161.331290.75
201412100.831260.50
201311171.55960.67
201212121.013100.77
Table 2. Average Goals / Matches across different seasons

From above, the mean and standard deviation of average goals/matches across different seasons for both Christiano Ronaldo and Robert Lewandowski can be calculated. I used Python Numpy to do the same. Here is the code:

cr_avg = np.array([0.67 ,1.15, 0.92, 1.33, 0.83, 1.55, 1.0])
rl_avg = np.array([1.0, 0.45, 0.89, 0.75, 0.50, 0.67, 0.77])
#
# Christiano Ronaldo Mean and Standard Deviation
#
print('Christiano Ronaldo: ', round(cr_avg.mean(), 2), round(cr_avg.std(), 2))
#
# Christiano Ronaldo Mean and Standard Deviation
#
print('Robert Lewandowski: ', round(rl_avg.mean(), 2), round(rl_avg.std(), 2))

The mean ([latex]\mu_{CR}[/latex]) and standard deviation ([latex]\sigma_{CR}[/latex]) for Christiano Ronaldo is 1.06 and 0.28.

The mean ([latex]\mu_{RL}[/latex]) and standard deviation ([latex]\sigma_{RL}[/latex]) for Robert Lewandowski is 0.72 and 0.18

From the above, it can be said that Ronaldo has a better goal scoring average across different seasons than Robert Lewandowski. That said, let’s see how they performed in comparison to their past performances.

Based on the above data, we will calculate the standard scores related to average goals/matches of the year 2019 for both Ronaldo and Robert and evaluate/compare their performance.

Z-score or Standard score for Ronaldo performance in 2019 can be calculated as the following. Note 0.5 is Ronaldo’s average goals/matches in 2019 shown in table 1.

Z-score or Standard score for Robert performance in 2019 can be calculated as the following. Note 1.5 is Robert’s average goals / matches in 2019 shown in table 1.

From above, it can be easily determined that Robert performance is way better than Ronaldo. Pictorially, the standard scores for Ronaldo and Robert when shown on standard normal distribution looks like the following:

Fig 3. Standard Scores for Ronaldo and Robert

Whose performance is better?

The standard score for Ronaldo for 2019 is -2.0, while the standard score for Robert L is 4.3. It is evident that after the scores are standardized, it can be concluded that the score for Robert L is higher.

This means that even though Ronaldo is found to have better average goals/matches scores across different seasons in the last 7 years, it is Robert Lewandowski performed a lot better in 2019 relative to his own past performances and Ronaldo performed a lot poor than his past performance.

Conclusions

Here is the summary of what you learned about the Z-score or standard score:

  • Z-score is a method used for standardizing the scores measured on different scales.
  • Z-score can be used to evaluate the performance of an entity in comparison to his / her past performance.

Ajitesh Kumar

I have been recently working in the area of Data analytics including Data Science and Machine Learning / Deep Learning. I am also passionate about different technologies including programming languages such as Java/JEE, Javascript, Python, R, Julia, etc, and technologies such as Blockchain, mobile computing, cloud-native technologies, application security, cloud computing platforms, big data, etc. I would love to connect with you on Linkedin. Check out my latest book titled as First Principles Thinking: Building winning products using first principles thinking.

View Comments

Recent Posts

Agentic Reasoning Design Patterns in AI: Examples

In recent years, artificial intelligence (AI) has evolved to include more sophisticated and capable agents,…

3 weeks ago

LLMs for Adaptive Learning & Personalized Education

Adaptive learning helps in tailoring learning experiences to fit the unique needs of each student.…

4 weeks ago

Sparse Mixture of Experts (MoE) Models: Examples

With the increasing demand for more powerful machine learning (ML) systems that can handle diverse…

1 month ago

Anxiety Disorder Detection & Machine Learning Techniques

Anxiety is a common mental health condition that affects millions of people around the world.…

1 month ago

Confounder Features & Machine Learning Models: Examples

In machine learning, confounder features or variables can significantly affect the accuracy and validity of…

1 month ago

Credit Card Fraud Detection & Machine Learning

Last updated: 26 Sept, 2024 Credit card fraud detection is a major concern for credit…

1 month ago