Data Science

Spearman Correlation Coefficient: Formula, Examples

Have you ever wondered how you might determine the relationship between two sets of data that aren’t necessarily linear, or perhaps don’t adhere to the assumptions of other correlation measures? Enter the Spearman Rank Correlation Coefficient, a non-parametric statistic that offers robust insights into the monotonic relationship between two variables – perfect for dealing with ranked variables or exploring potential relationships in a new, exploratory dataset.

In this blog post, we will learn the concepts of Spearman correlation coefficient with the help of Python code examples. Understanding the concept can prove to be very helpful for data scientists. Whether you’re exploring associations in marketing data, results from a customer satisfaction survey, or anything in between, the Spearman correlation could be the key to unlocking the insights you need.

What’s Spearman Rank Correlation Coefficient?

The Spearman’s rank correlation coefficient, commonly referred to as Spearman’s rho, is a statistical measure of the strength and direction of the monotonic relationship between two ranked variables.

The Spearman Rank Correlation Coefficient, denoted as ρ (rho), is a non-parametric measure of statistical dependence between two variables. It assesses how well the relationship between two variables can be described using a monotonic function. In simpler terms, it measures the strength and direction of association between two ranked variables.

The following is the interpretation of value of Spearman coefficient:

  • A Spearman correlation of 1 indicates a perfect positive correlation between the rank variables, i.e., as one variable increases, the other does as well.
  • A Spearman correlation of -1 indicates a perfect negative correlation between the rank variables, i.e., as one variable increases, the other decreases.
  • A Spearman correlation of 0 suggests no correlation, i.e., there’s no pattern that as one variable changes, the other does as well.

Let’s illustrate this concept with an example.

Suppose we have a small group of five friends who took part in a 5km running race and a swimming race. We can rank their performance in both events from 1 (best) to 5 (worst):

FriendRunning Race RankSwimming Race Rank
A12
B23
C34
D41
E55

Now, we want to understand the correlation between their performances in the two races. Here, we can use the Spearman correlation coefficient.

To calculate the coefficient, we use the formula:

ρ = 1 – ( (6 * Σd²) / (n * (n² – 1)) )

Where:

  • d is the difference between the two ranks for each individual
  • n is the total number of observations (or friends in this case)

First, calculate the difference in ranks (d), square it (d²), and sum all the squared differences (Σd²).

FriendRunning Rank (R1)Swimming Rank (R2)d = R1 – R2
A12-11
B23-11
C34-11
D4139
E5500

Σd² = 1 + 1 + 1 + 9 + 0 = 12

Now, substitute Σd² and n in the formula:

ρ = 1 – ( (6 * Σd²) / (n * (n² – 1)) )
ρ = 1 – ( (6 * 12) / (5 * (5² – 1)) )
ρ = 1 – (72 / 120)
ρ = 1 – 0.6
ρ = 0.4

So, the Spearman correlation coefficient of the friends’ ranks in the races is 0.4, indicating a moderate positive correlation.

Remember, the value of ρ ranges from -1 to 1. A positive ρ indicates a positive correlation (as one variable increases, the other tends to increase), and a negative ρ indicates a negative correlation (as one variable increases, the other tends to decrease). A value close to 0 indicates no correlation.

Spearman Correlation Coefficient: Real-life Examples

The Spearman correlation coefficient is a non-parametric measure that’s useful in a variety of real-life scenarios. Here are a few examples:

  1. Education: Imagine you are a teacher and want to find out if there’s a relationship between the ranking of students based on their homework scores and their final exam scores. As these are ranked data (and perhaps non-linear), the Spearman correlation could be an appropriate choice.
  2. Market Research: If a company wants to understand the correlation between the rank order of a customer’s income level and the rank order of the amount they spend on a product, Spearman’s correlation could be used.
  3. Psychology: In a survey or questionnaire, a psychologist might ask patients to rank their level of stress from 1-10 and also their level of anxiety from 1-10. To understand if there’s a correlation between these ranked responses, the Spearman correlation coefficient could be used.
  4. Environmental Science: Let’s say a scientist wants to understand the relationship between the ranking of cities based on air quality index and the ranking of cities based on incidence of respiratory diseases. As the relationship may not be linear and the data are ranks, the Spearman correlation would be a good choice.
  5. Medical Field: If a medical researcher is looking at the effectiveness of a new drug, they might rank patients based on how severe their symptoms are before taking the drug and then again after several weeks of taking the drug. To see if there’s a correlation in these rankings (indicating that the drug might be more effective for more severely afflicted patients), they could use the Spearman correlation coefficient.

Spearman correlation is most useful when the data are ranks or when you don’t want to make any assumptions about a linear relationship or normal distribution for the available data.

Spearman Rank Correlation Coefficient Python Example

The following represents a simple Python code snippet using the scipy library’s spearmanr function, which calculates the Spearman rank correlation coefficient. The data from previous example is used. The method spearmanr is invoked with the two lists of ranks as arguments. The method returns the correlation coefficient and the p-value. We’re only interested in the coefficient, so we ignore the p-value by assigning it to _.

from scipy.stats import spearmanr

# Ranks of friends in running and swimming
running_rank = [1, 2, 3, 4, 5]
swimming_rank = [2, 3, 4, 1, 5]

# Calculate Spearman's correlation
spearman_corr, _ = spearmanr(running_rank, swimming_rank)

print(f"Spearman's correlation coefficient is: {spearman_corr}")

The correlation can also be seen as the following scatter plot:

Difference: Spearman Rank Correlation vs Pearson Correlation Coefficient

The following represents some of the key differences between Spearman rank correlation coefficient and Pearson correlation coefficient:

.Pearson Correlation CoefficientSpearman Correlation Coefficient
Type of DataContinuous, interval or ratio data, assumed to be normally distributedOrdinal, interval, or ratio data; does not require a normal distribution
AssumptionsAssumes linearity and homoscedasticity (equal variance) in the dataNo assumptions about the distribution and can handle non-linear relationships
MeasuresLinear relationshipsMonotonic (increasing or decreasing, but not necessarily at a constant rate) relationships
CalculationBased on actual data values and their meansBased on data ranks
Appropriate UsageWhen both variables are normally distributed and the relationship is linearWhen data does not meet the assumption of normality or the relationship is non-linear

Conclusion

The Spearman Rank Correlation Coefficient is a non-parametric measure that quantifies the degree of association between two ranked variables. This robust statistical tool provides an understanding of the monotonic relationship between these variables, allowing us to assess how well the relationship can be described using a monotonic function. Whether your data are ordinal or contain outliers, Spearman’s rank correlation has proven to be an effective measure, helping to glean insights from data that other correlation measures might miss.

The need for Spearman’s correlation arises primarily when dealing with non-linear relationships or when the data do not meet the assumptions required for Pearson’s correlation coefficient. This brings us to one of the key differences between the two: while Pearson’s correlation works with the actual data values and is most effective in determining linear relationships, Spearman’s correlation operates on the rank of data and efficiently uncovers monotonic relationships, whether linear or not.

If you would like to know about greater details, please feel free to reach out.

Ajitesh Kumar

I have been recently working in the area of Data analytics including Data Science and Machine Learning / Deep Learning. I am also passionate about different technologies including programming languages such as Java/JEE, Javascript, Python, R, Julia, etc, and technologies such as Blockchain, mobile computing, cloud-native technologies, application security, cloud computing platforms, big data, etc. I would love to connect with you on Linkedin. Check out my latest book titled as First Principles Thinking: Building winning products using first principles thinking.

Recent Posts

Agentic Reasoning Design Patterns in AI: Examples

In recent years, artificial intelligence (AI) has evolved to include more sophisticated and capable agents,…

1 month ago

LLMs for Adaptive Learning & Personalized Education

Adaptive learning helps in tailoring learning experiences to fit the unique needs of each student.…

1 month ago

Sparse Mixture of Experts (MoE) Models: Examples

With the increasing demand for more powerful machine learning (ML) systems that can handle diverse…

2 months ago

Anxiety Disorder Detection & Machine Learning Techniques

Anxiety is a common mental health condition that affects millions of people around the world.…

2 months ago

Confounder Features & Machine Learning Models: Examples

In machine learning, confounder features or variables can significantly affect the accuracy and validity of…

2 months ago

Credit Card Fraud Detection & Machine Learning

Last updated: 26 Sept, 2024 Credit card fraud detection is a major concern for credit…

2 months ago