Z-score, also known as the standard score or Z-statistics, is a powerful statistical concept that plays a vital role in the world of data science. It provides a standardized method for comparing data points from different distributions, allowing data scientists to better understand and interpret the relative positioning of individual data points within a dataset.
Z-scores represent a statistical technique of measuring the deviation of data from the mean. It is also used with Z-test which is a hypothesis testing statistical technique (one sample Z-test or two samples Z-test). As a data scientist, it is of utmost importance to be well-versed with the z-score formula and its various applications. Having great clarity on the concept of Z-score and/or Z-statistics will help you use the correct formula for calculation in the appropriate cases. In this blog post, we will discuss the concept of Z-score, concepts, formula, and examples.
The Z-score formula differs when considering the sample data, or, when considering sampling distributions with an end goal of whether finding the deviation from mean, or, performing hypothesis testing respectively.
Additionally, z-score at different confidence intervals can be used to estimate the population mean based on a given sample or difference in the population means based on two different samples.
When considering a sample of data, Z-score is used to measure the number of standard deviations by which the data points in the sample differ from the mean. Alternatively, when defined for population, Z-score can be used to measure the number of standard deviations by which the data points differ from the population mean. This is also called as standard score. It is denoted using z and calculated as:
Z = (x-x̄)/σ
where,
x is an observation in the sample
x̄ is the mean of the observations in the sample
σ is the standard deviation of the observations in the sample
Let’s take an example to understand z-score calculation better. Suppose, the mean of data points in a sample is 90 and the standard deviation is 30. The observation X = 45 will have Z-scores as follows:
z= (45 – 90)/30 = [-45]/30 = -1.5
Observation 45 is -1.5 standard deviation away from the mean 90.
The process of converting raw observations into Z-score is also called as standardization or normalization. When the mean and standard deviation of a data set are known, it is easy to convert them into Z-score for that particular sample or population. The figure below represents different values of Z-scores. Note that Z = +1 represents that the observation is 1 standard deviation away from mean in the positive direction. In the same way, Z=-1 represents that the observation is 1 standard deviation away from mean in the negative direction.
When considering the sampling distribution, Z-score or Z-statistics is defined as the number of standard deviations between the sample mean and the population mean (mean of the sampling distribution). Note that the sampling distribution is used in the hypothesis testing technique known as Z-test. Recall that the sampling distribution is defined as the distribution of all the possible samples that could be drawn from a population. Z-test for sampling distribution is used to determine whether the sample mean is statistically different from the population mean. The value of Z-statistics is used to determine whether to reject the null hypothesis or otherwise. The value of Z-statistics is compared with the critical value of Z-statistics which is determined from the Z-table. If the value of Z-statistics falls in the rejection region, it indicates that there is sufficient evidence to reject the null hypothesis. The formula for z-statistics in Z-test is the following:
Z = (x̄-μ)/SE
where,
x̄ is the sample mean
μ is the mean of the observations in the population
The population mean is denoted by μ and the sample mean is denoted by x̄. SE is standard error (SE) of the sample mean. The standard error of the sample mean can be calculated as the following:
SE = σ/√n
Where standard deviation of the sampling distribution is denoted as σ and the sample size is n.
Let’s take an example to understand z-score calculation better with sampling distribution (which will be used in the hypothesis testing technique known as Z-test). Suppose, a random sample of 100 observations was taken from a population having mean μ = 70 and standard error (SE) of the mean is 15. The mean of the sample is 85. The z-score of the sample mean is calculated as follows:
z = (x̄ -μ)/SE = [(85 – 70)]/15 = 1.0
It means that the sample mean x̄ is 1 standard deviation away from the mean of the sampling distribution.
Z-score or Z-statistics can be used to perform hypothesis testing for the following scenarios:
The value of Z-score can be used to estimate the population mean / proportion at different confidence intervals as a function of sample mean or proportion. The following formula represents estimating population mean as a function of sample mean and population proportion as a function of sample proportion.
Population mean = sample mean +- MarginOfError
Population proportion = sample proportion +- MarginOfError
MarginOfError can be calculated as the following:
MarginOfError = Z* x StandardError
Standard error can be calculated as the following:
StandardError = StandardDeviationOfSamplingDistribution / SQRT (SampleSize)
Z* represents the value of Z at different confidence interval. For a two-tailed test, the value of Z at 95% interval is +- 1.96.
For 95% confidence interval, the population mean will lie in the following range:
Sample mean +- 1.96*StandardError
Similarly, the population proportion will lie in the following range:
Sample proportion +- 1.96*StandardError
Z-scores are an essential tool in the world of statistics and data analysis. They can help you measure how far or close your observations (or data points) are from the mean, as well as tell you whether to reject a null hypothesis in case of hypothesis testing. Z-tests for sampling distributions involve calculating z-statistics and comparing it with critical values determined by standard normal distribution or Z-tables. Here are some key points learned in this blog:
In recent years, artificial intelligence (AI) has evolved to include more sophisticated and capable agents,…
Adaptive learning helps in tailoring learning experiences to fit the unique needs of each student.…
With the increasing demand for more powerful machine learning (ML) systems that can handle diverse…
Anxiety is a common mental health condition that affects millions of people around the world.…
In machine learning, confounder features or variables can significantly affect the accuracy and validity of…
Last updated: 26 Sept, 2024 Credit card fraud detection is a major concern for credit…