Last updated: 18th Nov, 2023
Statistical tests are an important part of data analysis. They help us understand the data and make inferences about the population. They are used to examine relationships between variables based on hypothesis testing. They are a way of analyzing data to see if there is a significant difference between the two groups or a group and population. In statistics, there are two main types of tests: parametric and non-parametric. Both types of tests are used to make inferences about a population based on a sample. The difference between the two types of tests lies in the assumptions that they make about the data. Parametric tests make certain assumptions about the data, while non-parametric tests do not make any assumptions about the data. In this blog post, we will discuss the different types of statistical tests and related concepts with the help of examples. As a data scientist, you must get a good understanding of different types of statistical tests.
Statistical tests can also be classified based on their application in quantitative or qualitative research. This classification hinges primarily on the nature of the data being analyzed: quantitative research deals with numerical data, while qualitative research often involves non-numerical data. Statistical tests used in qualitative research, particularly when dealing with categorical data, are essential for uncovering relationships and associations between different qualitative or categorical variables.
Parametric Statistical Tests & Types: Concepts, Examples
Parametric statistical tests are a group of statistical tests that make certain assumptions about the data. These tests are used to make inferences about a population based on a sample. The main assumption that these tests make is that the data is normally distributed. This means that the data follows a specific pattern where the values are evenly spread out around the mean. There are several different parametric statistical tests, including t-tests, ANOVA, and Pearson’s correlation. The following is the high-level detail of these parametric tests:
- Independent t-tests: An independent t-test is a statistical test used to determine whether the means of two groups are statistically different from each other. This test is often used when the data in each group are supplied by different people or when the groups are randomly assigned. The independent t-test is a parametric test, meaning that it requires that the data be normally distributed. The benefits of using an independent t-test include that it is relatively easy to use and has high statistical power. Let’s understand individual t-tests with an example. For example, a researcher might be interested in comparing the average reading scores of two groups of students – one group that is taking a course in English literature and one group that is taking a course in math. In this case, the researcher would use an independent t-test to compare the average reading scores of the two groups. The independent t-test allows for the comparison of two groups of unequal sizes. However, the independent t-test is limited to the comparison of two groups and cannot be used to compare more than two groups.
- Paired t-tests: The paired t-test is a statistical test that is used to compare the means of two groups. The groups are usually matched or paired together in some way. For example, you might have a group of people who receive a new treatment and a group of people who receive a placebo treatment. The two groups are then compared to see if there is a difference in the mean scores. The paired t-test is also used to compare the pre-treatment and post-treatment scores of a single group of people.
- ANOVA tests: ANOVA tests are a type of statistical test that is used to compare the means of more than two groups. There are several different types of ANOVA tests, including one-way ANOVA, two-way ANOVA, and repeated measures ANOVA. Each type of ANOVA test is used to compare different combinations of groups. The benefits of using an ANOVA test include that it is relatively easy to use and has high statistical power. Let’s understand with an example of where a one-way ANOVA test can be used. One real-world example of the one-way ANOVA in action is a study that can be conducted to compare the GRE scores of students from different income levels and find whether there are significant differences between the means of the three groups. One possible outcome of the tests can be that the students from families with higher incomes tended to score higher on the GRE than students from families with lower incomes. This study can be used to assess and examine inequalities in society.
- MANOVA tests: MANOVA is a statistical test that is used to determine whether or not there are significant differences between two or more group means. It is similar to ANOVA, but it can be used with more than one dependent variable. MANOVA is a powerful statistical tool that can be used to examine the relationships between multiple dependent variables and a single independent variable. It can also be used to examine the relationships between multiple dependent variables and multiple independent variables. MANOVA is an important statistical test that should be used when investigating the relationships between multiple variables.
- F-test: The F-test is a statistical test that is used to determine whether or not there is a significant difference between the variance of two or more groups.
- Z-test: The Z-test is a statistical test that is used to determine the statistical significance of a difference between two groups. It is most commonly used when the groups are small. The Z-test is based on the standard normal distribution, which is a statistical model that assumes that all observations are drawn from a population that has a normal distribution. This test is used to determine whether the difference between the means of the two groups is statistically significant.
- Correlation test (Pearson’s): Correlation tests are statistical tests that assess the strength of the relationship between two variables. The most common type of correlation test is Pearson’s Correlation Coefficient, which measures the linear relationship between two variables. Correlation tests are used in a variety of fields, including psychology, sociology, and economics. Correlation tests can be used to study the cause-and-effect relationship between two variables or to predict future behavior based on past behavior. For example, a correlation test could be used to determine if there is a relationship between IQ and income. Correlation tests are also used to predict future events. For example, a correlation test could be used to predict the likelihood of a person getting divorced based on their age and education level.
Non-Parametric Statistical Tests & Types: Concepts, Examples
Non-parametric tests do not make any assumptions about the data. They can be used with data that is not normally distributed and with data that does not have equal variances. Non-parametric statistical tests are used when the assumptions of parametric statistical tests are not met, or when the data are not normally distributed. Some examples of non-parametric statistical tests include the Wilcoxon rank-sum test, the Kruskal-Wallis test. etc. Statisticians have developed many different non-parametric statistical tests, each with its own advantages and disadvantages. When choosing a non-parametric statistical test, it is important to consider the specific research question and the type of data that are available. The following is a brief introduction to different types of non-parametric tests:
- Wilcoxon rank-sum test: The Wilcoxon rank-sum test is a statistical test used to compare the difference between two groups of data. It is often used when the data is not normally distributed. The test works by ranking the data from both groups, and then summing the ranks for each group. The difference between the two sums is then compared to a table of values to determine whether or not there is a significant difference between the two groups. The Wilcoxon rank-sum test is a powerful statistical tool that can be used to compare data sets of all sizes. Wilcoxon rank-sum test is also known as the Mann-Whitney U test.
- Kruskal-Wallis H test: The Kruskal-Wallis H test is a statistical test that can be used to compare the means of two or more groups. It is similar to the ANOVA, but it is more robust and can be used when the assumptions of the ANOVA are not met. The Kruskal-Wallis test is also known as a non-parametric ANOVA, or analysis of variance. The Kruskal-Wallis test is used when the assumptions of the parametric ANOVA test are not met. The Kruskal-Wallis test can be used with either continuous or categorical data. To run the Kruskal-Wallis test, the data must be in the form of ranks. The Kruskal-Wallis test is based on the ranks of the data, not the actual values. When using categorical data, the Kruskal-Wallis test is often used to determine if there are significant differences between the means of the groups. When using quantitative data, the Kruskal-Wallis test can be used to determine if there are significant differences between the distributions of the groups.
- Chi-square test of independence: Chi-square test of independence is a statistical test used to determine whether two variables are independent. It is a non-parametric test, meaning that it does not make assumptions about the distributions of the variables. The chi-square test is used to calculate a statistic called the chi-square statistic. This statistic is then compared to a critical value to determine whether the two variables are independent. If the chi-square statistic is greater than the critical value, then the two variables are considered to be dependent. Chi-square test of independence can be used to test for independence in a variety of situations, including comparing proportions, testing for association, and testing for goodness of fit.
- The Friedman Test: The Friedman test is a non-parametric statistical test used to compare more than two groups of data. The test is used when the data are not normally distributed and when the groups are related to each other, such as in a repeated measures design. The test is based on the ranks of the data, rather than the actual values.
- The Cochran’s Q Test: The Cochran’s Q test is a non-parametric statistical test used to compare more than two groups of data. The test is used when the data are not normally distributed and when the groups are independent of each other.
- The Jonckheere-Terpstra Test: The Jonckheere-Terpstra test is a rank-based non-parametric statistical test used to compare more than two groups of data. The test is used when the data are not normally distributed and when the groups are ordered, such as in an experiment with treatments that are administered in increasing order of intensity.
Statistical Tests in Quantitative Research: Examples
Quantitative research involves the collection and analysis of numerical data. Most statistical tests, especially parametric tests, are used in quantitative research due to the numerical nature of the data. The ones listed below and discussed in the previous sections can be used for quantitative research:
- One-Sample, Independent Two-Sample, Paired
- One-Way, Two-Way, Repeated Measures
- Linear Regression:
- Simple, Multiple
- Pearson’s Correlation Coefficient, Spearman’s Rank Correlation Coefficient
- Mann-Whitney U Test
- Wilcoxon Signed-Rank Test
- Kruskal-Wallis Test
Statistical Tests in Qualitative Research: Examples
The following methods and tests are integral in qualitative research for analyzing categorical data. They help in understanding the relationships and associations between different categories, which is essential in fields like medicine, social science, biology and psychology, where categorical variables are frequently encountered. The choice of test depends on the nature of the data, the size of the sample, and the specific research questions being addressed.
- Chi-Squared (χ2) Test of Association:
- This is a primary test used to determine if there is a significant association between two categorical variables.
- It’s applicable when data is presented in a contingency table format, where frequencies or counts of occurrences in each category are compared.
- The χ2 test evaluates whether the distribution of sample categorical data matches an expected distribution.
- Modifications for Small Samples:
- When dealing with small sample sizes, modifications to the χ2 test are necessary.
- Fisher’s Exact Test is often used as an alternative in these scenarios, especially when the sample size is too small for the χ2 test to be reliable.
- Test for Trend:
- This test is relevant when at least one of the variables is ordinal (i.e., the categories have a natural order, like age groups).
- It assesses if there’s a trend or consistent pattern across categories of an ordinal variable.
- Risk Measurement:
- Involves calculating odds ratios and risk ratios.
- These measures are crucial in understanding the likelihood or risk of a certain event occurring in one group compared to another.
- Confidence Intervals for Proportions and Differences Between Proportions:
- This method involves calculating the confidence intervals to understand the range within which the true proportion or difference in proportions lies, with a certain level of confidence.
- Matched Samples Consideration:
- McNemar’s test is particularly useful in matched pair studies, where participants are paired in a way that controls for an extraneous variable.
- It’s used for dichotomous (binary) outcomes in paired samples to determine if there are differences in the paired proportions.
- Yates’ Correction:
- This is a correction applied to the χ2 test to adjust for continuity when dealing with small sample sizes.
- It’s typically used when the total sample size is small and the data is distributed in a 2×2 contingency table.
In conclusion, there are two main types of statistical tests: parametric and non-parametric. Parametric tests make certain assumptions about the data, while non-parametric tests do not make any assumptions about the data. Both types of tests are used to make inferences about a population based on a sample. The choice of which type of test to use depends on the type of data that is available.