In this post, I intend to present a perspective on the need for QA / testing team to test the feature relevance when testing the machine learning models as part of data science QA initiatives, and, different techniques which could be used to test or perform QA on feature relevance.
Feature relevance can also be termed as feature importance. Simply speaking, a feature is said to be relevant or important if it adds real predictive value to the underlying model. The relevant features must display a stable statistical relationship or association with the outcome variable. Well, an association does not imply a causation. However, a relevant feature or a feature with appropriate importance should be a part of the causal matrix which gives rise to the outcome. Read the related details on this page.
What we are saying is that the QA or testing team needs to test the feature relevance/importance from timetotime to make sure ML models complexities and performance could be managed well.
One of the key aspects of building machine learning model is determining features set which results in highperforming models. Once the model is built and deployed, it becomes much more important to test whether features stay relevant thereby impacting model performance in a positive manner. In other words, features consist of useful information for the problem. In case the features become redundant and cease to impact the model or increase the error rates, these features need to be removed or replaced with the new features.
The QA team would need to undertake the features relevance tests at least on a quarterly basis. The goal for testing feature relevance would be to achieve some of the following objectives:

Ensure that the features used in the model contain useful information for the problem.

In case there are features which are not found to contribute to the model performance, these features should be raised as the defect and filtered out from timetotime.
There are different techniques/approaches for testing the feature relevance visavis machine learning (ML) model from timetotime. The following are some of them:

Statistical approaches

Feature importance techniques
There are other feature selection techniques such as grid search which could also be applied for testing feature relevance. However, for now, we will focus on the ones which do not require much knowledge of machine learning.
Testing Feature Relevance – Statistical Approaches
Testing feature relevance using statistical approaches would require QA / test engineers to learn basic statistics fundamentals such as mean, mode, variance, probability distribution, correlation, chisquare tests.
The following are some of the statistic approaches which could be adopted for measuring feature relevance in relation to its impact on the model performance.

Correlation of feature variable with the outcome variable

Feature variance
Correlation of Feature with Outcome Variable
The features are selected on the basis of their scores in various statistical tests for their correlation with the outcome variable. The following table can be used to determine method which could be used for measuring the correlation between the feature variable and response variable.
Feature / Response Variable Type  Continuous  Categorical / Discrete 

Continuous  Pearson’s Correlation  Linear Discriminant Analysis (LDA) 
Categorical  Analysis of Variance (ANOVA)  Chisquare 
There are other techniques which could be used for testing the features’ impact to the model. For example, wrapper methods, embedded methods. However, to keep it simple, one could test the feature relationship with the outcome variable using correlation coefficients.
Test engineers would have to be trained with some of the following statistical tests to perform the testing:

Pearson’s correlation

LDA

ANOVA

Chisquare tests
These tests such as Pearson’s correlation and Chisquare tests could be done using the Excel spreadsheet. We will go into details in later articles.
Feature Variance
The features whose value remain the same or do not change much in different samples taken for hypothesis testing could be considered insignificant feature while building the models. Such features could also be termed as features with low variance.
Features with low variance below a certain threshold could as well be removed. The test engineers could write scripts to test the variance of features from timetotime and raise appropriate defect for removal of features.
In later articles, we will discuss different techniques in Python and R which could be used for removing features with low variance.
Testing Feature Relevance – Feature Importance Technique
A given set of features could be run through some of the following classifiers to test the feature importance. This technique is also called as embedded methods used for feature selection. Basically, the processes of feature selection and model training are completely merged. The training process used for building ML model generates a presumably relevant subset of features as a byproduct. The QA/test engineers should be trained to work with the following techniques:

Recursive partitioning treebased estimators such as random forest algorithm could be used to compute feature importance, which in turn can be used to discard irrelevant features.

The linear model with Lasso regularization
 Neural networks, SVM, Knearest neighbor etc.
One could get started with simplest of above such as treebased estimators, Lasso etc.
References
Summary
When starting on with QA or testing practices for predictive analytics or data science projects, testing feature relevance in relation to machine learning models is the key and must be considered. We have seen some of the techniques such as statistical approaches which could be taken for testing the feature relevance. In future posts, I would be presenting some code samples and related perspectives for you to get started quickly.
 Deep Learning Explained Simply in Layman Terms  September 17, 2020
 Tensor Broadcasting Explained with Examples  September 17, 2020
 Elbow Method vs Silhouette Score – Which is Better?  September 16, 2020