Given that the machine learning models are also a kind of conventional software application, the quality assurance principles applied to the conventional software development would or should also apply to build the machine learning models. In this post, you would learn about some of the important reasons as to why Quality Assurance (QA)is important to make sure that the machine learning models of only high quality are deployed in the production. Given that the machine learning models are said to be non-testable, it presents a set of challenges to do the quality control checks or perform testing of machine learning models from a quality assurance perspective. In this relation, I have been posting several other articles on doing Quality Assurance of machine learning models. Please feel free to go through some of these articles. Feel free to suggest or share your thoughts in this relation.
The following represents the different aspects of building machine learning (ML) models which need to be looked at when doing quality assurance testing:
- Data
- Features
- ML models
- ML pipeline
Key Reasons why QA is needed for Machine Learning Models
The following are some of the reasons why Quality Assurance (QA) would be required for performing quality control checks on machine learning models:
- Model Performance: Make sure that the overall performance of the model stays within the acceptable limits. For example, if the model performance at the time of deployment is 90% and the acceptable accuracy limits set by product management team is no less than 87%, the quality assurance team would check and ensure that the accuracy remains more than 87% and raise a flag in issue tracking system when finding otherwise.
- Model Trustability: Make sure that the model predictions are reliable or trustable enough; this is, in fact, one of the most important reasons why quality assurance will be needed to make sure machine learning models of an only high quality stay deployed in the production. This is more so important when the model is mission critical. For example, in healthcare or financial domain, every prediction is important and thus, there should be some way to check the trustability of every prediction.
- Solution/Model Reliability: Make sure the model is reliable in the sense that in case the model performance starts decreasing and fall below acceptable limits, an alternate model with higher accuracy is deployed in no time. Alternatively, the solution rolls back to pure heuristics based solution/model (rules) in case there are no alternate ML models.
- Model Efficiency: Make sure that the model is efficient enough from the perspective of execution time and the resources used for each execution.
- Model Fairness: Make sure that the model is fair enough by doing an analysis of the bias and variance. Ideally, the model should have low bias and low variance. The same needs to be tested across different samples taken from QA perspectives.
- Model Portability: Make sure that the model is easily installable and deployable in the production. In addition, the model should be able to be rolled back in an easy manner in case of issues such as degradation of model performance.
- Model Staleness: Make sure that the models which are deployed in the production are not stale. This may occur when data scientists fail to update the model in relation to algorithms or features and model performance starts deteriorating.
- Data Quality: Make sure that the models are not trained with adversary data set. This is also termed as data poisoning attack. This would require the analysis of data at regular intervals.
- Data/Features Compliance: Make make sure that the data used for building the features comply with the business rules and regulations. Many a time, the data prohibited for use in building models are not used. However, the derived features as a result of feature engineering could result in usage of prohibited data. And, this would need to be checked by the QA team.
- Features Importance: Make sure that the features which are used to build the models deployed in the production are still relevant or important enough. In case the feature importance has changed, raise the alert or defect in the bug system.
- Features Correlation Analysis: Make sure that only relevant features have been used to build the models by making use of univariate, bivariate and multivariate exploratory data analysis techniques.
References
You may want to check related posts such as the following in relation to testing machine learning models from QA perspective:
- Assessing the quality of AI models from QA standpoint
- Blackbox testing of machine learning models
- Why are machine learning models non-testable?
Summary
In this post, you learned about the need for setting up a quality assurance process for performing quality control checks on machine learning models and certify of suitable enough to be moved into the production. Out of all, the most important reasons why QA is needed for machine learning models is to ensure that the trustability of the model is high or beyond an acceptable limit. Feel free to comment or suggest or share your thoughts.
- What are AI Agents? How do they work? - January 7, 2025
- Agentic AI Design Patterns Examples - January 6, 2025
- List of Agentic AI Resources, Papers, Courses - January 5, 2025
I found it very helpful. However the differences are not too understandable for me