In this post, you will learn about the definition of quality of AI / machine learning (ML) models. Getting a good understanding of what is the high and low quality of AI models would help you design quality control checks for testing machine learning models and related quality assurance (QA) practices. This post would be a good read for QA professionals in general. However, it would also help set perspectives for data scientists and machine learning experts.
The following are some of the key quality traits which are described in detail for assessing the quality of AI models:
-
Functional suitability
-
Maintainability
-
Usability
-
Efficiency
-
Security
-
Portability
When designing QA practice and related quality control checks, all of the above would need to be considered for testing purpose.
In this post, the following topics have been discussed:
-
Definition of Quality of Products/Services
-
Software Product Quality Attributes (ISO 25000)
-
Quality Assessment of Machine Learning Models
Definition of Quality of Products/Services
The quality of a product or service, in general, can be defined in terms of the following:
-
The degree of excellence in relation to the extent, the products and/or services meet the needs of the business stakeholders such as customers, investors, employees, suppliers, partners etc. In other words, the stakeholders’ satisfaction resulting from higher productivity and safety as a result of usage of products/services is a good measure of quality in terms of degree of excellence.
-
The degree of consistency and sustainability in relation to whether the products and/or services meet the needs of the business stakeholders over a longer period of time
-
The degree of continuous improvement adapted to whether the products and/or services are continuously improved to meet the customer needs in a sustained manner.
Delivering product and/or service of great quality to the stakeholders (such as customers) would help businesses not only achieve great results in relation to revenue/profits but also seize growth opportunities to get bigger and better in the market.
In order to achieve the above, the following are the key processes which need to be adopted:
-
Governance: Govern the ongoing quality of products and/or services and take appropriate action in case the quality deteriorates
-
Assurance: Make sure that the quality is maintained in a sustained/ongoing manner
-
Improvement: Make sure that the quality is improved over a period of time with the evolution of business and related stakeholders’ requirements.
Software Product Quality Attributes (ISO 25000)
The following are some of the criteria based on which software product quality is determined:
-
Functional suitability
-
Maintainability
-
Usability
-
Security
-
Efficiency
-
Reliability
-
Portability
The above criteria are specified by ISO 25000 series which is also termed as SQuaRe specifications. SQuaRe stands for System and Software Quality Requirements and Evaluation.
Quality Assessment of Machine Learning Models
Based on the above-mentioned criteria to assess the software quality, the following applies to the machine learning models:
-
Functional suitability: The models should have the following characteristics satisfying the functionality suitability criteria of quality assessment:
-
Completeness / Correctness: Models should take into account all of the features which contribute to the model prediction. It should make use of appropriate feature selection strategies (such a feature importance, wrapper methods etc) to make use of most important features. Quality control checks should consider validating some of the following:
-
Most appropriate features have been selected
-
Features importance over a period of time vis-a-vis the need to drop one or more existing features or include the new features
-
-
Accuracy: The model should have a very high performance based on precision/recall outcomes. Tests should be performed to check/track the model performance and raise the defect alert in case the performance deteriorates.
-
-
Maintainability: The models are easy to change and test. The key aspects of maintainability is changeability and testability.
-
Changeability: The model should be easy to change from some of the following perspectives:
-
The features of the models should be easy to change in the sense that new features could be chosen or extracted and existing features should be able to be dropped off based on the feature selection strategies such as wrapper, embedded methods etc. Upstream data dependencies should be considered while assessing the changeability aspect of the models (features).
-
-
Testability: The ML models are claimed to be non-testable given that the test oracles are not found to be present (and, thus, cannot be invoked) for ML models. Thus, ML models testability should be explored based on pseudo-oracles. The following represents some of the techniques for testing ML models based on pseudo-oracles:
-
Metamorphic testing where metamorphic relations based on one or more properties are used for testing input-output pairs.
-
Comparing outputs from models created using different algorithms
-
Comparing outputs on different data slices where data slices get created based on certain characteristics of the data.
-
-
-
Usability: The models are easy to understand and learn. This applies to understanding of input and output from the model, machine learning algorithm used to build the model, features of the model etc. The usability aspect could be tracked/monitored from time-to-time using manual reviews.
-
Efficiency: The models having the higher quality will tend to execute faster and take lesser resources than its counterpart. QA team should measure the time and resources required for the model execution in relation to each of the predictions.
-
Security: The following are some of the security-related aspects which need to be tested and monitored from time-to-time.
-
Data privacy across ML pipeline: Data flowing through ML pipeline consisting of stages such as data gathering, data exploration, data preparation, feature extraction, feature selection need to be access controlled to avoid unauthorized accesses to the data.
-
Data/Feature compliance: Many a time, data not authorized to be used as features leak into the model as a result of mixing up data set and creating a new feature. This needs to be monitored from time-to-time.
-
Data poisoning: There is a need to review data from time-to-time to avoid usage of adversary data as part of features.
-
-
Portability: The models are easy to install. In addition, they could be easily replaced with models leveraging another machine learning algorithms.
References
You may also want to check some of the following related articles published on QA / Testing and Machine Learning Systems:
Summary
In this post, the topic related to defining and assessing the quality of machine learning models got described. In case you liked the article, please share. Feel free to comment or suggest regarding the content of the article and help me provide greater details.
- Agentic Reasoning Design Patterns in AI: Examples - October 18, 2024
- LLMs for Adaptive Learning & Personalized Education - October 8, 2024
- Sparse Mixture of Experts (MoE) Models: Examples - October 6, 2024
I found it very helpful. However the differences are not too understandable for me