Data Science

Assessing Quality of AI Models from QA Standpoint

In this post, you will learn about the definition of quality of AI / machine learning (ML) models. Getting a good understanding of what is the high and low quality of AI models would help you design quality control checks for testing machine learning models and related quality assurance (QA) practices. This post would be a good read for QA professionals in general. However, it would also help set perspectives for data scientists and machine learning experts.

The following are some of the key quality traits which are described in detail for assessing the quality of AI models:

  • Functional suitability

  • Maintainability

  • Usability

  • Efficiency

  • Security

  • Portability

When designing QA practice and related quality control checks, all of the above would need to be considered for testing purpose.

In this post, the following topics have been discussed:

  • Definition of Quality of Products/Services

  • Software Product Quality Attributes (ISO 25000)

  • Quality Assessment of Machine Learning Models

Definition of Quality of Products/Services

The quality of a product or service, in general, can be defined in terms of the following:

  • The degree of excellence in relation to the extent, the products and/or services meet the needs of the business stakeholders such as customers, investors, employees, suppliers, partners etc. In other words, the stakeholders’ satisfaction resulting from higher productivity and safety as a result of usage of products/services is a good measure of quality in terms of degree of excellence.

  • The degree of consistency and sustainability in relation to whether the products and/or services meet the needs of the business stakeholders over a longer period of time

  • The degree of continuous improvement adapted to whether the products and/or services are continuously improved to meet the customer needs in a sustained manner.

Delivering product and/or service of great quality to the stakeholders (such as customers) would help businesses not only achieve great results in relation to revenue/profits but also seize growth opportunities to get bigger and better in the market.

In order to achieve the above, the following are the key processes which need to be adopted:

  • Governance: Govern the ongoing quality of products and/or services and take appropriate action in case the quality deteriorates

  • Assurance: Make sure that the quality is maintained in a sustained/ongoing manner

  • Improvement: Make sure that the quality is improved over a period of time with the evolution of business and related stakeholders’ requirements.

Software Product Quality Attributes (ISO 25000)

The following are some of the criteria based on which software product quality is determined:

  • Functional suitability

  • Maintainability

  • Usability

  • Security

  • Efficiency

  • Reliability

  • Portability

The above criteria are specified by ISO 25000 series which is also termed as SQuaRe specifications. SQuaRe stands for System and Software Quality Requirements and Evaluation.

Quality Assessment of Machine Learning Models

Based on the above-mentioned criteria to assess the software quality, the following applies to the machine learning models:

Fig 1. Quality Attributes of Machine Learning Models

  • Functional suitability: The models should have the following characteristics satisfying the functionality suitability criteria of quality assessment:

    • Completeness / Correctness: Models should take into account all of the features which contribute to the model prediction. It should make use of appropriate feature selection strategies (such a feature importance, wrapper methods etc) to make use of most important features. Quality control checks should consider validating some of the following:

      • Most appropriate features have been selected

      • Features importance over a period of time vis-a-vis the need to drop one or more existing features or include the new features

    • Accuracy: The model should have a very high performance based on precision/recall outcomes. Tests should be performed to check/track the model performance and raise the defect alert in case the performance deteriorates.

  • Maintainability: The models are easy to change and test. The key aspects of maintainability is changeability and testability.

    • Changeability: The model should be easy to change from some of the following perspectives:

      • The features of the models should be easy to change in the sense that new features could be chosen or extracted and existing features should be able to be dropped off based on the feature selection strategies such as wrapper, embedded methods etc. Upstream data dependencies should be considered while assessing the changeability aspect of the models (features).

    • Testability: The ML models are claimed to be non-testable given that the test oracles are not found to be present (and, thus, cannot be invoked) for ML models. Thus, ML models testability should be explored based on pseudo-oracles. The following represents some of the techniques for testing ML models based on pseudo-oracles:

      • Metamorphic testing where metamorphic relations based on one or more properties are used for testing input-output pairs.

      • Comparing outputs from models created using different algorithms

      • Comparing outputs on different data slices where data slices get created based on certain characteristics of the data.

  • Usability: The models are easy to understand and learn. This applies to understanding of input and output from the model, machine learning algorithm used to build the model, features of the model etc. The usability aspect could be tracked/monitored from time-to-time using manual reviews.

  • Efficiency: The models having the higher quality will tend to execute faster and take lesser resources than its counterpart. QA team should measure the time and resources required for the model execution in relation to each of the predictions.

  • Security: The following are some of the security-related aspects which need to be tested and monitored from time-to-time.

    • Data privacy across ML pipeline: Data flowing through ML pipeline consisting of stages such as data gathering, data exploration, data preparation, feature extraction, feature selection need to be access controlled to avoid unauthorized accesses to the data.

    • Data/Feature compliance: Many a time, data not authorized to be used as features leak into the model as a result of mixing up data set and creating a new feature. This needs to be monitored from time-to-time.

    • Data poisoning: There is a need to review data from time-to-time to avoid usage of adversary data as part of features.

  • Portability: The models are easy to install. In addition, they could be easily replaced with models leveraging another machine learning algorithms.

References

You may also want to check some of the following related articles published on QA / Testing and Machine Learning Systems:

Summary

In this post, the topic related to defining and assessing the quality of machine learning models got described. In case you liked the article, please share. Feel free to comment or suggest regarding the content of the article and help me provide greater details.

Ajitesh Kumar

I have been recently working in the area of Data analytics including Data Science and Machine Learning / Deep Learning. I am also passionate about different technologies including programming languages such as Java/JEE, Javascript, Python, R, Julia, etc, and technologies such as Blockchain, mobile computing, cloud-native technologies, application security, cloud computing platforms, big data, etc. I would love to connect with you on Linkedin. Check out my latest book titled as First Principles Thinking: Building winning products using first principles thinking.

Recent Posts

Agentic Reasoning Design Patterns in AI: Examples

In recent years, artificial intelligence (AI) has evolved to include more sophisticated and capable agents,…

1 month ago

LLMs for Adaptive Learning & Personalized Education

Adaptive learning helps in tailoring learning experiences to fit the unique needs of each student.…

2 months ago

Sparse Mixture of Experts (MoE) Models: Examples

With the increasing demand for more powerful machine learning (ML) systems that can handle diverse…

2 months ago

Anxiety Disorder Detection & Machine Learning Techniques

Anxiety is a common mental health condition that affects millions of people around the world.…

2 months ago

Confounder Features & Machine Learning Models: Examples

In machine learning, confounder features or variables can significantly affect the accuracy and validity of…

2 months ago

Credit Card Fraud Detection & Machine Learning

Last updated: 26 Sept, 2024 Credit card fraud detection is a major concern for credit…

2 months ago