In this post, you will learn about how metamorphic testing could be used for performing quality control checks/testing on machine learning models. The post is primarily meant for data science (QA) specialists to plan the test cases to test the machine learning (ML) model implementation from QA perspective.
Testing machine learning models from a quality assurance perspective is different from testing machine learning models for accuracy/performance.
The word “testing” is one of the conflicting technical nomenclatures given its usage by machine learning experts and software engineering community in general.
In this post, the following topics are discussed:
- Introduction to metamorphic testing
- Why metamorphic testing for machine learning models?
- Automated metamorphic testing of ML models
Introduction to Metamorphic Testing
Conventional software application testing assumes the presence of test oracle which represents the fact that the output of software application can be verified against the expected values by a tester or testing mechanisms such as automated tests. However, when the test oracle cannot be determined due to the absence of the same or complexity associated with the testing in terms of time and effort, there is the need for some kind of testing which does not assume or depend upon the notion of test oracle. This is where metamorphic testing comes into the picture.
Metamorphic testing allows making use of the properties of the application such that if the input associated with those properties are modified in a certain way, it should be possible to predict the new output, given the original output. This approach is what is known as metamorphic testing.
The testing compares the input and output to determine the correctness of the implementation. This is also termed as pseudo-oracle.
The key to implementing metamorphic testing is to determine or identify the metamorphic relations (MRs).
The metamorphic testing makes use of metamorphic relations for coming up with test plans (comprising of test cases) which can be used for testing the machine learning models. Metamorphic relations represent a set of properties that relate multiple pairs of inputs and outputs of the target program/application. Let’s try and understand the same with examples.
In the diagram given below, hypothetically speaking, a property (the feature of the model) such as age relates inputs with outputs of the ML system. The output of the ML model (Y) is the likelihood estimate of a person suffering from a disease and the input parameter is age (X). As the age increases and given other important features, the likelihood of a person suffering from the disease increases. Thus, there is established a metamorphic relation between input (age) and the output (likelihood estimate of suffering from the disease).
Given the example discussed in the above diagram, a set of test cases can be planned (test plan) in the following manner:
- Let’s say for age as X1 = 30, the output (likelihood estimate as Y1) is 0.35.
- Next test case: For age X2 = 50, the value of Y2 should be greater than Y1.
- Next test case: For age X3 = 40, the value of Y3 should be more than Y1 and less than Y2
- Next test case: For age X4 = 20, the value of Y4 should be less than Y1
For coming up with the metamorphic test plan for doing quality checks on ML models, the following would be needed:
- Data Scientists (QA) or Test Engineers need to work with product managers to understand the business problem which is solved using ML models
- Test engineers then would need to work with data scientists to understand the model details such as the type of learning, high-level details of the learning algorithm, features etc.
- Based on the above, test cases would need to be thought through and made part of the test plan. The same needs to be confirmed/reviewed by product managers and data scientists.
- The test plan could then be automated using some programming language or scripts.
Why Metamorphic Testing for Machine Learning Models?
Let’s try and understand as to why we need to perform metamorphic testing for doing quality checks on ML models.
In conventional QA testing, the software output is verified against the expected outcome which is known beforehand. This is also known as “oracle” or “test oracle”. The presence of test oracle in software application development is a frequently invoked assumption saying that that output of an implementation could be verified against expected values by testers or testing mechanisms such as automated test programs.
In the case of machine learning models, the assumption regarding the presence of the test oracle cannot hold true. Machine learning models can be categorized in the class of scientific software which are used to create answers or generate new answers (not known beforehand) based on a set of input values. As each of the output can be unique and cannot be verified against “so-called” expected values, there needs to be some mechanism to test the output of the ML models. However, when “testing” the model is spoken about, it is often referred with the scenario during the development (model building) phase when data scientists test the model performance by comparing the model outputs (predicted values) with the actual values. This is not same as testing the model for any input where the expected value is not known.
This is where metamorphic testing comes into the picture. Metamorphic testing allows creating the test plans comprising of test cases based on metamorphic relations which could be used to verify the correctness of the ML models. In the next section, let’s learn about what is metamorphic testing?
Automated Metamorphic Tests of ML Models
As described in the earlier section, once one or more metamorphic relations are determined, it becomes simpler to come up with test cases which could be used to verify the correctness of ML models predictions. These test cases could be made part of the automation based on the following:
- Independent test cases testing input-output pairs
- Logical newer test cases which are derived as a result of successful execution of last test cases; This is unlike the conventional testing where the successful test cases do not lead to new test cases. Rather test cases are determined beforehand prior to starting the test execution.
The automation could be achieved with scripting and programming languages. It could be run as part of continuous integration/delivery workflow steps (tools such as Jenkins) of build and deployment automation.
You may want to check some of the following related articles published on QA and Machine Learning:
- Why is machine learning systems non-testable?
- Testing features of ML models
- QA of ML models with PDCA cycle
In this post, you learned about applying metamorphic testing to assess the quality of machine learning models. In case you are a quality assurance (QA) professional or a test engineer, you could suggest and apply this technique to plan the test cases and test plan for doing quality control checks on machine learning models built in your organization. Please feel free to share your comments or suggest or share in case you liked the post. Would greatly appreciate it.