Have you wondered around what would it be like to have your machine learning (ML) models come under security attack? In other words, your machine learning models get hacked. Have you thought through how to check/monitor security attacks on your AI models? As a data scientist/machine learning researcher, it would be good to know some of the scenarios related to security/hacking attacks on ML models.
In this post, you would learn about some of the following aspects related to security attacks (hacking) on machine learning models.
- Examples of Security Attacks on ML Models
- Hacking machine learning (ML) models means…?
- Different types of Security Attacks
- Monitoring security attacks
Examples of Security Attacks on ML Models
Most of the time, in my opinion, it is the classification models which would come under the security attacks. The following are some of the examples:
- Spam messages’ filters which are trained with adversary data sets to incorrectly classify the spam messages as good message leading to compromising of system’s integrity; Alternatively, the Spam messages’ filter trained inappropriately to block the good messages thereby compromising system’s availability.
- Classification models such as loan sanction models which could result in classifying the possible defaulter as good one thus leading to approval of loans which could later result in the business loss
- Classification models such as credit risk models which could be trained with adversary datasets to classify the business/buyer incorrectly to allow greater credit risk exposures.
- Classification models such as insurance which could be trained inappropriately to offer greater discounts to a section of users leading to business loss; Alternatively, models trained inappropriately to approve the insurance to those who are not qualified.
Hacking Machine Learning (ML) Models means…?
The machine learning model can be said to be hacked in the following scenarios:
- Compromising ML Models Integrity: In case the ML models fails to filter one or more negative cases, the ML model can be said to be hacked. This would mean that the integrity of the system gets compromised in the sense that negative cases sneak pass the system. Consider an example of the SPAM filter. Attackers could devise messages which consist of spam words but do not get caught by SPAM filter. This is achieved using exploratory testing. Once attackers get success in identifying the spam messages which bypasses the SPAM filter, he/she could send regular messages which result in the inclusion of such messages in retraining of the model at a later date. After the updated model is deployed, attackers could them run spam email campaign which would bypass the SPAM filter thereby compromising the SPAM filter integrity.
- Compromising ML Models Availability: In case the ML models start filtering out legitimate cases, the ML model can be said to be hacked. This would mean that the ML model is made unavailable. In order words, it is like denial of service (DOS) attack where legitimate cases fail to pass through the system thereby compromising the system availability. For example, let’s say the attacker/hacker sends some benign words with spam messages which get filtered by the SPAM filter. In future, the SPAM filter gets retrained with messages consisting of both SPAM and benign words which result in the filtering of a certain class of benign email as SPAM messages.
Different Types of Security Attacks
The following diagram represents the flow of security attacks (Threat Model) on the ML models:
The following are different types of security attacks which could be made on machine learning models:
- Exploratory attacks representing attackers trying to understand model predictions vis-a-vis input records. The primary goal of this attack which often would go unnoticed by the system is to understand that model behavior vis-a-vis features vis-a-vis features value. The attack could be related to targeting a specific class of input values or targeting all kind of input values. The attack is to understand the input records values which would result in compromising ML models integrity (false negatives) and devising input records which would result in compromising ML models availability (false positives).
- Exploratory attacks could be used to identify the model’s behavior in a specific class of data. This can also be termed as the targeted exploratory attack.
- Exploratory attacks could be used to identify the model’s behavior on all kinds of data. This can also be termed as the indiscriminate exploratory attack.
- Causative Attacks resulting in altering training data & related model: Based on exploratory attacks, the attacker could create appropriate input records pass through the system at regular intervals resulting into model either letting the bad records to sneak in or blocking good records to pass through. Alternatively, attackers could also hack into the system thereby altering the training data at large. The model when, later, gets trained with the training data, allows attackers to either compromise the systems integrity or availability as described in the above section.
- Integrity attacks compromising system’s integrity: With the model trained with attackers’ data allowing the bad inputs to pass through the system, the attacker could, on regular basis, compromise the systems’ integrity by having the system label bad input as good ones. This is as like system labeling the bad records as incorrectly label as negative which can also be called a false negative.
- Availability attacks compromising system’s availability: With the model trained with attackers’ data allowing the good inputs to get filtered through the system, the system would end up filtering out legitimate records false terming them as positive. This is similar to system labeling the good input record as positive which later turns out to be false positive.
Monitoring Security Attacks
The following are some thoughts regarding how one could go about monitoring security attacks:
- Review the training data at regular intervals to identify the adversary data sets lurking in the training dataset; This could be done with both product managers/business analyst and data scientist taking part in the data review.
- Review the test data and model predictions
- Perform random testing and review the outcomes/predictions. This could also be termed as security vulnerability testing. Product managers/business analysts should be involved in identifying the test datasets for security vulnerability testing.
References
Summary
In this post, you learned about different aspects of security attacks/hacking of machine learning models. Primarily, the machine learning models could be said to be compromised if it fails to label the bad input data accurately due to attackers/hackers exploring attacks scenario and hacking the training data sets appropriate to alter the ML model performance. Machine learning models’ integrity gets compromised if it lets bad data to pass through the system (false negative). Machine learning models’ availability gets compromised if it blocks or filters good data from passing through the system (false positive).
- What are AI Agents? How do they work? - January 7, 2025
- Agentic AI Design Patterns Examples - January 6, 2025
- List of Agentic AI Resources, Papers, Courses - January 5, 2025
I found it very helpful. However the differences are not too understandable for me