What is Data Poisoning & Input Manipulation?
In artificial intelligence (AI) systems, there is a growing focus on securing the integrity and quality of the data used for training and operation. Among the various threats to AI systems, is feeding the AI bad data at critical times of its life cycle. Adversaries can
- provide bad data during training, known as data poisoning, which causes the model to learn the wrong thing, and
- change input data, e.g., prompts, when the model is in use, known as input manipulation, which causes the model to do the wrong thing, provide bad answers, or release training data embedded in the AI.
These threats undermine the reliability ofAI models and lead to significant risks of bad performance.
Why does it matter?
Data poisoning and input manipulation are critical issues because they directly affect the quality and accuracy of the AI’s outputs. Malicious actors can exploit these vulnerabilities to degrade the performance of AI systems, introduce bias, or mislead decision-making processes.
What is Data Poisoning?
Data poisoning involves injecting harmful or misleading data into the training dataset of an AI system. The corrupted data can cause the AI to produce incorrect or biased outputs. Bad data can enter into the training data set either by actions of a malicious actor or by provided by bad sensors (e.g., caused by technical errors). For example, consider a spam filter trained to identify unwanted emails. An attacker deliberately injects a batch of spam emails into the training data, labeling them as "not spam." When the system trains on this corrupted data, it learns to misclassify genuine spam emails as "not spam," allowing more spam to slip through the filter and reach users' inboxes.
In some cases, data poisoning is used to target specific vulnerabilities in AI systems, aiming to degrade performance in a way that benefits the attacker. Once an AI system is trained on poisoned data, the effects can be long-lasting and difficult to repair without significant retraining.
Because data poisoning can result from both bad sensors and from malicious actions, it is important to have ways to identify and remove bad data to ensure the integrity of the resulting AI system.
What is Input Manipulation?
Input manipulation involves altering the input data fed to an AI system in real-time to mislead its decision-making process. This can have immediate and potentially harmful consequences. Attackers can craft specific inputs designed to deceive the AI system, causing it to make incorrect decisions. For instance, adding small, undetectable changes to an image can cause an image recognition system to misclassify it.
For example, imagine an AI system used for autonomous driving. This system relies on image recognition to identify traffic signs, such as stop signs. Attackers could manipulate the input data by placing small stickers or marks on a stop sign. These changes might be nearly invisible to human drivers but could be enough to cause the AI’s image recognition system to misclassify the stop sign as a yield sign or even a speed limit sign. As a result, the autonomous vehicle might fail to stop at the intersection, potentially causing accidents.
Manipulated input can also be used to exploit weaknesses in the AI’s processing capabilities. This can result in misleading queries or data input that can cause the AI to generate irrelevant or harmful responses. Thus, understanding input manipulation and knowing how to mitigate it are crucial.
The LeakPro project at AI Sweden focuses on identifying and mitigating instances of information leakage resulting from input manipulation in machine learning models.
Data poisoning and input manipulation represent significant threats to the security and reliability of AI systems. In the next newsletter, Generative AI Hallucinations will be introduced.
Contact us if you are a partner in AI Sweden and want to learn more or engage with AI Sweden in AI security.
→ Sign up here to receive updates from the AI Security Newsletter!
Related material
LeakPro: Leakage profiling and risk oversight for machine learning models
https://www.ai.se/en/project/leakpro-leakage-profiling-and-risk-oversight-machine-learning-models
Student projects at AI Sweden
Federated Fleet Learning: https://www.ai.se/en/project/federated-fleet-learning
Federated Learning In Banking: https://www.ai.se/en/project/federated-learning-banking
July 15, 2024 by Madeleine Xia