// MODEL OPTIMIZATION AND PROMPT SYNTAX TERM

Data Poisoning

An attacker intentionally feeds bad or malicious data into an AI model's training set to make it perform poorly or behave in a specific way.

TECHNICAL DEFINITION

A security attack where an adversary injects malicious or corrupted data into an AI model's training dataset, aiming to degrade its performance, introduce vulnerabilities, or manipulate its behavior during inference.

BACKGROUND

Prompt injection is a cybersecurity exploit and an attack vector in which innocuous-looking inputs are designed to cause unintended behavior in machine learning models, particularly large language models (LLMs). The attack takes advantage of the model's inability to distinguish between developer-defined prompts and user inputs to bypass safeguards and influence model behaviour. While LLMs are designed to follow trusted instructions, they can be manipulated into carrying out unintended responses through carefully crafted inputs.

READ MORE ON WIKIPEDIA

SYNONYMS & ALIASES

  • Training data manipulation
  • adversarial data injection
  • integrity attack

USAGE NOTE

Data poisoning can lead to biased or unreliable AI models, making robust data validation essential.

RELATED TERMS IN AI ETHICS & SAFETY