// MODEL OPTIMIZATION AND PROMPT SYNTAX TERM

Adversarial Example

An input that has been slightly changed in a way that's almost undetectable to humans, but causes an AI model to make a wrong prediction.

TECHNICAL DEFINITION

An input to an AI model that has been intentionally perturbed with small, often imperceptible, modifications, causing the model to misclassify or produce an incorrect output, typically generated to expose model vulnerabilities.

BACKGROUND

Prompt injection is a cybersecurity exploit and an attack vector in which innocuous-looking inputs are designed to cause unintended behavior in machine learning models, particularly large language models (LLMs). The attack takes advantage of the model's inability to distinguish between developer-defined prompts and user inputs to bypass safeguards and influence model behaviour. While LLMs are designed to follow trusted instructions, they can be manipulated into carrying out unintended responses through carefully crafted inputs.

SYNONYMS & ALIASES

Adversarial perturbation
crafted input
model exploit

USAGE NOTE

Adversarial examples highlight the fragility of deep learning models and the need for robust AI.

DEVELOPERS

Organizations developing technology related to Adversarial Example.

Google AI / DeepMind
Google AI and DeepMind conduct extensive research into the security and robustness of AI models, including developing techniques to create and defend against adversarial examples across various domains like computer vision and natural language processing.
OpenAI
OpenAI focuses on ensuring the safety and alignment of large language models. Their research includes understanding and mitigating adversarial examples, such as prompt injection attacks, to make their models more robust and reliable.
Microsoft Research
Microsoft Research actively investigates AI security, robustness, and interpretability. They develop tools and methods, like contributions to the Adversarial Robustness Toolbox (ART), to help detect and defend against adversarial examples in AI systems.
IBM Research
IBM Research is a leader in trustworthy AI, focusing on AI ethics, explainability, and security. They are a primary contributor to the open-source Adversarial Robustness Toolbox (ART), which helps developers evaluate and improve the robustness of AI models against adversarial attacks.
Anthropic
Anthropic is dedicated to AI safety and research, particularly for large language models. Their work involves deeply understanding potential vulnerabilities, including various forms of adversarial examples and prompt-based attacks, to build safer and more robust AI systems.
Meta AI
Meta AI (formerly Facebook AI Research - FAIR) conducts fundamental and applied AI research, including significant efforts in understanding and improving the robustness and security of AI models against adversarial examples in areas like computer vision and natural language processing.
Robust Intelligence
Robust Intelligence is a company focused on AI testing and validation, offering platforms and tools to detect and prevent AI failures, including identifying and mitigating vulnerabilities to adversarial attacks across the AI lifecycle.

RELATED TERMS IN AI ETHICS & SAFETY

BACK TO AI ENGINEERING & PROMPT DESIGN LEXICON

TECHNICAL DEFINITION

BACKGROUND

SYNONYMS & ALIASES

USAGE NOTE

DEVELOPERS

Google AI / DeepMind

OpenAI

Microsoft Research

IBM Research

Anthropic

Meta AI

Robust Intelligence

RELATED TERMS IN AI ETHICS & SAFETY