// MODEL OPTIMIZATION AND PROMPT SYNTAX TERM
Adversarial Example
An input that has been slightly changed in a way that's almost undetectable to humans, but causes an AI model to make a wrong prediction.
TECHNICAL DEFINITION
An input to an AI model that has been intentionally perturbed with small, often imperceptible, modifications, causing the model to misclassify or produce an incorrect output, typically generated to expose model vulnerabilities.
BACKGROUND
Prompt injection is a cybersecurity exploit and an attack vector in which innocuous-looking inputs are designed to cause unintended behavior in machine learning models, particularly large language models (LLMs). The attack takes advantage of the model's inability to distinguish between developer-defined prompts and user inputs to bypass safeguards and influence model behaviour. While LLMs are designed to follow trusted instructions, they can be manipulated into carrying out unintended responses through carefully crafted inputs.
READ MORE ON WIKIPEDIASYNONYMS & ALIASES
- Adversarial perturbation
- crafted input
- model exploit
USAGE NOTE
Adversarial examples highlight the fragility of deep learning models and the need for robust AI.
DEVELOPERS
Organizations developing technology related to Adversarial Example.
Google AI and DeepMind conduct extensive research into the security and robustness of AI models, including developing techniques to create and defend against adversarial examples across various domains like computer vision and natural language processing.
OpenAI focuses on ensuring the safety and alignment of large language models. Their research includes understanding and mitigating adversarial examples, such as prompt injection attacks, to make their models more robust and reliable.
Microsoft Research actively investigates AI security, robustness, and interpretability. They develop tools and methods, like contributions to the Adversarial Robustness Toolbox (ART), to help detect and defend against adversarial examples in AI systems.
IBM Research is a leader in trustworthy AI, focusing on AI ethics, explainability, and security. They are a primary contributor to the open-source Adversarial Robustness Toolbox (ART), which helps developers evaluate and improve the robustness of AI models against adversarial attacks.
Anthropic is dedicated to AI safety and research, particularly for large language models. Their work involves deeply understanding potential vulnerabilities, including various forms of adversarial examples and prompt-based attacks, to build safer and more robust AI systems.
Meta AI (formerly Facebook AI Research - FAIR) conducts fundamental and applied AI research, including significant efforts in understanding and improving the robustness and security of AI models against adversarial examples in areas like computer vision and natural language processing.
Robust Intelligence is a company focused on AI testing and validation, offering platforms and tools to detect and prevent AI failures, including identifying and mitigating vulnerabilities to adversarial attacks across the AI lifecycle.