// MODEL OPTIMIZATION AND PROMPT SYNTAX TERM
Backdoor Attack
An attacker secretly embeds a hidden trigger into an AI model during training, so it behaves maliciously only when that specific trigger is present in the input.
TECHNICAL DEFINITION
A type of data poisoning attack where an adversary injects specific "trigger" patterns into a small fraction of training data, causing the trained AI model to exhibit a desired malicious behavior (e.g., misclassification) only when inputs contain that trigger, while performing normally otherwise.
BACKGROUND
AI safety is an interdisciplinary field focused on preventing accidents, misuse, or other harmful consequences arising from artificial intelligence systems. It encompasses AI alignment, monitoring AI systems for risks, and enhancing their robustness. The field is particularly concerned with existential risks posed by advanced AI models.
READ MORE ON WIKIPEDIASYNONYMS & ALIASES
- Trojan attack
- trigger attack
- hidden vulnerability
USAGE NOTE
Backdoor attacks are particularly insidious because the model appears normal under most conditions.
DEVELOPERS
Organizations developing technology related to Backdoor Attack.
Develops the AI Firewall, a platform designed to protect machine learning models in real-time from various threats, including data poisoning which can be used to insert backdoors, and other model integrity attacks.
An AI security company that provides a Machine Learning Detection & Response (MLDR) platform. Their technology is designed to detect and block adversarial attacks against AI models, including backdoor attacks implanted during training.
A research-focused company specializing in AI red teaming and security. They develop methods and conduct assessments to identify vulnerabilities in AI systems, including susceptibility to trojans and backdoor attacks.
A cybersecurity research and consulting firm that offers AI/ML security services. They investigate and develop defenses against attacks on machine learning systems, including model trojaning and backdoor insertion.
A collaborative research institute that has published significant work on detecting and defending against backdoor attacks. Their research includes developing techniques to identify and neutralize hidden triggers in neural networks.
Conducts extensive research into Trustworthy AI. The organization developed Counterfit, an open-source tool for security risk assessment of AI systems, enabling developers to test models for vulnerabilities like backdoor attacks.
A corporate research organization that works on robust and trustworthy AI. BCAI has published research on developing certifiable defenses against backdoor attacks, focusing on creating models that are provably secure against such data poisoning manipulations.