// MODEL OPTIMIZATION AND PROMPT SYNTAX TERM

Backdoor Attack

An attacker secretly embeds a hidden trigger into an AI model during training, so it behaves maliciously only when that specific trigger is present in the input.

TECHNICAL DEFINITION

A type of data poisoning attack where an adversary injects specific "trigger" patterns into a small fraction of training data, causing the trained AI model to exhibit a desired malicious behavior (e.g., misclassification) only when inputs contain that trigger, while performing normally otherwise.

BACKGROUND

AI safety is an interdisciplinary field focused on preventing accidents, misuse, or other harmful consequences arising from artificial intelligence systems. It encompasses AI alignment, monitoring AI systems for risks, and enhancing their robustness. The field is particularly concerned with existential risks posed by advanced AI models.

READ MORE ON WIKIPEDIA

SYNONYMS & ALIASES

  • Trojan attack
  • trigger attack
  • hidden vulnerability

USAGE NOTE

Backdoor attacks are particularly insidious because the model appears normal under most conditions.

DEVELOPERS

Organizations developing technology related to Backdoor Attack.

  • Robust Intelligence

    Develops the AI Firewall, a platform designed to protect machine learning models in real-time from various threats, including data poisoning which can be used to insert backdoors, and other model integrity attacks.

  • HiddenLayer

    An AI security company that provides a Machine Learning Detection & Response (MLDR) platform. Their technology is designed to detect and block adversarial attacks against AI models, including backdoor attacks implanted during training.

  • Adversa AI

    A research-focused company specializing in AI red teaming and security. They develop methods and conduct assessments to identify vulnerabilities in AI systems, including susceptibility to trojans and backdoor attacks.

  • Trail of Bits

    A cybersecurity research and consulting firm that offers AI/ML security services. They investigate and develop defenses against attacks on machine learning systems, including model trojaning and backdoor insertion.

  • MIT-IBM Watson AI Lab

    A collaborative research institute that has published significant work on detecting and defending against backdoor attacks. Their research includes developing techniques to identify and neutralize hidden triggers in neural networks.

  • Microsoft Research

    Conducts extensive research into Trustworthy AI. The organization developed Counterfit, an open-source tool for security risk assessment of AI systems, enabling developers to test models for vulnerabilities like backdoor attacks.

  • Bosch Center for Artificial Intelligence (BCAI)

    A corporate research organization that works on robust and trustworthy AI. BCAI has published research on developing certifiable defenses against backdoor attacks, focusing on creating models that are provably secure against such data poisoning manipulations.

RELATED TERMS IN AI ETHICS & SAFETY