// MODEL OPTIMIZATION AND PROMPT SYNTAX TERM

Prompt Hacking

Using clever or manipulative prompts to make an AI system behave in unintended ways, often for malicious purposes.

TECHNICAL DEFINITION

Prompt hacking encompasses various techniques, including prompt injection, jailbreaking, and data leakage, where users exploit vulnerabilities in prompt design or model architecture to manipulate an LLM's behavior, extract sensitive information, or bypass safety filters.

BACKGROUND

Prompt injection is a cybersecurity exploit and an attack vector in which innocuous-looking inputs are designed to cause unintended behavior in machine learning models, particularly large language models (LLMs). The attack takes advantage of the model's inability to distinguish between developer-defined prompts and user inputs to bypass safeguards and influence model behaviour. While LLMs are designed to follow trusted instructions, they can be manipulated into carrying out unintended responses through carefully crafted inputs.

READ MORE ON WIKIPEDIA

SYNONYMS & ALIASES

  • Prompt injection
  • Prompt manipulation
  • Prompt exploitation
  • Adversarial prompting

USAGE NOTE

Prompt hacking is a significant security concern for applications built on large language models.

DEVELOPERS

Organizations developing technology related to Prompt Hacking.

  • OpenAI

    A leading AI research and deployment company that develops large language models (LLMs) and actively researches and implements safety measures, including red-teaming and defenses against prompt injection and other prompt hacking techniques.

  • Anthropic

    Known for its focus on AI safety and developing 'Constitutional AI' and models like Claude, which are designed to be harmless, helpful, and honest, directly addressing prompt hacking and model alignment challenges.

  • Google DeepMind

    A prominent AI research lab that conducts extensive research into AI safety, security, and the robustness of large language models, including understanding and mitigating adversarial prompting and prompt hacking.

  • Microsoft

    Integrates LLMs into numerous products and has dedicated research teams (e.g., Microsoft Research) focused on AI safety, security, and developing best practices for prompt engineering and defenses against prompt-based attacks.

  • Lakera

    Provides an AI security platform, Lakera Guard, specifically designed to detect and prevent prompt injections, data exfiltration, jailbreaks, and other adversarial attacks against large language models in real-time.

  • Adversa AI

    Specializes in AI security, offering solutions for identifying and mitigating adversarial attacks, including various forms of prompt hacking and prompt injection vulnerabilities in AI systems and LLMs.

  • Scale AI

    Offers data annotation and validation services, including human-powered red-teaming for LLMs, which involves actively testing models with adversarial prompts to identify and patch prompt hacking vulnerabilities.

  • Arthur AI

    Provides an MLOps platform for model monitoring and performance management, including capabilities to detect anomalous inputs, data drift, and potential prompt injections to ensure the safe and reliable operation of LLMs.

RELATED TERMS IN AI ETHICS & SAFETY