// MODEL OPTIMIZATION AND PROMPT SYNTAX TERM

Corrigibility

The ability of an AI system to allow humans to correct its behavior or shut it down if it's not performing as intended.

TECHNICAL DEFINITION

Corrigibility denotes an AI system's capacity to accept and incorporate human corrections, modifications, or shutdown commands, even if doing so conflicts with its primary objective function, ensuring human oversight and control over autonomous agents.

BACKGROUND

In the field of artificial intelligence (AI), alignment aims to steer AI systems toward a person's or group's intended goals, preferences, or ethical principles. An AI system is considered aligned if it advances the intended objectives. A misaligned AI system pursues unintended objectives.

SYNONYMS & ALIASES

Correctability
Modifiability
Interruptibility
Human override

USAGE NOTE

Designing for corrigibility is a key safety feature for advanced AI systems.

DEVELOPERS

Organizations developing technology related to Corrigibility.

Anthropic
Anthropic is an AI safety and research company known for developing 'Constitutional AI'. This approach directly addresses corrigibility by training AI systems to align with human values and principles, making them more steerable, transparent, and amenable to safe human modification and control without resistance.
OpenAI
OpenAI is dedicated to ensuring that artificial general intelligence (AGI) benefits all of humanity. Their extensive research into AI alignment, interpretability, and robust control mechanisms aims to develop AI systems that are safe, beneficial, and can be reliably steered and modified by human operators, which is central to corrigibility.
Google DeepMind
Google DeepMind has a significant AI safety team that conducts research on AI alignment, control, and responsible AI development. Their work explores methods to ensure advanced AI systems remain under human oversight and can be safely interrupted, modified, or updated without unintended resistance, a key aspect of corrigibility.
Machine Intelligence Research Institute (MIRI)
MIRI is a pioneering research non-profit focused on the theoretical challenges of AI alignment and safety. They conduct foundational research on topics such as corrigibility, safe interruptibility, and value alignment, aiming to prevent advanced AI from resisting beneficial human intervention or modification of its goals.
Future of Humanity Institute (FHI) at Oxford University
FHI conducts interdisciplinary research on global catastrophic risks, including those posed by advanced artificial intelligence. Their work on AI governance, control, and ensuring that future AI systems are beneficial and responsive to human guidance directly addresses the need for corrigible AI.
Conjecture
Conjecture is a research organization focused on solving the AI alignment problem. Their efforts are directed towards ensuring that advanced AI systems are controllable, safe, and allow for human intervention and modification of their objectives without generating undesirable counter-reactions, a core goal of corrigibility.
Redwood Research
Redwood Research is dedicated to AI alignment and interpretability. By developing techniques to better understand and predict AI behavior, they contribute to the foundational work required to safely intervene, modify, and ensure that AI systems are corrigible, meaning they can accept and implement changes to their goals or actions.

RELATED TERMS IN AI ETHICS & SAFETY

BACK TO AI ENGINEERING & PROMPT DESIGN LEXICON

TECHNICAL DEFINITION

BACKGROUND

SYNONYMS & ALIASES

USAGE NOTE

DEVELOPERS

Anthropic

OpenAI

Google DeepMind

Machine Intelligence Research Institute (MIRI)

Future of Humanity Institute (FHI) at Oxford University

Conjecture

Redwood Research

RELATED TERMS IN AI ETHICS & SAFETY