// MODEL OPTIMIZATION AND PROMPT SYNTAX TERM

Inner Alignment

This refers to the problem of ensuring that an AI's internal goals or learned objective function truly match the explicit goal that its human designers intended for it to have.

TECHNICAL DEFINITION

Inner alignment is the problem of ensuring that the learned objective function or "mesa-objective" of an AI system, particularly a mesa-optimizer, accurately reflects and is congruent with the outer objective function specified by its human designers or the base training process.

BACKGROUND

In the field of artificial intelligence (AI), alignment aims to steer AI systems toward a person's or group's intended goals, preferences, or ethical principles. An AI system is considered aligned if it advances the intended objectives. A misaligned AI system pursues unintended objectives.

SYNONYMS & ALIASES

Objective fidelity
Goal congruence
Value alignment (internal)
Mesa-objective alignment

USAGE NOTE

Achieving inner alignment is crucial for preventing advanced AI systems from pursuing unintended or harmful goals, even if the outer training process seems well-defined.

DEVELOPERS

Organizations developing technology related to Inner Alignment.

Anthropic
Developing safe and steerable AI systems, with a strong focus on AI alignment research, including understanding and mitigating internal goal misalignment in powerful AI models.
OpenAI
Conducting research into AI safety and alignment, including efforts to ensure that future advanced AI systems remain aligned with human intentions and do not develop unintended internal goals.
Google DeepMind
Researching AI safety, interpretability, and alignment to ensure that increasingly capable AI systems operate reliably and in accordance with human values, addressing challenges like inner alignment.
Machine Intelligence Research Institute (MIRI)
Focused on advanced AI safety research, particularly the theoretical and practical challenges of aligning highly intelligent systems with human values, including the inner alignment problem.
Alignment Research Center (ARC)
A research organization dedicated to solving the AI alignment problem, specifically focusing on how to reliably align advanced AI systems with human intentions, including the issue of inner alignment.
Centre for AI Safety (CAIS)
Fosters and supports research into the safety of advanced AI, including critical areas like inner alignment, to ensure the beneficial development of artificial intelligence.

RELATED TERMS IN AI ETHICS & SAFETY

BACK TO AI ENGINEERING & PROMPT DESIGN LEXICON

TECHNICAL DEFINITION

BACKGROUND

SYNONYMS & ALIASES

USAGE NOTE

DEVELOPERS

Anthropic

OpenAI

Google DeepMind

Machine Intelligence Research Institute (MIRI)

Alignment Research Center (ARC)

Centre for AI Safety (CAIS)

RELATED TERMS IN AI ETHICS & SAFETY