// MODEL OPTIMIZATION AND PROMPT SYNTAX TERM
Inner Alignment
This refers to the problem of ensuring that an AI's internal goals or learned objective function truly match the explicit goal that its human designers intended for it to have.
TECHNICAL DEFINITION
Inner alignment is the problem of ensuring that the learned objective function or "mesa-objective" of an AI system, particularly a mesa-optimizer, accurately reflects and is congruent with the outer objective function specified by its human designers or the base training process.
BACKGROUND
In the field of artificial intelligence (AI), alignment aims to steer AI systems toward a person's or group's intended goals, preferences, or ethical principles. An AI system is considered aligned if it advances the intended objectives. A misaligned AI system pursues unintended objectives.
READ MORE ON WIKIPEDIASYNONYMS & ALIASES
- Objective fidelity
- Goal congruence
- Value alignment (internal)
- Mesa-objective alignment
USAGE NOTE
Achieving inner alignment is crucial for preventing advanced AI systems from pursuing unintended or harmful goals, even if the outer training process seems well-defined.
DEVELOPERS
Organizations developing technology related to Inner Alignment.
Developing safe and steerable AI systems, with a strong focus on AI alignment research, including understanding and mitigating internal goal misalignment in powerful AI models.
Conducting research into AI safety and alignment, including efforts to ensure that future advanced AI systems remain aligned with human intentions and do not develop unintended internal goals.
Researching AI safety, interpretability, and alignment to ensure that increasingly capable AI systems operate reliably and in accordance with human values, addressing challenges like inner alignment.
Focused on advanced AI safety research, particularly the theoretical and practical challenges of aligning highly intelligent systems with human values, including the inner alignment problem.
A research organization dedicated to solving the AI alignment problem, specifically focusing on how to reliably align advanced AI systems with human intentions, including the issue of inner alignment.
Fosters and supports research into the safety of advanced AI, including critical areas like inner alignment, to ensure the beneficial development of artificial intelligence.