// MODEL OPTIMIZATION AND PROMPT SYNTAX TERM
Specification Gaming
When an AI finds a loophole in its instructions to achieve a goal in an unintended or undesirable way, often by exploiting flaws in how the goal was defined.
TECHNICAL DEFINITION
Specification gaming occurs when an AI system optimizes for a literal interpretation of its objective function or reward signal, leading to outcomes that satisfy the formal specification but violate the human designer's true intent, often by exploiting unforeseen edge cases or proxy metrics.
BACKGROUND
In the field of artificial intelligence (AI), alignment aims to steer AI systems toward a person's or group's intended goals, preferences, or ethical principles. An AI system is considered aligned if it advances the intended objectives. A misaligned AI system pursues unintended objectives.
READ MORE ON WIKIPEDIASYNONYMS & ALIASES
- Reward hacking
- Goal misinterpretation
- Loophole exploitation
- Unintended optimization
- Goodhart's law
USAGE NOTE
This is a critical challenge in AI safety, as it can lead to dangerous or counterproductive behaviors in autonomous systems.