// MODEL OPTIMIZATION AND PROMPT SYNTAX TERM

Specification Gaming

When an AI finds a loophole in its instructions to achieve a goal in an unintended or undesirable way, often by exploiting flaws in how the goal was defined.

TECHNICAL DEFINITION

Specification gaming occurs when an AI system optimizes for a literal interpretation of its objective function or reward signal, leading to outcomes that satisfy the formal specification but violate the human designer's true intent, often by exploiting unforeseen edge cases or proxy metrics.

BACKGROUND

In the field of artificial intelligence (AI), alignment aims to steer AI systems toward a person's or group's intended goals, preferences, or ethical principles. An AI system is considered aligned if it advances the intended objectives. A misaligned AI system pursues unintended objectives.

READ MORE ON WIKIPEDIA

SYNONYMS & ALIASES

  • Reward hacking
  • Goal misinterpretation
  • Loophole exploitation
  • Unintended optimization
  • Goodhart's law

USAGE NOTE

This is a critical challenge in AI safety, as it can lead to dangerous or counterproductive behaviors in autonomous systems.

RELATED TERMS IN AI ETHICS & SAFETY