// MODEL OPTIMIZATION AND PROMPT SYNTAX TERM

RLHF

A training method where an AI model learns to produce better outputs by getting feedback from humans who rank or rate different responses, guiding the model to prefer more desirable results.

RLHF — illustration from Wikipedia
Image via Wikipedia

TECHNICAL DEFINITION

Reinforcement Learning from Human Feedback (RLHF) is a post-training technique for large language models (LLMs) where a reward model, trained on human preferences for LLM outputs, is used to fine-tune the LLM via reinforcement learning, aligning its behavior with human values and desired characteristics.

BACKGROUND

In machine learning, reinforcement learning from human feedback (RLHF) is a technique to align an intelligent agent with human preferences. It involves training a reward model to represent preferences, which can then be used to train other models through reinforcement learning.

READ MORE ON WIKIPEDIA

SYNONYMS & ALIASES

  • Human-in-the-loop RL
  • Preference learning
  • Alignment training

USAGE NOTE

RLHF is critical for aligning powerful LLMs with human values and making them safer and more helpful.

DEVELOPERS

Organizations developing technology related to RLHF.

  • OpenAI

    Pioneered the use of Reinforcement Learning from Human Feedback (RLHF) to align large language models like InstructGPT and ChatGPT, making them more helpful, honest, and harmless.

  • Anthropic

    A leading AI safety and research company that develops large language models and advanced AI systems, heavily focused on alignment techniques, including 'Constitutional AI' which builds upon and extends the principles of human feedback and RL for safer AI.

  • Google DeepMind

    A world-renowned AI research laboratory within Google that conducts fundamental research in reinforcement learning and large language models, applying human feedback methods to improve model performance, safety, and alignment for systems like Gemini.

  • Meta AI

    The AI research division of Meta, which develops open-source large language models (e.g., Llama series) and actively researches methods for their alignment, safety, and steerability, often incorporating human feedback loops and reinforcement learning techniques.

  • Microsoft Research

    Engages in foundational and applied AI research, including responsible AI and large language models. Leveraging its partnership with OpenAI and internal expertise, it explores and implements advanced alignment techniques, including those based on human feedback and reinforcement learning.

  • Hugging Face

    Provides an open-source platform and libraries for building, training, and deploying machine learning models, including extensive tools and resources (e.g., TRL library) that facilitate the implementation and experimentation of RLHF for large language models.

  • Cohere

    An enterprise AI company that develops large language models for business applications. They focus on making their models reliable, customizable, and safe for enterprise use, requiring sophisticated alignment methods that often involve human feedback and reinforcement learning to meet specific business needs.

RELATED TERMS IN PROMPTING & LOGIC