// MODEL OPTIMIZATION AND PROMPT SYNTAX TERM

One-Hot Encoding

A technique to convert categorical data into a numerical format that machine learning models can understand, creating new binary columns for each category.

TECHNICAL DEFINITION

A categorical feature encoding scheme that transforms nominal categorical variables into a binary vector representation, where a new binary column is created for each unique category, and a '1' indicates the presence of that category.

BACKGROUND

Retrieval-augmented generation (RAG) is a technique that enables large language models (LLMs) to retrieve and incorporate new information from external data sources. With RAG, LLMs first refer to a specified set of documents, then respond to user queries. These documents supplement information from the LLM's pre-existing training data. This allows LLMs to use domain-specific and/or updated information that is not available in the training data. For example, this enables LLM-based chatbots to access internal company data or generate responses based on authoritative sources.

READ MORE ON WIKIPEDIA

SYNONYMS & ALIASES

  • Dummy encoding
  • binary encoding (for categories)
  • one-of-K encoding

USAGE NOTE

One-hot encoding is commonly used to prepare categorical features for algorithms that require numerical input.

RELATED TERMS IN DATA SCIENCE