// MODEL OPTIMIZATION AND PROMPT SYNTAX TERM

Encoder-Decoder

This architecture consists of two main parts: an encoder that processes the input and a decoder that uses that processed information to generate an output.

Encoder-Decoder — illustration from Wikipedia
Image via Wikipedia

TECHNICAL DEFINITION

A neural network architecture comprising an encoder, which maps an input sequence to a fixed-length context vector, and a decoder, which generates an output sequence based on this context vector, commonly used for sequence-to-sequence tasks.

BACKGROUND

In deep learning, the transformer is a family of artificial neural network architectures based on the multi-head attention mechanism, in which text is converted to numerical representations called tokens, and each token is converted into a vector via lookup from a word embedding table. At each layer, each token is then contextualized within the scope of the context window with other (unmasked) tokens via a parallel multi-head attention mechanism, allowing the signal for key tokens to be amplified and less important tokens to be diminished. Because self-attention alone is permutation-invariant, transformers inject positional information, typically through positional encodings or learned positional embeddings, so token order can affect the output.

READ MORE ON WIKIPEDIA

SYNONYMS & ALIASES

  • Seq2seq model
  • sequence-to-sequence
  • encoder-decoder framework

USAGE NOTE

Encoder-decoder models are widely used in machine translation, text summarization, and image captioning.

DEVELOPERS

Organizations developing technology related to Encoder-Decoder.

  • Google AI

    Pioneers of the Transformer architecture, which are advanced encoder-decoder models. They develop large language models like T5 (Text-to-Text Transfer Transformer), which is an encoder-decoder model, and contribute significantly to the theoretical and applied aspects of these models, crucial for prompt engineering.

  • Meta AI

    Engages in fundamental AI research, including developing encoder-decoder models like BART and NLLB (No Language Left Behind). Their work contributes directly to advanced NLP capabilities used in AI engineering and prompt design.

  • Microsoft Research

    Actively researches and develops various transformer-based models, including encoder-decoder architectures like PEGASUS for abstractive summarization. Their work is integrated into Azure AI services, supporting AI engineering and prompt design applications.

  • Amazon Science

    Utilizes and develops encoder-decoder architectures for services like machine translation, text summarization, and other NLP tasks within AWS AI. These models are foundational for many AI engineering efforts and prompt-based applications.

  • Hugging Face

    A leading platform for AI engineering, providing open-source libraries (Transformers library) and models, many of which are encoder-decoder based (e.g., T5, BART). They enable practitioners to easily access, fine-tune, and deploy these models for various prompt design tasks.

  • Baidu Research

    A major AI research arm, especially strong in NLP and machine translation. They develop and deploy advanced models, many of which leverage encoder-decoder architectures, for various AI engineering applications.

  • Salesforce AI

    Conducts research and develops novel NLP models, often building upon or extending transformer and encoder-decoder principles. Their work contributes to the advancement of AI engineering practices.

  • IBM Research AI

    Undertakes significant research in core AI technologies, including natural language processing and the development of deep learning architectures, which frequently involve encoder-decoder structures for tasks like machine translation and text generation.

RELATED TERMS IN MODEL ARCHITECTURE