// MODEL OPTIMIZATION AND PROMPT SYNTAX TERM

Knowledge Distillation

Knowledge distillation is a technique where a smaller, simpler AI model learns to mimic the behavior of a larger, more complex "teacher" model, resulting in a smaller model that performs almost as well.

TECHNICAL DEFINITION

Knowledge distillation is a model compression and training technique where a smaller, more efficient "student" model is trained to replicate the output probabilities or intermediate representations of a larger, higher-performing "teacher" model, thereby transferring learned knowledge and achieving comparable accuracy with reduced computational cost.

BACKGROUND

Artificial intelligence (AI) is the capability of computational systems to perform tasks typically associated with human intelligence, such as learning, reasoning, problem-solving, perception, and decision-making. It is a field of research in engineering, mathematics and computer science that develops and studies methods and software that enable machines to perceive their environment and use learning and intelligence to take actions that maximize their chances of achieving defined goals.

READ MORE ON WIKIPEDIA

SYNONYMS & ALIASES

  • Model compression via teaching
  • student-teacher learning
  • soft target training

USAGE NOTE

Knowledge distillation is used to create compact models suitable for deployment while retaining much of the performance of larger models.

DEVELOPERS

Organizations developing technology related to Knowledge Distillation.

  • Google AI / Google Research

    Engages in extensive research and development of AI models, frequently publishing on techniques like knowledge distillation to improve model efficiency and deployability across various applications, including large language models and computer vision.

  • Meta AI

    Conducts fundamental and applied AI research, often focusing on model compression, efficiency, and optimization techniques such as knowledge distillation for their large-scale AI systems, including those powering social media platforms and VR/AR applications.

  • Hugging Face

    Provides tools, libraries (like Transformers), and a platform for building, training, and deploying machine learning models. Their ecosystem frequently supports and encourages the use of knowledge distillation for creating smaller, more efficient versions of large transformer models.

  • Microsoft Research

    Investigates various AI paradigms, including model compression and efficiency. They publish research and develop tools that leverage knowledge distillation to create more practical and performant AI models for a range of Microsoft products and services.

  • NVIDIA

    Develops specialized hardware (GPUs) and software platforms (e.g., TensorRT, NeMo) that enable high-performance AI. Knowledge distillation is a critical technique for optimizing models to run efficiently on their hardware, especially for edge AI and real-time inference.

  • Amazon Web Services (AWS)

    Offers cloud-based machine learning services like Amazon SageMaker, which provides tools and frameworks for training, tuning, and deploying ML models. AWS supports various model optimization techniques, including knowledge distillation, to help customers deploy efficient models at scale.

  • OpenAI

    While known for developing large foundational models, OpenAI also researches methods for making AI more efficient and accessible. Knowledge distillation is a relevant technique for creating smaller, task-specific models derived from their larger, more general models.

RELATED TERMS IN MLOPS & DEPLOYMENT