// MODEL OPTIMIZATION AND PROMPT SYNTAX TERM
Knowledge Distillation
Knowledge distillation is a technique where a smaller, simpler AI model learns to mimic the behavior of a larger, more complex "teacher" model, resulting in a smaller model that performs almost as well.
TECHNICAL DEFINITION
Knowledge distillation is a model compression and training technique where a smaller, more efficient "student" model is trained to replicate the output probabilities or intermediate representations of a larger, higher-performing "teacher" model, thereby transferring learned knowledge and achieving comparable accuracy with reduced computational cost.
BACKGROUND
Artificial intelligence (AI) is the capability of computational systems to perform tasks typically associated with human intelligence, such as learning, reasoning, problem-solving, perception, and decision-making. It is a field of research in engineering, mathematics and computer science that develops and studies methods and software that enable machines to perceive their environment and use learning and intelligence to take actions that maximize their chances of achieving defined goals.
READ MORE ON WIKIPEDIASYNONYMS & ALIASES
- Model compression via teaching
- student-teacher learning
- soft target training
USAGE NOTE
Knowledge distillation is used to create compact models suitable for deployment while retaining much of the performance of larger models.
DEVELOPERS
Organizations developing technology related to Knowledge Distillation.
Engages in extensive research and development of AI models, frequently publishing on techniques like knowledge distillation to improve model efficiency and deployability across various applications, including large language models and computer vision.
Conducts fundamental and applied AI research, often focusing on model compression, efficiency, and optimization techniques such as knowledge distillation for their large-scale AI systems, including those powering social media platforms and VR/AR applications.
Provides tools, libraries (like Transformers), and a platform for building, training, and deploying machine learning models. Their ecosystem frequently supports and encourages the use of knowledge distillation for creating smaller, more efficient versions of large transformer models.
Investigates various AI paradigms, including model compression and efficiency. They publish research and develop tools that leverage knowledge distillation to create more practical and performant AI models for a range of Microsoft products and services.
Develops specialized hardware (GPUs) and software platforms (e.g., TensorRT, NeMo) that enable high-performance AI. Knowledge distillation is a critical technique for optimizing models to run efficiently on their hardware, especially for edge AI and real-time inference.
Offers cloud-based machine learning services like Amazon SageMaker, which provides tools and frameworks for training, tuning, and deploying ML models. AWS supports various model optimization techniques, including knowledge distillation, to help customers deploy efficient models at scale.
While known for developing large foundational models, OpenAI also researches methods for making AI more efficient and accessible. Knowledge distillation is a relevant technique for creating smaller, task-specific models derived from their larger, more general models.