// MODEL OPTIMIZATION AND PROMPT SYNTAX TERM

Pruning

Pruning is a model compression technique that removes less important connections or neurons from an AI model, making it smaller and potentially faster without significantly affecting its accuracy.

Image via Wikipedia

TECHNICAL DEFINITION

Pruning is a model compression technique that identifies and removes redundant or less significant weights, neurons, or connections from a neural network, effectively reducing the model's parameter count and computational graph complexity, leading to smaller models and faster inference.

BACKGROUND

Algorithmic bias describes systematic and repeatable harmful tendency in a computerized sociotechnical system to create "unfair" outcomes, such as "privileging" one category over another in ways that may or may not be different from the intended function of the algorithm.

SYNONYMS & ALIASES

Network pruning
weight pruning
sparsity

USAGE NOTE

Pruning can significantly reduce model size, especially for over-parameterized deep learning models.

DEVELOPERS

Organizations developing technology related to Pruning.

Google (Google AI / DeepMind)
Pioneers in neural network research, including extensive work on model compression and efficiency techniques like pruning, for deploying models from data centers to edge devices.
Meta AI (FAIR)
Actively researches and implements model optimization strategies, including various forms of pruning, to make large-scale AI models more efficient for both research and production use.
NVIDIA
Develops platforms and software (e.g., TensorRT, libraries for model optimization) that incorporate and enable pruning techniques to deploy high-performance, efficient AI models on their GPUs.
Intel
Through Intel AI and their OpenVINO toolkit, they provide tools and research for optimizing neural networks, including pruning, for efficient deployment on Intel hardware.
Microsoft (Microsoft Research)
Conducts deep research into neural network efficiency, encompassing pruning algorithms and their application across various AI domains to reduce model size and inference cost.
Qualcomm AI Research
Focuses heavily on making AI models efficient for on-device deployment (smartphones, IoT), where pruning and quantization are crucial for performance within strict power and memory constraints.
Hugging Face
While known for model sharing, they also promote and host optimized models. Their ecosystem often leverages and encourages model optimization techniques like pruning for more efficient deployment of transformer models.
IBM Research
Engages in fundamental and applied AI research, including efforts to reduce the computational footprint and memory requirements of neural networks through techniques like pruning for enterprise AI solutions.

RELATED TERMS IN MLOPS & DEPLOYMENT

BACK TO AI ENGINEERING & PROMPT DESIGN LEXICON

TECHNICAL DEFINITION

BACKGROUND

SYNONYMS & ALIASES

USAGE NOTE

DEVELOPERS

Google (Google AI / DeepMind)

Meta AI (FAIR)

NVIDIA

Intel

Microsoft (Microsoft Research)

Qualcomm AI Research

Hugging Face

IBM Research

RELATED TERMS IN MLOPS & DEPLOYMENT