// MODEL OPTIMIZATION AND PROMPT SYNTAX TERM

Pruning

Pruning is a model compression technique that removes less important connections or neurons from an AI model, making it smaller and potentially faster without significantly affecting its accuracy.

Pruning — illustration from Wikipedia
Image via Wikipedia

TECHNICAL DEFINITION

Pruning is a model compression technique that identifies and removes redundant or less significant weights, neurons, or connections from a neural network, effectively reducing the model's parameter count and computational graph complexity, leading to smaller models and faster inference.

BACKGROUND

Algorithmic bias describes systematic and repeatable harmful tendency in a computerized sociotechnical system to create "unfair" outcomes, such as "privileging" one category over another in ways that may or may not be different from the intended function of the algorithm.

READ MORE ON WIKIPEDIA

SYNONYMS & ALIASES

  • Network pruning
  • weight pruning
  • sparsity

USAGE NOTE

Pruning can significantly reduce model size, especially for over-parameterized deep learning models.

DEVELOPERS

Organizations developing technology related to Pruning.

  • Google (Google AI / DeepMind)

    Pioneers in neural network research, including extensive work on model compression and efficiency techniques like pruning, for deploying models from data centers to edge devices.

  • Meta AI (FAIR)

    Actively researches and implements model optimization strategies, including various forms of pruning, to make large-scale AI models more efficient for both research and production use.

  • NVIDIA

    Develops platforms and software (e.g., TensorRT, libraries for model optimization) that incorporate and enable pruning techniques to deploy high-performance, efficient AI models on their GPUs.

  • Intel

    Through Intel AI and their OpenVINO toolkit, they provide tools and research for optimizing neural networks, including pruning, for efficient deployment on Intel hardware.

  • Microsoft (Microsoft Research)

    Conducts deep research into neural network efficiency, encompassing pruning algorithms and their application across various AI domains to reduce model size and inference cost.

  • Qualcomm AI Research

    Focuses heavily on making AI models efficient for on-device deployment (smartphones, IoT), where pruning and quantization are crucial for performance within strict power and memory constraints.

  • Hugging Face

    While known for model sharing, they also promote and host optimized models. Their ecosystem often leverages and encourages model optimization techniques like pruning for more efficient deployment of transformer models.

  • IBM Research

    Engages in fundamental and applied AI research, including efforts to reduce the computational footprint and memory requirements of neural networks through techniques like pruning for enterprise AI solutions.

RELATED TERMS IN MLOPS & DEPLOYMENT