// MODEL OPTIMIZATION AND PROMPT SYNTAX TERM

Inference

The process of using a trained machine learning model to make predictions or decisions on new, unseen data.

TECHNICAL DEFINITION

Inference, in machine learning, refers to the application of a pre-trained model to new input data to generate predictions, classifications, or other outputs, often involving forward propagation through the neural network or model architecture.

BACKGROUND

A large language model (LLM) is a neural network trained on a vast amount of text for natural language processing tasks, especially language generation. LLMs can typically generate, summarize, translate and analyze text in many contexts, and are a foundational technology behind modern chatbots. Biased or inaccurate training data can make an LLM's output less reliable.

READ MORE ON WIKIPEDIA

SYNONYMS & ALIASES

  • Prediction
  • forecasting
  • scoring
  • model application

USAGE NOTE

Real-time inference is critical for applications like recommendation systems and autonomous driving.

DEVELOPERS

Organizations developing technology related to Inference.

  • NVIDIA

    Develops GPUs and software platforms like TensorRT, which optimize AI models for high-performance inference across various deployments, from data centers to edge devices.

  • Amazon Web Services (AWS)

    Offers cloud services like Amazon SageMaker for building, training, and deploying machine learning models, including managed inference endpoints and specialized Inferentia chips for cost-effective inference.

  • Google Cloud

    Provides Vertex AI, a unified platform for ML development and deployment, featuring managed inference services, custom model serving, and specialized hardware like TPUs for efficient AI inference.

  • Microsoft Azure

    Offers Azure Machine Learning for MLOps, including tools for deploying and managing AI models for inference, supporting various frameworks and hardware accelerators.

  • Hugging Face

    Provides a platform and libraries (like Transformers and Accelerate) for easily downloading, fine-tuning, and deploying pre-trained machine learning models for inference, often focusing on efficient execution.

  • Intel

    Develops CPUs, AI accelerators, and software toolkits like OpenVINO (Open Visual Inference & Neural Network Optimization) to optimize AI models for inference on Intel hardware.

  • Databricks

    Offers a unified data and AI platform that includes MLflow, enabling MLOps practices such as model serving and monitoring, which are crucial for managing AI model inference at scale.

  • OpenAI

    Develops advanced AI models like GPT and DALL-E and provides API access, requiring robust, scalable, and efficient inference infrastructure to serve millions of requests.

RELATED TERMS IN DATA SCIENCE