// MODEL OPTIMIZATION AND PROMPT SYNTAX TERM
Inference
The process of using a trained machine learning model to make predictions or decisions on new, unseen data.
TECHNICAL DEFINITION
Inference, in machine learning, refers to the application of a pre-trained model to new input data to generate predictions, classifications, or other outputs, often involving forward propagation through the neural network or model architecture.
BACKGROUND
A large language model (LLM) is a neural network trained on a vast amount of text for natural language processing tasks, especially language generation. LLMs can typically generate, summarize, translate and analyze text in many contexts, and are a foundational technology behind modern chatbots. Biased or inaccurate training data can make an LLM's output less reliable.
READ MORE ON WIKIPEDIASYNONYMS & ALIASES
- Prediction
- forecasting
- scoring
- model application
USAGE NOTE
Real-time inference is critical for applications like recommendation systems and autonomous driving.
DEVELOPERS
Organizations developing technology related to Inference.
Develops GPUs and software platforms like TensorRT, which optimize AI models for high-performance inference across various deployments, from data centers to edge devices.
Offers cloud services like Amazon SageMaker for building, training, and deploying machine learning models, including managed inference endpoints and specialized Inferentia chips for cost-effective inference.
Provides Vertex AI, a unified platform for ML development and deployment, featuring managed inference services, custom model serving, and specialized hardware like TPUs for efficient AI inference.
Offers Azure Machine Learning for MLOps, including tools for deploying and managing AI models for inference, supporting various frameworks and hardware accelerators.
Provides a platform and libraries (like Transformers and Accelerate) for easily downloading, fine-tuning, and deploying pre-trained machine learning models for inference, often focusing on efficient execution.
Develops CPUs, AI accelerators, and software toolkits like OpenVINO (Open Visual Inference & Neural Network Optimization) to optimize AI models for inference on Intel hardware.
Offers a unified data and AI platform that includes MLflow, enabling MLOps practices such as model serving and monitoring, which are crucial for managing AI model inference at scale.
Develops advanced AI models like GPT and DALL-E and provides API access, requiring robust, scalable, and efficient inference infrastructure to serve millions of requests.