// MODEL OPTIMIZATION AND PROMPT SYNTAX TERM

Inference

The process of using a trained machine learning model to make predictions or decisions on new, unseen data.

TECHNICAL DEFINITION

Inference, in machine learning, refers to the application of a pre-trained model to new input data to generate predictions, classifications, or other outputs, often involving forward propagation through the neural network or model architecture.

BACKGROUND

A large language model (LLM) is a neural network trained on a vast amount of text for natural language processing tasks, especially language generation. LLMs can typically generate, summarize, translate, and analyze text in many contexts, and are a foundational technology behind modern chatbots. Biased or inaccurate training data can make an LLM's output less reliable.

SYNONYMS & ALIASES

Prediction
forecasting
scoring
model application

USAGE NOTE

Real-time inference is critical for applications like recommendation systems and autonomous driving.

DEVELOPERS

Organizations developing technology related to Inference.

NVIDIA
Develops GPUs and software platforms like TensorRT, which optimize AI models for high-performance inference across various deployments, from data centers to edge devices.
Amazon Web Services (AWS)
Offers cloud services like Amazon SageMaker for building, training, and deploying machine learning models, including managed inference endpoints and specialized Inferentia chips for cost-effective inference.
Google Cloud
Provides Vertex AI, a unified platform for ML development and deployment, featuring managed inference services, custom model serving, and specialized hardware like TPUs for efficient AI inference.
Microsoft Azure
Offers Azure Machine Learning for MLOps, including tools for deploying and managing AI models for inference, supporting various frameworks and hardware accelerators.
Hugging Face
Provides a platform and libraries (like Transformers and Accelerate) for easily downloading, fine-tuning, and deploying pre-trained machine learning models for inference, often focusing on efficient execution.
Intel
Develops CPUs, AI accelerators, and software toolkits like OpenVINO (Open Visual Inference & Neural Network Optimization) to optimize AI models for inference on Intel hardware.
Databricks
Offers a unified data and AI platform that includes MLflow, enabling MLOps practices such as model serving and monitoring, which are crucial for managing AI model inference at scale.
OpenAI
Develops advanced AI models like GPT and DALL-E and provides API access, requiring robust, scalable, and efficient inference infrastructure to serve millions of requests.

RELATED TERMS IN DATA SCIENCE

BACK TO AI ENGINEERING & PROMPT DESIGN LEXICON

TECHNICAL DEFINITION

BACKGROUND

SYNONYMS & ALIASES

USAGE NOTE

DEVELOPERS

NVIDIA

Amazon Web Services (AWS)

Google Cloud

Microsoft Azure

Hugging Face

Intel

Databricks

OpenAI

RELATED TERMS IN DATA SCIENCE