// MODEL OPTIMIZATION AND PROMPT SYNTAX TERM

Triton

An open-source inference server from NVIDIA that can run many different types of AI models from various frameworks on GPUs and CPUs.

Image via Wikipedia

TECHNICAL DEFINITION

NVIDIA Triton Inference Server (formerly TensorRT Inference Server) is an open-source, high-performance inference serving software that supports multiple deep learning frameworks (TensorFlow, PyTorch, ONNX, etc.), model types, and concurrent execution on GPUs and CPUs.

BACKGROUND

The Pantheon is an ancient 2nd century Roman temple and, since AD 609, a Catholic church called the Basilica of St. Mary and the Martyrs in Rome, Italy. It is perhaps the most famous, and architecturally most influential, rotunda.

SYNONYMS & ALIASES

NVIDIA Triton
Triton Inference Server
TensorRT Inference Server
Inference Server

USAGE NOTE

Used for high-throughput, low-latency AI inference across diverse models and hardware.

DEVELOPERS

Organizations developing technology related to Triton.

NVIDIA
Originator and primary developer of the Triton Inference Server, an open-source solution for high-performance AI model deployment across various frameworks and hardware, crucial for AI engineering and efficient inference.
OctoML
Provides a platform for optimizing, deploying, and running AI models efficiently, frequently leveraging and integrating with NVIDIA Triton Inference Server for high-performance inference in production environments.
Run:AI
Offers an AI workload orchestration and management platform that optimizes GPU utilization and facilitates efficient deployment of AI models, often integrating with and managing inference servers like NVIDIA Triton.
Intel
Develops hardware and software solutions for AI, including specific optimizations for running AI models on Intel CPUs and collaborating to extend frameworks and inference servers like Triton for diverse hardware.
Databricks
Provides a unified data and AI platform with robust MLOps capabilities, where efficient model serving solutions, including those leveraging technologies like Triton Inference Server, are integrated and managed for large-scale deployments.
Weights & Biases
Offers a developer-first MLOps platform for experiment tracking, model versioning, and deployment monitoring, often integrating with and supporting efficient inference serving solutions like NVIDIA Triton in production pipelines.
Red Hat
Provides enterprise open-source solutions, including OpenShift AI, which offers a comprehensive platform for MLOps and AI model deployment, often supporting and integrating with inference servers like NVIDIA Triton for enterprise-grade AI applications.

RELATED TERMS IN MLOPS & DEPLOYMENT

BACK TO AI ENGINEERING & PROMPT DESIGN LEXICON

TECHNICAL DEFINITION

BACKGROUND

SYNONYMS & ALIASES

USAGE NOTE

DEVELOPERS

NVIDIA

OctoML

Run:AI

Intel

Databricks

Weights & Biases

Red Hat

RELATED TERMS IN MLOPS & DEPLOYMENT