// MODEL OPTIMIZATION AND PROMPT SYNTAX TERM

Performance Monitoring

Performance monitoring tracks how well a model is performing its task, such as its accuracy or speed, in a production setting.

TECHNICAL DEFINITION

Performance monitoring for ML models involves continuously tracking key metrics such as accuracy, precision, recall, F1-score, latency, and throughput, to assess the model's effectiveness and operational efficiency in a production environment.

BACKGROUND

Prompt injection is a cybersecurity exploit and an attack vector in which innocuous-looking inputs are designed to cause unintended behavior in machine learning models, particularly large language models (LLMs). The attack takes advantage of the model's inability to distinguish between developer-defined prompts and user inputs to bypass safeguards and influence model behaviour. While LLMs are designed to follow trusted instructions, they can be manipulated into carrying out unintended responses through carefully crafted inputs.

SYNONYMS & ALIASES

Model performance tracking
metric monitoring
operational monitoring

USAGE NOTE

Performance monitoring dashboards provide insights into a model's health and business impact.

DEVELOPERS

Organizations developing technology related to Performance Monitoring.

Arize AI
Arize AI provides a machine learning observability platform that helps data science and ML engineering teams monitor model performance, detect issues like drift, and troubleshoot problems in production AI systems.
WhyLabs AI
WhyLabs offers the AI Observability Platform, which provides continuous monitoring of data and model health, detecting data drift, model performance degradation, and data quality issues in AI applications.
Weights & Biases
Weights & Biases (W&B) provides an MLOps platform for tracking, visualizing, and comparing machine learning experiments, including tools for model monitoring, performance logging, and prompt engineering evaluations.
LangChain (LangSmith)
LangSmith, developed by the creators of LangChain, is a platform for debugging, testing, evaluating, and monitoring large language model (LLM) applications and agent systems, focusing on performance and reliability.
Helicone
Helicone offers an open-source platform for monitoring and managing LLM usage, providing observability into prompt performance, cost, latency, and error rates for AI applications.
Vellum
Vellum is an LLM development platform that includes features for prompt management, testing, and deployment, with built-in monitoring and analytics to track the performance and efficacy of prompts and models in production.
Fiddler AI
Fiddler AI provides an MLOps platform for responsible AI, offering explainability, fairness, and performance monitoring for machine learning models across their lifecycle, including model drift detection and bias analysis.
Datadog
Datadog, a leading monitoring and security platform, offers LLM observability capabilities that allow users to monitor the performance, cost, latency, and token usage of their large language model applications.

RELATED TERMS IN MLOPS & DEPLOYMENT

BACK TO AI ENGINEERING & PROMPT DESIGN LEXICON

TECHNICAL DEFINITION

BACKGROUND

SYNONYMS & ALIASES

USAGE NOTE

DEVELOPERS

Arize AI

WhyLabs AI

Weights & Biases

LangChain (LangSmith)

Helicone

Vellum

Fiddler AI

Datadog

RELATED TERMS IN MLOPS & DEPLOYMENT