// MODEL OPTIMIZATION AND PROMPT SYNTAX TERM

Checkpoint

A checkpoint in machine learning is a saved snapshot of a model's progress during training, including its learned weights and optimizer state, allowing training to be resumed later.

TECHNICAL DEFINITION

A checkpoint in machine learning refers to a saved state of a neural network model at a specific training iteration, encapsulating the model's learned parameters (weights and biases), optimizer state, and sometimes training configuration, enabling fault tolerance and continuation of training or inference from that point.

BACKGROUND

Hangzhou DeepSeek Artificial Intelligence Basic Technology Research Co., Ltd., doing business as DeepSeek, is a Chinese artificial intelligence (AI) company that develops large language models (LLMs). Based in Hangzhou, Zhejiang, DeepSeek is owned and funded by High-Flyer, a Chinese hedge fund. DeepSeek was founded in July 2023 by Liang Wenfeng, the co-founder of High-Flyer, who also serves as the CEO for both of the companies. The company launched an eponymous chatbot alongside its DeepSeek-R1 model in January 2025.

READ MORE ON WIKIPEDIA

SYNONYMS & ALIASES

  • Saved model
  • Model snapshot
  • Training state
  • Weights file

USAGE NOTE

Saving regular checkpoints is crucial during long training runs to prevent data loss and allow for experimentation with different learning rates.

DEVELOPERS

Organizations developing technology related to Checkpoint.

  • Weights & Biases (W&B)

    Weights & Biases offers an MLOps platform that enables experiment tracking, model versioning, and artifact management, allowing users to save, compare, and restore model checkpoints efficiently during training and development.

  • Hugging Face

    Hugging Face hosts a vast repository of pre-trained models (checkpoints) on their Hub, providing tools for saving, loading, sharing, and fine-tuning these models, which is crucial for AI engineering and prompt design workflows.

  • MLflow

    MLflow is an open-source platform for managing the end-to-end machine learning lifecycle, including experiment tracking and model management, allowing users to log, version, and deploy model checkpoints.

  • Google Cloud Vertex AI

    Vertex AI provides a unified platform for MLOps, offering features for experiment tracking, model registry, and versioning, which allows developers to manage and deploy different checkpoints of their AI models.

  • Amazon SageMaker

    Amazon SageMaker offers a comprehensive suite of tools for building, training, and deploying machine learning models, including capabilities for model versioning and artifact management that involve handling model checkpoints.

  • Microsoft Azure Machine Learning

    Azure Machine Learning provides an enterprise-grade platform for the ML lifecycle, featuring experiment tracking, model registration, and version control for managing and deploying various checkpoints of AI models.

  • ClearML

    ClearML is an open-source MLOps platform that offers experiment tracking, artifact management, and model versioning, enabling developers to save, log, and manage model checkpoints throughout their AI development process.

  • Comet ML

    Comet ML provides an MLOps platform for experiment tracking, model management, and monitoring, allowing users to log, version, and compare model checkpoints to accelerate their AI development and prompt optimization.

RELATED TERMS IN MODEL ARCHITECTURE