// MODEL OPTIMIZATION AND PROMPT SYNTAX TERM

DVC

An open-source tool for versioning data and machine learning models, similar to Git but for large files.

DVC — illustration from Wikipedia
Image via Wikipedia

TECHNICAL DEFINITION

Data Version Control (DVC) is an open-source version control system for machine learning projects, enabling data scientists to version datasets, models, and pipelines using Git-like commands, integrating with cloud storage (S3, GCS) and local filesystems.

BACKGROUND

The history of free and open-source software begins at the advent of computer software in the early half of the 20th century. In the 1950s and 1960s, computer operating software and compilers were delivered as a part of hardware purchases without separate fees. At the time, source code—the human-readable form of software—was generally distributed with the software, providing the ability to fix bugs or add new functions. Universities were early adopters of computing technology. Many of the modifications developed by universities were openly shared, in keeping with the academic principles of sharing knowledge, and organizations sprung up to facilitate sharing.

READ MORE ON WIKIPEDIA

SYNONYMS & ALIASES

  • Data Version Control
  • ML Versioning
  • Git for Data
  • Data Management

USAGE NOTE

Crucial for reproducibility in ML, allowing tracking of data and model changes alongside code.

DEVELOPERS

Organizations developing technology related to DVC.

  • Iterative.ai

    The creator and primary developer of DVC (Data Version Control), providing the core technology for versioning data and machine learning models, which is crucial for reproducible AI engineering and managing datasets used in prompt design and tuning.

  • Weights & Biases

    Offers a leading MLOps platform that integrates with DVC for data and model versioning, enabling robust experiment tracking, reproducibility, and lineage for AI engineering workflows, including the management of datasets for prompt engineering.

  • Comet ML

    Provides an MLOps platform for experiment tracking, model management, and data versioning. Comet ML often leverages DVC to manage datasets and models, facilitating reproducible AI engineering and the systematic development of prompt-driven applications.

  • ClearML

    Offers an MLOps platform that enables seamless integration with DVC for managing machine learning datasets, models, and experiments. It supports reproducible AI engineering pipelines and contributes to the systematic development of solutions involving prompt design.

  • Hugging Face

    While not directly developing DVC, Hugging Face is a central platform for large language models and prompt engineering. Many users in their community leverage DVC to version and manage datasets used for model training, fine-tuning, and prompt evaluation, essential for robust AI engineering.

  • Databricks

    As the developer of MLflow, Databricks provides an MLOps platform where DVC is frequently used in conjunction with MLflow for managing and versioning raw data, feature sets, and model artifacts in complex AI pipelines, including those involving prompt engineering and large language models.

  • Valohai

    An MLOps platform focused on automating and reproducing machine learning pipelines. Valohai often integrates DVC for data and model versioning, ensuring consistent and reproducible AI engineering practices across development and deployment, which is vital for effective prompt design.

RELATED TERMS IN MLOPS & DEPLOYMENT