// MODEL OPTIMIZATION AND PROMPT SYNTAX TERM

Splitting

Splitting refers to the general act of dividing a document or data into smaller parts, often based on specific rules, delimiters, or structural elements. It's a broader term than chunking.

Splitting — illustration from Wikipedia
Image via Wikipedia

TECHNICAL DEFINITION

The general process of dividing a larger data unit, such as a document or dataset, into smaller, discrete components, often a precursor to chunking or for parallel processing in data pipelines and machine learning workflows.

BACKGROUND

Grok is a generative artificial intelligence chatbot developed by xAI. It was launched in November 2023 by Elon Musk as an initiative based on the large language model (LLM) of the same name. Grok has apps for iOS and Android and is integrated with the X social network and Tesla's Optimus robot. The chatbot is named after the verb to grok, created by the American science fiction author Robert A. Heinlein to convey a form of deep, intuitive understanding.

READ MORE ON WIKIPEDIA

SYNONYMS & ALIASES

  • Data partitioning
  • Text division
  • Document splitting
  • Segmentation
  • Data segmentation

USAGE NOTE

Data splitting is a fundamental step in preparing datasets for training, validation, and testing machine learning models.

DEVELOPERS

Organizations developing technology related to Splitting.

  • LangChain

    A framework for developing applications powered by large language models, explicitly designed to enable 'chains' and 'agents' that break down complex tasks into manageable sub-tasks and prompt sequences for more robust AI workflows.

  • LlamaIndex

    A data framework for LLM applications that focuses on data integration and retrieval, supporting agentic behavior and query decomposition to handle complex user requests by splitting them into smaller, searchable parts.

  • OpenAI

    Develops large language models and APIs (e.g., Assistants API, function calling) that facilitate decomposing complex user requests into smaller, actionable steps or function calls, aligning with 'splitting' tasks for more effective AI interaction.

  • Google (Google AI / DeepMind)

    Conducts research and develops AI models and platforms (like Gemini) that incorporate advanced reasoning, planning, and agentic capabilities, which inherently involve decomposing complex problems into sub-problems for processing.

  • Anthropic

    Focuses on developing safe and helpful AI, utilizing sophisticated prompt engineering techniques and internal model reasoning (like Constitutional AI) that can involve breaking down complex instructions for robust and ethical outputs.

  • Cohere

    Provides large language models and tools for enterprise applications, often involving complex prompt strategies and chaining of operations to address specific business needs, which can be seen as 'splitting' a larger problem into smaller API calls or structured prompts.

  • Microsoft (Azure AI)

    Offers a suite of AI services and development tools, including capabilities for building and orchestrating complex AI workflows and agents, where tasks are often decomposed and managed across different components for scalability and efficiency.

RELATED TERMS IN PROMPTING & LOGIC