// MODEL OPTIMIZATION AND PROMPT SYNTAX TERM
Chunking
Chunking is the process of breaking down large texts or documents into smaller, more manageable pieces. This helps ensure that the information fits within the processing limits of AI models.
TECHNICAL DEFINITION
The process of segmenting extensive textual data into smaller, semantically coherent units (chunks) to optimize for context window limitations and improve retrieval accuracy in RAG systems by providing focused context.
BACKGROUND
Retrieval-augmented generation (RAG) is a technique that enables large language models (LLMs) to retrieve and incorporate new information from external data sources. With RAG, LLMs first refer to a specified set of documents, then respond to user queries. These documents supplement information from the LLM's pre-existing training data. This allows LLMs to use domain-specific and/or updated information that is not available in the training data. For example, this enables LLM-based chatbots to access internal company data or generate responses based on authoritative sources.
READ MORE ON WIKIPEDIASYNONYMS & ALIASES
- Text segmentation
- Document chunking
- Context window management
- Text splitting
- Passage splitting
USAGE NOTE
Effective chunking is crucial for fitting relevant information into a large language model's context window without losing critical context.
DEVELOPERS
Organizations developing technology related to Chunking.
LangChain provides an open-source framework for developing applications powered by large language models, offering specific 'text splitters' for efficient document chunking to manage context windows and optimize retrieval-augmented generation (RAG).
LlamaIndex is a data framework for LLM applications that specializes in connecting LLMs with external data. It offers various data connectors and index builders that implicitly and explicitly involve chunking strategies for effective data retrieval and context management.
Unstructured provides APIs and open-source tools to prepare unstructured data for large language models, focusing on advanced document parsing, cleaning, and intelligent chunking to create high-quality input for RAG and other AI applications.
Pinecone is a leading vector database used for building AI applications, especially RAG. While it stores vector embeddings, its effectiveness relies heavily on how data is pre-processed and chunked before being embedded and ingested into the database.
Cohere offers enterprise-grade LLMs, embedding models, and RAG capabilities. Their tools for search and generation often require careful document preparation, including strategic chunking, to maximize the relevance and quality of AI outputs.
Hugging Face provides a vast ecosystem of tools, models, and datasets for NLP. Its 'transformers' and 'datasets' libraries are fundamental for text preprocessing, tokenization, and managing document context, which often involves various forms of chunking for effective LLM input.
Weaviate is an open-source vector database that supports various data types and integrates with LLMs. It often provides guidance and features for preparing and chunking data to create effective vector embeddings for semantic search and RAG applications.