Chunk size

Chunk size

Chunk size refers to the size (in number of tokens or characters) of the unit into which documents are split when stored in a vector store within a RAG pipeline. It is a critical parameter that directly affects retrieval accuracy and answer quality.

Why Splitting Is Necessary

LLMs have an upper limit on their context window. Since it is not possible to pass hundreds of pages of internal manuals as-is, documents must be split into appropriate granular units (chunking), vectorized, and only the sections relevant to a query retrieved. At this point, "how large to make each cut" becomes the question of chunk size.

Too Large or Too Small Both Cause Problems

If chunks are too small, a single chunk lacks sufficient context, meaning that even when retrieved, it may not contain the information the LLM needs to construct an answer. Conversely, if chunks are too large, irrelevant information enters as noise, degrading answer accuracy while also increasing token costs.

Generally, 256–1,024 tokens is considered a starting point, but the optimal value depends on the domain and the nature of the queries. For short Q&A content such as FAQs, a smaller size is appropriate; for documents where surrounding context is important, such as technical specifications, a larger size is the standard practical approach.

The Technique of Overlap

To mitigate the problem of context being cut off at chunk boundaries, "overlap"—partially duplicating adjacent chunks—is commonly used. For example, with a chunk size of 512 tokens and an overlap of 64 tokens, the last 64 tokens of the previous chunk are also included at the beginning of the next chunk. This contributes to improved accuracy in BM25 and vector search, though storage and index size increase as a result.