Context Windowとは？

A context window refers to the upper limit on the number of tokens an LLM can handle in a single inference pass, encompassing the combined maximum length of the input prompt and output. Any text exceeding this limit becomes "invisible" to the model, making it a critical parameter that directly affects the accuracy of long-document processing and the quality of multi-turn conversations.

Why the Context Window Matters

LLMs (Large Language Models) process text by splitting it into units called tokens. The context window serves as the "container" for these tokens—a smaller container prevents the model from ingesting long documents at once, while a larger one allows inference with access to a broader range of information.

The practical implications come down to two key points:

Ability to process long documents: Whether materials spanning tens of thousands to hundreds of thousands of tokens—such as legal documents, financial reports, or entire source code repositories—can be provided all at once
Conversational continuity: How far back a chatbot or AI agent can "remember" the history of an exchange depends directly on the size of the context window

Recent models have begun to feature context windows on the scale of hundreds of thousands to one million tokens, with major models such as GPT, Claude, and Gemini each setting their own respective limits.

Technical Mechanisms and Constraints

The size of the context window is closely tied to the model's architecture, particularly the design of the Attention mechanism. Because Transformers compute Self-Attention over the entire input, increasing the token count causes computational load and GPU (Graphics Processing Unit) memory consumption to surge rapidly. This is the fundamental reason why the window cannot be expanded without limit.

Furthermore, research has pointed out a tendency to overlook information located near the middle of the context window, even when that window is large. Information at the beginning and end of a long context tends to be retained, while attention to the middle portion fades—a phenomenon known as the "Lost in the Middle" problem. Judging processing quality solely by the context window size is therefore risky, and it must be evaluated alongside the risk of hallucination.

Relationship to RAG and Agents

A representative approach for compensating for context window constraints is RAG (Retrieval-Augmented Generation). Rather than cramming entire documents into the window, relevant information is retrieved and dynamically injected, effectively expanding the practical reference range. Chunk size design is particularly important in this context—chunks that are too large strain the window, while chunks that are too small break up the surrounding context.

In AI agents and multi-agent systems, it has also become common to design systems where multiple agents collaborate and divide tasks, enabling processing that exceeds the context limit of any single model. The concept of context engineering is also attracting attention, and is being systematized as the practice of strategically designing what information to place within a limited window and in what order.

Implementation Considerations

One aspect that tends to be overlooked when working with context windows is that input and output are counted together. For example, if a 120,000-token prompt is given to a model with a 128,000-token window, the available space for output is limited to the remaining 8,000 tokens. For use cases requiring lengthy outputs—such as multi-step reasoning with a reasoning model, or code generation tasks like those performed by Claude Code—the design of output token headroom can make or break quality.

The context window is not a simple metric where bigger is always better. Leveraging it appropriately while accounting for the trade-offs among cost, latency, and accuracy is a core architectural decision in operating LLMs in production environments.

Context Window

Why the Context Window Matters

Technical Mechanisms and Constraints

Relationship to RAG and Agents

Implementation Considerations

Related Terms

AI ROI (Return on Investment in AI)

AI Observability

Ambient AI

BPO (Business Process Outsourcing)