RAG (Retrieval-Augmented Generation)とは？

RAG (Retrieval-Augmented Generation)

Updated:March 9, 2026Published:March 7, 2026

RAG (Retrieval-Augmented Generation) is a technique that improves the accuracy and currency of responses by retrieving relevant information from external knowledge sources and appending the results to the input of an LLM.

LLMs only possess knowledge up to their training cutoff date. Moreover, even with the knowledge they do have, they can be confidently wrong (hallucination). RAG has established itself as a practical solution to these two weaknesses.

The mechanism is intuitive. Upon receiving a user's question, relevant documents are first retrieved from internal documents or a knowledge base. The retrieved results are then passed to the LLM along with the question. The LLM generates a response based not only on its own knowledge, but grounded in the provided documents. Since sources can be explicitly cited, verifying responses becomes straightforward.

Breaking down the components of RAG, they consist of document preprocessing (chunking), vector embedding, similarity search (semantic search), and prompt construction for the LLM. Each step involves choices, and something as simple as how chunks are split can significantly impact response quality.

The distinction between RAG and fine-tuning is frequently debated, but they serve different roles. RAG is a method for "having the model reference external knowledge," while fine-tuning is a method for "adjusting the model's behavior and tone." If the goal is to have the model accurately answer questions based on internal manuals, RAG is the reasonable starting point; if the goal is to standardize the format and style of responses, fine-tuning is. Many projects employ both in combination.

RAG (Retrieval-Augmented Generation)

Related Terms

AI ROI (Return on Investment in AI)

AI Observability

Ambient AI

BPO (Business Process Outsourcing)