RAG (Retrieval-Augmented Generation) is a technique that improves the accuracy and currency of responses by retrieving relevant information from external knowledge sources and appending the results to the input of an LLM.
LLMs only possess knowledge up to their training cutoff date. Moreover, even with the knowledge they do have, they can be confidently wrong (hallucination). RAG has established itself as a practical solution to these two weaknesses. The mechanism is intuitive. Upon receiving a user's question, relevant documents are first retrieved from internal documents or a knowledge base. The retrieved results are then passed to the LLM along with the question. The LLM generates a response based not only on its own knowledge, but grounded in the provided documents. Since sources can be explicitly cited, verifying responses becomes straightforward. Breaking down the components of RAG, they consist of document preprocessing (chunking), vector embedding, similarity search (semantic search), and prompt construction for the LLM. Each step involves choices, and something as simple as how chunks are split can significantly impact response quality. The distinction between RAG and fine-tuning is frequently debated, but they serve different roles. RAG is a method for "having the model reference external knowledge," while fine-tuning is a method for "adjusting the model's behavior and tone." If the goal is to have the model accurately answer questions based on internal manuals, RAG is the reasonable starting point; if the goal is to standardize the format and style of responses, fine-tuning is. Many projects employ both in combination.


Agentic RAG is an architecture in which an LLM autonomously and iteratively generates search queries, evaluates results, and decides whether to re-retrieve information as an agent, achieving answer accuracy that cannot be obtained with simple single-turn RAG.

A next-generation RAG architecture that combines knowledge graphs and vector search, leveraging relationships between entities to improve retrieval accuracy.

A data model that represents entities and their relationships in a graph structure. It is used to improve the accuracy of RAG and AI search.

What is a Vector Database? A Complete Guide to How It Works, Top Product Comparisons, and RAG Applications

A technique that cross-references LLM outputs with external data sources and search results to generate factually grounded responses. A core method for reducing hallucinations.