Agentic RAG

Agentic RAG

Agentic RAG is an architecture in which an LLM autonomously and iteratively generates search queries, evaluates results, and decides whether to re-retrieve information as an agent, achieving answer accuracy that cannot be obtained with simple single-turn RAG.

Differences from Conventional RAG

A standard RAG pipeline operates in a linear flow: "user question → vector search → pass retrieved documents to LLM → generate answer." This is sufficient when the intent of the question is clear and the necessary information can be retrieved in a single search, but in practice, there are frequent cases where a single search does not yield all the required information.

In Agentic RAG, the LLM itself determines whether "the search results are insufficient" or "the query should be changed," rewriting the query or querying a different data source as needed. By incorporating multi-step reasoning, it can progressively collect and integrate multiple pieces of information to construct a final answer.

When Is It Effective?

Consider the example of querying an internal knowledge base. A question such as "Which proposal templates were used in the top 3 deals by sales last month?" requires multiple steps: searching sales data → identifying the deals → searching the proposal documents for each deal. By having the agent handle this decomposition and sequential search, the user can obtain an answer with a single question.

However, as the number of agent loop iterations increases, so do latency and token costs. Setting a loop limit and designing the system to return intermediate progress via streaming are essential for production use.