A prompting technique that improves accuracy on complex tasks by having the LLM explicitly generate intermediate reasoning steps.
Chain of Thought (CoT) is a prompting technique that improves accuracy on complex tasks by explicitly having an LLM generate intermediate reasoning steps.
For a problem such as "There are 3 apples and 5 oranges. What is the total?", instead of having the LLM answer "8" directly, it is guided to output the intermediate process: "3 apples + 5 oranges = 8." The difference is hard to notice with simple addition, but accuracy improves significantly for problems involving multi-step reasoning or conditional branching—such as determining whether legal requirements are satisfied.
Simply adding "Please think step by step" to a prompt can be effective. This is called Zero-shot CoT.
Reasoning models are designed with CoT built into the model itself, automatically generating a chain of thought without any prompting. On the other hand, CoT can also be elicited from standard LLMs through prompt engineering, so the practical approach is to first try it on the prompt side, and switch to a reasoning model if the accuracy is insufficient.
One important caveat: CoT increases the number of output tokens, which raises costs. Rather than applying it to all requests, the smart operational approach is to limit its use to queries where accuracy is critical.


Multi-step reasoning is a reasoning approach in which an LLM arrives at a final answer not through a single response generation, but by going through multiple intermediate steps, such as generating sub-questions, verifying partial answers, and retrieving additional information.

Prompt engineering is the practice of designing the structure, phrasing, and context of input text (prompts) in order to elicit desired outputs from LLMs (Large Language Models).

A technique that cross-references LLM outputs with external data sources and search results to generate factually grounded responses. A core method for reducing hallucinations.


Agentic RAG is an architecture in which an LLM autonomously and iteratively generates search queries, evaluates results, and decides whether to re-retrieve information as an agent, achieving answer accuracy that cannot be obtained with simple single-turn RAG.