A safety mechanism that monitors LLM inputs and outputs to automatically detect and block harmful content, sensitive information leakage, and policy violations.
## What Are Guardrails? Guardrails (AI Guardrails) is a collective term for safety mechanisms that monitor LLM inputs and outputs to automatically detect and block harmful content generation, sensitive information leakage, and policy violations. Just as roadside guardrails prevent vehicles from veering off course, they keep AI behavior within acceptable boundaries. ### Input Side and Output Side Guardrails function across two primary layers. **Input Guardrails**: Inspect user input before it reaches the model. This includes prompt injection detection, personally identifiable information (PII) masking, and topic restrictions (blocking off-topic queries). **Output Guardrails**: Inspect model responses before they are returned to the user. This involves filtering harmful expressions, verifying factual accuracy (grounding), and checking for sensitive data leakage. ### Implementation Approaches It is common practice to combine rule-based approaches (regular expressions, keyword lists) with ML-based approaches (classification models, evaluation by a separate LLM). Designing guardrails in alignment with the risk categories outlined in the OWASP LLM Top 10 improves overall coverage. ### Operational Pitfalls Excessive guardrails degrade the user experience. When legitimate work-related queries are incorrectly blocked — so-called "false positives" — users stop using AI tools altogether. Threshold tuning and transparent feedback explaining why a query was blocked are key to effective operation.


AI governance refers to the organizational policies, processes, and oversight mechanisms that ensure ethics, transparency, and accountability in AI system development and operation.

Agentic RAG is an architecture in which an LLM autonomously and iteratively generates search queries, evaluates results, and decides whether to re-retrieve information as an agent, achieving answer accuracy that cannot be obtained with simple single-turn RAG.

RAG (Retrieval-Augmented Generation) is a technique that improves the accuracy and currency of responses by retrieving relevant information from external knowledge sources and appending the results to the input of an LLM.
