
"Least Privilege" for AI agents is a design principle in which an agent is granted only the minimum tool execution permissions and API scopes necessary to accomplish its objectives, with everything else completely blocked.
This article is intended for engineers deploying LLM-based agents in production environments and architects looking to drive autonomous business automation while meeting security requirements. It walks through implementation steps in order: designing tool whitelists, minimizing API scopes, incorporating HITL (Human-in-the-Loop) approval flows, and implementing audit logs and anomaly detection.
By the end of this article, you will be equipped to systematically mitigate risks from Excessive Agency and integrate permission design aligned with NIST SP 800-53 AC-6 compliance into your organization's development workflow.
Permission design should begin only after clarifying "what the agent is allowed to do." Proceeding with an ambiguous objective tends to result in excessive permissions or design gaps. Before implementing least privilege for tool execution and API calls, three things must be organized: the agent's responsibilities, a risk classification of existing tools, and audit requirements. The approach of "just give it broad permissions for now and see what happens" is a common breeding ground for Excessive Agency. The following sections explain how to work through each of these areas.
Conclusion: The starting point for least privilege design is narrowing down the agent's responsibilities until you can describe what it does in a single sentence — before granting it any permissions.
If responsibilities remain vague and you decide to "hand over all the tools just in case they're useful," you create a breeding ground for Excessive Agency. The following elements should be explicitly defined when defining responsibilities:
Responsibilities are then translated into a scope — the set of callable tools and APIs. The scope parameter in RFC 6749 (OAuth 2.0) is one implementation example, allowing fine-grained definitions such as read:faq. In accordance with the NIST SP 800-53 AC-6 principle of "grant only the minimum permissions necessary to perform a task," verify that each tool call is directly required for achieving the stated objective.
Enumerate all tools and APIs the agent may potentially access, and classify them by risk using two axes: scope of impact and reversibility.
Example Risk Level Classifications
NIST SP 800-53 Rev.5 AC-6 (Least Privilege) prescribes an approach of "starting from the most restricted permissions and expanding only when a business necessity has been demonstrated." Treat everything as high risk by default, and only downgrade the classification when sufficient justification is in place. Even the same tool can carry different risk levels depending on the endpoint (e.g., read is medium, delete is high). Classify at the method and endpoint level, and manage the results in YAML or a spreadsheet to serve as input for the subsequent whitelist design and audit log design.
Before finalizing permission design, decide "what to record, to what extent, and for how long." It is not uncommon for post-incident investigations to stall because no evidence trail was preserved.
The applicable standards are NIST SP 800-53 Rev.5 AU-2 (Audit Event Definition) and AU-6 (Audit Record Review). Agent tool execution logs should be designed within this framework.
Items to record include: tool call start, completion, and failure; API scope requests and grants; and HITL approval outcomes. Retention periods are determined by internal policy and industry regulations — in regulated sectors such as finance and healthcare, longer retention than the general guideline (90 days to 1 year) is often required.
To prevent tampering, write access to log storage is not granted to the agent itself. Adopt append-only storage or signed logs, and operate with a combination of automated alerts based on AU-6 and manual sample reviews by personnel.
Conclusion: Tool whitelists should be built on the design philosophy of "permit only the minimum necessary" rather than "enumerate what can be used."
The more tools that are permitted, the broader the attack surface becomes — "allow all tools for now" simply does not hold up in production environments. This section explains how to define the minimum set of permitted tools, when to use dynamic resolution versus static whitelists, and how to design rate limits.
Conclusion: Permitted tools should be determined by working backward from responsibilities — keep only what is needed to fulfill those responsibilities, not what might possibly be used.
Accumulating tools on the basis of "might need them later" only expands the attack surface with tools that fall outside the defined scope of responsibility. The process for determining the minimum set is as follows:
For a customer support agent, if "ticket lookup," "FAQ search," and "reply send" are sufficient, then "ticket deletion" and "user information update" should be removed. Once the minimum set is established, require justification comments for any new tool additions to prevent privilege creep.
Without a static whitelist as the foundation, permission boundaries tend to become ambiguous.
A static whitelist is an approach in which callable tools are finalized at deploy time. Any changes must go through code review and an approval workflow, eliminating the risk of unknown tools being added at runtime, and aligning with NIST SP 800-53 AC-6.
Dynamic tool resolution is an approach that references a catalog at runtime. It is effective during phases where new tools are added frequently, but allowing it without restriction increases the risk of excessive agent permissions.
Practical guidelines for choosing between the two:
Even when adopting dynamic resolution, establish a rule that tools must never be sourced from outside the catalog, version-control the catalog, and record change diffs in audit logs.
Even if the permitted tools are narrowed down via a whitelist, it is meaningless without limiting call frequency. If indirect injection or loop execution occurs, unrestricted calls can lead to system failures or cost explosions.
Rate limiting and call count restrictions should combine the following three dimensions:
As an implementation example, vary limits by risk classification — such as "20 calls per session, 5 calls per minute" for file writes, and "10 calls per session, 2 calls per minute" for external HTTP requests. When limits are exceeded, pair structured log recording with alert notifications rather than failing silently, and manage throttling centrally via middleware on the agent side.
Conclusion: Minimizing the scopes granted to API keys is the most direct means of limiting the blast radius in the event of an agent compromise.
Even if a whitelist restricts "what can be called," if the API key itself holds excessive permissions, the impact of a compromise will be far-reaching. Scope design under RFC 6749 (OAuth 2.0) is a canonical solution to this problem, splitting the credentials passed to an agent by intended use. This section explains the boundary design for read/write separation, tenant isolation, and secret injection.
Conclusion: Default API keys to "read-only," and grant write permissions only to tools that can demonstrate a necessity for them.
There are more use cases that can be completed with read access alone than you might expect.
contents:read or pull_requests:write at the operation level rather than using repoThe scope parameter in RFC 6749 (OAuth 2.0) is the standard mechanism for this separation, and NIST SP 800-53 AC-6 also requires that access be limited to explicitly authorized actions.
In practice, the three key points are: issue keys by purpose (inject only a read-only key into agents by default); issue write keys as short-lived tokens on a temporary basis following HITL approval; and immediately revoke any scopes that are no longer needed through periodic reviews.
Conclusion: In multi-tenant environments, failing to strictly isolate API keys, scopes, and data between tenants will result in "cross-tenant leakage."
Most incidents are caused by reusing credentials without binding them to a tenant identifier.
tenant_id as metadata and validate ittenant:{id}:read in OAuth 2.0 scopes (RFC 6749, Section 3.3)X-Tenant-ID and verify it against the token's claims. Reject immediately on mismatchtenant_idIndependent isolation is required at each layer—API gateway, storage, and logging. The basis for this is the "per-resource authentication and authorization" principle of NIST SP 800-207 (Zero Trust).
The fundamental design principle is to never pass secrets (API keys and credentials) directly into an agent's context window, but instead inject them with minimal scope immediately before execution.
Embedding API keys or OAuth tokens in a system prompt creates a risk of exposure via prompt leaking or indirect injection attacks. Secrets should be treated as "things the agent has no need to know."
There are three injection methods. Environment variable injection (recommended) retrieves secrets from AWS Secrets Manager or HashiCorp Vault at container startup and passes them as environment variables, so they are never included in the LLM's context. In the sidecar pattern, a tool execution wrapper holds the secrets, and the LLM receives only the tool name and its arguments. Hardcoding in the system prompt is not recommended.
The three key points for boundary design are: partition scopes per tool and avoid reusing a single key; shorten token lifetimes and issue short-lived tokens (effective when used in conjunction with RFC 6749); and incorporate secret masking on log output into the injection layer.
Even with restricted permissions, the risk of an agent making "incorrect judgments within the permitted scope" can never be reduced to zero. For actions that are difficult to reverse—such as file deletion, external data transmission, and payment processing—incorporating a HITL approval flow provides a final safety net. The following sections walk through specific implementation steps in order, from designing approval triggers to handling UI and timeouts, through to automatic escalation.
Having humans approve every action defeats the purpose of automation. Trigger design based on risk level determines the balance between speed and safety.
A policy of "require approval for all operations" leads to approval fatigue, where the responsible party stops reviewing the content and simply clicks approve.
For assessment, scoring across three axes is effective: irreversibility, blast radius, and whether privilege escalation is involved. Irreversibility covers operations that are difficult to undo, such as data deletion or sending emails; blast radius asks whether the impact is limited to a single record or could spread across multiple tenants; and privilege escalation asks whether the action involves access outside the normal scope. HITL approval is triggered when any of these scores high.
Concrete examples of mandatory approval include: destructive calls such as DELETE /users/{id}; bulk email sends exceeding 100 recipients; and use of OAuth tokens outside a read-only scope. Calling the same API 10 or more times consecutively should be automatically flagged.
NIST SP 800-53 AC-6(1) provides the basis for human approval. Trigger conditions should be externalized in YAML or a similar format.
Conclusion: The approval UI should consolidate "what and why is being approved" on a single screen, and timeouts must always default to rejection.
There are four elements that must be included on the approval screen.
The principle for timeouts is "deny by default."
| Risk Level | Timeout Guideline | Behavior on Expiration |
|---|---|---|
| Low (read) | 5 minutes | Auto-approval permitted |
| Medium (write) | 15 minutes | Auto-deny and log |
| High (delete/transfer) | 5 minutes | Auto-deny and alert |
Both mobile notifications and a Web UI are used in combination, but since excessive notifications lead to "notification fatigue" and cause items to be overlooked, notifications should be limited to high-risk actions.
Escalation should be automated to handle situations where approvers are unresponsive or risk levels spike suddenly. Relying solely on the assumption that "a human will always review" creates a binary outcome after a timeout: either operations halt or processing proceeds without approval.
There are three trigger conditions for automatic escalation.
Escalation targets are managed in a configuration file, with up to three levels defined, such as "primary approver → team lead → security officer." If approval is not obtained even at the final level, execution of the relevant tool is automatically blocked.
When escalation occurs, the reason, timestamp, and target action are recorded in a structured log, ensuring the audit trail required by NIST SP 800-53 AU-2. The history is also useful for reviewing permission design going forward, so it is accumulated as data.
Even if tool whitelists and API scopes are carefully designed, deviations cannot be detected unless actual calls are recorded and monitored. Permission design does not function by design alone — it only works when execution logs are continuously monitored. NIST SP 800-53 AU-2 and AU-6 also explicitly list the definition of audit events and periodic reviews as control requirements. This section explains, in order, the schema design for structured logs, detection rules for permission deviations, and the implementation of a kill switch.
The fields for audit logs are as follows (compliant with NIST SP 800-53 AU-2/AU-6).
timestamp: UTC timestamp in ISO 8601 formatagent_id: Unique identifier for the agent instancetool_name: Name of the executed tool or API endpointaction_type: Distinction between read / write / delete / invokerequested_scope: The scope that was requestedgranted_scope: The scope that was actually grantedresource_id: Identifier of the resource being operated onresult_status: success / denied / errorsession_id: Key linking to the HITL approval flowSigns of permission deviation can be automatically detected from the difference between requested_scope and granted_scope. JSON Lines is the recommended output format, and data should be transferred to write-once (append-only) storage upon saving.
Conclusion: A combination of rule-based detection and statistical anomaly detection is effective for identifying permission deviations. Relying on a single method tends to result in missed detections, so monitoring should be implemented across multiple layers.
Representative patterns for rule-based detection
Statistical anomaly detection
Rules alone make it difficult to catch cases that "appear normal but are abnormal in volume." A baseline is calculated from historical data, and an alert is triggered when a value exceeds a set multiple of the standard deviation (NIST SP 800-53 AU-6 requires continuous analysis). Since low accuracy causes operations teams to start ignoring alerts, it is advisable to begin with alerts only, measure the false positive rate, and then switch to blocking.
Conclusion: A kill switch must be designed not just to "stop" but to "stop safely." Without designing an interruption procedure for in-progress tasks, there is a risk of data corruption.
Invalidating an API key alone allows in-flight calls to complete, potentially causing unintended writes or external transmissions. Design your kill switch across three layers.
Design each operation as idempotent so that double-triggering produces no side effects. After shutdown, switch logs to preservation mode in accordance with NIST SP 800-53 AU-6. Automatic triggering should be driven by privilege deviation detection or HITL timeouts, with a separate endpoint also provided for manual triggering.
Conclusion: Most failures stem from the initial decision to "just grant broad permissions for now."
Failure 1: Granting all tools at once This occurs when a PoC is moved to production as-is. Make it a process to narrow permissions down before going to production.
Failure 2: Not separating read and write This widens the blast radius of a prompt injection attack, so read-only keys should be the default.
Failure 3: Bypassing HITL approval Exception handling and retries can sometimes circumvent the approval flow. Place approval checks outside of exception handling.
Failure 4: Treating audit logs as "collect only" Without alert rules, anomalies go unnoticed. Implement log collection and detection rules together as a set.
Failure 5: Hardcoding secrets Combine externalization to a secrets management service with automated scanning during code review.
These failures can be addressed systematically when combined with the "prevent through structure" approach described in What is Harness Engineering? A Design Methodology for Structurally Preventing AI Agent Mistakes.
Q1. Won't the principle of least privilege restrict functionality too much? Proper scope design allows both functionality and safety to coexist. "Starting narrow and expanding as needed" reduces the risk of problems surfacing later.
Q2. Won't HITL slow down processing? By limiting it to high-risk actions, routine low-risk operations can still be handled automatically. Timeouts and automatic escalation also minimize bottlenecks.
Q3. How long should audit logs be retained? NIST SP 800-53 AU-2 and AU-6 require retention periods to be set according to the organization's risk profile. Industry regulations take precedence, but a minimum of 90 days or more is recommended.
Q4. How does permission design change in a multi-agent configuration? Rather than delegating the parent agent's permissions wholesale, the basic principle is "split delegation"—passing only the permissions required for each specific task. See also What is Multi-Agent AI? From Design Patterns to Implementation and Operational Considerations.
Conclusion: Design to the standard of "can it operate safely," not just "does it work."
Here is a recap of the key points from each step.
A practical approach is to start by applying a whitelist and read-only scopes in a PoC, then begin operations with a minimal configuration that includes HITL. For a broader view of AI agent security design, refer to What is AI Governance? A Practical Guide from EU AI Act Compliance to Internal Policy Development.

Yusuke Ishihara
Started programming at age 13 with MSX. After graduating from Musashi University, worked on large-scale system development including airline core systems and Japan's first Windows server hosting/VPS infrastructure. Co-founded Site Engine Inc. in 2008. Founded Unimon Inc. in 2010 and Enison Inc. in 2025, leading development of business systems, NLP, and platform solutions. Currently focuses on product development and AI/DX initiatives leveraging generative AI and large language models (LLMs).