What is NVIDIA OpenShell? A Quick Start Guide to the Sandbox for Running AI Agents Safely

Updated:May 30, 2026Published:May 30, 2026

Lead

NVIDIA OpenShell is an open-source runtime environment designed to run AI agents safely within a sandbox. It prevents agents from directly accessing host files or credentials, controls external network access through declarative policies, and enables logging of permitted and denied operations.

This article is intended for developers and IT administrators who want to safely experiment with coding agents such as Claude Code. Following the official documentation (github.com/NVIDIA/OpenShell, Apache 2.0 license), it walks through the steps from installing OpenShell to creating your first sandbox and configuring policy-based access control. By the end, you will be able to reproduce a minimal configuration for running agents in isolation from your host environment.

OpenShell is a policy-driven sandbox designed with the assumption that agents actively rewrite their own working environment. This section clarifies how it differs from ordinary containers and what it protects against.

How It Differs from Regular Containers

Ordinary containers such as Docker are primarily designed to run applications in isolation. You build an image, launch it, and execute your application inside it — the environment is largely static.

AI agents, on the other hand, write code, install packages, and modify configuration files, continuously changing their own working environment. OpenShell differs in that it provides an isolated space where this kind of "environment-modifying agent" can safely operate through trial and error.

Specifically, it permits only the operations an agent requires and blocks everything else — reading host files, accessing credentials, and communicating with unauthorized destinations — at the OS level. Where containers focus on "application isolation," OpenShell centers on "minimizing agent privileges."

The Four Protection Domains (File · Network · Process · Inference)

OpenShell's defining characteristic is "out-of-process" enforcement: rather than constraining agents through prompts (behavioral instructions), it imposes restrictions on the environment in which the agent runs. Because the constraints operate outside the agent, they cannot be overridden by the agent itself, even if it is compromised.

The official documentation identifies four threats to defend against, each mapped to a corresponding protection domain (policy domain).

Filesystem: Prevents reading of SSH keys, cloud credentials, and similar sensitive data (credential theft). Uses the Linux kernel's Landlock to block reads and writes outside permitted paths; locked at sandbox creation time.
Network: Prevents exfiltration of source code or internal files (data leakage). Denies outbound communication to any destination not explicitly permitted; supports hot-reload while running.
Process: Prevents privilege escalation via sudo or setuid. Restricts dangerous system calls using unprivileged process IDs and seccomp; locked at creation time.
Inference: Prevents data from being sent to unauthorized model providers. Acts as a privacy router that keeps sensitive context local using open models, routing to frontier models such as Claude or GPT only when the policy permits.

Policies are written in declarative YAML. Static domains (filesystem and process) are fixed at creation time, while dynamic domains (network and inference) can be updated without restarting.

Supported Agents and Licensing

OpenShell is designed as a general-purpose runtime that is not tied to any specific agent. The official quickstart lets you choose from agents including Claude Code, Codex, and OpenCode. The CLI automatically detects credentials for recognized agents from the shell environment, so in most cases agents can be run inside the sandbox without any code changes.

The project is open source under the Apache 2.0 license, with source code available at github.com/NVIDIA/OpenShell. At the time of writing, it is in an early 0.0.x stage, and the command structure and policy schema may change in future updates. When using OpenShell in practice, always consult the official documentation at docs.nvidia.com/openshell for the latest specifications.

Prerequisites — Four Requirements to Prepare

To try OpenShell, you need four things: a supported OS, a container runtime environment, the CLI, and agent credentials. We'll walk through each one in order.

Supported Operating Systems and Container Runtimes

To run OpenShell, the following environment is required.

One of: macOS / Linux / Windows + WSL 2
One of the following available: Docker / Podman / MicroVM
OpenShell CLI and OpenShell gateway

Because filesystem isolation uses the Linux kernel's Landlock, the full protection capabilities are available in Linux environments (and WSL 2 or containers running on a Linux kernel). On macOS, it operates through a virtualization layer. For enterprise use, a configuration running long-lived agents on DGX Spark is also documented.

It's a good idea to first confirm that a container runtime (such as Docker) is running on your local OS before proceeding to installation — this will help you avoid common pitfalls.

Agent Accounts and API Keys

You will need to separately prepare accounts or API keys for the agent you intend to run inside the sandbox (such as Claude Code). OpenShell itself is a container for securely isolating agents; it does not provide access rights to the agents themselves.

The OpenShell CLI automatically detects credentials for recognized agents (Claude, Codex, OpenCode, Copilot, etc.) from the shell's environment variables. This means you can pass through the authentication credentials you already use on the host side as-is. It is also possible to explicitly register credentials (providers).

The important point here is that these credentials are protected by the sandbox policy. Only what the agent needs in order to operate is passed through, and it is isolated from files and network communications outside the policy.

Step 1 — Install OpenShell and Verify Connectivity

Installation is handled in one shot with the official script, which installs both the CLI and the gateway together. After installation, use status and help to verify everything is working.

Installing the CLI and Gateway

Run the official installation script.

bash

curl -LsSf https://raw.githubusercontent.com/NVIDIA/OpenShell/main/install.sh | sh

This script installs the "OpenShell CLI" and the "OpenShell gateway," and automatically starts the local gateway. The CLI is the interface for operations, while the gateway is the resident component that actually enforces sandbox and network policies. The official documentation also describes an alternative installation method using uv tool install -U openshell.

Note that the curl | sh style of installation carries a general risk in that the script's contents cannot be reviewed before execution. If this is a concern, you can open the URL in an editor to read through the script before running it, or opt for installation via uv instead.

Verifying Connectivity with status and help

After installation, check the gateway status.

bash

openshell status

If everything is working correctly, the gateway name, server URL (e.g., a local address such as https://127.0.0.1:17670), connection state (Connected), and version will be displayed. If Connected appears, the CLI can reach the gateway.

To view a list of available subcommands, use help.

bash

openshell --help

If status does not show Connected, check whether the gateway process is running and whether the local port is conflicting with another process. Confirming connectivity here makes it easier to distinguish between "a CLI issue" and "a policy issue" if you run into problems later when creating a sandbox.

Step 2 — Create Your First Sandbox

The basic workflow in OpenShell is to create a sandbox with openshell sandbox create and launch an agent inside it. Let's try running Claude Code in an isolated environment.

Launching claude Inside a Sandbox

Create a named sandbox and launch Claude Code inside it.

bash

openshell sandbox create --name demo -- claude

--name demo assigns the name "demo" to the sandbox, and claude, passed after --, is executed inside the sandbox. Once launched, Claude Code begins running within the sandbox.

-- serves as a delimiter meaning "everything after this is the command to run inside the sandbox." In addition to Claude Code, other agents supported by the CLI — such as Codex or OpenCode — can be launched in the same way. Agent credentials are automatically detected from the host environment, so in most cases you can start using them immediately without any additional configuration.

Confirming the Agent Is Running Inside the Sandbox

Verify that the launched agent is running inside the sandbox, not on the host. For example, ask Claude Code for the current directory.

What is the absolute path of the current folder?

In OpenShell's default configuration, the working directory is /sandbox. If the agent returns /sandbox, you can confirm that it is operating inside the isolated environment managed by OpenShell, not on the host filesystem itself.

This check — confirming "where is it actually running?" — may seem minor, but it is important. Before delegating file operations or command execution to an agent, verify with your own eyes that it is separated from the host environment. Skipping this step can lead to an incident where you believed the agent was isolated, but it was actually touching the host.

Assigning Tasks to the Agent and Reviewing the Output

Once the sandbox is running, try giving the agent a simple task. For example, issue the following instruction.

Please create hello_world.py.

The agent will create the file inside the sandbox. To review the generated code, exit Claude Code with /exit and inspect the contents from the sandbox shell.

bash

sandbox@xxxx:~$ pwd
/sandbox
sandbox@xxxx:~$ ls
hello_world.py
sandbox@xxxx:~$ cat hello_world.py
print("Hello, World!")

When you are done, type exit to leave the sandbox shell and return to your host PC's shell. Files created by the agent are confined under /sandbox and do not pollute the host file tree. This state — where generated artifacts are contained within the sandbox — forms the foundation for safe experimentation.

Step 3 — Managing Sandboxes (List · Reconnect · Delete)

Created sandboxes are managed through listing, reconnecting, and deletion. Once a trial is complete, delete any unnecessary sandboxes to keep the environment tidy.

When to Use list, connect, and delete

Use the following command to view the list of sandboxes.

bash

openshell sandbox list

This displays the name (NAME), creation date and time (CREATED), and status (PHASE). If a sandbox is running, PHASE will show Ready (other possible states include Provisioning, Error, and Deleting). Appending -o json or -o yaml outputs the results in a machine-readable format.

To reconnect to an existing sandbox, use connect, then restart the agent inside it.

bash

openshell sandbox connect demo
claude

Delete sandboxes that are no longer needed using delete. Upon deletion, processes are stopped, resources are released, and any injected credentials are also discarded.

bash

openshell sandbox delete demo

Note that connect only attaches to the sandbox shell — starting the agent is a separate operation. When you want an at-a-glance view of running status, the openshell term dashboard is also available. Rather than leaving sandboxes running indefinitely, cleaning them up with sandbox delete after verification prevents stale state and policies from lingering, which would otherwise make it harder to isolate unexpected behavior.

Step 4 — Controlling Network Access with Policies

OpenShell's true value lies in its ability to control network access through declarative policies. Here, we will walk through the full process: confirming that URLs not permitted by default are blocked, then editing the policy to allow access.

URLs Not Allowed by Default Are Blocked

A sandbox starts with "minimal outbound access." URLs not permitted by the default policy are unreachable.

First, connect to the sandbox.

bash

openshell sandbox connect demo

Try accessing a non-permitted URL using curl.

bash

curl -I https://example.com/some-path

Communication to hosts not included in the default policy is blocked by the gateway proxy. This configuration closes off the risk of an agent arbitrarily sending data to external destinations (data exfiltration). The principle of least privilege — "deny by default, explicitly allow only what is necessary" — is consistently applied at the network layer as well.

Retrieving and Reading a Policy

Export the current policy to a file.

bash

openshell policy get demo --full > current-policy.yaml

The exported file begins with metadata such as Version, Hash, Status, Active, and Created, followed by a --- separator and then the YAML body. From the YAML body, the following can be read:

Which hosts and URLs are accessible (network_policies)
Which commands (binaries) are permitted to make network access (binaries)
Which folders can be read from or written to (filesystem_policy)
Which user/group processes run as (process)

The YAML body has roughly the following structure (excerpt):

yaml

version: 1
filesystem_policy:
include_workdir: true
read_only:
- /usr
- /etc
read_write:
- /sandbox
- /tmp
landlock:
compatibility: best_effort
process:
run_as_user: sandbox
run_as_group: sandbox
network_policies:
claude_code:
name: claude-code
endpoints:
- host: api.anthropic.com
port: 443
protocol: rest
tls: terminate
enforcement: enforce
access: full
binaries:
- path: /usr/local/bin/claude

Editing and Applying a Policy

Add the hosts you want to allow access to in network_policies. Since openshell policy set replaces the entire policy, the golden rule is to append new policies to the existing network_policies rather than deleting them. For example, to add read access to an arbitrary host, add a block like the following beneath the existing policy.

yaml

example_site:
name: example-site
endpoints:
- host: example.com
port: 443
protocol: rest
enforcement: enforce
access: read-only
binaries:
- path: /usr/bin/curl

For each endpoint's enforcement, you can choose between enforce, which actually blocks violations, and audit, which only logs them without blocking. It is safer to first observe behavior with audit before switching to enforce. Because OpenShell evaluates access at the granularity of binary, destination, HTTP method, and path, fine-grained control such as "allow only GET from curl to this host" is possible.

Before applying, be sure to delete the metadata above the --- at the top of the file (Version / Hash / Status, etc.). Only the YAML body below --- should be passed to policy set.

bash

openshell policy set demo --policy current-policy.yaml --wait

--wait is an option that waits until the policy has been fully applied. Network policies are hot-reloaded into the running sandbox, so changes take effect without having to recreate the sandbox. After applying, reconnect and curl the same URL — this time it will be reachable.

Common Pitfalls and Workarounds

Here are common pitfalls when getting started with OpenShell, along with how to avoid them.

Overwriting the entire existing policy with policy set: policy set replaces the entire policy. Deleting the existing network_policies will also cut off communications that the agent inherently requires (such as the model API). Always edit by appending to the content retrieved with policy get.
Passing metadata to policy set as-is: Leaving the metadata above --- (Version / Hash, etc.) will cause the apply operation to fail. Strip it down to the body before passing it.
Trying to change static policies on a running sandbox: filesystem and process are locked at creation time and cannot be changed while the sandbox is running. If you need to change these, recreate the sandbox. Only network and inference support hot-reloading.
Expecting Landlock protection on macOS: Landlock is a Linux kernel feature, and as indicated by the compatibility: best_effort setting, its behavior varies by environment. Make sure you have a precise understanding of what is and isn't protected.
Leaving sandboxes running indefinitely: Leftover sandboxes and policies make it harder to isolate and debug behavior. After verification, clean up with sandbox delete.

FAQ

A summary of frequently asked questions about OpenShell.

Q. How is OpenShell different from regular Docker? While Docker is primarily designed for application isolation, OpenShell is purpose-built for safely running AI agents that modify their own environment while working. The key differences are that it controls four domains — file, network, process, and inference — through policies, and that network and inference policies can be updated even while the sandbox is running.

Q. Do I need to rewrite my existing agent's code? In most cases, no. The CLI automatically detects credentials for recognized agents (Claude, Codex, OpenCode, etc.) and can run them inside a sandbox without any code changes.

Q. Can it be used in production? It is open source under the Apache 2.0 license, but as of the time of writing it is in an early 0.0.x stage and specifications are subject to change. If you are considering production deployment, check the latest official documentation and release notes for supported platforms and known limitations, and it is safest to start with a small-scale proof of concept.

Summary

NVIDIA OpenShell is an open-source execution environment that lets you run AI agents safely inside a sandbox, protecting host files and credentials while controlling network access through declarative policies. This article walked through the steps from installation and connectivity verification, to creating your first sandbox, assigning tasks to an agent, and controlling network access with policies.

There are three key takeaways. First, OpenShell is designed with agents that continuously modify their environment in mind, protecting the four domains of file, network, process, and inference with least-privilege principles. Second, sandboxes can be managed with simple create, connect, and delete operations, and artifacts are isolated under /sandbox. Third, network policies should be operated on a "default deny, append only what is needed" basis, and can be hot-reloaded even while the sandbox is running.

The more autonomous work you delegate to agents, the more critical isolation and access control become. Our own approach when expanding AI agent usage is to start by verifying behavior in a small sandbox and gradually broaden the policy from there. We encourage you to begin by establishing a safe environment for trial and error, built on an execution environment like OpenShell.

Author & Supervisor

Yusuke Ishihara

Started programming at age 13 with MSX. After graduating from Musashi University, worked on large-scale system development including airline core systems and Japan's first Windows server hosting/VPS infrastructure. Co-founded Site Engine Inc. in 2008. Founded Unimon Inc. in 2010 and Enison Inc. in 2025, leading development of business systems, NLP, and platform solutions. Currently focuses on product development and AI/DX initiatives leveraging generative AI and large language models (LLMs).

What is NVIDIA OpenShell? A Quick Start Guide to the Sandbox for Running AI Agents Safely

Lead

How It Differs from Regular Containers

The Four Protection Domains (File · Network · Process · Inference)

Supported Agents and Licensing

Prerequisites — Four Requirements to Prepare

Supported Operating Systems and Container Runtimes

Agent Accounts and API Keys

Step 1 — Install OpenShell and Verify Connectivity

Installing the CLI and Gateway

Verifying Connectivity with status and help

Step 2 — Create Your First Sandbox

Launching claude Inside a Sandbox

Confirming the Agent Is Running Inside the Sandbox

Assigning Tasks to the Agent and Reviewing the Output

Step 3 — Managing Sandboxes (List · Reconnect · Delete)

When to Use list, connect, and delete

Step 4 — Controlling Network Access with Policies

URLs Not Allowed by Default Are Blocked

Retrieving and Reading a Policy

Editing and Applying a Policy

Common Pitfalls and Workarounds

FAQ

Summary

Author & Supervisor

Recommended Articles

Implementing AI Observability: Integration with Existing Application Monitoring and Phased Adoption

What is Harness AI? How It Integrates AI into DevOps and Step-by-Step Implementation Guide

How to Compare and Select AI Cybersecurity Evaluation Benchmarks