Skip to main content
Launch week·Five new features shipping this week (March 30 – April 3)

AI sandboxes & volumes: isolated environments for coding agents

· 6 min read
Ruben Fiszel

AI sandboxes

Day 3 of Windmill launch week. You can now run AI coding agents like Claude Code or Codex in sandboxed environments with persistent storage, directly from your scripts and flows.

The problem

AI coding agents need two things that are hard to combine: isolation and persistence. You want them sandboxed so they cannot access the host filesystem or network. But you also want them to remember state across runs, produce artifacts, and pick up where they left off.

Teams end up managing Docker containers, mounting volumes manually, and writing wrapper scripts to handle session state. The orchestration layer has no opinion about where the agent runs or how its files persist.

AI sandboxes: two annotations

An AI sandbox is a regular Windmill script with two annotations: one for isolation, one for storage.

// sandbox
// volume: agent-state .agent

import Anthropic from '@anthropic-ai/sdk';
import { MessageStream } from '@anthropic-ai/sdk/lib/MessageStream';

export async function main(prompt: string) {
const client = new Anthropic();
// The .agent directory persists across runs
const result = await client.messages.create({
model: 'claude-opus-4-6-20260401',
max_tokens: 1024,
messages: [{ role: 'user', content: prompt }],
});
return result;
}

// sandbox enables NSJAIL process isolation. // volume: agent-state .agent mounts a persistent volume synced to your workspace object storage. That's it.

Persistent volumes

Agents need to remember state across runs. Volumes handle this by syncing files to your workspace object storage (S3, Azure Blob, GCS) between executions.

Declaring a volume is a single annotation: // volume: <name> <mount_path>. You can attach up to 10 volumes per script.

Each volume goes through three phases per execution:

  1. Before execution: Windmill acquires an exclusive lease on the volume and downloads files from object storage. A per-worker LRU cache (up to 10 GB) skips the download when files haven't changed (compared by size and MD5).
  2. During execution: the volume is bind-mounted into the sandbox. The agent reads and writes files normally.
  3. After execution: changed files are synced back to object storage, metadata is updated, and the lease is released.

The exclusive lease (60-second TTL, auto-renewed every 10 seconds) guarantees that only one job writes to a volume at a time. If another job targets the same volume, it waits for the lease to be released.

Volume names support $workspace and $args[...] interpolation, so you can scope storage per workspace, per user, or per any input parameter. This makes it straightforward to give each agent session its own isolated storage.

Volumes also have fine-grained permissions: owner, read-only, or read-write access, assignable per user or group. A job with no permission on a volume will fail, so you control exactly which agents can access which data.

Process isolation

Without isolation, an agent has access to the host filesystem, environment variables (including credentials), other running jobs, and unrestricted network. Windmill provides two levels of process isolation to prevent this.

NSJAIL sandboxing is the strongest option. Each execution runs in its own NSJAIL sandbox with:

  • Filesystem isolation: the agent only sees its own working directory and mounted volumes. No access to the host filesystem or other jobs.
  • Network restrictions: outbound network access can be restricted per sandbox.
  • Resource limits: CPU, memory, and disk usage are capped per execution.

Enable it per script with the // sandbox annotation, or force it instance-wide for all jobs with DISABLE_NSJAIL=false.

PID namespace isolation is a lighter alternative for workloads where full sandboxing is unnecessary. It uses Linux unshare to create separate process namespaces, so each job gets its own process tree and cannot see or signal processes from other jobs. Enable it with ENABLE_UNSHARE_PID=true. Lower overhead, but no filesystem or network isolation.

Works with any agent

Claude Code, Codex, OpenCode, or any custom agent that operates on a local filesystem. Windmill provides the sandbox and the storage; the agent brings its own logic. A built-in Claude Code template handles session persistence and token counting out of the box.

Built-in Claude Code template

Windmill ships with a ready-to-use Claude Code template. It handles session persistence (the session ID is stored in the volume), agent instructions, skill files, and token counting for cost monitoring.

// sandbox
// volume: claude-sessions .agent

import { ClaudeCodeAgent } from '@anthropic-ai/claude-agent-sdk';

export async function main(prompt: string) {
const agent = new ClaudeCodeAgent({
instructions: "You are a helpful coding assistant.",
});
return await agent.run(prompt);
}

Observability

Every agent run is a regular Windmill job, which means full observability out of the box: logs, execution history, and token usage for cost monitoring. Set up alerts on failures or cost thresholds, and audit agent activity across workspaces.

Use cases

  • Persistent agent memory: conversation history and session state survive across runs.
  • Artifact generation: agents produce reports, code, or data files that persist in the volume.
  • Multi-step workflows: a flow triggers an agent, waits for results, then passes artifacts to the next step.
  • Safe execution at scale: resource limits and isolation let you run untrusted agent code without risk.

Getting started

  1. Configure workspace object storage (S3, Azure Blob, GCS, or filesystem).
  2. Add // sandbox and // volume: <name> <path> annotations to any script.
  3. Run it. Files in the volume path persist across executions.

What's next

Tomorrow is Day 4: Git sync & workspace forks. Sync with Git, stage workspaces, and deploy via CI/CD. Follow along.

Windmill Logo
Windmill is an open-source and self-hostable serverless runtime and platform combining the power of code with the velocity of low-code. We turn your scripts into internal apps and composable steps of flows that automate repetitive workflows.

You can self-host Windmill using a docker compose up, or go with the cloud app.