Why Your AI Refactor Tool Should Run in a Sandbox

OpenAI's cookbook shows AI agents migrating code inside throwaway sandboxes, with auditable patches. Here's the pattern and why it's spreading.

Isolated container with padlock symbol representing sandboxed execution

By Rob11 June 2026 · 8 min read

When an AI coding agent runs in your terminal with full access to your repo, it can do useful work and it can also do harmful work, and most days you cannot tell which until you've read the diff. OpenAI's recent cookbook example shows the pattern that fixes this: don't run the agent in your working directory at all. Spin up a fresh sandbox, give it the files it needs, let it produce a patch, then review the patch yourself before it touches anything you care about.

What does "sandboxed agent" actually mean?

The split is between a trusted host process (your code, running on your machine, with your credentials and your real repo) and a sandboxed execution environment (a throwaway container or microVM that holds only what's needed for one task). The agent code lives in the sandbox; the orchestration lives in the host.

Per OpenAI's cookbook example, the sandbox is usually a Docker container (a fresh Python 3.14 image per task), though the same pattern works with E2B or Cloudflare Workers, the serverless edge-computing platform from Cloudflare for running code globally, as the isolation layer. Inside the sandbox the agent can run commands, edit files, run tests. None of that touches your real machine. When the task ends, the sandbox is deleted, taking any side effects with it.

The trust boundary is the key idea. Credentials, API keys, MCP servers, audit logging: those stay in the host process. The sandbox sees only the files and the task brief.

Why isn't this how every AI coding tool already works?

Two reasons, and they're both starting to fall away.

The first is friction. Spinning up a fresh sandbox for every task costs some seconds of latency and some megabytes of disk. If you're doing twenty small edits an hour, that's a noticeable tax. The interactive coding flow most of us use today (Claude Code, Cursor, Copilot in-editor) prioritises speed over isolation. The trade made sense when AI coding was mostly autocomplete; it makes less sense when agents are running multi-file refactors unattended.

The second is complexity. Wiring up Docker, mapping volumes correctly, handling the patch-extraction step at the end: it's all real engineering. Doing it badly creates new failure modes (sandbox can't reach the test runner, mounted volume leaks state, patch fails to apply cleanly). The cookbook pattern matters because it makes the engineering reusable.

Both reasons are eroding. Sandbox startup latency is dropping (E2B and Cloudflare Workers measure in tens of milliseconds rather than seconds). The orchestration libraries are maturing. The pattern that today requires bespoke setup will be a one-line config in mainstream tools within a year.

What does the pattern look like in practice?

Six steps, each one belonging clearly to either the host or the sandbox.

Host splits the work

A 200-file migration becomes 20 ten-file tasks. Each task is small enough that the agent can complete it inside one sandbox session and small enough that the resulting patch is reviewable by a human.

Host creates a fresh sandbox

Empty container, scoped workspace, no credentials beyond what the task needs. The host stages the relevant repo files into the sandbox via a mount or copy.

Sandbox runs the agent loop

The agent reads files, runs the baseline tests, makes edits, runs the tests again. All of this happens inside the sandbox; nothing leaks out yet.

Sandbox emits a patch

When the agent is done, the sandbox produces a unified-diff patch file, a structured result (JSON), an audit log of every command the agent ran, and a markdown summary. These are the only outputs that cross back into the host.

Host validates and stores

The host reads the patch, verifies it applies cleanly, optionally runs its own checks, then either applies it or holds it for human review. The audit log is archived so you can see exactly what the agent did.

Host deletes the sandbox

The container is destroyed. No state survives. The next task starts in a fresh sandbox with no inherited contamination.

What's an "auditable patch" and why does it matter?

An auditable patch has four properties. It's a standard unified diff (the same format git apply consumes), so any developer can read it without special tooling. It's small enough to review (because the host split the work into small tasks). It comes with a complete audit log showing every shell command the agent ran. And it's bundled with a structured result that names what was attempted and what succeeded.

That bundle is the thing that makes the difference between "trust the agent" and "verify the agent". Without the audit log you cannot tell whether the agent ran tests, whether it cheated by skipping a step, whether it tried something destructive before deciding against it. With the audit log you can grep for shell commands you don't recognise, spot patterns of repeated failure, and decide whether to apply the patch based on what actually happened rather than the agent's self-reported summary.

This is the same shift code-review tools went through fifteen years ago: from "trust the developer's commit message" to "see the actual diff". Agents need the same scrutiny, for the same reasons.

When should I use this for my own projects?

Three categories where sandboxing is clearly worth the engineering cost.

Bulk migrations. Anything you'd describe as "update this pattern across the whole codebase" (replace deprecated APIs, change a library version, rename a function). The work splits naturally into small tasks, each task has a clear pass/fail check (tests), and the volume justifies the orchestration overhead.

Anything touching production config or secrets. If the task might involve reading env files, deploy scripts, infrastructure-as-code, the sandbox is the only safe place for the agent to operate. Even if you trust the agent, running it in a sandbox means a misconfiguration cannot leak production secrets to its training data, its provider's logs, or anywhere else.

Untrusted or experimental agents. If you're trying out a new agent framework, a new model, or a workflow you don't fully understand yet, run it in a sandbox first. The marginal cost is small; the marginal safety is large.

For day-to-day interactive coding (you, in Claude Code or Cursor, making one edit at a time), the sandbox overhead probably isn't worth it. The existing tools have approval prompts and easy undo, which cover the common cases. The pattern wins when the agent is running unattended on something more than one file.

Frequently asked questions

Q01Does Claude Code or Cursor support this pattern natively?

Not as the default. Both can run inside a container if you configure it that way, but the interactive in-editor flow doesn't isolate by task. For sandboxed-per-task work, the OpenAI Agents SDK, the Cloudflare Browser Run model, and dedicated tools like E2B are better fits. Expect mainstream IDEs to add this as a feature within the next 12 months.

Q02Doesn't this just move the trust problem to the sandbox configuration?

Partly. The orchestration code that creates the sandbox and reads the output is still trusted, and a bug there can defeat the isolation. The difference is that the trusted surface is small, well-defined, and not under the agent's control. That's a far easier thing to audit than "the entire AI agent and everything it might do".

Q03What happens if the patch is malicious?

You catch it during the review step before applying. If you don't review patches, the sandbox protects you from immediate execution but not from the next step of applying a bad change to real code. The pattern only works if the host actually reads the audit log and the diff before merging. Without that step you've added overhead without adding safety.

Q04How much slower is sandboxed agent execution?

On local Docker, each task adds 2-5 seconds of sandbox startup. On Cloudflare Workers or E2B, the cost is closer to 100-200 milliseconds. For tasks that take the agent several minutes (a multi-file migration), that overhead rounds to zero. For tasks that take a few seconds, the overhead is real and may not be worth it.

Q05Can I use this for non-OpenAI agents?

Yes, the pattern is not specific to the OpenAI SDK. The same idea (host orchestrates, sandbox executes, patch crosses back as the only output) works with Claude Code, custom Python agents using the Anthropic SDK, or any other agent framework. The cookbook is one example; the underlying architecture is independent of the vendor.