Deep Dive: AI Agent Sandbox Execution — AgentScope Runtime vs OpenAI Agents SDK

When an LLM calls a tool that runs Python or shell commands, who keeps the host safe? This question defines the architecture of every production AI agent. Two open-source frameworks have taken sharply different approaches: Alibaba's AgentScope Runtime, a full-stack "Agent-as-a-Service" platform, and OpenAI's Agents SDK, a lightweight library that recently added first-class sandbox support. We dug into their source code to understand exactly how each one isolates execution — and how you should choose between them.

Why Sandboxes Matter

Modern AI agents do more than chat. They execute code, read files, launch browsers, and patch repositories. Without isolation, a hallucinated rm -rf / or an adversarial prompt injection becomes a real security incident. Sandboxes provide that isolation — but how they're implemented determines scale, security, and developer experience.

AgentScope Runtime: The Production Platform

AgentScope Runtime is Alibaba Cloud's answer to production agent deployment. Its README lists four core capabilities, and the first one is "Tool Sandboxing — tool call runs inside a hardened sandbox." The project has been evolving rapidly since its initial release in August 2025, reaching v1.1.0 by February 2026 with major architectural refactors including a distributed interrupt service.

How It Works: The Full Picture

AgentScope Runtime's agent application inherits directly from FastAPI — AgentApp(FastAPI) — which means it integrates with the entire FastAPI ecosystem natively. When an agent needs to execute a tool, the flow goes through a SandboxManager (1,921 lines of orchestration code) that manages the container lifecycle:

Acquisition: First tries a warm container from a per-type pool. If none available, creates a fresh container.
Provisioning: Generates a session_id, creates a mount directory, generates a 32-character hex SECRET_TOKEN via secrets.token_hex(16), and maps the mount directory to /workspace inside the container.
Lifecycle: A heartbeat system scans every second. Sessions idle for 300 seconds get auto-recycled. Container states follow a formal FSM: WARM → RUNNING → RECYCLED → RELEASED.
Execution: The container runs an Nginx reverse proxy in front of a FastAPI app. Every request to /tools/run_ipython_cell or /tools/run_shell_command must pass Bearer token verification against the per-session SECRET_TOKEN.

The distributed lock for the heartbeat scanner uses Redis SET NX EX with a Lua script for atomic release — a production-grade approach that prevents race conditions when multiple workers share the same sandbox pool.

Container Lifecycle FSM

The state machine has well-defined transitions — from idle pool replenishment through running to recycling. The heartbeat scanner (every second) handles the RUNNING → RECYCLED transition automatically at 300 seconds of idle time. Restore paths bring recycled containers back online, while the ERROR state catches crashes or timeouts from any active state.

Eight Backends, One Abstraction

This is where AgentScope Runtime really differentiates itself. The CONTAINER_DEPLOYMENT environment variable selects among eight backends, each implementing the same BaseClient interface:

Backend	Isolation Level	Best For
Docker	Kernel namespaces	Local dev, testing
gVisor (runsc)	User-space kernel	Higher-security local or single-machine
BoxLite	Hardware VM	Embeddable high-isolation (Apple Silicon)
Kubernetes	Pod namespaces	Production orchestration
Knative	Serverless pods	Scale-to-zero workloads
Kruise	Sandbox CRD	K8s with dedicated sandbox resource
Function Compute	Platform-managed	Alibaba Cloud serverless
AgentRun	Platform-managed	Alibaba Cloud managed runtime

The gVisor and BoxLite adapters are remarkably concise — 38 lines and a thin SDK wrapping respectively — demonstrating clean abstraction design. The Kruise client creates Kubernetes resources via agents.kruise.io/v1alpha1/Sandbox Custom Resource Definitions, an approach that treats sandboxes as first-class cluster resources rather than just pods.

Backend Isolation Heatmap

The heatmap above scores each backend across five dimensions. BoxLite wins on isolation (hardware VM) while Docker wins on startup speed and cost. For production, Kubernetes offers the best scaling at the cost of operational complexity. The sweet spot depends on your threat model: run local dev in Docker, ship to BoxLite on Apple Silicon for CI, and scale on K8s/Knative in production.

Seven Sandbox Types

AgentScope Runtime doesn't just sandbox Python. Each sandbox type has its own Docker image with purpose-built internals:

BaseSandbox: Python + IPython kernel + shell commands. The workhorse.
GuiSandbox: XFCE4 desktop, Chromium, VNC access. For computer-use agents.
BrowserSandbox: Playwright MCP over VNC with 20+ browser methods. For web automation.
FilesystemSandbox: Full file operations (read, write, edit, search, tree) exposed as API.
MobileSandbox: Redroid Android emulation with ADB over ws-scrcpy. Requires privileged mode.
TrainingSandbox: BFCL/APPWorld evaluation environments for benchmarking.
AgentbaySandbox: Cloud-only sandbox via AgentBay service. No local containers needed.

All sandbox types get both synchronous and asynchronous variants. In embedded mode, the SandboxManager runs in-process; in remote mode, all operations proxy over HTTP with Bearer auth to a separate runtime-sandbox-server. This dual-mode architecture means the same code can run locally during development and on a remote Kubernetes cluster in production.

Deployment Philosophy

AgentScope Runtime is built for multi-tenancy from the ground up. It uses Redis-based session storage (RedisSession), supports OSS (Alibaba Cloud Object Storage) for cross-instance filesystem persistence, and deploys one-click to Alibaba Cloud ACK via Compute Nest. Framework adapters let you write agents in AgentScope, LangGraph, or Microsoft Agent Framework and expose them through the same runtime — though tool support varies by adapter.

OpenAI Agents SDK: The Lightweight Library Approach

If AgentScope Runtime is a deployment platform, OpenAI Agents SDK is a library you compose into your own stack. At 25,914 GitHub stars and 3,964 forks, it has dramatically more community adoption and was released just two months earlier (March 2025).

Sandbox Agents: Architecture Philosophy

Sandbox Agents landed in version 0.14.0 with a clear architectural principle: separate agent definition from transport. The SandboxAgent class extends Agent with five sandbox-specific fields — default_manifest, base_instructions, capabilities, run_as, and an internal concurrency guard — but contains zero information about how to get a sandbox. That information lives in RunConfig(sandbox=SandboxRunConfig(...)), passed at execution time. This means the same agent definition can run on a local filesystem, in Docker, or on E2B without changing the agent code.

The Manifest: Declarative Workspace Layout

The Manifest model is the heart of OpenAI's sandbox design. It's a declarative workspace contract that describes what files should be present when a session starts:

agent = SandboxAgent(
    name="Workspace Assistant",
    instructions="Inspect the sandbox workspace before answering.",
    default_manifest=Manifest(
        entries={
            "repo": GitRepo(repo="openai/openai-agents-python", ref="main"),
        }
    ),
)

The entry system is deeply polymorphic. File, Dir, LocalFile, LocalDir, GitRepo, and Mount subclasses all extend BaseEntry, and the Pydantic discriminator field type handles deserialization automatically. When you add a GitRepo entry, the SDK clones it inside the sandbox during manifest materialization. Each entry carries Permissions (owner/group/other with READ/WRITE/EXEC flags), User, and Group metadata, which are applied inside the sandbox after file provisioning.

Manifest Entry Polymorphism

This polymorphic design enables JSON round-trips across the entire stack: the same Pydantic type discriminator pattern is used for entries, sandbox client options, and session state serialization. Adding a new entry type is just subclassing BaseEntry with a unique type value — the registration happens automatically.

Session Lifecycle: Deep and Deliberate

The BaseSandboxSession abstract class (1,167 lines) is a full workspace operating system. It provides:

Process execution: exec(*command, timeout, shell, user) with shell wrapping (sh -lc), timeout enforcement, and user impersonation
File I/O: read(path), write(path, data) with access permission checks
Directory operations: mkdir, rm, ls
Persistence: persist_workspace() and hydrate_workspace(data) for tar-based checkpointing
Interactive PTY: pty_exec_start(), pty_write_stdin(), pty_terminate_all() for long-running interactive processes
Instrumentation: Every operation is wrapped by SandboxSession with event emission and tracing spans

The SandboxRuntimeSessionManager (959 lines) handles session resolution with a clear priority chain: session= (explicitly injected) → RunState sandbox payload → session_state= → fresh creation. Resume keys are allocated by agent name, not by instance, so handoffs between agents preserve the same sandbox session.

Capability Composition

Instead of a fixed tool set, OpenAI uses composable capabilities. The default set — Capabilities.default() — gives every sandbox agent Filesystem(), Shell(), and Compaction() out of the box. Capabilities can add tools, instructions, and even manifest entries. They're cloned per-run so state from one execution never leaks into another. You extend by subclassing Capability and adding your own tools that bind to the live sandbox session.

macOS Seatbelt: The Hidden Security Gem

The most interesting security detail is in UnixLocalSandboxClient. On macOS, every command executed inside a local sandbox session runs through sandbox-exec -p <profile> with a deny-by-default Seatbelt profile. This profile explicitly allows only the workspace root, /usr/bin, /usr/lib, /bin, /System, and any configured extra_path_grants. It explicitly denies /Users, /Applications, /Library, /etc, /var, /opt, /tmp, and /private/* — a substantial security boundary that goes well beyond simple process isolation.

When running in Docker, container isolation handles this layer instead.

Extension Ecosystem

OpenAI designed the sandbox client as an abstract interface (BaseSandboxClient), and the community has responded. Extensions for Vercel, Runloop, Modal, E2B, Daytona, Cloudflare, and Blaxel have all been contributed, each implementing create(), delete(), and resume() in their own infrastructure.

Head-to-Head Comparison

Dimension	AgentScope Runtime	OpenAI Agents SDK
Stars	766	25,914
License	Apache-2.0	MIT
Primary design	Deployment platform	Composable library
Sandbox backends	8 (Docker, gVisor, BoxLite, K8s, Knative, Kruise, FC, AgentRun)	2 + 7 community extensions
Sandbox types	7 specialized types (Base, GUI, Browser, FS, Mobile, Training, Cloud)	1 unified session (capabilities compose features)
Session management	Built-in Redis, container pool, heartbeat, auto-reaping	SDK-managed lifecycle or developer-owned
Multi-tenancy	First-class with session isolation	Via developer integration
Observability	Built-in logs and traces	Built-in tracing (OpenAI dashboard)
Production path	One-click to Alibaba Cloud ACK	Host on your own infra or community backends
Framework adapters	AgentScope, LangGraph, MS Agent Framework, Agno	Single SDK (provider-agnostic via any-llm/LiteLLM)
Entry point	`AgentApp` (FastAPI subclass)	`Runner.run()` or `Runner.run_sync()`

When to Use Which

Choose AgentScope Runtime when:

You need a full-stack agent deployment platform with session management, pooling, and multi-tenancy built in
Your infrastructure is on Alibaba Cloud or you want a managed Kubernetes experience
You need GUI, browser, filesystem, or mobile sandboxes — not just code execution
You're building an internal agent platform that multiple teams will develop against
You want warm container pooling and auto-scaling without building it yourself

Choose OpenAI Agents SDK when:

You want a lightweight library to compose into your existing infrastructure
You already have a deployment story and only need sandboxed code execution
You value the larger community (25k+ stars) and ecosystem of extensions
You want provider-agnostic LLM access (100+ models via any-llm/LiteLLM)
You need macOS development (Seatbelt is a unique advantage for local testing)

The sweet spot: If you're prototyping an agent locally on macOS, use OpenAI Agents SDK with UnixLocalSandboxClient — the Seatbelt sandbox gives you real isolation during development. If and when you need production deployment with multi-tenancy and container pooling, AgentScope Runtime provides that out of the box.

What We Didn't See (Yet)

Neither framework currently supports WebAssembly sandboxing (like Extism or wasmtime), which would offer near-instant startup and language-agnostic execution. OpenAI's approach could probably add a WASM client fairly easily through the BaseSandboxClient interface. AgentScope Runtime's container-based architecture makes WASM integration less natural but more secure through existing isolation layers.

Both frameworks would benefit from standardized sandbox security attestation — something that tells a deployer exactly what isolation guarantees a given backend provides. Today, you have to read source code to know that gVisor uses a user-space kernel while BoxLite uses hardware virtualization.

Conclusion

AI agent sandboxing is evolving from a niche concern to a core architectural decision. AgentScope Runtime and OpenAI Agents SDK represent two philosophies: build a complete platform versus provide a composable primitive. Both are open-source, both run on Python 3.10+, and both are actively maintained with commits in the last 24 hours. The right choice depends on whether you need a deployment platform or a library — but either way, sandboxed execution is becoming table stakes for production AI agents.

Sources

All analysis is based on direct source code inspection and official documentation as of May 6, 2026.

AgentScope Runtime:

Repository: https://github.com/agentscope-ai/agentscope-runtime (Apache-2.0, ⭐766)
Documentation: https://runtime.agentscope.io/
Key source files inspected:
- src/agentscope_runtime/sandbox/manager/sandbox_manager.py — SandboxManager (1,921 lines), container lifecycle orchestration
- src/agentscope_runtime/sandbox/manager/heartbeat_mixin.py — Heartbeat scanner, distributed lock (Redis SETNX), auto-reaping
- src/agentscope_runtime/sandbox/box/sandbox.py — SandboxBase, dual-mode (embedded/remote) architecture
- src/agentscope_runtime/sandbox/box/base/ — BaseSandbox, run_ipython_cell, run_shell_command
- src/agentscope_runtime/sandbox/box/gui/, browser/, filesystem/, mobile/ — Specialized sandbox Dockerfiles
- src/agentscope_runtime/common/container_clients/docker_client.py — DockerClient (port range 49152-59152)
- src/agentscope_runtime/common/container_clients/gvisor_client.py — GVisorDockerClient (38 lines, runtime=runsc)
- src/agentscope_runtime/common/container_clients/boxlite_client.py — BoxliteClient (hardware VM isolation)
- src/agentscope_runtime/common/container_clients/kubernetes_client.py, knative_client.py, kruise_client.py — K8s/Knative/Kruise backends
- src/agentscope_runtime/sandbox/shared/app.py — Container-internal FastAPI app (Nginx + Uvicorn + token auth)
- src/agentscope_runtime/engine/__init__.py — AgentApp (inherits from FastAPI)

OpenAI Agents SDK:

Repository: https://github.com/openai/openai-agents-python (MIT, ⭐25,914)
Documentation: https://openai.github.io/openai-agents-python/
Key source files inspected:
- src/agents/sandbox/sandbox_agent.py — SandboxAgent (extends Agent), concurrency guard
- src/agents/sandbox/manifest.py — Manifest model, declarative workspace contract
- src/agents/sandbox/entries/base.py, artifacts.py — BaseEntry polymorphism, File/Dir/LocalFile/LocalDir/GitRepo
- src/agents/sandbox/session/base_sandbox_session.py — BaseSandboxSession (1,167 lines), exec/read/write/pty/snapshot
- src/agents/sandbox/session/sandbox_session.py — SandboxSession instrumentation wrapper
- src/agents/sandbox/session/sandbox_client.py — BaseSandboxClient abstract interface
- src/agents/sandbox/sandboxes/unix_local.py — UnixLocalSandboxClient, macOS Seatbelt (sandbox-exec deny-by-default)
- src/agents/sandbox/sandboxes/docker.py — DockerSandboxClient, volume mounts, port mapping
- src/agents/sandbox/runtime.py — SandboxRuntime, bridges Runner ↔ sandbox sessions
- src/agents/sandbox/runtime_session_manager.py — RuntimeSessionManager (959 lines), session resolution priority chain
- src/agents/sandbox/capabilities/capabilities.py — Capabilities.default(): Filesystem + Shell + Compaction
- extensions/sandbox/ — Community extensions: Vercel, Runloop, Modal, E2B, Daytona, Cloudflare, Blaxel

Architecture Comparison