Skip to content
Effloow
← Back to Articles
AI DEVELOPMENT ARTICLES ·2026-05-06 ·UPDATED 2026-06-20 ·BY EFFLOOW EDITORIAL ·16 MIN READ

E2B Sandbox: Secure Code Execution for AI Agents

Add secure sandboxed code execution to AI agents with E2B. Firecracker microVM isolation, Python/JS SDKs, MCP support, and source-checked limits.
e2b ai-agents sandboxing code-execution python ai-infrastructure security
SHARE
Illustration for E2B Sandbox: Secure Code Execution for AI Agents
Illustration: AI-assisted. Editorial policy

Every team building a code-executing AI agent eventually runs into the same wall: where does the model-generated code actually run? Running it on your host server is a liability. Running it inside Docker is better but not enough for untrusted LLM output at scale. E2B was built specifically for that problem, and it now has a documented OpenAI Agents SDK integration for running SandboxAgent workflows inside E2B sandboxes.

This guide covers what E2B is, how it works, how to integrate it with Python and the major LLM providers, and when it makes sense compared to alternatives like Modal and Daytona. If you are building the agents that will run this code, our OpenAI Agents SDK multi-agent tutorial and AI agent frameworks compared cover the orchestration side.

Effloow Lab note: Effloow Lab installed e2b-code-interpreter==2.6.2 and e2b==2.20.3, inspected the SDK surface, and confirmed the authentication boundary. Live sandbox execution requires an E2B API key, so this article does not claim a live E2B sandbox run. The public evidence note is at the E2B sandbox PoC lab note.

Why Sandboxed Code Execution Matters

When an AI agent generates and executes code, several things can go wrong that simply don't happen with human-written code:

  • The model generates import subprocess; subprocess.run(["rm", "-rf", "/"]) based on a prompt-injection attack
  • A data analysis agent leaks API keys from environment variables into output files
  • A code-generating chatbot causes a denial of service by spawning infinite threads
  • Cross-tenant attacks expose one user's data to another in a multi-tenant deployment

Traditional containerization mitigates some of these, but containers share the host kernel — a kernel exploit in model-generated code could affect other workloads. The security gap between "containerized" and "isolated" matters when the code author is an LLM responding to arbitrary user input.

E2B solves this with Firecracker microVMs: each sandbox runs inside its own lightweight virtual machine with a separate kernel, hardware-level isolation, and no shared attack surface between tenants.

What E2B Is

E2B is a managed sandbox platform with open-source SDKs for running AI-generated code in isolated environments. E2B describes the product as an open-source, secure environment for enterprise-grade agents. Each sandbox is designed to:

  • Starts managed sandbox sessions for code execution
  • Runs a Jupyter kernel for interactive, stateful code execution
  • Supports Python, JavaScript/TypeScript, and other languages
  • Provides filesystem access, internet access, and package installation
  • Can run for up to 24 hours per session

The platform is LLM-agnostic: you bring your own model (Claude, GPT, Gemini, Llama) and use E2B for the execution layer. It ships Python and TypeScript/JavaScript SDKs.

Source snapshot checked on 2026-06-20:

  • E2B's pricing page lists a free Hobby tier with a one-time $100 credit, 1-hour sandbox sessions, and 20 concurrent sandboxes.
  • E2B's billing docs list Pro at $150/month with 24-hour continuous runtime and 100 to 1,100 concurrent sandboxes depending on add-ons.
  • E2B's homepage states that E2B is used by 94% of Fortune 100 companies, with 7M+ monthly downloads and 1B+ started sandboxes. Treat those as vendor-stated adoption figures, not independent market-share data.
  • The E2B OpenAI Agents SDK page documents E2BSandboxClient, SandboxAgent, and RunConfig integration.
  • The E2B code-interpreter repository remains the live source for SDK code and examples; this article does not freeze a GitHub-star number.

Source-Backed Fit Matrix

This table is this guide's original-value asset. It separates what the sources verify from what Effloow's local lab note actually confirmed.

Decision area Source-backed fact Effloow verified locally What to do differently
SDK availability The E2B code-interpreter repository documents pip install e2b-code-interpreter and Sandbox.runCode / Sandbox.run_code examples. Installed e2b-code-interpreter==2.6.2 and e2b==2.20.3; inspected Sandbox, AsyncSandbox, Execution, McpServer, and GitHubMcpServer exports. Treat the SDK surface as real, but do not claim live execution until your environment has E2B_API_KEY and a successful sandbox run.
OpenAI Agents SDK path E2B docs show E2BSandboxClient, SandboxAgent, and SandboxRunConfig; OpenAI announced native sandbox execution in the Agents SDK on 2026-04-15. No OpenAI or E2B API call was made in this lab note. Use the integration when you want the agent harness and sandbox lifecycle tied together; keep a separate fallback plan for local or provider-specific execution.
Code contexts E2B code-context docs say create_code_context() / createCodeContext() can run code in different execution contexts. Confirmed the Python method exists on the installed SDK. Use contexts to reduce accidental state leakage inside one sandbox, but do not treat them as a substitute for tenant-level sandbox isolation without a live security review.
MCP gateway E2B MCP docs describe MCP tools running inside E2B sandboxes with internet-connected Linux access. Confirmed MCP-related classes are exported by the installed SDK. Prefer MCP gateway patterns when agents need tools plus an execution machine; keep secrets and external connectors scoped per sandbox.
Billing and limits E2B docs say sandboxes are billed per second while running, with Hobby and Pro limits as listed above. The lab did not create a billable sandbox. Model cost with runtime seconds, concurrency, pause/kill behavior, and CPU/RAM sizing before calling E2B cheaper or more expensive than another provider.

SDK Installation and Setup

Install the Python SDK:

pip install e2b-code-interpreter

This installs e2b-code-interpreter==2.6.2 and e2b==2.20.3 (as of May 2026). Get your API key from the E2B dashboard at e2b.dev/dashboard under the Team tab. Set it as an environment variable:

export E2B_API_KEY="e2b_..."

Or pass it directly in code:

from e2b_code_interpreter import Sandbox

with Sandbox.create(api_key="e2b_...") as sandbox:
    execution = sandbox.run_code("print('hello from sandbox')")
    print(execution.text)  # "hello from sandbox"

The with statement ensures the sandbox is killed when the block exits. For long-running agents, you can manage lifecycle manually:

sandbox = Sandbox.create()
# ... do work ...
sandbox.kill()

The run_code API

The run_code method is the core of E2B's code interpreter. Its full signature:

def run_code(
    self,
    code: str,
    language: Optional[str] = None,
    context: Optional[Context] = None,
    on_stdout: Optional[Callable[[OutputMessage], Any]] = None,
    on_stderr: Optional[Callable[[OutputMessage], Any]] = None,
    on_result: Optional[Callable[[Result], Any]] = None,
    on_error: Optional[Callable[[ExecutionError], Any]] = None,
    envs: Optional[Dict[str, str]] = None,
    timeout: Optional[float] = None,
    request_timeout: Optional[float] = None,
) -> Execution

The streaming callbacks — on_stdout, on_result, on_error — are particularly useful for agents that need to stream execution feedback to end users in real time rather than waiting for the full result.

A basic example with streaming output:

from e2b_code_interpreter import Sandbox

def handle_stdout(msg):
    print(f"[sandbox stdout] {msg.line}")

with Sandbox.create() as sandbox:
    # Stateful: variables persist within the same sandbox session
    sandbox.run_code("import pandas as pd\nimport numpy as np")

    execution = sandbox.run_code(
        """
df = pd.DataFrame({'x': np.random.randn(100), 'y': np.random.randn(100)})
print(f"Shape: {df.shape}")
print(df.describe())
        """,
        on_stdout=handle_stdout,
    )

    if execution.error:
        print(f"Error: {execution.error.name}: {execution.error.value}")
    else:
        print("Final text output:", execution.text)

Because the sandbox runs a Jupyter kernel, state is preserved between run_code calls within the same session. Imports, variables, and loaded data persist until the sandbox is killed or a new context is created.

Integrating E2B with Claude (Anthropic)

E2B maintains an official cookbook with examples for all major providers. The Anthropic pattern uses Claude's tool use API to generate code, then executes it in an E2B sandbox:

import anthropic
from e2b_code_interpreter import Sandbox

client = anthropic.Anthropic()

CODE_INTERPRETER_TOOL = {
    "name": "execute_python",
    "description": "Execute Python code in a secure sandbox. Use for data analysis, calculations, and visualization.",
    "input_schema": {
        "type": "object",
        "properties": {
            "code": {
                "type": "string",
                "description": "Python code to execute"
            }
        },
        "required": ["code"]
    }
}

def run_agent_with_sandbox(user_message: str):
    with Sandbox.create() as sandbox:
        messages = [{"role": "user", "content": user_message}]

        while True:
            response = client.messages.create(
                model="claude-opus-4-7",
                max_tokens=4096,
                tools=[CODE_INTERPRETER_TOOL],
                messages=messages
            )

            if response.stop_reason == "end_turn":
                # Extract final text response
                for block in response.content:
                    if hasattr(block, "text"):
                        return block.text
                break

            # Handle tool use
            tool_results = []
            for block in response.content:
                if block.type == "tool_use" and block.name == "execute_python":
                    execution = sandbox.run_code(block.input["code"])
                    result = execution.text if not execution.error else f"Error: {execution.error.name}: {execution.error.value}"
                    tool_results.append({
                        "type": "tool_result",
                        "tool_use_id": block.id,
                        "content": result
                    })

            # Append assistant turn and tool results
            messages.append({"role": "assistant", "content": response.content})
            messages.append({"role": "user", "content": tool_results})

result = run_agent_with_sandbox(
    "Load the iris dataset from sklearn, compute correlation matrix, and summarize key findings."
)
print(result)

This pattern gives Claude persistent, stateful code execution that spans multiple tool calls within a single agent turn. The sandbox outlives any individual request and carries state across the full conversation.

Integrating E2B with OpenAI Agents SDK

Since April 2026, E2B is a native sandbox provider in OpenAI's Agents SDK. The integration lets you declare E2B as the execution environment directly in your agent configuration:

from agents import Agent, Runner
from agents.sandbox.e2b import E2BSandbox

agent = Agent(
    name="DataAnalystAgent",
    instructions="You are a data analysis agent. Use the code interpreter to analyze data.",
    sandbox=E2BSandbox(api_key="e2b_..."),
)

result = Runner.run_sync(agent, "Analyze the distribution of AAPL stock returns in 2025")
print(result.final_output)

The Agents SDK handles the full lifecycle: spinning up the sandbox, routing code execution requests to it, and tearing it down when the session ends.

Execution Isolation With Code Contexts

For agents that need to run code from multiple untrusted users in the same sandbox session (e.g., a multi-tenant chatbot), E2B provides isolated code contexts:

with Sandbox.create() as sandbox:
    # Create isolated contexts for two users
    ctx_alice = sandbox.create_code_context()
    ctx_bob = sandbox.create_code_context()

    # Alice's code doesn't affect Bob's namespace
    sandbox.run_code("secret = 'alice_secret_value'", context=ctx_alice)
    sandbox.run_code("secret = 'bob_secret_value'", context=ctx_bob)

    # Each context has its own namespace
    result_alice = sandbox.run_code("print(secret)", context=ctx_alice)
    result_bob = sandbox.run_code("print(secret)", context=ctx_bob)

    print(result_alice.text)  # "alice_secret_value"
    print(result_bob.text)    # "bob_secret_value"

This is useful when you want to avoid spinning up a new sandbox (and paying the cold start cost) for every user while still preventing state leakage.

Snapshot and Persistence

E2B sandboxes can be snapshotted and restored, which matters for agents with expensive setup steps (installing packages, loading large models, warming up caches):

with Sandbox.create() as sandbox:
    # Expensive one-time setup
    sandbox.run_code("pip install -q scikit-learn matplotlib seaborn pandas")
    sandbox.run_code("from sklearn.datasets import load_iris; iris = load_iris()")

    # Save current state
    snapshot = sandbox.create_snapshot()
    snapshot_id = snapshot.snapshot_id
    print(f"Snapshot saved: {snapshot_id}")

# Later — reconnect to the pre-warmed state
with Sandbox.create(snapshot=snapshot_id) as pre_warmed_sandbox:
    result = pre_warmed_sandbox.run_code("print(iris.target_names)")
    print(result.text)  # ['setosa', 'versicolor', 'virginica']

Snapshots eliminate repeated install costs for workloads like data analysis agents that always need the same libraries pre-loaded.

MCP Server Integration

E2B 2.x ships a built-in MCP (Model Context Protocol) server interface, allowing any MCP-compatible agent to use E2B sandboxes as tool servers:

from e2b_code_interpreter import Sandbox, McpServer

with Sandbox.create() as sandbox:
    # Get MCP-compatible URL and token for this sandbox
    mcp_url = sandbox.get_mcp_url()
    mcp_token = sandbox.get_mcp_token()
    print(f"MCP endpoint: {mcp_url}")

This means LangGraph agents using langchain-mcp-adapters, ADK agents, or any other MCP-capable framework can connect to E2B sandboxes using the standard MCP protocol without custom integration code.

E2B also ships a GitHubMcpServer class for agents that need to interact with GitHub repositories inside a sandbox:

from e2b_code_interpreter import Sandbox, GitHubMcpServer

github_mcp = GitHubMcpServer(token="ghp_...")
with Sandbox.create(mcp_servers=[github_mcp]) as sandbox:
    # Agent can now read/write GitHub repos inside the sandbox
    result = sandbox.run_code("# list files from connected GitHub repo")

Async Support

For production FastAPI or async Python applications, use AsyncSandbox:

import asyncio
from e2b_code_interpreter import AsyncSandbox

async def analyze_data(user_code: str) -> str:
    async with AsyncSandbox.create() as sandbox:
        execution = await sandbox.run_code(user_code)
        if execution.error:
            return f"Error: {execution.error.name}"
        return execution.text

result = asyncio.run(analyze_data("print(sum(range(1000)))"))
print(result)  # "499500"

The async API mirrors the sync one completely. Use AsyncSandbox whenever your application is already async (FastAPI, Starlette, etc.) to avoid blocking the event loop on sandbox operations.

E2B vs. Alternatives

FeatureE2BModalDaytonaLocal Docker
IsolationFirecracker microVMgVisor (syscall intercept)Docker containerDocker container
Startup latency[DATA NOT AVAILABLE] until live-tested[DATA NOT AVAILABLE] in this article[DATA NOT AVAILABLE] in this articleDepends on local host
Purpose-built for AIYesGeneral compute + AIDev workspaceNo
Public pricing signal checked hereE2B Hobby/Pro docs linked below[DATA NOT AVAILABLE][DATA NOT AVAILABLE]Your infra cost
Session duration signal checked hereE2B docs list 1h Hobby and 24h Pro[DATA NOT AVAILABLE][DATA NOT AVAILABLE]Your host limit
OpenAI Agents SDK path checked hereDocumented by E2B and OpenAI[DATA NOT AVAILABLE][DATA NOT AVAILABLE]No managed integration
MCP signal checked hereE2B docs describe MCP gateway usage[DATA NOT AVAILABLE][DATA NOT AVAILABLE]Manual setup
Best forAI agent code executionGeneral inference + batchDev workspacesLocal dev only

The key distinction: E2B is purpose-built for untrusted LLM code execution with Firecracker microVM isolation. Modal runs on gVisor and sits inside a broader compute platform (inference, training, batch). Daytona comes from a developer workspace perspective — it's fast but designed for persistent environments, not ephemeral code execution from untrusted sources.

For production agents executing LLM-generated code, E2B is the strongest candidate verified in this article. Compare Modal, Daytona, and local Docker separately before making a cost, latency, or vendor-risk decision; this guide did not live-test those alternatives.

Pricing Snapshot

E2B's pricing and billing docs say the service charges per second while a sandbox is running. Re-check the live pricing page before procurement, because plan limits and per-second rates can change.

  • Hobby (free): $100 one-time usage credit, no credit card required, up to 1-hour sandbox sessions, and up to 20 concurrent sandboxes.
  • Pro ($150/month): 24-hour sessions, CPU/RAM customization, up to 100 concurrent sandboxes, and paid concurrency add-ons up to 1,100 according to the current E2B page.
  • Enterprise: custom pricing. The public E2B pricing page does not expose a universal Enterprise price; use [DATA NOT AVAILABLE] for a generic minimum unless E2B provides a current quote.

At the current listed CPU rate of $0.000014/second for 1 vCPU, CPU alone would make $100 cover about 7.1 million vCPU-seconds. That is not a full workload cost estimate because RAM, storage, plan limits, idle time, and paid add-ons also matter.

A rough production calculator should use: average sandbox runtime, average CPU count, average memory, number of runs, pause/kill behavior, and concurrent-session limits. Without those inputs, monthly production cost is [DATA NOT AVAILABLE].

Failure and Limitation Table

Limitation Why it matters Mitigation
No live sandbox execution in the Effloow lab note The saved evidence proves installability and SDK surface inspection, not runtime latency, isolation strength, package install success, or billing behavior. Run a disposable live sandbox with non-secret code before using this article as a production acceptance checklist.
Vendor adoption figures are self-reported E2B's homepage claims broad Fortune 100 adoption, but that is not independent market-share evidence. Cite it as an E2B-stated adoption signal only; do not use it as a ranking or procurement proof by itself.
Pricing is usage-shaped A cheap per-second CPU rate can still become expensive if agents stay idle, hold long sessions, or need high concurrency. Enforce pause / kill, track active sandbox seconds, and test concurrency limits before launch.
Contexts are not tenant isolation by themselves Code contexts help separate execution state inside a sandbox, but a multi-tenant threat model may require separate sandboxes. Put untrusted tenants in separate sandboxes unless a security review proves shared-session contexts are sufficient.
Managed cloud may violate data-residency rules Hosted execution can be unacceptable for regulated data or customer-controlled infrastructure requirements. Use E2B only for data you are allowed to send to the provider, or evaluate a self-managed Firecracker-style architecture.

FAQ

Q: Does E2B work with local/self-hosted models like Ollama or vLLM?

Yes. E2B is completely LLM-agnostic — it handles the execution layer only. You point your agent at any model (Ollama, vLLM, Groq, self-hosted) and use E2B to run the code that model generates. The SDK doesn't care what produced the code string.

Q: Can I install custom packages in an E2B sandbox?

Yes. E2B sandboxes have internet access by default, so you can run pip install or npm install inside a run_code call. For repeated use, create a snapshot after installation so you don't pay the install time on every run.

Q: How does E2B compare to running code in a subprocess locally?

Local subprocess execution is not sandboxed: model-generated code can read environment variables, make network requests, access the filesystem, and potentially execute system commands. E2B runs code in an isolated microVM with its own kernel — even a kernel exploit can't reach your host. For production agents handling external user input, local subprocess execution is not a safe option.

Q: What languages does E2B support?

The run_code API supports Python and JavaScript by default (the language parameter selects between them). R, Bash, and other languages can be run via subprocess inside the Python environment. Custom language support is possible with custom sandbox templates.

Q: Is there an open-source self-hosted option?

E2B's core SDK is Apache 2.0 open source. The cloud infrastructure (Firecracker orchestration, API, billing) is not open-source. For self-hosted deployments at enterprise scale, contact E2B for the Enterprise plan. For local development, you can mock the sandbox execution locally, but production workloads with untrusted code should use the cloud service or equivalent hardware-level isolation.

Key Takeaways

E2B solves a real production problem that every code-executing agent eventually faces: where does LLM-generated code run safely? The answer isn't "in a container on your server" when the code author is an AI responding to arbitrary user prompts.

The SDK design reflects this clearly. run_code supports streaming callbacks for real-time output, code contexts for separate execution state, and snapshots for warm restarts. MCP server support means MCP-capable agent frameworks can use E2B as the execution machine when configured correctly. The OpenAI Agents SDK integration in April 2026 makes E2B a documented sandbox option, not proof that it is the best provider for every production Python agent.

When to Use / When to Skip

Use E2B when an AI agent executes code produced from external user prompts, when you need a stateful workspace with filesystem and terminal access, or when the OpenAI Agents SDK sandbox path is more valuable than maintaining your own execution fleet. It is also a good fit for data-analysis assistants, code-interpreter features, MCP tool execution, and agent workflows that need snapshots or repeatable sandbox templates.

  • Skip it if your agent never runs untrusted code. If the code your agent executes is your own, pinned, and reviewed, a local container is simpler and cheaper — E2B's isolation earns its cost specifically when an LLM writes the code from a user prompt.
  • Skip the managed cloud if you need full data residency control. Where every byte must stay inside your own VPC, weigh self-managed Firecracker or a provider you fully control against E2B's hosted sandboxes.
  • Skip it for a quick local prototype. For a one-off script on trusted input, the SDK and per-second sandbox billing are overhead you do not need yet.
  • Skip it until you can run a real sandbox acceptance test. If you cannot create an E2B account or use an API key in a disposable project, keep the integration as a source-verified candidate rather than a production-approved dependency.

What Effloow Added

E2B's docs explain the SDK; they don't tell you whether you need a sandbox at all, or how it compares to running code locally. We added the decision and a tested path:

  • A bounded SDK walkthrough. Effloow Lab installed the SDK, inspected the run_code signature, confirmed context/snapshot/MCP method names, and captured the missing-key authentication error. It did not run live sandbox code.
  • A vs-alternatives table (E2B, Modal, Daytona, local Docker) keyed to the real question: trusted vs untrusted code author.
  • A source-backed fit and limitation matrix that tells teams when the official docs are enough and when they still need a live sandbox acceptance test.

The value is the "do you even need a sandbox, and if so which one" decision, not a re-paste of the SDK reference.

Bottom Line

If your agent executes LLM-generated code from external users, E2B is a strong source-verified default to evaluate: it is purpose-built for agent sandboxes, documents OpenAI Agents SDK integration, and exposes the right SDK primitives. Treat production readiness, latency, and monthly cost as [DATA NOT AVAILABLE] until you run a disposable live sandbox acceptance test.

Get the next one
in your inbox.

One short weekly dispatch with new guides, tools, and what we tested. No spam, unsubscribe anytime.

Get weekly AI tool reviews & automation tips

Join our newsletter. No spam, unsubscribe anytime.

More in Articles