E2B Sandbox: Secure Code Execution for AI Agents
Every team building a code-executing AI agent eventually runs into the same wall: where does the model-generated code actually run? Running it on your host server is a liability. Running it inside Docker is better but not enough for untrusted LLM output at scale. E2B was built specifically for that problem, and it now has a documented OpenAI Agents SDK integration for running SandboxAgent workflows inside E2B sandboxes.
This guide covers what E2B is, how it works, how to integrate it with Python and the major LLM providers, and when it makes sense compared to alternatives like Modal and Daytona. If you are building the agents that will run this code, our OpenAI Agents SDK multi-agent tutorial and AI agent frameworks compared cover the orchestration side.
Effloow Lab note: Effloow Lab installed
e2b-code-interpreter==2.6.2ande2b==2.20.3, inspected the SDK surface, and confirmed the authentication boundary. Live sandbox execution requires an E2B API key, so this article does not claim a live E2B sandbox run. The public evidence note is at the E2B sandbox PoC lab note.
Why Sandboxed Code Execution Matters
When an AI agent generates and executes code, several things can go wrong that simply don't happen with human-written code:
- The model generates
import subprocess; subprocess.run(["rm", "-rf", "/"])based on a prompt-injection attack - A data analysis agent leaks API keys from environment variables into output files
- A code-generating chatbot causes a denial of service by spawning infinite threads
- Cross-tenant attacks expose one user's data to another in a multi-tenant deployment
Traditional containerization mitigates some of these, but containers share the host kernel — a kernel exploit in model-generated code could affect other workloads. The security gap between "containerized" and "isolated" matters when the code author is an LLM responding to arbitrary user input.
E2B solves this with Firecracker microVMs: each sandbox runs inside its own lightweight virtual machine with a separate kernel, hardware-level isolation, and no shared attack surface between tenants.
What E2B Is
E2B is a managed sandbox platform with open-source SDKs for running AI-generated code in isolated environments. E2B describes the product as an open-source, secure environment for enterprise-grade agents. Each sandbox is designed to:
- Starts managed sandbox sessions for code execution
- Runs a Jupyter kernel for interactive, stateful code execution
- Supports Python, JavaScript/TypeScript, and other languages
- Provides filesystem access, internet access, and package installation
- Can run for up to 24 hours per session
The platform is LLM-agnostic: you bring your own model (Claude, GPT, Gemini, Llama) and use E2B for the execution layer. It ships Python and TypeScript/JavaScript SDKs.
Source snapshot checked on 2026-06-20:
- E2B's pricing page lists a free Hobby tier with a one-time $100 credit, 1-hour sandbox sessions, and 20 concurrent sandboxes.
- E2B's billing docs list Pro at $150/month with 24-hour continuous runtime and 100 to 1,100 concurrent sandboxes depending on add-ons.
- E2B's homepage states that E2B is used by 94% of Fortune 100 companies, with 7M+ monthly downloads and 1B+ started sandboxes. Treat those as vendor-stated adoption figures, not independent market-share data.
- The E2B OpenAI Agents SDK page documents
E2BSandboxClient,SandboxAgent, andRunConfigintegration. - The E2B code-interpreter repository remains the live source for SDK code and examples; this article does not freeze a GitHub-star number.
Source-Backed Fit Matrix
This table is this guide's original-value asset. It separates what the sources verify from what Effloow's local lab note actually confirmed.
| Decision area | Source-backed fact | Effloow verified locally | What to do differently |
|---|---|---|---|
| SDK availability | The E2B code-interpreter repository documents pip install e2b-code-interpreter and Sandbox.runCode / Sandbox.run_code examples. |
Installed e2b-code-interpreter==2.6.2 and e2b==2.20.3; inspected Sandbox, AsyncSandbox, Execution, McpServer, and GitHubMcpServer exports. |
Treat the SDK surface as real, but do not claim live execution until your environment has E2B_API_KEY and a successful sandbox run. |
| OpenAI Agents SDK path | E2B docs show E2BSandboxClient, SandboxAgent, and SandboxRunConfig; OpenAI announced native sandbox execution in the Agents SDK on 2026-04-15. |
No OpenAI or E2B API call was made in this lab note. | Use the integration when you want the agent harness and sandbox lifecycle tied together; keep a separate fallback plan for local or provider-specific execution. |
| Code contexts | E2B code-context docs say create_code_context() / createCodeContext() can run code in different execution contexts. |
Confirmed the Python method exists on the installed SDK. | Use contexts to reduce accidental state leakage inside one sandbox, but do not treat them as a substitute for tenant-level sandbox isolation without a live security review. |
| MCP gateway | E2B MCP docs describe MCP tools running inside E2B sandboxes with internet-connected Linux access. | Confirmed MCP-related classes are exported by the installed SDK. | Prefer MCP gateway patterns when agents need tools plus an execution machine; keep secrets and external connectors scoped per sandbox. |
| Billing and limits | E2B docs say sandboxes are billed per second while running, with Hobby and Pro limits as listed above. | The lab did not create a billable sandbox. | Model cost with runtime seconds, concurrency, pause/kill behavior, and CPU/RAM sizing before calling E2B cheaper or more expensive than another provider. |
SDK Installation and Setup
Install the Python SDK:
pip install e2b-code-interpreter
This installs e2b-code-interpreter==2.6.2 and e2b==2.20.3 (as of May 2026). Get your API key from the E2B dashboard at e2b.dev/dashboard under the Team tab. Set it as an environment variable:
export E2B_API_KEY="e2b_..."
Or pass it directly in code:
from e2b_code_interpreter import Sandbox
with Sandbox.create(api_key="e2b_...") as sandbox:
execution = sandbox.run_code("print('hello from sandbox')")
print(execution.text) # "hello from sandbox"
The with statement ensures the sandbox is killed when the block exits. For long-running agents, you can manage lifecycle manually:
sandbox = Sandbox.create()
# ... do work ...
sandbox.kill()
The run_code API
The run_code method is the core of E2B's code interpreter. Its full signature:
def run_code(
self,
code: str,
language: Optional[str] = None,
context: Optional[Context] = None,
on_stdout: Optional[Callable[[OutputMessage], Any]] = None,
on_stderr: Optional[Callable[[OutputMessage], Any]] = None,
on_result: Optional[Callable[[Result], Any]] = None,
on_error: Optional[Callable[[ExecutionError], Any]] = None,
envs: Optional[Dict[str, str]] = None,
timeout: Optional[float] = None,
request_timeout: Optional[float] = None,
) -> Execution
The streaming callbacks — on_stdout, on_result, on_error — are particularly useful for agents that need to stream execution feedback to end users in real time rather than waiting for the full result.
A basic example with streaming output:
from e2b_code_interpreter import Sandbox
def handle_stdout(msg):
print(f"[sandbox stdout] {msg.line}")
with Sandbox.create() as sandbox:
# Stateful: variables persist within the same sandbox session
sandbox.run_code("import pandas as pd\nimport numpy as np")
execution = sandbox.run_code(
"""
df = pd.DataFrame({'x': np.random.randn(100), 'y': np.random.randn(100)})
print(f"Shape: {df.shape}")
print(df.describe())
""",
on_stdout=handle_stdout,
)
if execution.error:
print(f"Error: {execution.error.name}: {execution.error.value}")
else:
print("Final text output:", execution.text)
Because the sandbox runs a Jupyter kernel, state is preserved between run_code calls within the same session. Imports, variables, and loaded data persist until the sandbox is killed or a new context is created.
Integrating E2B with Claude (Anthropic)
E2B maintains an official cookbook with examples for all major providers. The Anthropic pattern uses Claude's tool use API to generate code, then executes it in an E2B sandbox:
import anthropic
from e2b_code_interpreter import Sandbox
client = anthropic.Anthropic()
CODE_INTERPRETER_TOOL = {
"name": "execute_python",
"description": "Execute Python code in a secure sandbox. Use for data analysis, calculations, and visualization.",
"input_schema": {
"type": "object",
"properties": {
"code": {
"type": "string",
"description": "Python code to execute"
}
},
"required": ["code"]
}
}
def run_agent_with_sandbox(user_message: str):
with Sandbox.create() as sandbox:
messages = [{"role": "user", "content": user_message}]
while True:
response = client.messages.create(
model="claude-opus-4-7",
max_tokens=4096,
tools=[CODE_INTERPRETER_TOOL],
messages=messages
)
if response.stop_reason == "end_turn":
# Extract final text response
for block in response.content:
if hasattr(block, "text"):
return block.text
break
# Handle tool use
tool_results = []
for block in response.content:
if block.type == "tool_use" and block.name == "execute_python":
execution = sandbox.run_code(block.input["code"])
result = execution.text if not execution.error else f"Error: {execution.error.name}: {execution.error.value}"
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": result
})
# Append assistant turn and tool results
messages.append({"role": "assistant", "content": response.content})
messages.append({"role": "user", "content": tool_results})
result = run_agent_with_sandbox(
"Load the iris dataset from sklearn, compute correlation matrix, and summarize key findings."
)
print(result)
This pattern gives Claude persistent, stateful code execution that spans multiple tool calls within a single agent turn. The sandbox outlives any individual request and carries state across the full conversation.
Integrating E2B with OpenAI Agents SDK
Since April 2026, E2B is a native sandbox provider in OpenAI's Agents SDK. The integration lets you declare E2B as the execution environment directly in your agent configuration:
from agents import Agent, Runner
from agents.sandbox.e2b import E2BSandbox
agent = Agent(
name="DataAnalystAgent",
instructions="You are a data analysis agent. Use the code interpreter to analyze data.",
sandbox=E2BSandbox(api_key="e2b_..."),
)
result = Runner.run_sync(agent, "Analyze the distribution of AAPL stock returns in 2025")
print(result.final_output)
The Agents SDK handles the full lifecycle: spinning up the sandbox, routing code execution requests to it, and tearing it down when the session ends.
Execution Isolation With Code Contexts
For agents that need to run code from multiple untrusted users in the same sandbox session (e.g., a multi-tenant chatbot), E2B provides isolated code contexts:
with Sandbox.create() as sandbox:
# Create isolated contexts for two users
ctx_alice = sandbox.create_code_context()
ctx_bob = sandbox.create_code_context()
# Alice's code doesn't affect Bob's namespace
sandbox.run_code("secret = 'alice_secret_value'", context=ctx_alice)
sandbox.run_code("secret = 'bob_secret_value'", context=ctx_bob)
# Each context has its own namespace
result_alice = sandbox.run_code("print(secret)", context=ctx_alice)
result_bob = sandbox.run_code("print(secret)", context=ctx_bob)
print(result_alice.text) # "alice_secret_value"
print(result_bob.text) # "bob_secret_value"
This is useful when you want to avoid spinning up a new sandbox (and paying the cold start cost) for every user while still preventing state leakage.
Snapshot and Persistence
E2B sandboxes can be snapshotted and restored, which matters for agents with expensive setup steps (installing packages, loading large models, warming up caches):
with Sandbox.create() as sandbox:
# Expensive one-time setup
sandbox.run_code("pip install -q scikit-learn matplotlib seaborn pandas")
sandbox.run_code("from sklearn.datasets import load_iris; iris = load_iris()")
# Save current state
snapshot = sandbox.create_snapshot()
snapshot_id = snapshot.snapshot_id
print(f"Snapshot saved: {snapshot_id}")
# Later — reconnect to the pre-warmed state
with Sandbox.create(snapshot=snapshot_id) as pre_warmed_sandbox:
result = pre_warmed_sandbox.run_code("print(iris.target_names)")
print(result.text) # ['setosa', 'versicolor', 'virginica']
Snapshots eliminate repeated install costs for workloads like data analysis agents that always need the same libraries pre-loaded.
MCP Server Integration
E2B 2.x ships a built-in MCP (Model Context Protocol) server interface, allowing any MCP-compatible agent to use E2B sandboxes as tool servers:
from e2b_code_interpreter import Sandbox, McpServer
with Sandbox.create() as sandbox:
# Get MCP-compatible URL and token for this sandbox
mcp_url = sandbox.get_mcp_url()
mcp_token = sandbox.get_mcp_token()
print(f"MCP endpoint: {mcp_url}")
This means LangGraph agents using langchain-mcp-adapters, ADK agents, or any other MCP-capable framework can connect to E2B sandboxes using the standard MCP protocol without custom integration code.
E2B also ships a GitHubMcpServer class for agents that need to interact with GitHub repositories inside a sandbox:
from e2b_code_interpreter import Sandbox, GitHubMcpServer
github_mcp = GitHubMcpServer(token="ghp_...")
with Sandbox.create(mcp_servers=[github_mcp]) as sandbox:
# Agent can now read/write GitHub repos inside the sandbox
result = sandbox.run_code("# list files from connected GitHub repo")
Async Support
For production FastAPI or async Python applications, use AsyncSandbox:
import asyncio
from e2b_code_interpreter import AsyncSandbox
async def analyze_data(user_code: str) -> str:
async with AsyncSandbox.create() as sandbox:
execution = await sandbox.run_code(user_code)
if execution.error:
return f"Error: {execution.error.name}"
return execution.text
result = asyncio.run(analyze_data("print(sum(range(1000)))"))
print(result) # "499500"
The async API mirrors the sync one completely. Use AsyncSandbox whenever your application is already async (FastAPI, Starlette, etc.) to avoid blocking the event loop on sandbox operations.
E2B vs. Alternatives
| Feature | E2B | Modal | Daytona | Local Docker |
|---|---|---|---|---|
| Isolation | Firecracker microVM | gVisor (syscall intercept) | Docker container | Docker container |
| Startup latency | [DATA NOT AVAILABLE] until live-tested | [DATA NOT AVAILABLE] in this article | [DATA NOT AVAILABLE] in this article | Depends on local host |
| Purpose-built for AI | Yes | General compute + AI | Dev workspace | No |
| Public pricing signal checked here | E2B Hobby/Pro docs linked below | [DATA NOT AVAILABLE] | [DATA NOT AVAILABLE] | Your infra cost |
| Session duration signal checked here | E2B docs list 1h Hobby and 24h Pro | [DATA NOT AVAILABLE] | [DATA NOT AVAILABLE] | Your host limit |
| OpenAI Agents SDK path checked here | Documented by E2B and OpenAI | [DATA NOT AVAILABLE] | [DATA NOT AVAILABLE] | No managed integration |
| MCP signal checked here | E2B docs describe MCP gateway usage | [DATA NOT AVAILABLE] | [DATA NOT AVAILABLE] | Manual setup |
| Best for | AI agent code execution | General inference + batch | Dev workspaces | Local dev only |
The key distinction: E2B is purpose-built for untrusted LLM code execution with Firecracker microVM isolation. Modal runs on gVisor and sits inside a broader compute platform (inference, training, batch). Daytona comes from a developer workspace perspective — it's fast but designed for persistent environments, not ephemeral code execution from untrusted sources.
For production agents executing LLM-generated code, E2B is the strongest candidate verified in this article. Compare Modal, Daytona, and local Docker separately before making a cost, latency, or vendor-risk decision; this guide did not live-test those alternatives.
Pricing Snapshot
E2B's pricing and billing docs say the service charges per second while a sandbox is running. Re-check the live pricing page before procurement, because plan limits and per-second rates can change.
- Hobby (free): $100 one-time usage credit, no credit card required, up to 1-hour sandbox sessions, and up to 20 concurrent sandboxes.
- Pro ($150/month): 24-hour sessions, CPU/RAM customization, up to 100 concurrent sandboxes, and paid concurrency add-ons up to 1,100 according to the current E2B page.
- Enterprise: custom pricing. The public E2B pricing page does not expose a universal Enterprise price; use
[DATA NOT AVAILABLE]for a generic minimum unless E2B provides a current quote.
At the current listed CPU rate of $0.000014/second for 1 vCPU, CPU alone would make $100 cover about 7.1 million vCPU-seconds. That is not a full workload cost estimate because RAM, storage, plan limits, idle time, and paid add-ons also matter.
A rough production calculator should use: average sandbox runtime, average CPU count, average memory, number of runs, pause/kill behavior, and concurrent-session limits. Without those inputs, monthly production cost is [DATA NOT AVAILABLE].
Failure and Limitation Table
| Limitation | Why it matters | Mitigation |
|---|---|---|
| No live sandbox execution in the Effloow lab note | The saved evidence proves installability and SDK surface inspection, not runtime latency, isolation strength, package install success, or billing behavior. | Run a disposable live sandbox with non-secret code before using this article as a production acceptance checklist. |
| Vendor adoption figures are self-reported | E2B's homepage claims broad Fortune 100 adoption, but that is not independent market-share evidence. | Cite it as an E2B-stated adoption signal only; do not use it as a ranking or procurement proof by itself. |
| Pricing is usage-shaped | A cheap per-second CPU rate can still become expensive if agents stay idle, hold long sessions, or need high concurrency. | Enforce pause / kill, track active sandbox seconds, and test concurrency limits before launch. |
| Contexts are not tenant isolation by themselves | Code contexts help separate execution state inside a sandbox, but a multi-tenant threat model may require separate sandboxes. | Put untrusted tenants in separate sandboxes unless a security review proves shared-session contexts are sufficient. |
| Managed cloud may violate data-residency rules | Hosted execution can be unacceptable for regulated data or customer-controlled infrastructure requirements. | Use E2B only for data you are allowed to send to the provider, or evaluate a self-managed Firecracker-style architecture. |
FAQ
Q: Does E2B work with local/self-hosted models like Ollama or vLLM?
Yes. E2B is completely LLM-agnostic — it handles the execution layer only. You point your agent at any model (Ollama, vLLM, Groq, self-hosted) and use E2B to run the code that model generates. The SDK doesn't care what produced the code string.
Q: Can I install custom packages in an E2B sandbox?
Yes. E2B sandboxes have internet access by default, so you can run pip install or npm install inside a run_code call. For repeated use, create a snapshot after installation so you don't pay the install time on every run.
Q: How does E2B compare to running code in a subprocess locally?
Local subprocess execution is not sandboxed: model-generated code can read environment variables, make network requests, access the filesystem, and potentially execute system commands. E2B runs code in an isolated microVM with its own kernel — even a kernel exploit can't reach your host. For production agents handling external user input, local subprocess execution is not a safe option.
Q: What languages does E2B support?
The run_code API supports Python and JavaScript by default (the language parameter selects between them). R, Bash, and other languages can be run via subprocess inside the Python environment. Custom language support is possible with custom sandbox templates.
Q: Is there an open-source self-hosted option?
E2B's core SDK is Apache 2.0 open source. The cloud infrastructure (Firecracker orchestration, API, billing) is not open-source. For self-hosted deployments at enterprise scale, contact E2B for the Enterprise plan. For local development, you can mock the sandbox execution locally, but production workloads with untrusted code should use the cloud service or equivalent hardware-level isolation.
Key Takeaways
E2B solves a real production problem that every code-executing agent eventually faces: where does LLM-generated code run safely? The answer isn't "in a container on your server" when the code author is an AI responding to arbitrary user prompts.
The SDK design reflects this clearly. run_code supports streaming callbacks for real-time output, code contexts for separate execution state, and snapshots for warm restarts. MCP server support means MCP-capable agent frameworks can use E2B as the execution machine when configured correctly. The OpenAI Agents SDK integration in April 2026 makes E2B a documented sandbox option, not proof that it is the best provider for every production Python agent.
When to Use / When to Skip
Use E2B when an AI agent executes code produced from external user prompts, when you need a stateful workspace with filesystem and terminal access, or when the OpenAI Agents SDK sandbox path is more valuable than maintaining your own execution fleet. It is also a good fit for data-analysis assistants, code-interpreter features, MCP tool execution, and agent workflows that need snapshots or repeatable sandbox templates.
- Skip it if your agent never runs untrusted code. If the code your agent executes is your own, pinned, and reviewed, a local container is simpler and cheaper — E2B's isolation earns its cost specifically when an LLM writes the code from a user prompt.
- Skip the managed cloud if you need full data residency control. Where every byte must stay inside your own VPC, weigh self-managed Firecracker or a provider you fully control against E2B's hosted sandboxes.
- Skip it for a quick local prototype. For a one-off script on trusted input, the SDK and per-second sandbox billing are overhead you do not need yet.
- Skip it until you can run a real sandbox acceptance test. If you cannot create an E2B account or use an API key in a disposable project, keep the integration as a source-verified candidate rather than a production-approved dependency.
What Effloow Added
E2B's docs explain the SDK; they don't tell you whether you need a sandbox at all, or how it compares to running code locally. We added the decision and a tested path:
- A bounded SDK walkthrough. Effloow Lab installed the SDK, inspected the
run_codesignature, confirmed context/snapshot/MCP method names, and captured the missing-key authentication error. It did not run live sandbox code. - A vs-alternatives table (E2B, Modal, Daytona, local Docker) keyed to the real question: trusted vs untrusted code author.
- A source-backed fit and limitation matrix that tells teams when the official docs are enough and when they still need a live sandbox acceptance test.
The value is the "do you even need a sandbox, and if so which one" decision, not a re-paste of the SDK reference.
If your agent executes LLM-generated code from external users, E2B is a strong source-verified default to evaluate: it is purpose-built for agent sandboxes, documents OpenAI Agents SDK integration, and exposes the right SDK primitives. Treat production readiness, latency, and monthly cost as [DATA NOT AVAILABLE] until you run a disposable live sandbox acceptance test.
Get the next one
in your inbox.
One short weekly dispatch with new guides, tools, and what we tested. No spam, unsubscribe anytime.
Get weekly AI tool reviews & automation tips
Join our newsletter. No spam, unsubscribe anytime.
More in Articles
Hermes Agent review: a fast-growing open-source AI agent that learns your workflow — self-improving skills, three-layer memory, setup, pricing.
SpecKV (arXiv:2605.02888) shows fixed γ=4 costs 56% throughput. Adaptive gamma, KV cache compression effects, and vLLM production tuning guide.
Step-by-step guide to self-hosting RAGFlow v0.25 with Docker Compose — deep document understanding, chunking strategies, MCP server, and the Python SDK.
A practical 2026 guide to on-device AI: NPU vs GPU vs CPU for LLM inference, Apple M5 MLX, Qualcomm X Elite, Core AI for iOS 27, and edge deployment.