ARTICLES ·2026-05-07 ·BY EFFLOOW CONTENT FACTORY

Cloudflare Code Mode MCP: Entire API in 1,000 Tokens

Cloudflare's Code Mode MCP server covers 2,500+ API endpoints in ~1,000 tokens via two tools: search() and execute(). Here's how it works and how to use it.

cloudflare mcp ai-agents token-optimization developer-tools api-integration

Cloudflare Code Mode MCP: Entire API in 1,000 Tokens

The fundamental problem with Model Context Protocol servers and large APIs is one of scale. A typical enterprise API has hundreds to thousands of endpoints. Representing each one as a separate MCP tool means tens of thousands of tokens just for the tool definitions — before the model has said a single word. For an API like Cloudflare's, with over 2,500 endpoints, a conventional MCP server would consume approximately 1.17 million tokens per request. That exceeds the entire context window of most frontier models.

Cloudflare's answer, shipped in April 2026, is Code Mode: a fundamentally different architecture that reduces the token footprint by 99.9%, to around 1,000 tokens, while covering the entire API surface through just two tools.

This article explains how Code Mode works, how to use it, and why the underlying pattern matters for any developer building AI agents against large APIs.

The Core Problem: Token Inflation in MCP

Standard MCP servers work by enumerating tools. Each tool has a name, description, and parameter schema. For small APIs — a handful of endpoints, each with two or three parameters — this works fine. The token overhead is manageable, and the model can decide which tool to call with a direct lookup.

As APIs grow, this approach breaks down. Cloudflare's API spans DNS records, Workers configuration, R2 storage, Zero Trust policies, DDoS rules, Pages deployments, and dozens of other products. Flattening that surface into individual MCP tool definitions would not only exceed context limits — it would also make it difficult for the model to navigate. Picking the right tool from 2,500 options is a retrieval problem, not a reasoning problem.

Code Mode sidesteps this entirely by not enumerating tools at all.

How Code Mode Works

Code Mode exposes exactly two tools:

search(query) — semantic search over the entire Cloudflare API surface, represented as typed TypeScript types. The model uses this to discover what operations are available and what their inputs and outputs look like.

execute(code) — takes a JavaScript string and runs it inside a secure V8 isolate (Cloudflare's Dynamic Worker sandbox). The code can call any Cloudflare API operation, chain multiple requests, handle pagination, and compose results — all in one execution.

The workflow is:

Model calls search() to understand what the API can do
Model writes JavaScript against the typed API representation
Model calls execute() with that code
The V8 isolate runs it with access to the Cloudflare API client and returns results

The insight is that language models are better at writing code than at choosing the right tool from a list of thousands. Code Mode leverages this. Instead of selecting tool A, then tool B, then tool C in sequence, the model writes a small script that orchestrates all three operations together — and executes it in one round trip.

Security Architecture: Dynamic Worker Isolates

User-generated code running against a production API requires careful sandboxing. Code Mode uses Cloudflare's Dynamic Workers — V8 isolates with strict constraints:

No file system access
No environment variables exposed (prevents prompt injection attacks from leaking credentials)
Outbound requests controlled via explicit handlers — the isolate cannot make arbitrary external calls
No persistent state between executions

This means the model cannot use execute() to exfiltrate environment variables, make unexpected calls to third-party services, or persist data across sessions. The isolation boundary is enforced at the platform level, not just via prompt engineering.

Installing and Using `@cloudflare/codemode`

Cloudflare open-sourced the Code Mode SDK as part of its Agents SDK. The npm package is @cloudflare/codemode.

npm install @cloudflare/codemode

The two key exports are:

import { createCodeTool, DynamicWorkerExecutor } from "@cloudflare/codemode";

createCodeTool — converts your existing tools (or an OpenAPI spec) into a typed TypeScript API surface that the model can search and write code against
DynamicWorkerExecutor — handles the secure execution side, running model-generated code in a V8 sandbox

Using the Cloudflare MCP Server Directly

If you just want to connect an AI agent to the Cloudflare API without building anything from scratch, Cloudflare provides a hosted MCP server. Add it to your MCP client configuration:

{
  "mcpServers": {
    "cloudflare": {
      "url": "https://mcp.cloudflare.com/sse"
    }
  }
}

Code Mode is enabled by default. To disable it and fall back to conventional tool enumeration (at the cost of 1,000x more tokens), append ?codemode=false to the URL:

https://mcp.cloudflare.com/sse?codemode=false

The repository is at github.com/cloudflare/mcp.

What the Model Actually Does in a Code Mode Session

Here is an example interaction to make the pattern concrete. Suppose an agent needs to list all DNS records for a zone and delete any that point to a deprecated IP.

Without Code Mode: the agent would need to call list_dns_records (one tool call), parse the results, loop through each record calling get_dns_record (one call per record), filter, then call delete_dns_record for each matching record. Depending on the number of records, this could be 20–100 tool calls.

With Code Mode:

Agent → search("list and delete DNS records by IP")
Server → returns typed TypeScript interface for DNSRecord operations

Agent → execute(`
  const records = await cf.dns.records.list({ zone_id: ZONE_ID });
  const deprecated = records.filter(r => r.content === "203.0.113.42");
  await Promise.all(deprecated.map(r => cf.dns.records.delete(r.id, { zone_id: ZONE_ID })));
  return { deleted: deprecated.length };
`)
Server → runs in V8 isolate, returns { deleted: 3 }

Two tool calls instead of dozens. The model writes the logic; the isolate executes it.

Building Your Own Code Mode MCP Server

The pattern applies to any large API, not just Cloudflare. If you have an OpenAPI spec, createCodeTool can generate the typed interface:

import { createCodeTool, DynamicWorkerExecutor } from "@cloudflare/codemode";
import openApiSpec from "./your-api-spec.json";

const executor = new DynamicWorkerExecutor({
  // handler that routes execute() calls to your API
  fetch: yourApiFetchHandler,
});

const codeTool = createCodeTool({
  spec: openApiSpec,
  executor,
  name: "your_api",
  description: "Access Your Company API with code-mode efficiency",
});

// codeTool is now an MCP tool definition with search() + execute()
// Add it to your MCP server alongside any other tools

This makes Code Mode a reusable architectural pattern, not just a Cloudflare-specific feature.

Token Comparison

Approach	Token Footprint	Covers All Endpoints	Multi-step Orchestration
Conventional MCP (2,500 tools)	~1.17M tokens	Yes	Sequential (one call at a time)
Code Mode (2 tools)	~1,000 tokens	Yes	Parallel (code orchestrates)
Selective MCP (hand-picked tools)	~5,000–50,000 tokens	No	Sequential
REST API via system prompt	~2,000–10,000 tokens	Partial	Manual prompt engineering

The "selective MCP" row is the common workaround: manually expose only the 50–100 most frequently used endpoints. This keeps token usage manageable but means the agent cannot reach the long tail of API operations. Code Mode eliminates the tradeoff.

Limitations to Know

Model capability dependency — Code Mode works best with models that are good at generating code. The model is writing JavaScript, not selecting from a list. If the model cannot write correct, working JavaScript for the API surface in question, the pattern degrades.

Debugging complexity — when a conventional MCP tool call fails, the failure maps directly to one operation. When model-generated code fails, debugging the JavaScript adds a layer. Error messages from the V8 isolate need to be parseable by the model for self-correction.

Cold starts — Dynamic Worker isolates have a cold start latency. For interactive, latency-sensitive applications, this matters less than for automation workflows. Cloudflare reports cold starts in the sub-millisecond range for warm isolates, but the first execution after a long idle period is slower.

V8 sandbox constraints — the isolate cannot make calls to arbitrary external services. If your workflow needs the Cloudflare API to also call out to a third-party webhook or database during the same execution, that requires explicit configuration of allowed outbound handlers.

Verdict: Code Mode is a genuinely useful architectural pattern for the specific problem of connecting AI agents to large APIs without exploding the context window. The 99.9% token reduction is real and the two-tool design is elegant. If you are building agents that need to interact with large, structured APIs — Cloudflare's or your own — the @cloudflare/codemode SDK is worth evaluating. The main cost is that your agent needs to be a competent code generator. For GPT-4o-class models and above, that is not a significant barrier.

FAQ

Q: Does Code Mode work with any MCP client?

Yes. Code Mode is just a specific way to implement MCP tools — two tools instead of many. Any MCP client that supports standard tool calls can use Code Mode servers without modification.

Q: Can I use Code Mode with my own API (not Cloudflare's)?

Yes. The @cloudflare/codemode SDK lets you wrap any OpenAPI spec. You provide the spec and a fetch handler; createCodeTool generates the typed interface and DynamicWorkerExecutor handles sandboxed execution.

Q: Is the Cloudflare MCP server free to use?

The hosted MCP server at mcp.cloudflare.com uses your Cloudflare API credentials. There is no additional charge beyond normal Cloudflare API usage. The @cloudflare/codemode npm package is open source.

Q: What happens if the model generates incorrect JavaScript in execute()?

The V8 isolate runs the code and returns any runtime errors. The MCP server passes these back to the model, which can self-correct and call execute() again. The quality of self-correction depends on the underlying model.

Q: How does the model know what APIs are available inside execute()?

The search() tool returns typed TypeScript type definitions for the API surface. The model uses these type signatures to write correct JavaScript without needing to see the full OpenAPI spec inline.

Need content like this
for your blog?

We run AI-powered technical blogs. Start with a free 3-article pilot.

Learn more →