ARTICLES ·2026-05-25 ·BY EFFLOOW CONTENT FACTORY

Google Managed Agents: Gemini API Runtime Guide 2026

A developer guide to Google Managed Agents in the Gemini API, with local sandbox evidence for agent package and policy design.

gemini managed-agents antigravity interactions-api agent-runtime sandbox-poc ai-frameworks

Google Managed Agents: Gemini API Runtime Guide 2026

Google introduced Managed Agents in the Gemini API on May 19, 2026, as a preview runtime for running the Antigravity agent in a Google-hosted Linux environment. The official announcement says the agent can reason, call tools, execute code, manage files, and browse the web inside an isolated ephemeral environment, while custom agents can be defined with versionable AGENTS.md and SKILL.md files.

That is a meaningful shift for developer teams. Until now, most production-grade agent work meant assembling the model, orchestration loop, tool executor, workspace, browser access, file handling, logs, retry behavior, and security boundary yourself. Managed Agents moves part of that execution layer into the Gemini API and Interactions API surface.

This guide explains what changed, where it fits, and how to prepare a project for it without overstating the preview. Effloow Lab also ran a local sandbox PoC for the part that can be tested safely without credentials: packaging an agent instruction file, adding a skill file, and enforcing a deterministic policy gate before any irreversible action. The evidence note is in data/lab-runs/google-managed-agents-gemini-api-developer-guide-2026.md.

Important boundary: Effloow Lab did not call the live Gemini Managed Agents API in this run. No API key, production credential, Google-hosted sandbox, billing behavior, quota behavior, MCP connectivity, or workspace download path was tested. Treat the PoC as preparation evidence, not as a product benchmark.

Why This Matters

Managed Agents are not just another model endpoint. The core idea is that a developer can ask the Gemini API to run an agent that already has a managed execution environment. According to Google's announcement, a single call to the Antigravity agent provisions a remote Linux environment where the agent can use the harness, execute code, manage files, and browse the web. The same announcement says each interaction creates or receives an environment that can be reused in follow-up calls so files and state can continue across the session.

That matters because agent infrastructure has become the messy part of the stack. A chatbot can return text. An agent that edits files, writes reports, downloads artifacts, searches the web, or runs Python needs state, isolation, auditability, and failure boundaries. If those are missing, the system becomes either a demo or an operational risk.

Google is also steering new agentic work toward the Interactions API. The docs describe it as a beta interface designed for agentic workflows, server-side state management, complex multimodal conversations, typed execution steps, and long-running or background tasks. The older generateContent API remains supported, but Google positions Interactions as the new standard for new agentic projects.

For developers, the immediate takeaway is practical: start separating agent behavior from application glue. Put the role, allowed work, data boundaries, and approval rules in files that can be reviewed. Keep irreversible actions behind policy gates. Assume the hosted runtime can remove some infrastructure work, but not product responsibility.

What Google Announced

The official Google post says Managed Agents in the Gemini API are powered by the new Antigravity agent, built on Gemini 3.5 Flash, and available through the Interactions API and Google AI Studio. It also says developers can define custom agents with instructions, skills, and data, then register them as managed agents.

The preview has three developer-facing ideas:

A hosted execution environment for agent work.
A file-based way to define custom agent behavior with AGENTS.md and SKILL.md.
A Gemini API path that can preserve environment state across interactions.

The tools docs are relevant here because they separate built-in tools from custom execution. Gemini's built-in tools include Google Search, Google Maps, Code Execution, URL Context, Computer Use in preview, and File Search. For built-in tools, Google handles execution inside the API call. For custom tools and computer-use-style work, the application still has execution responsibilities.

That split is the key architectural line. Managed Agents may reduce the amount of execution infrastructure you own, but it does not mean every integration becomes magically safe. If a workflow can mutate billing records, delete files, send money, or publish content, the application still needs approval boundaries, scoped credentials, and reviewable artifacts.

Effloow Lab Sandbox PoC

The local sandbox asked a narrow question: before calling any hosted agent runtime, can a team express a useful agent package and enforce a simple policy gate around it?

The answer was yes. The run created a throwaway directory at /tmp/effloow-google-managed-agent-poc, wrote an AGENTS.md file for an invoice evidence agent, wrote a skills/invoice-audit/SKILL.md file, loaded a four-row CSV, and ran two Node scripts.

The validator checked four requirements:

{
  "hasRole": true,
  "hasHumanApprovalRule": true,
  "hasNoMutationRule": true,
  "hasSkillProcess": true
}

The policy simulation then allowed read/report actions and denied an irreversible payment action:

{
  "summary": {
    "totalRows": 4,
    "byStatus": {
      "paid": 2,
      "pending": 1,
      "failed": 1
    },
    "negativeAmounts": ["INV-101"],
    "missingCurrency": ["INV-102"]
  },
  "allowed": ["read_csv", "write_report"],
  "denied": ["refund_payment"],
  "report": "artifacts/report.md"
}

This did not prove anything about Google's hosted sandbox. It did prove a useful preparation pattern: define agent behavior as reviewable files, validate the package before runtime, and make the policy gate deterministic enough that a dangerous action is denied even when the task context contains payment-like language.

For a real Managed Agents integration, this should become a preflight step. Before an agent package reaches the API, the repository should fail CI if the package has no role, no approval rule, no no-mutation boundary for read-only jobs, or no skill-specific process.

Architecture Pattern

Think of Managed Agents as an execution layer, not as the whole product. The application still needs to own user intent, authentication, billing rules, audit logs, and final decisions.

A conservative architecture looks like this:

User request
  -> application policy check
  -> agent package selection
  -> Managed Agent interaction
  -> artifact download or response inspection
  -> deterministic verification
  -> human approval for risky actions
  -> application-owned mutation

The important part is the last two steps. If the agent suggests a refund, deployment, schema change, email send, or invoice write-off, the application should verify the artifact and require approval before the mutation. The agent can prepare the work. The product should decide whether the work is allowed.

This is similar to the pattern in our Temporal durable execution guide, where durable workflow state matters more than model cleverness once a job runs for more than one request. It also fits the direction of our MCP roadmap guide, where agent tools need discovery, task state, and explicit server boundaries.

Google's Interactions API docs make the state point explicit: the API supports server-side history management through previous_interaction_id, with store=true enabled by default and a stateless option available through store=false. That is useful, but it also means developers should decide which requests deserve persistent server-side state and which should remain stateless for privacy, cost, or control reasons.

Setup Checklist

Before wiring a production feature to Managed Agents, prepare the repository as if the agent package will be reviewed by a security engineer.

Start with the agent role:

# Invoice Evidence Agent

Role: inspect uploaded invoice CSV files and produce a reconciliation checklist.

Constraints:
- Never call external payment APIs.
- Never mutate source files.
- Write findings to artifacts/report.md.
- Ask for human approval before any irreversible action.

Then add skills as small, task-specific documents:

# invoice-audit

Use this skill when a CSV contains invoice_id, amount, currency, and status columns.

Process:
1. Parse CSV headers.
2. Count rows by status.
3. Flag negative amounts and missing currencies.
4. Emit a markdown summary.

Keep each skill narrow. A broad "finance agent" skill is hard to validate. A skill that reads a known CSV shape and emits a markdown report is much easier to inspect, test, and deny when the task drifts.

Add CI checks for the package:

node validate-agent-package.mjs
node simulate-policy-gate.mjs

The validator should fail when required safety text is missing. The policy simulation should fail if a risky action is accidentally allowed. In a real team, the checks would be more structured than simple text matching, but simple checks are still better than sending unreviewed behavior into a hosted runtime.

What To Use It For First

The best early use cases are artifact-producing workflows where the agent can work inside a sandbox and the product can inspect the output before doing anything irreversible.

Good candidates:

Data cleanup reports over small uploaded files.
Research summaries that cite source URLs.
Repository analysis that writes a markdown issue brief.
Test generation where a developer reviews the patch.
Back-office reconciliation checklists.
Migration planning documents.

Poor first candidates:

Direct payment changes.
Production database writes.
Legal, medical, or financial decisions without expert review.
Autonomous customer messaging.
Deployment flows with no rollback gate.
Agents that receive broad credentials and choose their own action space.

The difference is not whether the agent is hosted by Google, Anthropic, OpenAI, or a self-hosted stack. The difference is whether the workflow leaves inspectable evidence before it mutates the outside world.

Common Mistakes

The first mistake is treating a managed runtime as a managed product. Google can host the sandbox, but your product still owns authorization, user consent, logging, incident response, data classification, and customer-visible actions.

The second mistake is hiding agent behavior in prompts that only exist in a dashboard. If AGENTS.md, SKILL.md, and policy files live in the repository, they can be reviewed, versioned, tested, and rolled back. If they live only as UI state, the review loop is weaker.

The third mistake is sending too much data into the environment. DataCamp's tutorial notes inline data examples and also points to repository-based sources for larger inputs. That is useful, but teams still need a data-minimization rule. Send the smallest useful file, redact secrets, and avoid mounting broad repositories when a narrow fixture or export will do.

The fourth mistake is benchmarking the preview from anecdotes. This article does not include speed, cost, success-rate, or quality claims because this run did not measure the live API. Pricing and preview quota details should be checked from Google's current pricing and billing pages before any cost model is published. If a detail cannot be verified, write [DATA NOT AVAILABLE] rather than guessing.

FAQ

Q: Is Gemini Managed Agents generally available?

Google announced Managed Agents in the Gemini API as a preview on May 19, 2026. The official post also says enterprise support on the Gemini Enterprise Agent Platform is in private preview. Treat availability, quotas, regions, and billing as moving parts until verified in your own account.

Q: Do Managed Agents replace LangGraph, Temporal, or custom orchestration?

Not always. Managed Agents can reduce sandbox and execution setup for some workflows. Durable business processes may still need workflow engines, queues, retries, human approval states, and application-owned mutations. If a job needs guaranteed retries over days, compare the hosted agent runtime with durable workflow tools before moving the whole process.

Q: Can Managed Agents use MCP?

Google's announcement and related coverage describe MCP support in the managed-agent direction, and Google's broader agent documentation includes tool and agent sections. This Effloow Lab run did not test MCP connectivity. For production work, verify current MCP support in the Gemini API docs and test with a non-sensitive MCP server first.

Q: What should be in AGENTS.md?

At minimum, include the agent role, allowed data sources, output artifact path, forbidden actions, escalation rules, and human approval requirements. The file should be short enough to review and specific enough that a validator can detect missing safety boundaries.

Q: Should developers start with Google AI Studio or the API?

Use Google AI Studio to inspect the preview and templates quickly. Use the API path when you need version control, CI checks, repeatable environments, artifact handling, and product integration. Do not skip repository review just because the hosted playground makes setup easier.

Key Takeaways

Google Managed Agents are important because they move agent execution closer to the model provider's API surface. The preview promises a hosted Antigravity agent, remote Linux environment, file and code execution, web browsing, and custom behavior through AGENTS.md and SKILL.md.

The safe developer response is not to remove architecture. It is to move more behavior into reviewable files, validate those files before runtime, keep risky actions behind deterministic policy gates, and require evidence before mutation.

Effloow Lab's local sandbox PoC supports that preparation pattern. It does not prove live Gemini API behavior. The next step for a production team is a credentialed preview test with a harmless artifact-producing workflow, followed by explicit checks for state retention, logs, downloads, quota limits, pricing, MCP behavior, and region availability.

Sources

Need content like this
for your blog?

We run AI-powered technical blogs. Start with a free 3-article pilot.

Learn more →