Skip to content
Effloow
← Back to Articles
ARTICLES ·2026-06-04 ·BY EFFLOOW CONTENT FACTORY

Microsoft ACS SDK: Agent Control Sandbox PoC

Test Microsoft ACS-style agent control locally with the Agent Governance SDK, policy rules, tool-call denial, and audit verification.
microsoft acs agent-governance ai-agents agent-security typescript sandbox-poc
SHARE
Microsoft ACS SDK: Agent Control Sandbox PoC

Microsoft's Agent Control Specification is one of the more practical Build 2026 ideas because it targets a gap every serious agent team eventually hits: prompts are not controls. If an AI agent can call tools, write files, update tickets, query internal data, or invoke another agent, the runtime needs a deterministic place to say "allow," "deny," or "modify" before the action reaches the real system.

The naming is still easy to confuse. Microsoft's Build recap calls ACS the Agent Control Specification, the public community site uses Agent Control Standard, and the installable package Effloow Lab tested is @microsoft/agent-governance-sdk@4.0.0, a public-preview TypeScript SDK from the Agent Governance Toolkit. This article uses "ACS-style control" for the pattern and is careful not to claim that every framework-specific adapter is generally available.

Effloow Lab ran a local sandbox PoC for this article. The lab installed the TypeScript SDK, installed the Python agent-governance-toolkit==4.0.0 package in a virtualenv, and used the SDK's GenericFrameworkAdapter to allow one simulated tool call while denying a destructive shell-style action before its handler ran. The evidence note is at data/lab-runs/microsoft-acs-sdk-agent-control-multi-framework-sandbox-poc-2026.md.

Effloow Lab — Local sandbox on macOS with Python 3.12.8, Node v25.9.0, npm 11.12.1, @microsoft/agent-governance-sdk@4.0.0, and agent-governance-toolkit==4.0.0. No model API, Microsoft Foundry deployment, LangChain run, CrewAI run, or production MCP server was tested.

Why ACS Matters

Most agent frameworks already have a way to define tools. That is not the same as governing tools. A LangChain, CrewAI, OpenAI Agents SDK, Semantic Kernel, or custom agent can expose a tool schema and still leave critical questions to application code: who is allowed to call the tool, which arguments are safe, which state transitions are legal, what must be logged, and when a human approval should interrupt the flow.

Microsoft's Foundry Build 2026 recap frames ACS as an open source control layer for deterministic checks at five checkpoints: input, LLM, state, tool execution, and output. The related trust-stack announcement describes ACS as a portable policy contract for agent safety controls, expressed in YAML and intended to work across frameworks.

The Agent Control Standard site makes the same point in different words: agent platforms should expose runtime hooks, open source tooling should enforce policies through those hooks, and enterprises should be able to plug in their own classifiers, detectors, and security tools. That puts ACS closer to a runtime control plane than a prompt-writing convention.

This direction also aligns with the broader agent security landscape. OWASP's Agentic AI threats and mitigations guide treats autonomous agents as systems with goal hijacking, tool misuse, identity abuse, memory poisoning, cascading failures, and rogue-agent risks. Those are runtime risks. A system prompt can describe desired behavior, but it cannot reliably prove that a tool call was blocked before execution.

What Shipped Versus What Is Still Emerging

Developers should separate three layers.

First, ACS is the open specification direction. The ACS GitHub repository describes instrumentable, traceable, and inspectable agents, plus work around OpenTelemetry mapping and Agent Bills of Materials. Its roadmap still reads like an evolving standard: public preview documentation and definitions now, then deeper instrumentation and Guardian Agent samples later.

Second, Microsoft has a concrete Agent Governance Toolkit. The toolkit repository lists install commands for Python, TypeScript, .NET, Rust, Go, and developer surfaces such as Copilot CLI and Claude Code. The TypeScript package page exposed @microsoft/agent-governance-sdk@4.0.0 as a public preview package for identity, trust scoring, policy evaluation, and audit logging.

Third, framework integration is the product promise. The Build material says ACS and related tracing/evaluation tools are intended to work across major stacks. The local PoC did not validate real LangChain, CrewAI, OpenAI Agents SDK, Anthropic Agents SDK, AutoGen, Semantic Kernel, Microsoft.Extensions.AI, or MCP integrations. It validated the generic adapter pattern that such integrations can use.

That distinction matters. The right takeaway is not "rewrite your agent stack around ACS today." The right takeaway is "start treating runtime control points as a first-class architecture layer, and watch ACS/Agent Governance Toolkit maturity closely."

What the Sandbox Installed

The sandbox ran in /tmp/effloow-acs-poc-2026 and started with local environment checks:

Python 3.12.8
v25.9.0
11.12.1
zsh:1: command not found: pip

The missing bare pip command was not a blocker. The lab used python3 -m venv and python3 -m pip inside the virtualenv.

Package discovery found the TypeScript SDK:

{
  "version": "4.0.0",
  "name": "@microsoft/agent-governance-sdk",
  "description": "Public Preview — TypeScript SDK for the Agent Governance Toolkit: agent identity, trust scoring, policy evaluation, and audit logging"
}

Python package discovery found:

agent-governance-toolkit (4.0.0)
Available versions: 4.0.0, 3.7.0, 3.6.0, 3.5.0, 3.4.0, 3.3.0, 3.2.2, 3.2.1, 3.2.0, 3.1.0, 3.0.2, 3.0.1, 3.0.0, 2.3.0, 2.1.0

The TypeScript install completed cleanly:

npm init -y
npm install @microsoft/agent-governance-sdk@4.0.0

Relevant output:

added 7 packages, and audited 8 packages in 937ms
found 0 vulnerabilities

The Python install also completed in the virtualenv:

/tmp/effloow-acs-poc-2026/.venv/bin/python -m pip install 'agent-governance-toolkit==4.0.0'

Relevant output:

Successfully installed agent-governance-toolkit-4.0.0 annotated-types-0.7.0 click-8.4.1 pydantic-2.13.4 pydantic-core-2.46.4 pyyaml-6.0.3 typing-extensions-4.15.0 typing-inspection-0.4.2

The SDK exported the pieces needed for a local checkpoint demo: AgentMeshClient, GenericFrameworkAdapter, PolicyEngine, AuditLogger, TraceCapture, GovernanceVerifier, McpSecurityScanner, and TrustManager.

Reproduce the Local Tool-Call Gate

The PoC used the SDK's generic adapter as a framework-neutral stand-in for a real LangChain callback, CrewAI decorator, OpenAI Agents hook, or custom middleware wrapper.

const {
  AgentMeshClient,
  GenericFrameworkAdapter,
} = require("@microsoft/agent-governance-sdk");

async function main() {
  const client = AgentMeshClient.create("effloow-sandbox-agent", {
    policyRules: [
      { action: "framework.tool_call.search_docs", effect: "allow" },
      { action: "framework.tool_call.summarize", effect: "allow" },
      { action: "framework.tool_call.shell.rm", effect: "deny" },
      { action: "*", effect: "deny" },
    ],
  });

  const adapter = new GenericFrameworkAdapter(client);

  const allowed = await adapter.run(
    {
      name: "search_docs",
      kind: "tool_call",
      input: { query: "ACS policy checkpoints" },
    },
    async () => ({ items: ["input", "tool", "output"] }),
  );

  let blockedHandlerRan = false;
  const blocked = await adapter.run(
    {
      name: "shell.rm",
      kind: "tool_call",
      input: { command: "rm -rf /tmp/not-actually-run" },
    },
    async () => {
      blockedHandlerRan = true;
      return { deleted: true };
    },
  );

  console.log(JSON.stringify({
    allowed: {
      decision: allowed.governanceResult.decision,
      allowed: allowed.allowed,
      output: allowed.output,
    },
    blocked: {
      decision: blocked.governanceResult.decision,
      allowed: blocked.allowed,
      handlerRan: blockedHandlerRan,
      reason: blocked.reason,
    },
    auditChainValid: client.audit.verify(),
    auditEntries: client.audit.getEntries().length,
  }, null, 2));
}

main();

Run it:

node acs-checkpoint-demo.js

Output:

{
  "allowed": {
    "decision": "allow",
    "allowed": true,
    "output": {
      "items": [
        "input",
        "tool",
        "output"
      ]
    }
  },
  "blocked": {
    "decision": "deny",
    "allowed": false,
    "handlerRan": false,
    "reason": "Governance denied action \"framework.tool_call.shell.rm\""
  },
  "auditChainValid": true,
  "auditEntries": 2
}

The important field is handlerRan: false. The denied action did not merely fail after execution. It was blocked before the handler body ran. That is the behavior teams want for destructive tools, privileged file operations, deployment actions, customer-data exports, and cross-agent handoffs.

How This Maps to Real Agent Frameworks

The generic adapter pattern is straightforward:

  1. Convert each framework event into a normalized invocation.
  2. Resolve that invocation to an action string.
  3. Evaluate the policy before the handler runs.
  4. Run the handler only on allow.
  5. Record the decision in audit and trace data.

In LangChain, the event might be a callback around tool start. In CrewAI, it might be a wrapped task. In OpenAI Agents SDK, it might sit near a function tool or guardrail boundary. In Semantic Kernel, it might live in middleware around function invocation. In a custom agent, it can be a plain wrapper around every tool function.

The action naming convention is the part developers should design early. A flat name such as delete is too vague. A structured name such as framework.tool_call.shell.rm, crm.contact.read, deploy.production.start, or memory.customer.write gives the policy engine enough shape to express meaningful rules.

For example:

rules:
  - action: "crm.contact.read"
    effect: "allow"
  - action: "crm.contact.export"
    effect: "deny"
  - action: "deploy.production.*"
    effect: "deny"
  - action: "*"
    effect: "deny"

The final catch-all deny matters. Agent systems should fail closed. If a new tool appears and nobody wrote a policy for it, the default should not be silent permission.

Where ACS Fits with OpenTelemetry and MCP

ACS is not trying to replace observability or tool protocols. It sits between them.

MCP standardizes how agents discover and call tools. A2A standardizes agent-to-agent communication. OpenTelemetry gives teams a common way to trace model calls, tool calls, and agent spans. The OpenTelemetry GenAI semantic conventions already define GenAI signals for events, exceptions, metrics, model spans, agent spans, and framework spans.

ACS-style control asks a different question: before this event becomes a real action, what policy decision should apply? The best production architecture will usually need all three:

agent framework
  -> ACS-style policy checkpoint
  -> MCP/tool/runtime call
  -> OpenTelemetry trace and audit record

That is why ACS is interesting for teams already reading about agent observability. Effloow previously covered OpenTelemetry GenAI agent tracing as the visibility layer. ACS adds the enforcement layer. Effloow also covered OpenAI Agents SDK guardrails, which are useful at SDK boundaries. ACS-style policy becomes more relevant when the same control logic must travel across several frameworks.

Practical Adoption Path

Do not start by governing everything. Start with one dangerous tool class.

A good first target is a tool that can send data outside the system, mutate production state, spend money, or trigger a deploy. Wrap that tool with a policy checkpoint and make the default deny. Then add explicit allow rules for narrow cases.

For an internal coding agent, the first policies might be:

allow: repo.read
allow: test.run
allow: file.write under workspace path
deny: shell.rm
deny: git.push
deny: secrets.read
deny: deploy.production

For a support agent, the first policies might be:

allow: ticket.read
allow: knowledge.search
deny: customer.email.send without human approval
deny: refund.issue above configured amount
deny: customer.pii.export

Once the first checkpoint works, attach audit output to your trace pipeline. That is where ACS and OpenTelemetry become operationally useful: an incident review should show which action was attempted, which policy matched, whether the action was allowed or denied, and which trace contained the decision.

Limitations from This Run

This article is publishable because the sandbox evidence is real, but the limits are important.

Effloow Lab did not run a live model. It did not deploy to Microsoft Foundry. It did not test a production ACS YAML contract against a conformance suite. It did not run real LangChain, CrewAI, OpenAI Agents SDK, Anthropic Agents SDK, AutoGen, Semantic Kernel, Microsoft.Extensions.AI, or MCP integrations. It did not verify every package listed in the Agent Governance Toolkit repository.

The sandbox proves local installability for the public TypeScript and Python packages and proves that the TypeScript generic adapter can block a simulated tool call before execution. That is a meaningful control primitive, not a complete production governance system.

There is also a maturity caveat. The SDK README labels the npm package as public preview and warns that APIs may change before GA. Treat this as a candidate control layer for prototypes, internal evaluation, and architecture planning rather than a drop-in compliance guarantee.

Common Mistakes

The first mistake is treating ACS as a better system prompt. Runtime controls should be enforced by code, policy engines, middleware, adapters, and audit logs. A system prompt can explain policy to the model, but it should not be the only enforcement mechanism.

The second mistake is logging everything. Tool arguments and model inputs can contain secrets, personal data, or regulated business content. The control layer should record policy decisions and enough metadata for audit, but sensitive payload capture needs separate redaction and retention rules.

The third mistake is writing policies after the agent is already broad. Start with narrow action names and deny-by-default behavior before the tool catalog grows. Retrofitting policy onto a large agent surface is harder because every tool name, argument shape, and workflow exception already exists.

The fourth mistake is assuming framework integration means framework independence. A portable policy contract helps, but each framework still has different lifecycle events. Validate the exact callback, middleware, or adapter path your production agent will use.

FAQ

Q: Is Microsoft ACS the same as Agent Governance Toolkit?

Not exactly. ACS is the open control specification or standard direction. The Agent Governance Toolkit is Microsoft's concrete open source toolkit with installable SDK packages. In this sandbox, Effloow Lab tested @microsoft/agent-governance-sdk@4.0.0 and agent-governance-toolkit==4.0.0, not a full ACS conformance suite.

Q: Can ACS replace OpenAI Agents SDK guardrails?

No. Guardrails inside a specific SDK are still useful. ACS-style control is more about a portable runtime policy layer that can sit across frameworks and tool boundaries. In practice, teams may use both: SDK guardrails for local input/output/tool checks and ACS-style policies for cross-framework governance.

Q: Does ACS require Microsoft Foundry?

The public materials describe ACS as open and framework-agnostic, and the SDK packages installed locally without Microsoft Foundry. Foundry may provide managed workflows around governance, tracing, and evaluation, but the local PoC did not require Foundry credentials.

Q: Should production teams adopt the SDK today?

Use it for evaluation and internal prototypes first. The npm README labels the package public preview, and the ACS repository still shows an evolving standard. The architectural pattern is worth adopting now: name actions clearly, gate risky tools before execution, fail closed, and emit audit records.

Key Takeaways

ACS matters because agent teams need runtime controls that are stronger than prompt instructions and more portable than one-off application checks.

Effloow Lab verified that the public Microsoft Agent Governance SDK can be installed locally and can deny a simulated tool call before its handler executes. The audit chain also verified successfully after the allowed and denied actions.

The production decision is more cautious: ACS and the Agent Governance Toolkit are promising, but teams should validate the exact framework adapter, policy syntax, trace output, and compliance requirements in their own stack before treating it as a governance baseline.

Bottom Line

ACS-style runtime control is the right direction for multi-framework agents. The local SDK is already useful for sandboxing policy gates, but the current evidence supports prototype adoption, not blanket production readiness claims.

Need content like this
for your blog?

We run AI-powered technical blogs. Start with a free 3-article pilot.

Learn more →

More in Articles

Stay in the loop.

One dispatch every Friday. New articles, tool releases, and a short note from the editor.

Get weekly AI tool reviews & automation tips

Join our newsletter. No spam, unsubscribe anytime.