Microsoft Build 2026: Windows Agent Runtime and Project Polaris
Microsoft Build 2026 (June 2–3, San Francisco) arrived with a clear message: Windows is no longer just a surface for running AI applications. With the Windows Agent Runtime announcement, Microsoft repositioned the OS as a first-class agent execution layer, complete with sandboxing primitives, a distribution marketplace, and local model infrastructure that rivals cloud VM deployment.
Three announcements stand out for developers: the Windows Agent Runtime (WAR), the Windows Agent Store, and Project Polaris — Microsoft's first homegrown coding model for GitHub Copilot. WSL 3 and the new MAI model family complete a developer stack that Microsoft is explicitly framing as an alternative to cloud-first agent deployment.
This guide covers what each announcement means for your workflow and what you can start building today. Effloow Lab inspected primary sources across official Microsoft blog posts, developer coverage, and technical write-ups published at Build.
Why Build 2026 Is Different
Microsoft has shipped developer tools at Build for decades, but the framing at Build 2026 is new: the goal is to make Windows the canonical execution environment for autonomous agents — not as a thin client that calls cloud APIs, but as a platform with OS-level lifecycle management, sandboxing, and a distribution channel.
The mobile app permission model analogy runs through every WAR announcement. Capability grants work like iOS/Android permissions: agents declare what they need, users approve at install time, and the OS enforces the boundary at runtime. For developers, the payoff is that you get hardware-backed sandboxing without writing your own containerization logic.
That said, Build 2026 is also notable for what it didn't announce: no Windows 12 preview, no major Azure pricing changes, and no new Claude/Gemini partnership announcements. The focus was squarely on Windows-as-agent-platform and the Microsoft AI (MAI) model family.
Windows Agent Runtime: OS-Level Agent Sandboxing
The Windows Agent Runtime preview ships to Windows Insiders on June 9, 2026 via KB5039239 (Windows 11 version 24H2).
Hardware Requirements
WAR requires a minimum of 40 TOPS of NPU capacity — which rules out pre-Copilot+ machines. The runtime ships with two bundled inference models:
- Phi-4-mini-silicon (2B parameters) — text-only tasks, available at launch
- Phi-4-vision-silicon (7B parameters) — image understanding, roadmapped for 2027
The -silicon suffix distinguishes these from the standard Phi-4 weights available on HuggingFace: these are NPU-optimized variants compiled for Intel, AMD, and Qualcomm architectures. The bundled models mean agents can run inference locally without an API key — an important constraint for enterprise deployments with data residency requirements.
The Capability Grant System
The security model is the most developer-relevant aspect of WAR. Every agent declares its required permissions at install time across three dimensions:
- File system scope — which directories the agent can read and write
- Network access — specific endpoints or domains the agent can reach
- Application launch permissions — what the agent can invoke on the host
Users approve these grants during installation, analogous to mobile app permission dialogs. The OS enforces the boundaries at runtime; agents cannot silently expand scope after installation.
For higher-risk workloads — code execution agents, agents handling credentials, agents running subprocesses — Microsoft introduced the Microsoft Execution Containers (MXC) SDK, a cross-platform policy-driven execution layer that provisions micro-VMs backed by the Windows hypervisor. MXC is heavier than the standard WAR sandbox, but provides genuine VM-level isolation against sandbox escapes. The distinction matters when choosing the right primitive for your agent type.
The Windows Agent Store
Alongside WAR, Microsoft announced the Windows Agent Store — a curated marketplace for agent distribution directly within Windows with an 85% revenue share for developers. Agents submitted to the store go through a Microsoft security review covering capability disclosure, data handling policy declaration, and sandboxing compliance verification.
For developers, this is the first OS-level distribution channel for agents that bundles both discovery and monetization infrastructure. The model mirrors what app stores did for mobile: standardize the trust model, lower the distribution friction, and let developers focus on agent behavior rather than deployment mechanics.
What the Preview Does Not Include
At launch, WAR only supports text-based agents operating on JSON, XML, and PDF content. Vision-capable agents — those that observe screen state and interact with UI elements — are not scheduled until 2027. Developers building screen-reader-style automation or UI testing agents will need to continue with Win32 accessibility APIs for now.
Sideloading behavior for WAR agents (analogous to Windows developer mode for UWP) was not confirmed in Build 2026 materials. The Agent Store appears to be the primary distribution path at launch.
Project Polaris: GitHub Copilot Gets a Homegrown Model
The second major announcement is as much strategic as technical. Project Polaris is Microsoft's own mixture-of-experts coding model, and it replaces GPT-4 Turbo as the default engine inside GitHub Copilot starting August 2026.
Architecture and Performance
Project Polaris uses specialized MoE sub-modules per programming language and paradigm, applying chain-of-thought and tree-of-thought reasoning at inference time. Microsoft's internal benchmarks report it outperforming GPT-4 Turbo on HumanEval and MBPP, with particularly strong results in Rust, Haskell, and Go — lower-resource languages where GPT-4 Turbo's training distribution is thinner.
These are self-reported figures and have not been independently verified at the time of writing. The HumanEval and MBPP comparisons are against GPT-4 Turbo specifically — not against GPT-5.5 or Claude Opus 4.8, which are the current coding benchmark leaders.
Rollout and Transition
The Polaris switch is automatic for all Copilot Pro subscribers in August 2026. Microsoft is offering an optional three-month fallback period to GPT-4 Turbo for teams that need to validate behavior before fully cutting over. If you're on GitHub Copilot Enterprise, model preference controls will appear in the admin console before the August rollout.
What This Means for Teams
The practical question is whether Polaris's different training distribution affects completions your team relies on. Languages with strong open-source training data — Python, JavaScript, TypeScript — are unlikely to regress. The performance gain claims are most pronounced in low-resource languages, which is worth testing if Rust or Haskell are in your stack.
The broader signal is that Microsoft now controls the full agentic development stack: from the model (Polaris, MAI-Code-1-Flash) to IDE integration (VS Code), to the agent runtime (WAR), to the inference hardware (Copilot+ NPU requirements). This isn't inherently a risk, but it's a vendor consolidation worth factoring into long-term platform decisions.
MAI-Thinking-1 and MAI-Code-1-Flash
Build 2026 included a second, less-publicized model announcement: two models under the MAI (Microsoft AI) brand that are distinct from Project Polaris.
MAI-Thinking-1
MAI-Thinking-1 is Microsoft's first large-scale reasoning model trained entirely on commercially licensed data — explicitly without distillation from OpenAI models. Architecture details:
- 35 billion active parameters, sparse MoE architecture
- 256,000-token context window
- Built using Microsoft's own training infrastructure
Microsoft-reported benchmarks: AIME 2025 at 97.0%, AIME 2026 at 94.5%, and SWE-Bench Pro performance described as competitive with Claude Opus 4.6. Independent raters reportedly preferred MAI-Thinking-1 over Claude Sonnet 4.6 in blind evaluations — a claim worth treating as preliminary until third-party verification appears.
MAI-Thinking-1 is currently in private preview through Microsoft Foundry. It's also accessible via Fireworks AI, Baseten, and OpenRouter for developers who want to avoid Azure lock-in. All three providers expose OpenAI-compatible endpoints, so you can test MAI-Thinking-1 with the standard openai Python SDK by pointing base_url at any of them.
MAI-Code-1-Flash
MAI-Code-1-Flash is the more immediately accessible model: a 5-billion-parameter coding model already integrated into GitHub Copilot and VS Code. Key claims from Microsoft:
- +16 percentage points over Claude Haiku 4.5 on SWE-Bench Pro
- 60% fewer tokens on complex coding tasks
- Trained on production Copilot telemetry and commercially licensed code
The token efficiency figure is the one with immediate cost implications for teams running high-volume code generation in CI pipelines or agentic coding loops. If the 60% figure holds at your input distribution, MAI-Code-1-Flash changes the economics of inline code agents significantly.
WSL 3: Near-Native GPU and NPU for Linux ML Workloads
WSL 3 was announced alongside WAR, and for developers who run Linux-based ML tooling on Windows, it's arguably the most immediately useful Build 2026 announcement.
The headline improvement: paravirtualized GPU and NPU access. WSL 2 used full hardware virtualization for GPU access (via DirectML), creating a meaningful performance gap compared to bare-metal Linux. WSL 3 uses a lightweight VM architecture that lets the Linux kernel communicate with Windows GPU and NPU hardware at near-native speed.
Cited benchmarks: 3–5% delta versus bare-metal Linux for PyTorch and CUDA workloads. WSL 2 had no NPU access at all — if you wanted to run inference on a Snapdragon Hexagon NPU or Intel AI Boost from your Linux toolchain, it wasn't possible until now.
Supported Hardware at Launch
| Platform | GPU Passthrough | NPU Passthrough | WSL 3 Status |
|---|---|---|---|
| Qualcomm Snapdragon X Elite | Yes | Yes (Hexagon) | Available now (Insiders) |
| Intel Meteor Lake | Yes | Yes (AI Boost) | Available now (Insiders) |
| AMD | Planned | Planned | No confirmed timeline |
WSL 3 is available now through the Windows Insiders program. For developers who need to run Ollama, llama.cpp, vLLM, or PyTorch inside a Linux environment on a Copilot+ PC, this eliminates the primary reason to dual-boot.
Practical Application: Timing and Targets
The Build 2026 announcements land across different timelines and hardware requirements. Here's a developer-oriented summary:
- WSL 3 — Available now for Snapdragon X Elite and Intel Meteor Lake. If you're on one of these machines and running Linux ML tooling, this is worth testing immediately.
- Windows Agent Runtime — June 9, 2026 (Windows 11 24H2, KB5039239). Start designing your agent's capability grant manifest now even before the preview lands — the permission schema is documented and shouldn't change between Insider and stable release.
- MAI-Code-1-Flash — Live now in VS Code and GitHub Copilot. No configuration required; it's already the underlying model for Copilot inline suggestions for some subscribers.
- Project Polaris — August 2026 rollout for Copilot Pro. Three-month GPT-4 Turbo fallback available.
- MAI-Thinking-1 — Private preview via Microsoft Foundry; available via Fireworks AI, Baseten, and OpenRouter today for teams accepted into early access.
Common Mistakes to Avoid
Treating WAR's standard sandbox as appropriate for all agent types. The per-agent capability grant system is lightweight — right for text-processing agents on a desktop. Agents that execute arbitrary code, spawn subprocesses, or handle credentials belong on the MXC SDK's micro-VM path. Defaulting to the lighter option because it's simpler to integrate creates a security gap that Microsoft's runtime can't close for you.
Taking Polaris benchmark numbers at face value. Microsoft's HumanEval and MBPP figures are self-reported. Until independent benchmarks appear (likely Q3 2026 as Polaris rolls out), treat the performance claims as directionally useful but not a basis for architecture decisions. Test against your specific codebase and language mix.
Skipping capability manifest design. Windows Agent Store review includes capability disclosure as a gate. Agents that request overly broad file system scope or open-ended network access will face review friction. Design your manifest narrowly from the start — it's easier to expand permissions post-approval than to pass initial review with a permissive manifest.
Conflating WSL 3 with WSL 2 for NVIDIA workloads. The 3–5% performance claim applies to paravirtualized access on Qualcomm and Intel platforms. NVIDIA GPU passthrough in WSL has used a different path (DirectML + CUDA on WSL) since WSL 2. WSL 3 improves this path too, but the NPU paravirtualization story is specific to Copilot+ PC silicon.
FAQ
Q: Does Windows Agent Runtime work on Windows 10?
No. WAR ships in Windows 11 version 24H2 via KB5039239. There is no announced backward compatibility with Windows 10.
Q: Can I sideload WAR agents without going through the Windows Agent Store?
Sideloading behavior (analogous to Windows developer mode for UWP) was not confirmed in Build 2026 materials. The Agent Store appears to be the primary distribution path at launch.
Q: When will WSL 3 support AMD NPUs?
AMD support was acknowledged as planned but no timeline was confirmed at Build 2026. Qualcomm Snapdragon X Elite and Intel Meteor Lake are the launch platforms.
Q: Is MAI-Thinking-1 available via an OpenAI-compatible API?
Yes. Fireworks AI, Baseten, and OpenRouter all expose OpenAI-compatible endpoints. You can use the standard openai Python SDK with a custom base_url pointing to any of these providers to access MAI-Thinking-1 without an Azure subscription.
Q: What happens to my Copilot Pro subscription when Polaris rolls out in August?
The switch is automatic. Microsoft offers an optional three-month fallback to GPT-4 Turbo for teams that need to validate behavior first. GitHub Copilot Enterprise admins will see model preference controls before the August cutover.
Q: Does the 85% Agent Store revenue share apply to enterprise deployments?
Microsoft's Build 2026 materials described the 85% figure for the Windows Agent Store consumer/developer channel. Enterprise licensing and revenue arrangements were not detailed at Build.
Key Takeaways
- Windows Agent Runtime (June 9, 2026) brings mobile-style permission grants and OS-level sandboxing to local AI agents on Windows 11 Copilot+ PCs. Hardware floor: 40 TOPS NPU.
- Project Polaris replaces GPT-4 Turbo in GitHub Copilot in August 2026. A homegrown MoE model trained specifically for code — with a three-month fallback window to GPT-4 Turbo.
- WSL 3 delivers near-native GPU and NPU passthrough for Linux ML workloads on Snapdragon X Elite and Intel Meteor Lake; 3–5% delta vs bare-metal Linux.
- MAI-Code-1-Flash is live now in VS Code and Copilot — claims +16pp on SWE-Bench Pro vs Claude Haiku 4.5 with 60% fewer tokens.
- MAI-Thinking-1 (35B active, MoE, 256K context) is in private preview, available today via Fireworks AI, Baseten, and OpenRouter.
The thread connecting all five announcements: Microsoft is building an agent-first OS and a vertically integrated AI development stack. Cloud deployment is no longer the only serious option for production agent workloads.
Build 2026 is Microsoft's most coherent developer platform shift in years. If you're on a Copilot+ PC, WSL 3 and the Windows Agent Runtime give you local agent infrastructure worth evaluating now — before your cloud bills become the forcing function.
Need content like this
for your blog?
We run AI-powered technical blogs. Start with a free 3-article pilot.