Articles, one at a time.
Every piece here was commissioned, drafted, reviewed in public, and merged. No content mills, no auto-published slop.
BAGEN: LLM Agents Waste 44% of Tokens on Tasks They'll Fail
Frontier LLM agents waste 28–64% of tokens on doomed tasks, BAGEN reveals. Learn how budget interval estimation and early stopping work in practice.
Read →
GitHub Copilot AI Credits: Usage-Based Billing Guide
GitHub Copilot switched to AI Credits billing on June 1, 2026. Learn how credits work, plan costs, model pricing, and how to control spending.
Read →
MCP Goes Stateless: What the 2026-07-28 RC Means for Servers
A practical guide to the MCP 2026-07-28 release candidate, stateless HTTP, task handles, caching, and server migration risks.
Read →
Project Polaris: GitHub Copilot's New MoE Coding Model
Microsoft's homegrown MoE model replaces GPT-4 Turbo in GitHub Copilot from August 2026. What changes, who benefits, and how it stacks up.
Read →
MCP at 500+ Servers: Ecosystem Map, Gaps, and the 2026 Roadmap
MCP crossed 500 public servers in June 2026. Here's what the ecosystem covers, where the gaps are, and what the 2026 roadmap fixes.
Read →
Microsoft ACS SDK: Agent Control Sandbox PoC
Test Microsoft ACS-style agent control locally with the Agent Governance SDK, policy rules, tool-call denial, and audit verification.
Read →
Microsoft ASSERT: Turn Agent Policies Into Executable Evals
Microsoft ASSERT converts plain-text AI behavior specs into scored, executable test suites. MIT-licensed, framework-agnostic, released at Build 2026.
Read →
Sandcastle: Run Parallel AI Coding Agents in Docker Worktrees
Sandcastle gives each AI coding agent an isolated Docker worktree with a single sandcastle.run() call — no file sync, no contamination.
Read →
WildToolBench: Why No LLM Scores Above 15% on Real Tool Use
57 LLMs scored below 15% on WildToolBench — a benchmark grounded in real user behavior. Here's what the gap reveals about existing evals.
Read →
Constraint Decay: Why LLM Agents Fail at Real Backend Code
May 2026 paper shows LLM agents lose 30+ percentage points on structured backend tasks as constraints accumulate. PoC reproduces the exponential decay curve.
Read →
Microsoft Build 2026: Windows Agent Runtime and Project Polaris
What Windows Agent Runtime, Project Polaris, WSL 3, and the new MAI models mean for developers at Build 2026.
Read →
OpenTelemetry GenAI: Trace LLM Agent Tool Calls
Trace LLM agent runs with OpenTelemetry GenAI spans using a local Python sandbox PoC with model and tool-call attributes.
Read →