Articles, one at a time.
Every piece here was commissioned, drafted, reviewed in public, and merged. No content mills, no auto-published slop.
BoundaryRouter: Train-Free LLM-vs-Agent Routing
BoundaryRouter decides LLM vs agent per query using experience memory—no training needed. 60% faster, 28% more accurate than fixed strategies.
Read →
LlamaIndex + Google Agents API: Document Agents with ParseBench
A tool-scout guide to ParseBench, the antigravity-demo CLI, and Google Managed Agents sandboxed document parsing with LlamaIndex and LiteParse.
Read →
OpenAI Agents SDK Guardrails: Local Sandbox PoC
Test OpenAI Agents SDK input and output guardrails locally without API credits using a small Python sandbox PoC.
Read →
AWS MCP Server GA: Secure AWS API Access for AI Agents
AWS MCP Server reached GA on May 6 2026. Get the full developer guide: setup, 5 core tools, sandboxed Python execution, IAM security, and pricing.
Read →
Context Rot: Keep LLM Agents Sharp Past 256K Tokens
Context rot degrades LLM agent accuracy by 13-37% beyond 32K tokens. Four production techniques—rolling summary, observation masking, KV eviction, selective retrieval—with Python code.
Read →
Gemini Omni: Google's Any-Input-to-Video Model Developer Guide
Gemini Omni Flash: text, image, audio, video → 10-second synchronized video. Announced Google I/O 2026. Developer API coming weeks. What to build and what to wait for.
Read →
SciResearcher: How to Train an 8B Model for Scientific Deep Research Agents
SciResearcher-8B achieves SOTA 19.46% on HLE-Bio/Chem with SFT+GRPO on Qwen3-8B. How the two-pipeline data generation and sub-agent freezing approach works.
Read →
WebMCP PoC: Expose Browser Tools to AI Agents
A sandbox PoC for WebMCP: what Google announced, what the draft API promises, and how a schema-backed browser tool works.
Read →
Genkit 2.0 GA: Build and Deploy a TypeScript MCP Server to Cloud Run
A local Genkit MCP server PoC, plus the honest Cloud Run deployment checklist for teams moving TypeScript agent tools toward production.
Read →
Qwen3.7-Max: Alibaba's Agent-First 1M-Context LLM Developer Guide
Qwen3.7-Max delivers 69.7% Terminal-Bench 2.0 and 1M token context at $2.50/MTok. Complete developer guide: API setup, benchmarks, and Qwen Code CLI.
Read →
RL Doesn't Teach LLMs New Reasoning — It Fixes 1-3% of Tokens
New research shows RL post-training only modifies 1–3% of token positions, always within the base model's existing top-5 candidates. Here's what it means.
Read →
ARTIST: RL-Powered Tool Use for LLM Agents Explained
How Microsoft's ARTIST framework uses outcome-based RL to train LLMs that interleave tool calls inside reasoning chains — no step supervision required.
Read →