ARTICLES ·2026-04-22 ·BY EFFLOOW CONTENT FACTORY

smolagents: Build Code Agents with HF in Under 100 Lines

Learn how to build powerful AI agents with Hugging Face's smolagents. CodeAgent, multi-agent systems, MCP tools, and sandboxed execution explained.

smolagents hugging-face ai-agents python code-agents multi-agent mcp open-source

smolagents: Build Code Agents with HF in Under 100 Lines

Most agent frameworks make you write hundreds of lines before your first tool call lands. smolagents flips that. The entire framework core fits in roughly 1,000 lines of Python, a ReAct-style agent needs about 40 lines to set up, and the library ships with first-class support for MCP tools, multi-agent orchestration, and sandboxed code execution out of the box.

This guide covers everything you need to go from pip install to a production-capable multi-agent system — including where smolagents wins, where it doesn't, and the common mistakes that catch developers off guard.

Why smolagents Exists and Why It Matters

Hugging Face released smolagents in late 2024 to solve a specific problem: existing frameworks had grown too complex for their own good. LangChain's agent abstractions span multiple layers of callbacks, runnables, and chains. LangGraph adds a graph-based state machine. CrewAI introduces YAML-configured crew manifests. Each layer adds power but also adds surface area where things break.

smolagents takes the opposite bet. The agents.py file has fewer than 1,000 lines. If something breaks, you can read the source in an afternoon. The GitHub repository has crossed 26,000 stars, making it one of the fastest-growing agent libraries in the Hugging Face ecosystem.

The framework also ships with a DeepLearning.AI course taught by Thomas Wolf (Hugging Face co-founder and CSO) and Aymeric Roucher (Project Lead for Agents). That kind of institutional backing matters when you're picking a framework to build on.

The central idea driving smolagents is deceptively simple: instead of making an agent call tools by emitting a JSON blob that a separate executor then parses and runs, let the agent write executable Python code directly. That decision has compounding benefits that ripple through everything else the library does.

CodeAgent vs ToolCallingAgent: The Core Choice

smolagents gives you two agent paradigms, and choosing between them is the first architectural decision you'll make.

CodeAgent is the default and the flagship approach. At each reasoning step, the LLM generates a Python code snippet that the agent executes directly. Need to search the web, parse the result, filter a list, and then call another tool? That's one code block, with loops and conditionals you write naturally:

from smolagents import CodeAgent, WebSearchTool, InferenceClientModel

model = InferenceClientModel()
agent = CodeAgent(tools=[WebSearchTool()], model=model)

agent.run(
    "Find the top 3 Python web frameworks by GitHub stars in 2026, "
    "and calculate the average star count."
)

The LLM might generate something like:

results = web_search("top Python web frameworks GitHub stars 2026")
frameworks = [r for r in results[:5] if "framework" in r["title"].lower()]
star_counts = [parse_stars(f["snippet"]) for f in frameworks[:3]]
avg_stars = sum(star_counts) / len(star_counts)
print(f"Average: {avg_stars:,.0f} stars")

That single code block handles filtering, computation, and output formatting — things that would require multiple sequential tool calls in a JSON-based agent. Hugging Face's benchmarks show CodeAgent consistently uses 30% fewer LLM steps than ToolCallingAgent on the same tasks, which translates directly to lower API costs and faster wall-clock time.

ToolCallingAgent is the alternative for situations where code execution is undesirable — regulated environments, cases where you need to audit every action before it runs, or models that are better calibrated for structured JSON outputs than code generation. It works exactly like classic ReAct: the LLM emits a tool name and arguments as JSON, a dispatcher calls the corresponding function, and the result gets appended to the conversation.

The rule of thumb: start with CodeAgent. Switch to ToolCallingAgent only when your environment explicitly prohibits code execution or when your chosen model produces significantly worse code than JSON.

Installation and First Agent

pip install smolagents

# Optional: Gradio UI for browser-based agent interaction
pip install 'smolagents[gradio]'

# Optional: E2B sandbox for production code execution
pip install 'smolagents[e2b]'

A minimal working agent looks like this:

from smolagents import CodeAgent, InferenceClientModel, DuckDuckGoSearchTool

# InferenceClientModel uses the Hugging Face Inference API
# Swap for AnthropicModel, OpenAIServerModel, LiteLLMModel, etc.
model = InferenceClientModel(model_id="meta-llama/Llama-3.3-70B-Instruct")

agent = CodeAgent(
    tools=[DuckDuckGoSearchTool()],
    model=model,
    max_steps=10,
)

result = agent.run("What was the most-starred new GitHub repository in April 2026?")
print(result)

The max_steps parameter is important — it prevents runaway agents from burning your API budget on an infinite loop. Set it conservatively (10–20) for research agents, more liberally (30–50) for complex multi-step workflows where you understand the expected path.

Model Support: Use Any LLM

One of smolagents' strongest selling points is genuine model agnosticism. The library wraps any LLM through a consistent model interface:

# Hugging Face Inference API (free tier available)
from smolagents import InferenceClientModel
model = InferenceClientModel(model_id="Qwen/Qwen2.5-72B-Instruct")

# OpenAI
from smolagents import OpenAIServerModel
model = OpenAIServerModel(model_id="gpt-6")

# Anthropic via LiteLLM
from smolagents import LiteLLMModel
model = LiteLLMModel(model_id="anthropic/claude-sonnet-4-6")

# Local Ollama
from smolagents import LiteLLMModel
model = LiteLLMModel(model_id="ollama/llama3.2")

# Fully local Transformers (no API required)
from smolagents import TransformersModel
model = TransformersModel(model_id="microsoft/Phi-3.5-mini-instruct", device="cuda")

The LiteLLM integration means any of the 100+ providers LiteLLM supports — Groq, Together, Mistral, Cohere, and others — work without additional code. This is a meaningful advantage over frameworks that assume OpenAI as the primary backend.

Read our guide on LiteLLM's multi-provider routing for cost management strategies that pair well with smolagents when running at scale.

Building Tools

Tools are the atomic units of agent capability. smolagents gives you three ways to define them:

Decorator approach (simplest):

from smolagents import tool

@tool
def get_repo_stars(repo_name: str) -> int:
    """
    Fetches the current GitHub star count for a repository.
    
    Args:
        repo_name: Full repository name in 'owner/repo' format.
    
    Returns:
        The number of GitHub stars as an integer.
    """
    import requests
    response = requests.get(f"https://api.github.com/repos/{repo_name}")
    return response.json().get("stargazers_count", 0)

The docstring is mandatory — it's what the LLM reads to understand when and how to call the tool. Write it for the model, not for human developers.

Class approach (more control):

from smolagents import Tool

class FileWriterTool(Tool):
    name = "file_writer"
    description = "Writes content to a file at the given path."
    inputs = {
        "path": {"type": "string", "description": "File path to write to"},
        "content": {"type": "string", "description": "Content to write"},
    }
    output_type = "string"
    
    def forward(self, path: str, content: str) -> str:
        with open(path, "w") as f:
            f.write(content)
        return f"Wrote {len(content)} characters to {path}"

MCP server integration:

from smolagents import CodeAgent, InferenceClientModel, ToolCollection
from mcp import StdioServerParameters

server_params = StdioServerParameters(
    command="npx",
    args=["@modelcontextprotocol/server-filesystem", "/workspace"],
)

with ToolCollection.from_mcp(server_params) as tools:
    agent = CodeAgent(tools=[*tools.tools], model=InferenceClientModel())
    agent.run("List all Python files and count their total lines of code")

The MCP integration is particularly useful if you've already built a tooling ecosystem around the Model Context Protocol. Anything you've built for Claude Code or another MCP-compatible client becomes immediately available in smolagents. See our MCP ecosystem overview for context on the broader MCP landscape.

Multi-Agent Systems

smolagents has first-class support for multi-agent architectures where a manager agent coordinates specialized sub-agents. The pattern is straightforward: any agent can be wrapped as a ManagedAgent and passed as a tool to a parent agent.

from smolagents import (
    CodeAgent, InferenceClientModel, ManagedAgent,
    DuckDuckGoSearchTool, VisitWebpageTool
)

model = InferenceClientModel()

# Specialized web research agent
web_agent = CodeAgent(
    tools=[DuckDuckGoSearchTool(), VisitWebpageTool()],
    model=model,
    name="web_researcher",
    description="Searches the web and visits pages to gather detailed information.",
)

# Wrap it so the orchestrator can call it as a tool
managed_web = ManagedAgent(
    agent=web_agent,
    name="web_researcher",
    description="Call this agent to research any topic on the web.",
)

# Orchestrator treats the sub-agent as one of its available tools
orchestrator = CodeAgent(
    tools=[managed_web],
    model=model,
    max_steps=15,
)

orchestrator.run(
    "Research the top 3 AI agent frameworks released in 2026 and write "
    "a summary comparing their GitHub stars, license, and primary use case."
)

The orchestrator calls managed_web like any other tool, passing a task description. The sub-agent handles its own multi-step reasoning loop and returns a result. This composability is one of smolagents' biggest structural advantages: you build increasingly capable systems by nesting simpler ones, without a framework-imposed orchestration DSL.

Sandboxed Execution for Production

This is where many smolagents tutorials gloss over something critical: the default LocalPythonExecutor is not a true security sandbox. It runs code in a restricted interpreter that blocks certain dangerous imports, but it does not provide full process isolation. For any production deployment where agent inputs come from untrusted sources, you need a real sandbox.

smolagents supports four sandboxing backends:

# E2B: Managed cloud sandbox, fast startup (~2s), good for prototyping
agent = CodeAgent(tools=[...], model=model, executor_type="e2b")

# Modal: Python-first, gVisor isolation, massive autoscaling
agent = CodeAgent(tools=[...], model=model, executor_type="modal")

# Docker: Self-hosted, full control, higher setup overhead
agent = CodeAgent(tools=[...], model=model, executor_type="docker")

# Blaxel: Edge-native execution
agent = CodeAgent(tools=[...], model=model, executor_type="blaxel")

E2B is the fastest path to sandboxed execution — sessions spin up in about two seconds and the free tier covers development. For production scale, Modal's autoscaling and gVisor isolation make it the stronger choice. One important caveat: sandboxed execution currently does not support multi-agent systems. If you need both multi-agent and sandboxing, design the architecture so untrusted inputs are processed in isolated sub-agents that don't pass executable code between boundaries.

Memory and State Management

smolagents agents maintain memory as a list of AgentStep objects that get passed back to the LLM as chat history at each step. You can inspect and modify this memory directly:

# Access raw memory after a run
for step in agent.logs:
    print(f"Step {step.step_number}: {step.action}")

# Resume from a previous checkpoint
agent.memory.steps = previous_run_steps

# Dynamically trim memory via step callbacks
def trim_memory_callback(memory, step_number):
    """Keep only the last 10 steps to manage context length."""
    if len(memory.steps) > 10:
        memory.steps = memory.steps[-10:]

agent = CodeAgent(
    tools=[...],
    model=model,
    step_callbacks=[trim_memory_callback],
)

For agents that need to persist state across sessions, serialize agent.logs to a database between runs and reload it with context injection before the next session. smolagents intentionally omits built-in persistence — that keeps the library small and lets you choose your own storage layer.

For LLM observability and tracing across agent runs, Langfuse integrates with smolagents through its callback system, giving you step-by-step traces of what each agent did and at what cost.

smolagents vs LangGraph: When to Use Each

Criterion	smolagents	LangGraph
Lines for a basic ReAct agent	~40 lines	~120 lines
LLM steps per task	30% fewer (CodeAgent)	Baseline
Learning curve	Low — read source in an afternoon	High — graph state machine abstraction
Multi-agent support	Built-in via ManagedAgent	Built-in via graph nodes
Production-readiness	Good with sandboxing	Strong — explicit state management
Debugging	Inspect agent.logs directly	Time-travel debugging (graph rollback)
MCP tool support	Native via ToolCollection.from_mcp()	Via LangChain adapter
Local LLM support	First-class (HF ecosystem)	Requires adapters
Best for	Rapid prototyping, research, HF models	Complex stateful enterprise workflows

The practical decision point: if you need explicit state machines with rollback debugging and your team already knows LangGraph, stay with LangGraph. If you're prototyping quickly, want to use Hugging Face models, or value being able to read and modify the framework itself, smolagents wins on almost every dimension. Our LangGraph tutorial covers the graph-based approach in depth if you want to compare directly.

Common Mistakes to Avoid

Writing tool docstrings for humans, not LLMs. The model decides when and how to call your tools based entirely on the docstring. "Fetches data" tells the LLM nothing useful. "Returns the real-time price of a cryptocurrency symbol (e.g., 'BTC', 'ETH') as a float in USD" gives it what it needs to make good decisions.

Not setting max_steps. A CodeAgent without a step limit can burn through API budget trying to self-correct on a malformed task. Always set max_steps explicitly. For open-ended research tasks, 15–25 is a reasonable range. For narrow, well-defined tasks, 5–10 is safer.

Skipping sandboxing in production. The restricted interpreter blocks certain dangerous operations, but it is not a full process isolation layer. Use E2B, Modal, or Docker in any deployment that processes externally provided inputs or untrusted data.

Treating multi-agent as free. Each sub-agent invocation runs its own reasoning loop with its own LLM calls. A three-agent system can easily use 3–5x as many tokens as a single-agent approach on the same task. Design sub-agents to be narrow and efficient, and measure the actual cost before assuming the architecture is viable.

Ignoring context window pressure. Agent memory grows with each step. On a long-running task with many steps, the accumulated agent.logs can push the context well into the hundreds of thousands of tokens. Use step callbacks to trim or summarize older steps, or pair with a prompt caching strategy for repeated system-level context.

FAQ

Q: Can smolagents work with local models via Ollama?

Yes. Point a LiteLLMModel at your Ollama server:

from smolagents import LiteLLMModel
model = LiteLLMModel(model_id="ollama/qwen2.5-coder:7b", api_base="http://localhost:11434")

Any model that follows the OpenAI Chat Completions interface works. Smaller models (7B–14B) tend to struggle with reliable code generation for CodeAgent — if you see frequent syntax errors or tool misuse, switch to ToolCallingAgent or upgrade to a model with better code generation capabilities.

Q: How does smolagents compare to OpenAI's Agents SDK?

OpenAI Agents SDK is tightly coupled to OpenAI models and optimized for the production infrastructure OpenAI provides (sandboxed execution, memory sessions, MCP). smolagents is model-agnostic and self-hostable. If your team is already on the OpenAI stack and wants managed infrastructure, the OpenAI Agents SDK is a better fit. If you want full control over models and infrastructure, smolagents is the stronger choice.

Q: Is smolagents suitable for production use?

With proper sandboxing, yes. The key requirements: use executor_type="e2b" or executor_type="docker" for any deployment handling externally-provided inputs, set max_steps conservatively, validate tool inputs at the boundary, and instrument with an observability tool like Langfuse to monitor step counts and costs.

Q: What's the difference between smolagents and the Microsoft Agent Framework?

Microsoft Agent Framework targets enterprise .NET and Python deployments with multi-provider routing, A2A interoperability, and LTS commitments. It's designed for large organizations with governance requirements. smolagents is a lightweight, research-friendly library — faster to prototype with, but lighter on enterprise features like audit logging and role-based access controls.

Q: Can I use smolagents with structured output validation?

Yes, indirectly. Wrap a structured output call as a tool that calls your LLM with a schema constraint and validates the response using Pydantic. The framework doesn't natively enforce structured outputs on the agent's internal reasoning, but you can enforce structure on any data the agent produces as a final result.

Key Takeaways

smolagents earns its 26,000+ GitHub stars by doing one thing exceptionally well: removing everything that isn't essential. CodeAgent's code-first approach consistently outperforms JSON-based agents on step efficiency, with a 30% reduction in LLM calls across benchmark tasks. The entire framework is small enough to understand completely, which means debugging stays tractable even as systems grow complex. Native MCP support and genuine multi-LLM flexibility give it a broad integration surface.

The tradeoffs are real: no built-in persistence, sandboxing that excludes multi-agent systems, and a lighter set of production operations features compared to enterprise-oriented frameworks. For teams prototyping agents, exploring the Hugging Face model ecosystem, or building research systems where readability and hackability matter, smolagents is an excellent foundation.

The best evaluation is a working prototype. Install it, wire up a CodeAgent with a search tool, and run it. If your first working agent takes 20 minutes, you'll know exactly whether the simplicity-first approach fits your workflow.

Bottom Line

smolagents is the fastest path from idea to working AI agent in the open-source Python ecosystem. CodeAgent's 30% step reduction over JSON-based agents is a genuine efficiency win, and the 1,000-line codebase means you'll never be stuck debugging a black box. Use it for research, prototyping, and production systems where you control the inputs — and pair it with E2B or Docker sandboxing before exposing it to externally-provided data.

Prefer a deep-dive walkthrough? Watch the full video on YouTube.

Need content like this
for your blog?

We run AI-powered technical blogs. Start with a free 3-article pilot.

Learn more →