ARTICLES ·2026-04-30 ·BY EFFLOOW CONTENT FACTORY

smolagents + MCP Bridge: Connect Any Tool to Your Agent

Connect any MCP server to a smolagents ToolCallingAgent in Python. Covers MCPClient, stdio and streamable-http transports, and LLM backend options.

smolagents mcp hugging-face python ai-agents tool-use mcp-bridge open-source

smolagents + MCP Bridge: Connect Any Tool to Your Agent

The Model Context Protocol crossed 97 million installs in March 2026. There are now over 2,000 community-built MCP servers covering databases, browsers, calendars, code execution, file systems, and dozens of SaaS APIs. If you're building a Python agent and want to tap into that ecosystem without writing custom adapters, smolagents + MCPClient is one of the most direct paths to get there.

This guide shows you how — from installing the right extras to connecting both local stdio servers and remote streamable-HTTP servers, choosing an LLM backend, and understanding the pitfalls that trip up most people on first try. Effloow Lab ran this as a sandbox PoC (see data/lab-runs/huggingface-smolagents-mcp-bridge-guide-2026.md) using smolagents 1.24.0 and mcp 1.27.0.

Why MCP and smolagents Are a Natural Fit

smolagents is HuggingFace's lightweight agent framework — roughly 1,000 lines of core code, two agent classes, and a clean Tool interface that makes it easy to add capabilities. Its ToolCallingAgent outputs structured JSON tool calls, which maps cleanly onto MCP's JSON-RPC tool invocation model.

MCP defines a standard way for any process (a "server") to expose a list of tools, with typed inputs and outputs, discoverable at runtime. Without a protocol like this, every agent framework invents its own tool definition format. With MCP, a tool you write once for Claude Desktop, Cursor, or any other MCP client is immediately available to your smolagents pipeline too.

The integration layer — MCPClient — handles the subprocess lifecycle, the JSON-RPC handshake, and converts each discovered MCP tool into a smolagents-native Tool object. You never write a tool wrapper by hand.

For context on the broader MCP ecosystem, see our earlier breakdown at MCP: From Anthropic Spec to 100M Installs.

Installation: The Extra You Cannot Skip

The base install gets you both packages but silently breaks MCP support at runtime:

# This does NOT enable MCPClient — it's missing mcpadapt
pip install smolagents mcp

When you try to use MCPClient, you'll see:

ModuleNotFoundError: Please install 'mcp' extra to use MCPClient:
  `pip install 'smolagents[mcp]'`

The correct install is:

pip install "smolagents[mcp]"

This pulls in mcpadapt >= 0.1.13, the adaptation layer between the MCP Python SDK and smolagents' Tool interface. In the Effloow Lab sandbox, pip install "smolagents[mcp]" installed mcpadapt 0.1.20, websockets 16.0, and jsonref 1.1.0 in addition to the base packages.

Version check:

import smolagents, mcp
print(smolagents.__version__)  # 1.24.0
print(mcp.__version__)         # 1.27.0

Building a Local MCP Server with FastMCP

FastMCP (bundled inside the mcp Python SDK as mcp.server.fastmcp) is the fastest way to write a server. It reads Python type hints and docstrings to generate the tool schema automatically.

Here's a minimal calculator server:

# calc_server.py
from mcp.server.fastmcp import FastMCP

mcp = FastMCP("Calculator")

@mcp.tool()
def add(a: float, b: float) -> float:
    """Add two numbers together."""
    return a + b

@mcp.tool()
def multiply(a: float, b: float) -> float:
    """Multiply two numbers."""
    return a * b

@mcp.tool()
def compute_area(length: float, width: float) -> str:
    """Compute the area of a rectangle and return formatted result."""
    area = length * width
    return f"Rectangle area: {area:.2f} square units (length={length}, width={width})"

if __name__ == "__main__":
    mcp.run(transport="stdio")

For a deeper walkthrough of MCP server construction, including prompts and resources, see Build a Custom MCP Server with Claude Code.

MCPClient: Connecting to a Stdio Server

MCPClient launches the server subprocess and manages the stdio pipes. Use it as a context manager — the connection tears down cleanly when the block exits.

import asyncio
from mcp import StdioServerParameters
from smolagents import MCPClient

server_params = StdioServerParameters(
    command="python3",           # or full path to venv python
    args=["calc_server.py"],
    env=None,                    # inherits parent environment
)

with MCPClient(server_params, structured_output=False) as tools:
    print(f"Discovered {len(tools)} tools:")
    for t in tools:
        print(f"  {t.name}: {t.description}")
        print(f"    inputs: {t.inputs}")

Output from the Effloow Lab sandbox:

Discovered 3 tools:
  add: Add two numbers together.
    inputs: {'a': {'title': 'A', 'type': 'number'}, 'b': {'title': 'B', 'type': 'number'}}
  multiply: Multiply two numbers.
    inputs: {'a': {'title': 'A', 'type': 'number'}, 'b': {'title': 'B', 'type': 'number'}}
  compute_area: Compute the area of a rectangle and return formatted result.
    inputs: {'length': {'title': 'Length', 'type': 'number'}, 'width': {'title': 'Width', 'type': 'number'}}

Each item in the tools list is a callable smolagents Tool object. You can call them directly — no agent required:

with MCPClient(server_params, structured_output=False) as tools:
    tools_map = {t.name: t for t in tools}

    result = tools_map["compute_area"](length=10.0, width=5.0)
    print(result)
    # → Rectangle area: 50.00 square units (length=10.0, width=5.0)

    product = tools_map["multiply"](a=6, b=7)
    print(product)  # → 42.0

Both calls succeeded in the sandbox, confirming the JSON-RPC CallToolRequest/CallToolResult round-trip works end-to-end.

The `structured_output` Parameter

In smolagents 1.24.0 you'll see a FutureWarning if you don't set structured_output explicitly:

FutureWarning: Parameter 'structured_output' was not specified.
Currently it defaults to False, but in version 1.25, the default will change to True.

structured_output=False — legacy mode. Tool outputs are returned as plain text strings. Safe for any MCP server.
structured_output=True — new mode. Handles MCP spec 2025-06-18+ outputSchema fields and returns structured data (JSON objects, typed results). Use this when your MCP server explicitly declares output schemas.

Set it explicitly to suppress the warning:

with MCPClient(server_params, structured_output=False) as tools:  # or True
    ...

Connecting to a Remote Server (Streamable HTTP)

For servers running as a network service — cloud-hosted tools, shared team infrastructure, or production deployments — use the streamable-HTTP transport. Pass a dict with url and transport keys instead of StdioServerParameters:

with MCPClient(
    {"url": "http://localhost:8000/mcp", "transport": "streamable-http"},
    structured_output=True,
) as tools:
    for t in tools:
        print(t.name)

The streamable-http transport was introduced in MCP spec 2025-03-26 and became the preferred remote transport as of that release. The older SSE ("transport": "sse") is deprecated and scheduled for removal by mid-2026 — avoid it for new projects.

Connecting to Multiple Servers at Once

MCPClient accepts a list of server parameters, merging all discovered tools from each server into a single flat tool list:

server_list = [
    StdioServerParameters(command="python3", args=["calc_server.py"]),
    {"url": "https://api.example.com/mcp", "transport": "streamable-http"},
]

with MCPClient(server_list, structured_output=True) as tools:
    print(f"Total tools across all servers: {len(tools)}")

Wiring Tools into ToolCallingAgent

Once you have tools from MCPClient, passing them to a ToolCallingAgent is straightforward:

from smolagents import ToolCallingAgent, InferenceClientModel

model = InferenceClientModel(
    model_id="Qwen/Qwen2.5-72B-Instruct",
    token="hf_...",  # HuggingFace token required
)

with MCPClient(server_params, structured_output=False) as mcp_tools:
    agent = ToolCallingAgent(tools=list(mcp_tools), model=model)
    result = agent.run("What is the area of a 15 by 8 rectangle?")
    print(result)

ToolCallingAgent dispatches to the MCP server every time the model selects a tool. The round-trip — model generates JSON tool call → smolagents parses it → MCPClient sends CallToolRequest → MCP server returns result → agent observes result — happens inside the agent loop automatically.

In the Effloow Lab PoC, Step 1 of the agent loop correctly dispatched compute_area(length=10.0, width=5.0) to the MCP subprocess and received "Rectangle area: 50.00 square units" back as an observation.

Choosing an LLM Backend

ToolCallingAgent is model-agnostic — any class that implements smolagents' Model interface works. Here are the three most common setups:

Backend	Class	Requires	Best For
HuggingFace Hub	`InferenceClientModel`	HF_TOKEN	Latest open-weight models (Qwen, DeepSeek, Llama)
Ollama (local)	`LiteLLMModel`	Ollama running locally	Private data, no API cost, offline
OpenAI / Anthropic / Gemini	`LiteLLMModel`	Provider API key	Production reliability, GPT-4o, Claude Sonnet
Recommended start	`LiteLLMModel` + Ollama	Ollama + llama3.2	Free, local, no data leaves your machine

HuggingFace Inference API

from smolagents import InferenceClientModel

model = InferenceClientModel(
    model_id="Qwen/Qwen2.5-72B-Instruct",
    provider="hf-inference",
    token="hf_YOUR_TOKEN_HERE",
)

Free-tier access was removed; any model call without a valid token returns a 401 error (confirmed in sandbox).

Ollama for Local Inference

Install Ollama (brew install ollama on macOS), pull a model, and point LiteLLMModel at the local server:

pip install "smolagents[litellm]"
ollama pull llama3.2
ollama serve  # keep this running

from smolagents import LiteLLMModel

model = LiteLLMModel(
    model_id="ollama_chat/llama3.2",
    api_base="http://localhost:11434",
    num_ctx=8192,  # default 2048 often causes failures; 8192+ recommended
)

num_ctx=8192 is important — the default 2048 context window truncates multi-step agent histories and causes silent failures.

Any OpenAI-Compatible API

from smolagents import LiteLLMModel

model = LiteLLMModel(
    model_id="claude-sonnet-4-6",
    api_base="https://api.anthropic.com/v1",
    api_key="sk-ant-...",
)

CodeAgent vs ToolCallingAgent: Which One for MCP?

smolagents ships with two agent classes. For MCP tool integration, ToolCallingAgent is the natural choice, but understanding the difference helps you make the right call:

ToolCallingAgent generates JSON tool call specifications (the OpenAI function-calling format). The tool invocation is separate from any code execution — the agent decides what to call, smolagents dispatches it, and the result comes back as an observation string. This model maps directly onto MCP's JSON-RPC protocol.

CodeAgent writes Python code snippets and executes them. It's smolagents' default and uses 30% fewer steps on average. But it needs a sandboxed Python execution environment, and its output format (Python code) doesn't map as cleanly onto JSON-RPC tool calls.

If your MCP server exposes tools that return data for the agent to reason about (file contents, API results, database rows), use ToolCallingAgent. If you're building a code-writing assistant that happens to use some tools as helpers, CodeAgent may be more efficient. For the MCP bridge pattern, ToolCallingAgent is the cleaner choice.

For a full comparison of both agent types with examples, see our earlier smolagents guide.

Manual Connection Management

If you can't use the context manager (e.g., in a long-running service), manage the connection with try/finally:

from mcp import StdioServerParameters
from smolagents import MCPClient

server_params = StdioServerParameters(
    command="python3",
    args=["calc_server.py"],
)

mcp_client = MCPClient(server_params, structured_output=False)
try:
    tools = mcp_client.get_tools()
    # build and run your agent here
    agent = ToolCallingAgent(tools=tools, model=model)
    result = agent.run("your task")
finally:
    mcp_client.disconnect()

Skipping disconnect() leaves the subprocess running in the background. The context manager handles this automatically.

Common Mistakes

Installing smolagents and mcp separately and expecting MCPClient to work. It won't. The mcpadapt package is only added by pip install "smolagents[mcp]". This is the most common stumbling block.

Not setting structured_output explicitly in 1.24.x. The FutureWarning is not just cosmetic — the default is changing in 1.25. If your MCP server uses outputSchema, start with True now. If not, set False to lock in the current behavior.

Using SSE transport for new servers. The MCP spec deprecated SSE in March 2025. Use "transport": "streamable-http" for any new remote server. SSE will be unsupported after mid-2026.

Setting Ollama's num_ctx too low. The default context window (2048 tokens) isn't enough for multi-step agent loops. Agent history, system prompt, tool schemas, and observations together easily exceed that. Set num_ctx=8192 or higher.

Calling tools outside the MCPClient context manager. MCPClient starts the subprocess and opens the stdio pipes in __init__. If you call mcp_client.get_tools() and then use those tools after the context exits, the underlying connection is gone and calls fail.

Expecting final_answer to work from plain text in ToolCallingAgent. Unlike CodeAgent (which parses "Final answer:" prefix in generated code), ToolCallingAgent expects the model to emit a JSON tool call to the built-in final_answer tool. If your model outputs a text response ending the conversation, the agent may continue running until max_steps is reached.

Multi-Agent Patterns with MCP

MCP fits naturally into multi-agent architectures. You can expose an entire sub-agent as an MCP server and connect it to an orchestrator agent via MCPClient. This is similar in spirit to the A2A protocol pattern but simpler to set up for Python-only stacks.

A common pattern: one orchestrator agent with MCPClient connections to three MCP servers — one for file operations, one for web search, one for database queries — and all tool selection handled by a single LLM call.

from smolagents import ToolCallingAgent, LiteLLMModel, MCPClient
from mcp import StdioServerParameters

servers = [
    StdioServerParameters(command="uvx", args=["mcp-server-filesystem", "/data"]),
    StdioServerParameters(command="uvx", args=["mcp-server-brave-search"]),
]

model = LiteLLMModel(model_id="ollama_chat/llama3.2", api_base="http://localhost:11434", num_ctx=8192)

with MCPClient(servers, structured_output=True) as all_tools:
    agent = ToolCallingAgent(tools=list(all_tools), model=model)
    result = agent.run("Find the quarterly report in /data and summarize the revenue figures.")
    print(result)

The MCPClient merges tool lists from all servers automatically. The orchestrator agent sees a unified flat list of tools and picks from them based on the task.

Frequently Asked Questions

Q: Does MCPClient work with publicly hosted MCP servers (like those listed in mcp.so)?

If the server supports streamable-HTTP transport, yes. Pass {"url": "https://server-url/mcp", "transport": "streamable-http"}. Many community servers still use SSE transport — those work too with "transport": "sse", but SSE support will be removed from the spec by mid-2026. Check the server's documentation for which transport it exposes.

Q: Can I use MCPClient inside an async function?

MCPClient is synchronous internally (it wraps async MCP calls with asyncio.run()). If you're already inside an async event loop, use asyncio.run_coroutine_threadsafe() or run the MCPClient in a thread executor. The smolagents documentation recommends the synchronous context manager for most use cases.

Q: What happens if the MCP server crashes mid-task?

If the subprocess dies, the next tool call raises a ConnectionError or BrokenPipeError. The agent loop surfaces this as a tool error observation. Wrap agent.run() in a try/except and recreate the MCPClient connection on failure. There is no built-in reconnection logic in 1.24.x.

Q: Is smolagents' MCPClient the same as mcpadapt?

mcpadapt is the underlying adapter library, developed separately by the community. MCPClient in smolagents is a thin wrapper around mcpadapt.core.MCPAdapt. As of smolagents 1.4.1, the integration is bundled so you don't need to import mcpadapt directly — MCPClient is the public API.

Q: Can ToolCallingAgent use both MCP tools and native smolagents tools at the same time?

Yes. Pass a combined list:

from smolagents.tools import WebSearchTool

native_tools = [WebSearchTool()]

with MCPClient(server_params, structured_output=False) as mcp_tools:
    all_tools = list(mcp_tools) + native_tools
    agent = ToolCallingAgent(tools=all_tools, model=model)

The agent treats MCP-sourced tools and native tools identically. Both appear in the tool schema passed to the model.

Key Takeaways

Install with pip install "smolagents[mcp]" — the [mcp] extra is required to pull in mcpadapt.
MCPClient handles server lifecycle, JSON-RPC protocol, and tool wrapping. Use it as a context manager.
Set structured_output explicitly in 1.24.x to avoid the FutureWarning; default changes to True in 1.25.
For local servers: StdioServerParameters. For remote servers: {"url": "...", "transport": "streamable-http"}.
ToolCallingAgent maps cleanly onto MCP's JSON-RPC tool call model. It's the right agent class for MCP integration.
Without an HF token, use LiteLLMModel with Ollama or any OpenAI-compatible provider.
Pass a list to MCPClient to connect multiple servers and merge their tools into a single tool set.

Bottom Line

smolagents + MCPClient is the most direct path from Python code to the 2,000+ tool servers in the MCP ecosystem. The install gotcha (you need the [mcp] extra) and the structured_output FutureWarning are the main friction points — once past those, connecting a local FastMCP server or a remote streamable-HTTP endpoint takes under 20 lines of code.

Need content like this
for your blog?

We run AI-powered technical blogs. Start with a free 3-article pilot.

Learn more →