Home » MCP Servers » Server Chaining

Can MCP Servers Call Other MCP Servers

Yes, an MCP server can act as an MCP client internally, connecting to other MCP servers and invoking their tools. This is technically straightforward using the MCP client SDK, but it is rarely the best approach. In most cases, the AI client is the natural orchestration layer, connecting to multiple servers directly and letting the model decide which tools to use. Server-to-server MCP calls add complexity without adding capability that the client layer does not already provide.

How Server-to-Server Calls Work

The MCP specification defines separate client and server roles, but nothing prevents a single process from being both. An MCP server that also implements the MCP client protocol can connect to other servers, discover their tools, and invoke them programmatically. The MCP SDK provides both server and client classes, so the implementation is a matter of instantiating a client alongside the server and using it within tool handlers.

from mcp.server.fastmcp import FastMCP
from mcp.client import ClientSession
from mcp.client.stdio import stdio_client

mcp = FastMCP("orchestrator")

@mcp.tool()
async def search_and_remember(query: str) -> str:
    """Search the codebase and store findings in memory.

    Args:
        query: What to search for in the codebase
    """
    # Call another MCP server (search server)
    async with stdio_client("python", "search_server.py") as (read, write):
        async with ClientSession(read, write) as session:
            await session.initialize()
            search_result = await session.call_tool("search_files", {"query": query})

    # Call another MCP server (memory server)
    async with stdio_client("python", "memory_server.py") as (read, write):
        async with ClientSession(read, write) as session:
            await session.initialize()
            await session.call_tool("store", {"content": str(search_result)})

    return f"Found results for '{query}' and stored in memory."

Why It Is Rarely the Best Approach

In the standard MCP architecture, the AI client connects to multiple servers simultaneously. The model sees all tools from all servers and decides which to call based on the conversation. This design makes the AI model the orchestrator, which is usually the right layer for orchestration because the model understands the user's intent and can chain tool calls dynamically.

When a server calls another server internally, it takes orchestration away from the model. The server decides the workflow at development time (search then store), and the model cannot change or adapt that workflow at runtime. If the user wanted to search but not store, or store first and then search, the hardcoded server-to-server workflow cannot accommodate that.

The model is also better at error recovery. If a tool call fails, the model can try a different approach, ask the user for clarification, or skip that step and continue. A server-to-server call fails at the code level, where error recovery is limited to what the developer anticipated and handled.

When Server Chaining Makes Sense

There are legitimate use cases for server-to-server communication, though they are more common in complex enterprise architectures than in typical developer setups:

Proxy or gateway servers: A single MCP server acts as a unified entry point that routes requests to backend services. This is useful when you want to expose a curated set of tools from multiple internal services through one connection, applying consistent authentication, logging, and rate limiting at the gateway level.

Aggregation tools: A tool that needs data from multiple sources to produce a single result. For example, a "project status" tool that queries the database, checks the CI pipeline, and summarizes open issues. The tool calls backend services (which might be MCP servers) to gather data, then combines the results into one response for the model.

Encapsulated workflows: Some operations are genuinely multi-step and should always happen as a unit. Memory consolidation (scan for duplicates, merge related memories, update confidence scores) is a single logical operation even though it involves multiple internal steps. Exposing it as one tool that internally coordinates multiple operations makes more sense than exposing each step separately.

Alternatives to Server Chaining

Before building server-to-server MCP calls, consider simpler alternatives:

Let the model orchestrate. Connect multiple servers to the client and let the model call them in sequence. The model naturally chains tool calls, it can call search, read the results, then call store with the findings, all driven by the conversation flow. This requires no custom orchestration code.

Use direct API calls. If your MCP server needs to call a backend service, call its REST API directly rather than going through the MCP protocol. The MCP layer adds discovery and description that is useful for AI models but unnecessary for server-to-server communication where both sides are deterministic code.

Shared libraries. If two servers need the same functionality, extract it into a shared library that both import. This avoids the overhead of protocol communication when the functionality is just a function call.

The Bottom Line

Server-to-server MCP calls are technically possible but rarely necessary. The AI model, sitting at the client level, is the natural orchestrator for multi-tool workflows because it understands context and can adapt in real time. Reserve server chaining for gateway architectures, aggregation tools, and encapsulated operations where the multi-step workflow is always the same regardless of context.

Let the model orchestrate your memory. Connect Adaptive Recall alongside your other MCP servers and the AI handles the workflow.

Get Started Free