MCP Servers in Production: What the Hype Misses

A Protocol, Not a Platform

The Model Context Protocol landed at exactly the moment the industry needed it. Every team building agents had rolled its own tool-use layer. Every integration was a bespoke adapter. MCP turned that into a contract: a server exposes tools, resources, and prompts over a known schema, and any client that speaks the protocol can consume them. The promise is obvious. The production reality is thinner than most announcements make it sound.

MCP solves the shape of the problem. It does not solve the operational problem of shipping, versioning, authenticating, and monitoring the servers that sit behind it. A lot of teams treated the first MCP example they ran as the finish line. In production it is closer to the starting line.

What MCP Actually Gives You

The protocol standardizes three things: tool discovery, structured tool invocation, and resource exposure. Your agent asks a server what it can do, gets back a typed schema, and calls tools with arguments that match. That is genuinely useful. It removes the argument glue that used to live in every agent runtime.

What it does not give you: a deployment story, a permission model tied to your existing IAM, a versioning story for tool schemas that change over time, or observability that integrates with the tools your team already runs. Those are implementation concerns, and they are where most MCP projects stall six weeks in.

Authentication Is the First Real Problem

Early MCP servers were built for local development. The transport was stdio, the client was a desktop app, the auth model was "the user trusts the process." That does not survive contact with a production environment where the server is remote, the client is running for a customer, and the tools the server exposes can write to real systems.

The HTTP transport and OAuth profile close part of this gap, but integrating them with your actual identity provider is where the work lives. You have three realistic options. Sit the MCP server behind your existing API gateway and reuse the auth middleware you already trust. Implement OAuth directly on the server and accept the complexity of a second identity surface to maintain. Or require short-lived signed tokens from your platform and verify them on every call.

The first option is what we reach for most often. It keeps the blast radius small, it reuses audit logs you already have, and it means the MCP server does not become a new target.

Versioning Is Underspecified and It Matters

Tool schemas change. A field gets added, a parameter gets renamed, a required argument becomes optional. MCP has no opinion about how you communicate those changes to clients. The protocol is a point-in-time handshake: you connect, you list tools, you use them. If the server silently changes a tool definition between sessions, agents that cached the old schema fail in ways that are hard to trace.

The fix is to version your tools explicitly. We name tools with a version suffix — create_invoice_v2 lives alongside create_invoice_v1 — and deprecate on a schedule, not in a push. Clients pin to a version. Breaking changes are additive: new tool, not modified tool. It is ugly. It is also the pattern every public API landed on, and there is no reason MCP servers should invent a new one.

The alternative is to treat MCP servers as internal-only, where you control both ends and can coordinate upgrades. That works fine until the day you want to expose your server to a partner or an agent that someone else is building. Version discipline from day one is cheaper than the migration you will otherwise do later.

Rate Limits and the Agent Amplifier

A human operator making API calls is self-limiting. They think, they type, they wait. An agent making API calls is not. A planning loop that misreads its own scratchpad can hit the same tool a hundred times in ten seconds. MCP servers that front real systems need rate limits, and the limits need to live on the server, not rely on the agent to behave.

We layer two limits. A per-session budget that caps total tool calls per agent run. A per-tool rate limit that prevents any single tool from being hammered. The per-session budget catches runaway loops. The per-tool limit catches the legitimate-looking workload that happens to be expensive downstream. Both return structured errors the agent can reason about — "budget exhausted, try again after X" — rather than raw 429s that the agent will retry until it exhausts something else.

Observability Is Where Teams Undersell Themselves

The hardest production question about an agent using MCP tools is not "did it work" but "what did it do, in what order, with what arguments, and why." Answering that requires correlating logs across the agent runtime, the MCP transport, the server, and the downstream system each tool touches. If you wire that up at the start, debugging an agent that misbehaved in production is boring. If you do not, it is a weekend.

The minimum we ship with every production MCP server: a correlation ID that follows a single agent turn from prompt through every tool call, structured logs with tool name, arguments, latency, and result shape, and a trace that exports to whatever APM the rest of your stack uses. Cost attribution is the follow-on — knowing which agent runs cost what is how you have the conversation with finance when the bill arrives.

When MCP Is the Wrong Answer

MCP shines when you have multiple clients that need to consume the same tools. Your agent runtime, your internal copilot, and a partner team's project all need to hit the same ticketing system — one MCP server, three consumers, clean boundary. That is the sweet spot.

It earns less when you have exactly one client. If your agent runs inside one service and calls tools that live in the same deployment, adding an MCP transport layer between them is overhead. The protocol exists to decouple producer from consumer; if they are not decoupled, direct function calls are faster, cheaper, and easier to reason about. We have shipped agents with an MCP server and agents without. The deciding factor is always the number of consumers.

If you are scoping an internal system that needs tool use and are not sure whether MCP is the right shape, a 15-minute audit usually answers it. The protocol is real and the servers are production-capable, but "use MCP" is a means, not an end. The system either fits the multi-consumer pattern or it does not. If it does, the standardization is worth every hour of auth and versioning work. If it does not, simpler still wins.

MCP Servers in Production: What the Hype Misses

A Protocol, Not a Platform

What MCP Actually Gives You

Authentication Is the First Real Problem

Versioning Is Underspecified and It Matters

Rate Limits and the Agent Amplifier

Observability Is Where Teams Undersell Themselves

When MCP Is the Wrong Answer

Keep reading.

Small Language Models for Vertical Agents

AI Evals in Production: The Work Nobody Sees Until It Breaks

LLM Cost Control: Strategies That Actually Move the Bill