Skip to content
Go back

Do MCP Servers Really Eat Half Your Context Window?

Do MCP Servers Really Eat Half Your Context Window?

A common criticism of MCP servers is that they consume too much context window, leaving little room for actual work. But is this bloat real, or just a reporting error? I captured raw API calls and replicated Claude Code’s counting logic to find out.

TL;DR

Background: where the token bloat story started

Claude Code is a terminal-based coding agent powered by Anthropic’s LLMs. One of its best features is the /context command, which provides a live breakdown of token usage: system prompts, tools, memory, and more. It’s the only tool I know of that offers this level of transparency. However, when I installed XcodeBuildMCP, the numbers didn’t add up:

/context

  Context Usage
 ⛁ ... ⛁   claude-sonnet-4-5-20250929 · 112k/200k tokens (56%)
 ⛁ ... ⛁
 ⛁ ... ⛁   ⛁ System prompt: 2.5k tokens (1.2%)
 ⛀ ... ⛶   ⛁ System tools: 13.8k tokens (6.9%)
 ⛶ ... ⛝   ⛁ MCP tools: 45.0k tokens (22.5%)
 ⛶ ... ⛶   ⛁ Memory files: 5.9k tokens (2.9%)
 ⛶ ... ⛶   ⛁ Messages: 8 tokens (0.0%)
 ⛶ ... ⛝   ⛶ Free space: 88k (43.9%)
 ⛝ ... ⛝   ⛝ Autocompact buffer: 45.0k tokens (22.5%)

 MCP tools · /mcp
 └ mcp__XcodeBuildMCP__build_device (XcodeBuildMCP): 813 tokens
 └ mcp__XcodeBuildMCP__clean (XcodeBuildMCP): 955 tokens
 └ … (62 more tools ranging from ~600-1,100 tokens each)

Why the numbers looked so high

I built a small usage-reporting tool that measured the MCP payload independently and it consistently showed ~14k tokens for XcodeBuildMCP’s tools. Claude Code, however, insisted on 45k. The /context output is generated by calling Anthropic’s count_tokens API for each tool and summing the results.

What I measured

Single request versus per-tool sums

ComponentTokens
XcodeBuildMCP tool JSON (tokeniser)14,081
count_tokens call (all tools in one request)15,282
/context “MCP tools” total (sum of individual requests)45,018
Discrepancy (Implied Overhead)30,937

When measured directly, the tools occupy ~14k tokens. Adding Anthropic’s system instructions and wrapper brings it to ~15k. Yet, /context reports ~45k.

The culprit? Duplication. /context calculates usage by summing the cost of each tool individually. This means the platform’s system instructions and wrapper are counted once per tool instead of once per request. With 60 tools, you pay that overhead 60 times.

Per-tool counting repeats the wrapper and system instructions

Claude Code queries Anthropic’s count_tokens endpoint once per tool. Each call wraps the tool schema in a dummy conversation, then sums the totals:

"messages": [{ "role": "user", "content": "foo" }],
"tools": [{ ...single tool schema... }]

The wrapper adds ~460 tokens even when the tool itself is only ~130 tokens, and Anthropic also injects its own system instructions for tool use. When the CLI iterates over 60 tools, it incurs that preamble and wrapper 60 times, then sums those inflated totals for /context. The real request includes the tool list once, so those instructions and the wrapper should be counted once.

Minimal MCP server experiment

VariantLocal tokenizercount_tokens (one request)/context (per-tool sum)
Single echo_tool (“Echoes input.”)131587587
Two echo_tools batched together2616721,167 (≈587 + 580)

When two tools are counted together, count_tokens returns 672 tokens—far less than the 1,167 you get when each tool is counted separately and the results are summed. The overhead is shared once per request, but /context ignores that sharing and repeats it per tool.

Reproduction steps (API-only)

Anyone with an Anthropic API key can reproduce the numbers without Claude Code:

  1. Baseline (8 tokens).
curl https://api.anthropic.com/v1/messages/count_tokens?beta=true \
  -H "anthropic-version: 2023-06-01" \
  -H "Authorization: Bearer $CLAUDE_API_KEY" \
  -d '{"model":"claude-sonnet-4-5-20250929","messages":[{"role":"user","content":"foo"}]}'
  1. Single tool (~587 tokens). Add the echo_tool schema from earlier to the tools array and rerun the request.

  2. Two tools batched (~672 tokens). Add the second tool to the same array and rerun; the total should increase by only ~85 tokens.

  3. Per-tool summation (~1,167 tokens). Call count_tokens separately for each tool (or run Claude Code’s /context), then add the results together. The total roughly doubles because the shared instructions are counted twice.

  4. Full XcodeBuildMCP (15,282 vs. 45,018 tokens). Send one count_tokens request with all sixty tools or rely on /context; the same 30k-token gap appears, matching the table above.

Why the prompt grows in the first place

Anthropic injects a hidden system prompt into every request that uses tools. This preamble teaches the model how to use them, but it’s invisible to the user.

According to Anthropic’s documentation:

“When you use tools, we also automatically include a special system prompt for the model which enables tool use… Note that the table assumes at least 1 tool is provided. If no tools are provided, then a tool choice of none uses 0 additional system prompt tokens.”

For Claude Sonnet 4.5, this hidden prompt costs 313-346 tokens.

When /context counts each tool separately, it pays this “tax” for every single tool.

That ~30k difference is just the system prompt being counted over and over again.

What remains true about MCP costs

The correction does not absolve MCP servers from consuming context:

The point is narrower: /context significantly overstates MCP usage because it sums per-tool measurements that each include duplicate platform system instructions and wrapper text.

Closing

This analysis confirms that Claude Code’s /context command significantly overstates MCP token usage by counting shared overhead for every single tool. While this doesn’t end the debate on MCP vs. CLI efficiency—MCPs do still consume context—it ensures we’re arguing over real numbers, not inflated ones.

It’s important to note that this finding relies on the granular data that only Claude Code provides. The fact that we can even spot this discrepancy is a testament to the value of that transparency.


Share this post on:

Next Post
Introducing MCPLI: Turn MCP servers into first class CLIs