
A common criticism of MCP servers is that they consume too much context window, leaving little room for actual work. But is this bloat real, or just a reporting error? I captured raw API calls and replicated Claude Code’s counting logic to find out.
TL;DR
- Double Counting: Claude Code sums token usage per tool, inadvertently counting shared system instructions multiple times.
- The Real Numbers: XcodeBuildMCP uses ~15k tokens, not the reported ~45k.
- The Impact: Users see a 3x inflation in reported usage, leading to false alarms about MCP efficiency.
Background: where the token bloat story started
Claude Code is a terminal-based coding agent powered by Anthropic’s LLMs. One of its best features is the /context command, which provides a live breakdown of token usage: system prompts, tools, memory, and more. It’s the only tool I know of that offers this level of transparency. However, when I installed XcodeBuildMCP, the numbers didn’t add up:
/context
Context Usage
⛁ ... ⛁ claude-sonnet-4-5-20250929 · 112k/200k tokens (56%)
⛁ ... ⛁
⛁ ... ⛁ ⛁ System prompt: 2.5k tokens (1.2%)
⛀ ... ⛶ ⛁ System tools: 13.8k tokens (6.9%)
⛶ ... ⛝ ⛁ MCP tools: 45.0k tokens (22.5%)
⛶ ... ⛶ ⛁ Memory files: 5.9k tokens (2.9%)
⛶ ... ⛶ ⛁ Messages: 8 tokens (0.0%)
⛶ ... ⛝ ⛶ Free space: 88k (43.9%)
⛝ ... ⛝ ⛝ Autocompact buffer: 45.0k tokens (22.5%)
MCP tools · /mcp
└ mcp__XcodeBuildMCP__build_device (XcodeBuildMCP): 813 tokens
└ mcp__XcodeBuildMCP__clean (XcodeBuildMCP): 955 tokens
└ … (62 more tools ranging from ~600-1,100 tokens each)
Why the numbers looked so high
I built a small usage-reporting tool that measured the MCP payload independently and it consistently showed ~14k tokens for XcodeBuildMCP’s tools. Claude Code, however, insisted on 45k. The /context output is generated by calling Anthropic’s count_tokens API for each tool and summing the results.
What I measured
Single request versus per-tool sums
| Component | Tokens |
|---|---|
| XcodeBuildMCP tool JSON (tokeniser) | 14,081 |
count_tokens call (all tools in one request) | 15,282 |
/context “MCP tools” total (sum of individual requests) | 45,018 |
| Discrepancy (Implied Overhead) | 30,937 |
When measured directly, the tools occupy ~14k tokens. Adding Anthropic’s system instructions and wrapper brings it to ~15k. Yet, /context reports ~45k.
The culprit? Duplication. /context calculates usage by summing the cost of each tool individually. This means the platform’s system instructions and wrapper are counted once per tool instead of once per request. With 60 tools, you pay that overhead 60 times.
Per-tool counting repeats the wrapper and system instructions
Claude Code queries Anthropic’s count_tokens endpoint once per tool. Each call wraps the tool schema in a dummy conversation, then sums the totals:
"messages": [{ "role": "user", "content": "foo" }],
"tools": [{ ...single tool schema... }]
The wrapper adds ~460 tokens even when the tool itself is only ~130 tokens, and Anthropic also injects its own system instructions for tool use. When the CLI iterates over 60 tools, it incurs that preamble and wrapper 60 times, then sums those inflated totals for /context. The real request includes the tool list once, so those instructions and the wrapper should be counted once.
Minimal MCP server experiment
| Variant | Local tokenizer | count_tokens (one request) | /context (per-tool sum) |
|---|---|---|---|
Single echo_tool (“Echoes input.”) | 131 | 587 | 587 |
Two echo_tools batched together | 261 | 672 | 1,167 (≈587 + 580) |
When two tools are counted together, count_tokens returns 672 tokens—far less than the 1,167 you get when each tool is counted separately and the results are summed. The overhead is shared once per request, but /context ignores that sharing and repeats it per tool.
Reproduction steps (API-only)
Anyone with an Anthropic API key can reproduce the numbers without Claude Code:
- Baseline (8 tokens).
curl https://api.anthropic.com/v1/messages/count_tokens?beta=true \
-H "anthropic-version: 2023-06-01" \
-H "Authorization: Bearer $CLAUDE_API_KEY" \
-d '{"model":"claude-sonnet-4-5-20250929","messages":[{"role":"user","content":"foo"}]}'
-
Single tool (~587 tokens). Add the
echo_toolschema from earlier to thetoolsarray and rerun the request. -
Two tools batched (~672 tokens). Add the second tool to the same array and rerun; the total should increase by only ~85 tokens.
-
Per-tool summation (~1,167 tokens). Call
count_tokensseparately for each tool (or run Claude Code’s/context), then add the results together. The total roughly doubles because the shared instructions are counted twice. -
Full XcodeBuildMCP (15,282 vs. 45,018 tokens). Send one
count_tokensrequest with all sixty tools or rely on/context; the same 30k-token gap appears, matching the table above.
Why the prompt grows in the first place
Anthropic injects a hidden system prompt into every request that uses tools. This preamble teaches the model how to use them, but it’s invisible to the user.
According to Anthropic’s documentation:
“When you use tools, we also automatically include a special system prompt for the model which enables tool use… Note that the table assumes at least 1 tool is provided. If no tools are provided, then a tool choice of none uses 0 additional system prompt tokens.”
For Claude Sonnet 4.5, this hidden prompt costs 313-346 tokens.
When /context counts each tool separately, it pays this “tax” for every single tool.
- One tool: ~130 raw tokens + ~450 system overhead = ~580 tokens.
- 60 tools (batched): ~15k tokens (overhead paid once).
- 60 tools (summed): ~45k tokens (overhead paid 60 times).
That ~30k difference is just the system prompt being counted over and over again.
What remains true about MCP costs
The correction does not absolve MCP servers from consuming context:
- The XcodeBuildMCP tool surface is still ~16k tokens. That is a significant portion of most context windows.
- Claude Code’s built-in tools add another ~10k tokens.
- Loading everything at session start means even a “hello” prompt can cost ~25k tokens before the model can reply.
The point is narrower: /context significantly overstates MCP usage because it sums per-tool measurements that each include duplicate platform system instructions and wrapper text.
Closing
This analysis confirms that Claude Code’s /context command significantly overstates MCP token usage by counting shared overhead for every single tool. While this doesn’t end the debate on MCP vs. CLI efficiency—MCPs do still consume context—it ensures we’re arguing over real numbers, not inflated ones.
It’s important to note that this finding relies on the granular data that only Claude Code provides. The fact that we can even spot this discrepancy is a testament to the value of that transparency.