News

Anthropic's Code Execution With MCP Cuts Agent Token Usage by 98.7%

Anthropic published a new engineering approach that turns MCP tool calls into code APIs, dropping token consumption from 150,000 to 2,000. Here's how it works, why it matters, and what it means for anyone building AI agents at scale.

Jan Schmitz | | 7 min read
Anthropic's Code Execution With MCP Cuts Agent Token Usage by 98.7%

TL;DR: Anthropic’s engineering team published a pattern called “code execution with MCP” that restructures how AI agents interact with tools. Instead of loading thousands of tool definitions into the context window and passing intermediate data through the model, agents write code that calls MCP servers as APIs inside a sandboxed environment. One implementation dropped token usage from 150,000 to 2,000, a 98.7% reduction. The approach also keeps sensitive data out of the model entirely.


Anthropic’s Code Execution With MCP Cuts Agent Token Usage by 98.7%

There’s a dirty secret in the AI agent world that nobody talks about at demo day: Most agents are horrifically wasteful. Connect one to a dozen enterprise tools, ask it to do something useful with your data, and watch it burn through tokens like a sports car burns through premium fuel. Every tool definition, every intermediate result, every chunk of data that passes between services gets shoved through the model’s context window, racking up latency and cost.

Anthropic’s engineering team just published a fix. And the numbers are hard to ignore.

Their approach, detailed on the Anthropic engineering blog, takes the Model Context Protocol (the open standard they launched in November 2024 for connecting AI agents to external systems) and rethinks how agents use it. The core idea: Stop treating MCP tools as function calls. Start treating them as code APIs.

The problem nobody wanted to admit

Here’s the scenario that breaks most agent architectures today. Your AI agent connects to 15 MCP servers: Google Drive, Salesforce, Slack, a couple of databases, some internal tools. Standard direct-tool-calling approaches load every single tool definition into the model’s context at startup. Fifteen servers with 30 tools each? That’s 450 tool definitions sitting in your context window before the agent has processed a single user request.

The math gets worse from there. Say someone asks the agent to pull a two-hour meeting transcript from Google Drive, extract action items, and update the relevant Salesforce records. Under the traditional approach, that transcript (easily 50,000 tokens) flows through the model when fetched, then flows through again when passed to the Salesforce update. A single straightforward workflow can consume 150,000 tokens. Multiply that by hundreds of daily requests across an organisation, and the costs spiral.

“For agents connected to thousands of tools, this can mean processing hundreds of thousands of tokens before addressing a user request,” Anthropic’s engineering post explains. And that’s not even accounting for edge cases where documents exceed context limits entirely, breaking workflows mid-execution.

This isn’t a theoretical problem. It’s the reason enterprise AI agent deployments have been slower than the hype suggested. According to analysis from Fast.io, unoptimised production agents can run $10 to $100+ per session. At that price point, the business case for automation evaporates fast.

How code execution flips the architecture

Anthropic’s solution restructures the entire pipeline. Instead of the model calling tools directly, the MCP client exposes each server as a set of code modules in a filesystem:

servers/
├── google-drive/
│   ├── getDocument.ts
│   └── index.ts
└── salesforce/
    ├── updateRecord.ts
    └── index.ts

The agent doesn’t see 450 tool definitions. It sees a filesystem it can browse. When a task requires Google Drive, the agent reads google-drive/index.ts to understand the available functions. When it needs Salesforce, it reads that module. Everything else stays unloaded.

Then the agent writes code that executes inside a sandboxed environment. This is where the real savings come in. That meeting transcript gets fetched and processed within the sandbox. The model never sees the raw 50,000-token document. It only receives the extracted summary the code returns.

The result from one of Anthropic’s implementations: 150,000 tokens compressed to 2,000.

Three mechanisms driving the savings

The token reduction comes from three distinct mechanisms working together, and understanding each one matters for anyone trying to replicate this pattern.

On-demand tool discovery

Traditional agent setups front-load every tool definition. Code execution introduces what Anthropic calls “progressive disclosure.” The agent browses the filesystem, reads only the definitions it needs, and moves on. A search_tools function with configurable detail levels helps agents find the right tools without loading everything.

Think of it like the difference between carrying an entire toolbox up a ladder versus wearing a tool belt and grabbing what you need from the ground. The weight difference compounds with every additional tool.

Local data processing

This is where the biggest token savings happen. Instead of piping raw data through the model for filtering, the code execution environment handles it locally. Anthropic gives a concrete example: Fetching 10,000 spreadsheet rows, filtering them down to 5 pending orders, and returning only those 5 records to the model. The 9,995 irrelevant rows never touch the context window.

The same principle applies to joining data across sources, extracting specific fields from complex structures, and aggregating results. All of it happens in the sandbox.

Native control flow

Under direct tool-calling, every decision point requires a round trip to the model. Loop through a list of records? The model reasons about each iteration. Handle an error? Another round trip. Code execution replaces this with standard programming constructs (loops, conditionals, try/catch blocks) that execute instantly in the sandbox.

The latency improvement goes beyond tokens. An agent that needs 15 sequential model calls to iterate through a list can collapse that into a single code execution, saving wall-clock time on every run.

The privacy angle nobody expected

Token efficiency grabbed the headlines, but there’s a second benefit buried in Anthropic’s post that deserves more attention: Data privacy.

Under the direct-calling model, every piece of data flows through the LLM’s context. That meeting transcript, those customer records, those financial figures: The model processes all of it. For organisations operating under GDPR, HIPAA, or SOC 2 requirements, that creates compliance headaches.

Code execution with MCP keeps intermediate data inside the sandboxed environment by default. The model only sees what gets explicitly returned. Anthropic goes further, describing systems that automatically tokenise personally identifiable information so that data can flow from Google Sheets to Salesforce without names, email addresses, or phone numbers ever entering the model context.

This is a meaningful architectural shift. Organisations can build agents that handle sensitive workflows without exposing that data to the model, a requirement that’s been blocking adoption in regulated industries.

The trade-offs are real

Anthropic doesn’t pretend this approach is free. Code execution demands infrastructure that direct tool-calling doesn’t: Secure sandboxing to prevent malicious or runaway code, resource limits on CPU and memory, monitoring to track what agents are actually executing.

There’s also an implicit requirement: The model needs to be good at writing code. This pattern depends heavily on the LLM’s ability to generate correct, functional TypeScript (or whatever language the execution environment supports) from natural language intent. Weak code generation means broken workflows.

Debugging also gets harder. When an agent calls a tool directly, the failure mode is straightforward: The tool worked or it didn’t. When an agent writes code that orchestrates multiple tool calls with conditional logic and error handling, diagnosing failures requires inspecting generated code, sandbox logs, and execution traces. The observability stack gets more complex.

Industry validation

Anthropic isn’t alone in this direction. Cloudflare independently published similar findings under the term “Code Mode,” reaching the same conclusion: Language models are exceptionally good at code generation, and developers should exploit that strength when building multi-tool agents.

The broader MCP ecosystem makes this pattern increasingly practical. Since the protocol was donated to the Agentic AI Foundation under the Linux Foundation in December 2025 (co-founded by Anthropic, Block, and OpenAI), adoption has accelerated. Monthly SDK downloads now exceed 97 million across Python and TypeScript. Over 10,000 active public MCP servers are running in production. Every major cloud provider (AWS, Google Cloud, Azure, Cloudflare) offers deployment infrastructure.

The question is no longer whether MCP will be the standard for agent-tool communication. That’s settled. The question is how efficiently agents will use that standard, and code execution appears to be the answer the industry is converging on.

What this means for developers building agents

If you’re building agents that connect to more than a handful of tools, the code execution pattern deserves serious evaluation. The token economics alone justify the architectural investment: At current API pricing, cutting 98% of token usage translates directly to 98% lower inference costs for those operations.

The practical benefits go beyond cost:

  • Adding new MCP servers doesn’t bloat the context window. Your agent connecting to 5 tools performs the same as your agent connecting to 500.
  • Operations that previously exceeded context limits (processing full databases, analysing long documents, joining multiple large datasets) become feasible.
  • Sensitive data stays in the sandbox. Auditors can inspect what the model actually saw versus what it processed indirectly.
  • Anthropic describes agents that save successful code implementations as higher-level capabilities with documentation. Your agent gets better at its job over time.

The infrastructure requirements are non-trivial. You need sandboxing, resource management, and execution monitoring. But these are solved problems in the container and serverless world. The engineering lift is real, but bounded.

Looking forward

The combination of MCP standardisation and code execution patterns points toward a specific future: AI agents that operate less like chatbots with tool access and more like software engineers with API documentation. They read specs, write code, execute it in controlled environments, and return results.

That’s a different, and far more efficient, model than what most agent frameworks offer today. With MCP now backed by every major player in the industry, the infrastructure to support this pattern is only getting stronger.

The 150,000-to-2,000 number will get the attention. The architectural shift behind it is what matters.

Share this post

Want structured YouTube intelligence?

Content gap analysis, title scoring, thumbnail intelligence, and hook classification. Delivered via API and MCP server.

Get your free API key →