Published April 17, 2024

Scaling MCP Tools with Anthropic's Defer Loading

December 25, 2025

Learn how to use Anthropic's defer_loading tool search features with Unified's MCP server to efficiently manage hundreds of tools while maintaining high accuracy and context efficiency.

When building AI applications with MCP servers, including the Unified MCP server, you quickly encounter a critical challenge: most LLM models struggle with large numbers of tools.

While Unified can provide thousands of tools across different integrations, traditional approaches hit two key limitations:

Context window bloat: Tool definitions consume massive portions of your context (50 tools ≈ 10-20K tokens)
Tool selection degradation: An LLM's ability to correctly select tools degrades significantly beyond 30-50 tools

Anthropic's new defer_loading feature solves both first problems by dynamically discovering and loading tools on-demand instead of loading all tool definitions upfront.

The Problem: Too Many Tools

The Unified MCP server can expose tools from any connected integration— CRM, ATS, HRIS, ticketing, storage, and more. A single connection might offer 50+ tools, and with multiple integrations, you could easily have 200+ available tools. Overall, the Unified MCP server currently support more than 22,000 tools.

Traditional approach problems:

Loading 200 tool definitions uses 40,000-80,000 tokens
The LLM API struggles to select the correct tool from such a large set
You waste context on tools that won't be used in that conversation

The Solution: Anthropic's Defer Loading Tool Search

Anthropic's defer_loading feature works with two tool search variants:

Tool Search Variants

1. Regex Tool Search (tool_search_tool_regex_20251119)

Claude constructs regex patterns to search for tools
Best for exact matches and pattern-based discovery
Fast and efficient for well-named tools

2. BM25 Tool Search (tool_search_tool_bm25_20251119)

Claude uses natural language queries to search
Better for semantic understanding
More flexible for varied naming conventions

How It Works

You include a tool search tool in your tools list
You provide all tool definitions with defer_loading: true
Claude sees only the tool search tool initially
When Claude needs additional tools, it searches dynamically
The API returns 3-5 most relevant tools
These are automatically expanded into full definitions
Claude selects and invokes the appropriate tool

Implementation

To use the new defer_loading tool option, follow these instructions (found here) when calling Anthropic's /v1/messages API:

Add "mcp-client-2025-11-20" to the anthropic-beta header: anthropic-beta: advanced-tool-use-2025-11-20,mcp-client-2025-11-20
Add the Unified MCP server's URL as usual

"mcp_servers": [
  {
    "type": "url",
    "name": "unified-salesforce-server",
    "url": "https://mcp-api.unified.to?token=x&connection=y"
  }
],

Then add an additional tools array with configuration on which tools to defer:

"tools": [
  {
    "type": "tool_search_tool_regex_20251119",
    "name": "tool_search_tool_regex"
  },
  {
    "type": "mcp_toolset",
    "mcp_server_name": "unified-salesforce-server",
    "default_config": {
      "defer_loading": true
    },
    "configs": {
      "list_crm_contacts": {
        "defer_loading": false
      }
    }
  }
],

Use the default_config.defer_loading: true option to make all tools deferrable
Use the configs.{tool_id}.defer_loading: true/false to set an individual tool to defer (or not)

Or, alternatively, you can call the Unified MCP /tools endpoint with parameters ?type=anthropic&defer_tools=all and then feed that tools result into Anthropic's API.

Best Practices

1. Keep Core Tools Non-Deferred

If you have 3-5 tools that are frequently used, keep them as non-deferred.

2. Use Permissions to Scope Tools

Reduce the tool catalog size by requesting only the permissions you need:

// Instead of all tools across all categories
permissions: 'crm_contact_read,crm_contact_write,crm_deal_read'

// This is better than loading 200+ tools from all categories

3. Restrict Tools

Use the Unified MCP tools parameter to restrict which tools will be returned back to the LLM API.

This has a different effect than the defer_tools parameter as it doesn't return restricted tools to the LLM API at all, while deferring tools means that the LLM API knows about the tool, but doesnt process it until it is needed.

4. Monitor Token Usage

Track your token consumption to understand the benefits:

console.log(`Input tokens: ${response.usage.input_tokens}`);
console.log(`Output tokens: ${response.usage.output_tokens}`);
console.log(`Tool search requests: ${response.usage.server_tool_use?.tool_search_requests}`);

5. Combine with Prompt Caching

Use prompt caching with defer_loading for multi-turn conversations:

messages.push({
    role: "user",
    content: "Now find their recent deals",
    cache_control: { type: "ephemeral" }
});

Tool Search Limits

Be aware of these limits:

Maximum tools: 10,000 tools in your catalog
Search results: Returns 3-5 most relevant tools per search
Pattern length: Maximum 200 characters for regex patterns
Model support: Claude Sonnet 4.5+ and Opus 4.5+ only

When to Use Defer Loading

Good use cases:

20+ tools available from Unified connections
Multiple integrations (CRM + ATS + HRIS + Storage)
Building multi-tenant applications where each tenant has different integrations
Context window is getting tight with tool definitions
Tool selection accuracy is degrading

When traditional tool calling might be better:

Less than 10 tools total
All tools are frequently used in every request
Very focused single-integration use case

Real-World Example: Multi-Integration Assistant

Here's a practical example of a customer support assistant that accesses multiple integrations:

async function createSupportAssistant(crm_connection_id, hris_connection_id, zendesk_connection_id) {
    // Fetch tools from multiple Unified connections
    const crmTools = await fetchUnifiedTools(crm_connection_id, 'crm_contact_read,crm_deal_read', { type: 'anthropic', defer_tools: 'list_crm_'});
    const ticketingTools = await fetchUnifiedTools(zendesk_connection_id, 'ticketing_ticket_read,ticketing_ticket_write', { type: 'anthropic', defer_tools: 'list_crm_'});
    const hrisTools = await fetchUnifiedTools(hris_connection_id, 'hris_employee_read', { type: 'anthropic', defer_tools: 'list_crm_'});

    // Combine all tools with defer_loading
    const tools = [
        ...crmTools,
        ...ticketingTools,
        ...hrisTools
    ];

    // Total: 150+ tools, with all listX tools being deffered
    return await anthropic.beta.messages.create({
        model: "claude-sonnet-4-5-20250929",
        betas: ["advanced-tool-use-2025-11-20"],
        max_tokens: 4096,
        messages: [{
            role: "user",
            content: "Customer John Doe from Acme Corp called about ticket #12345. Show me his account info, open tickets, and any recent deals."
        }],
        tools: tools
    });
}

Resources

TLDR

Anthropic's defer_loading feature is a game-changer for building AI applications with Unified's MCP server. By the LLM API dynamically loading tools on-demand, you can:

Scale to hundreds of tools across multiple integrations
Reduce context usage by 80-90%
Improve tool selection accuracy significantly
Build more capable AI assistants that access diverse data sources

Start by adding the tool search tool and marking your Unified MCP tools as deferred. Monitor your token usage and tool selection accuracy to see the immediate benefits.

The combination of Unified's extensive integration network and Anthropic's intelligent tool search opens up possibilities for building truly comprehensive AI agents that can work across your entire SaaS ecosystem.

All articles