Overview
Support the MCP sampling specification (sampling/createMessage), which allows MCP servers to request LLM completions from the client. This enables servers to incorporate inference into their behavior without holding API keys — the client retains full control over model access, selection, and user approval.
credit: modelcontextprotocol.io
Motivation
MCP servers increasingly need LLM inference as part of their operation (e.g. synthesizing retrieved content, generating structured output from raw data). Without sampling, servers must either embed their own API keys or return raw results and leave all synthesis to the outer agent loop. The spec provides a standard protocol for this that keeps the client in control of model choice and cost, and keeps users in the loop.
Use Cases
-
Natural language tool responses — a tool server uses sampling to convert raw structured data (e.g. a database query result) into a natural language summary before returning it to the agent, without the server holding its own model credentials.
-
Sampling with tools — a server issues a sampling/createMessage request with a tools array; the client's LLM calls one or more tools (e.g. fetching live data), the server executes them and returns results, and the client completes the conversation — all under user supervision.
-
Agent-to-agent — a subordinate agent (acting as an MCP server) needs clarification mid-task and sends a sampling request to the orchestrating agent's client: "Please provide more context about the purpose of this query."
Requirements
Capability Declaration. A client that supports sampling MUST declare the sampling capability at initialization. If the client also supports tool use within sampling, it MUST declare sampling.tools; servers MUST NOT send tool-enabled requests to clients that have not declared this.
Human-in-the-Loop. There SHOULD always be a human in the loop with the ability to deny sampling requests. The client SHOULD present the prompt for user review before forwarding it to the LLM, allow edits, and similarly present the response for approval before returning it to the server.
Tool Use. When a server includes a tools array, the LLM may respond with ToolUseContent blocks (stopReason: "toolUse"). Every such assistant message MUST be followed by a user message consisting entirely of ToolResultContent blocks matched by toolUseId — mixing tool results with other content types is not permitted. The server MUST enforce these invariants in follow-up requests. Both parties SHOULD impose iteration limits to bound tool loops.
Model Selection. Servers express preferences via modelPreferences, providing normalized priority scores (costPriority, speedPriority, intelligencePriority) and ordered advisory hints (model name substrings). Clients SHOULD respect these hints but retain final authority over model selection, and MAY map hints to equivalent models from other providers.
Security. Clients SHOULD implement user approval controls and rate limiting on all sampling requests. Both parties MUST handle sensitive message content appropriately, SHOULD validate message content before processing, and SHOULD enforce iteration limits on tool loops.
Alternatives
Servers can include their own API keys and call LLM providers directly, but this undermines the client's ability to control model selection, enforce rate limits, and keep users informed.
Overview
Support the MCP sampling specification (
sampling/createMessage), which allows MCP servers to request LLM completions from the client. This enables servers to incorporate inference into their behavior without holding API keys — the client retains full control over model access, selection, and user approval.credit: modelcontextprotocol.io
Motivation
MCP servers increasingly need LLM inference as part of their operation (e.g. synthesizing retrieved content, generating structured output from raw data). Without sampling, servers must either embed their own API keys or return raw results and leave all synthesis to the outer agent loop. The spec provides a standard protocol for this that keeps the client in control of model choice and cost, and keeps users in the loop.
Use Cases
Natural language tool responses — a tool server uses sampling to convert raw structured data (e.g. a database query result) into a natural language summary before returning it to the agent, without the server holding its own model credentials.
Sampling with tools — a server issues a
sampling/createMessagerequest with atoolsarray; the client's LLM calls one or more tools (e.g. fetching live data), the server executes them and returns results, and the client completes the conversation — all under user supervision.Agent-to-agent — a subordinate agent (acting as an MCP server) needs clarification mid-task and sends a sampling request to the orchestrating agent's client: "Please provide more context about the purpose of this query."
Requirements
Capability Declaration. A client that supports sampling MUST declare the
samplingcapability at initialization. If the client also supports tool use within sampling, it MUST declaresampling.tools; servers MUST NOT send tool-enabled requests to clients that have not declared this.Human-in-the-Loop. There SHOULD always be a human in the loop with the ability to deny sampling requests. The client SHOULD present the prompt for user review before forwarding it to the LLM, allow edits, and similarly present the response for approval before returning it to the server.
Tool Use. When a server includes a
toolsarray, the LLM may respond withToolUseContentblocks (stopReason: "toolUse"). Every such assistant message MUST be followed by a user message consisting entirely ofToolResultContentblocks matched bytoolUseId— mixing tool results with other content types is not permitted. The server MUST enforce these invariants in follow-up requests. Both parties SHOULD impose iteration limits to bound tool loops.Model Selection. Servers express preferences via
modelPreferences, providing normalized priority scores (costPriority,speedPriority,intelligencePriority) and ordered advisoryhints(model name substrings). Clients SHOULD respect these hints but retain final authority over model selection, and MAY map hints to equivalent models from other providers.Security. Clients SHOULD implement user approval controls and rate limiting on all sampling requests. Both parties MUST handle sensitive message content appropriately, SHOULD validate message content before processing, and SHOULD enforce iteration limits on tool loops.
Alternatives
Servers can include their own API keys and call LLM providers directly, but this undermines the client's ability to control model selection, enforce rate limits, and keep users informed.