Skip to content

feat: MCP sampling (sampling/createMessage) #2809

@EronWright

Description

@EronWright

Overview

Support the MCP sampling specification (sampling/createMessage), which allows MCP servers to request LLM completions from the client. This enables servers to incorporate inference into their behavior without holding API keys — the client retains full control over model access, selection, and user approval.

Image

credit: modelcontextprotocol.io

Motivation

MCP servers increasingly need LLM inference as part of their operation (e.g. synthesizing retrieved content, generating structured output from raw data). Without sampling, servers must either embed their own API keys or return raw results and leave all synthesis to the outer agent loop. The spec provides a standard protocol for this that keeps the client in control of model choice and cost, and keeps users in the loop.

Use Cases

  1. Natural language tool responses — a tool server uses sampling to convert raw structured data (e.g. a database query result) into a natural language summary before returning it to the agent, without the server holding its own model credentials.

  2. Sampling with tools — a server issues a sampling/createMessage request with a tools array; the client's LLM calls one or more tools (e.g. fetching live data), the server executes them and returns results, and the client completes the conversation — all under user supervision.

  3. Agent-to-agent — a subordinate agent (acting as an MCP server) needs clarification mid-task and sends a sampling request to the orchestrating agent's client: "Please provide more context about the purpose of this query."

Requirements

Capability Declaration. A client that supports sampling MUST declare the sampling capability at initialization. If the client also supports tool use within sampling, it MUST declare sampling.tools; servers MUST NOT send tool-enabled requests to clients that have not declared this.

Human-in-the-Loop. There SHOULD always be a human in the loop with the ability to deny sampling requests. The client SHOULD present the prompt for user review before forwarding it to the LLM, allow edits, and similarly present the response for approval before returning it to the server.

Tool Use. When a server includes a tools array, the LLM may respond with ToolUseContent blocks (stopReason: "toolUse"). Every such assistant message MUST be followed by a user message consisting entirely of ToolResultContent blocks matched by toolUseId — mixing tool results with other content types is not permitted. The server MUST enforce these invariants in follow-up requests. Both parties SHOULD impose iteration limits to bound tool loops.

Model Selection. Servers express preferences via modelPreferences, providing normalized priority scores (costPriority, speedPriority, intelligencePriority) and ordered advisory hints (model name substrings). Clients SHOULD respect these hints but retain final authority over model selection, and MAY map hints to equivalent models from other providers.

Security. Clients SHOULD implement user approval controls and rate limiting on all sampling requests. Both parties MUST handle sensitive message content appropriately, SHOULD validate message content before processing, and SHOULD enforce iteration limits on tool loops.

Alternatives

Servers can include their own API keys and call LLM providers directly, but this undermines the client's ability to control model selection, enforce rate limits, and keep users informed.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions