message

Learn how to request LLM sampling from the client to enable agentic behaviors in your MCP server.

The message method allows your MCP server to request LLM sampling (also called "completions" or "generations") from the client. This is called "sampling" in the MCP specification and enables servers to implement agentic behaviors—like calling an LLM to process data, generate responses, or make decisions—without requiring server-side API keys.

The client maintains control over model access, selection, and permissions while your server can leverage AI capabilities through a standardized interface.

Warning

Not all MCP clients support sampling yet. The server will throw an error if you attempt to use message with a client that doesn't declare the sampling capability.

Basic API

You can request LLM sampling by calling the message method on the server instance. The method takes a request object and returns a promise with the LLM's response.

		const response = await server.message({
	messages: [
		{
			role: 'user',
			content: {
				type: 'text',
				text: 'What is the capital of France?',
			},
		},
	],
	maxTokens: 100,
});
 
console.log(response.content.text); // "The capital of France is Paris."
console.log(response.model); // "claude-3-sonnet-20240307"
console.log(response.stopReason); // "endTurn"

The response includes the LLM's message with role (always "assistant"), content (text, image, or audio), the model used, and stopReason indicating why generation stopped.

Parameters

Parameter	Required	Description
`messages`	Yes	Array of messages with `role` (`"user"` or `"assistant"`) and `content` (text, image, or audio). Each message contains a single content item.
`maxTokens`	Yes	Maximum number of tokens to generate.
`systemPrompt`	No	System prompt to guide the LLM's behavior.
`modelPreferences`	No	Object with `hints` (array of model name hints), `costPriority`, `speedPriority`, and `intelligencePriority` (all 0-1). Client uses these to select an appropriate model.
`temperature`	No	Controls randomness (0.0 = deterministic, higher = more random). Support varies by client.
`stopSequences`	No	Array of strings that stop generation when encountered.
`metadata`	No	Arbitrary metadata object to pass to the LLM provider.
`includeContext`	No	Request to include context from MCP servers: `"none"`, `"thisServer"`, or `"allServers"`. Deprecated, may be ignored by client.

Multi-turn Conversations

Build conversations by including previous messages in the array:

		const response = await server.message({
	messages: [
		{
			role: 'user',
			content: { type: 'text', text: 'What is 2 + 2?' },
		},
		{
			role: 'assistant',
			content: { type: 'text', text: '2 + 2 equals 4.' },
		},
		{
			role: 'user',
			content: { type: 'text', text: 'Now multiply that by 3.' },
		},
	],
	maxTokens: 100,
});

Checking Client Support

Check if the client supports sampling before using this method:

		server.tool(
	{
		name: 'analyze_with_ai',
		description: 'Analyze data using AI',
		enabled() {
			return !!server.ctx.sessionInfo?.clientCapabilities?.sampling;
		},
	},
	async () => {
		const response = await server.message({
			messages: [
				{
					role: 'user',
					content: { type: 'text', text: 'Analyze this data.' },
				},
			],
			maxTokens: 500,
		});
		return tool.text(response.content.text);
	},
);

Error Handling

The method throws McpError when the client doesn't support sampling (code -32601), the user rejects the request (code -1), or invalid parameters are provided (code -32602).

		try {
	const response = await server.message({
		messages: [
			{
				role: 'user',
				content: { type: 'text', text: 'Hello!' },
			},
		],
		maxTokens: 500,
	});
} catch (error) {
	if (error.code === -1) {
		return tool.text('User declined the request.');
	} else if (error.code === -32601) {
		return tool.text('Sampling not available with this client.');
	}
	throw error;
}

elicitation

refreshRoots