Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
In this article, you connect an MCP-compliant tool server with an AI toolchain operator (KAITO) inference workspace on Azure Kubernetes Service (AKS), enabling secure and modular tool calling for LLM applications. You also learn how to validate end-to-end tool invocation by integrating the model with the MCP server and monitoring real-time function execution through structured responses.
Model Context Protocol (MCP)
As an extension of KAITO inference with tool calling, the Model Context Protocol (MCP) provides a standardized way to define and expose tools for language models to call.
Tool calling with MCP makes it easier to connect language models to real services and actions without tightly coupling logic into the model itself. Instead of embedding every function or API call into your application code, MCP lets you run a standalone tool server that exposes standardized tools or APIs that any compatible LLM can use. This clean separation means you can update tools independently, share them across models, and manage them like any other microservice.
You can bring-your-own (BYO) internal or connect external MCP servers seamlessly with your KAITO inference workspace on AKS.
MCP with AI toolchain operator (KAITO) on AKS
You can register an external MCP server in a uniform, schema-driven format and serve it to any compatible inference endpoint, including those deployed with a KAITO workspace. This approach allows for externalizing business logic, decoupling model behavior from tool execution, and reusing tools across agents, models, and environments.
In this guide, you register a pre-defined MCP server, test real calls issued by an LLM running in a KAITO inference workspace, and confirm the entire tool execution path (from model prompt to MCP function invocation) works as intended. You have flexibility to scale or swap tools independent of your model.
Prerequisites
- This article assumes that you have an existing AKS cluster. If you don't have a cluster, create one by using the Azure CLI, Azure PowerShell, or the Azure portal.
- Your AKS cluster is running on Kubernetes version
1.33or higher. To upgrade your cluster, see Upgrade your AKS cluster. - Install and configure Azure CLI version
2.77.0or later. To find your version, runaz --version. To install or update, see Install the Azure CLI. - You have the AI toolchain operator add-on enabled on your cluster and a KAITO workspace with tool calling support deployed on your cluster.
- An external MCP server available at an accessible URL (e.g.,
https://mcp.example.com/mcp) that returns valid/list_toolsand hasstreamtransport.
Connect to a reference MCP server
In this example, we'll use a reference Time MCP Server, which provides time zone conversion capabilities and enables LLMs to get current time information and perform conversions using standardized names.
Port-forward the KAITO inference service
Confirm that your KAITO workspace is ready and retrieve the inference service endpoint using the
kubectl getcommand.kubectl get svc workspace‑phi‑4-mini-toolcallNote
The output might be a
ClusterIPor internal address. Check which port(s) the service listens on. The default KAITO inference API is on port80for HTTP. If it's only internal, you can port‑forward locally.Port-forward the inference service for testing using the
kubectl port-forwardcommand.kubectl port-forward svc/workspace‑phi‑4‑mini-toolcall 8000:80Check
/v1/modelsendpoint to confirm thatPhi-4-mini-instructLLM is available usingcurl.curl http://localhost:8000/v1/modelsYour
Phi-4-mini-instructOpenAI-compatible inference API will be available at:http://localhost:8000/v1/chat/completions
Confirm the reference MCP server is valid
This example assumes that the Time MCP server is hosted at https://mcp.example.com.
Confirm the server returns tools using
curl.curl https://mcp.example.com/mcp/list_toolsExpected output:
{ "tools": [ { "name": "get_current_time", "description": "Get the current time in a specific timezone", "arguments": { "timezone": "string" } }, { "name": "convert_time", "description": "Convert time between two timezones", "arguments": { "source_timezone": "string", "time": "string", "target_timezone": "string" } } ] }
Connect MCP server to the KAITO workspace using API request
KAITO automatically fetches tool definitions from tools declared in API requests or registered dynamically inside the inference runtime (vLLM + MCP tool loader).
In this guide, we create a Python virtual environment to send a tool-calling request to the Phi-4-mini-instruct inference endpoint using the MCP definition and pointing to the server.
Define a new working directory for this test project.
mkdir kaito-mcp cd kaito-mcpCreate a Python virtual environment and activate it so that all packages are local to your test project.
uv venv source .venv/bin/activateUse the open-source Autogen framework to test the tool calling functionality and install its dependencies:
uv pip install "autogen-ext[openai]" "autogen-agentchat" "autogen-ext[mcp]"Create a test file named
test.pythat:- Connects to the Time MCP server and loads
get_current_timetool. - Connects to your KAITO inference service running at
localhost:8000. - Sends an example query like “What time is it in Europe/Paris?”
- Enables automatic selection and calling of the
get_current_timetool.
import asyncio from autogen_agentchat.agents import AssistantAgent from autogen_agentchat.ui import Console from autogen_core import CancellationToken from autogen_core.models import ModelFamily, ModelInfo from autogen_ext.models.openai import OpenAIChatCompletionClient from autogen_ext.tools.mcp import (StreamableHttpMcpToolAdapter, StreamableHttpServerParams) from openai import OpenAI async def main() -> None: # Create server params for the Time MCP service server_params = StreamableHttpServerParams( url="https://mcp.example.com/mcp", timeout=30.0, terminate_on_close=True, ) # Load the get_current_time tool from the server adapter = await StreamableHttpMcpToolAdapter.from_server_params(server_params, "get_current_time") # Fetch model name from KAITO's local OpenAI-compatible API model = OpenAI(base_url="http://localhost:8000/v1", api_key="dummy").models.list().data[0].id model_info: ModelInfo = { "vision": False, "function_calling": True, "json_output": True, "family": ModelFamily.UNKNOWN, "structured_output": True, "multiple_system_messages": True, } # Connect to the KAITO inference workspace model_client = OpenAIChatCompletionClient( base_url="http://localhost:8000/v1", api_key="dummy", model=model, model_info=model_info ) # Define the assistant agent agent = AssistantAgent( name="time-assistant", model_client=model_client, tools=[adapter], system_message="You are a helpful assistant that can provide time information." ) # Run a test task that invokes the tool await Console( agent.run_stream( task="What time is it in Europe/Paris?", cancellation_token=CancellationToken() ) ) if __name__ == "__main__": asyncio.run(main())- Connects to the Time MCP server and loads
Run the test script in your virtual environment.
uv run test.pyIn the output of this test, you should expect the following:
- The model correctly generates a tool call using the MCP name and expected arguments.
- Autogen sends the tool call to the MCP server, the MCP server runs the logic and returns a result.
- The
Phi-4-mini-instructLLM interprets the raw tool output and provides a natural language response.
---------- TextMessage (user) ---------- What time is it in Europe/Paris? ---------- ToolCallRequestEvent (time-assistant) ---------- [FunctionCall(id='chatcmpl-tool-xxxx', arguments='{"timezone": "Europe/Paris"}', name='get_current_time')] ---------- ToolCallExecutionEvent (time-assistant) ---------- [FunctionExecutionResult(content='{"timezone":"Europe/Paris","datetime":"2025-09-17T17:43:05+02:00","is_dst":true}', name='get_current_time', call_id='chatcmpl-tool-xxxx', is_error=False)] ---------- ToolCallSummaryMessage (time-assistant) ---------- The current time in Europe/Paris is 5:43 PM (CEST).
Experiment with more MCP tools
You can test the various tools available to this MCP server, such as convert_time.
In your
test.pyfile from the previous step, update youradapterdefinition to the following:adapter = await StreamableHttpMcpToolAdapter.from_server_params(server_params, "convert_time")Update your
taskdefinition to invoke the new tool. For example:task="Convert 9:30 AM New York time to Tokyo time."Save and run the Python script.
uv run test.pyExpected output:
9:30 AM in New York is 10:30 PM in Tokyo.
Troubleshooting
The following table outlines common errors when testing KAITO inference with an external MCP server and how to resolve them:
| Error | How to resolve |
|---|---|
Tool not found |
Ensure that your tool name matches the one declared in /mcp/list_tools. |
401 Unauthorized |
If your MCP server requires an Auth token, make sure to update server_params to include headers with the Auth token. |
connection refused |
Ensure the KAITO inference service is port-forwarded properly (e.g. to localhost:8000). |
tool call ignored |
Review the KAITO tool calling documentation to find vLLM models that support tool calling. |
Next steps
In this article, you learned how to connect a KAITO workspace to an external reference MCP server using Autogen to enable tool calling through the OpenAI-compatible API. You also validated that the LLM could discover, invoke, and integrate results from MCP-compliant tools on AKS. To learn more, see the following resources:
- Deploy and test tool calls with the AI toolchain operator add-on on AKS.
- KAITO tool calling and MCP official documentation.
Azure Kubernetes Service