Azure AI Speech tools for the Azure MCP Server

2025-10-03

Use the Azure MCP Server to manage Azure AI Speech functionalities such as speech-to-text (SST) with natural language prompts. You don't need to remember specific command syntax.

Note

The Azure MCP Server tools define parameters for data they need to complete tasks. Some of these parameters are specific to each tool and are documented below. Other parameters are global and shared by all tools. For more information, see Tool parameters.

Speech-to-Text: Recognize

Recognize speech from an audio file using Azure AI Services Speech. This command takes an audio file and converts it to text using advanced speech recognition capabilities. Supported audio formats include WAV, MP3, OPUS/OGG, FLAC, ALAW, MULAW, MP4, M4A, and AAC. Compressed formats require GStreamer to be installed on the system.

Example prompts include:

Basic conversion: "Convert this audio file to text using Azure Speech Services"
With language detection: "Recognize speech from my audio file with language detection"
With profanity filtering: "Transcribe speech from audio file with profanity filtering"
Specify endpoint: "Convert speech to text from audio file using my cognitive services endpoint"
Spanish language: "Transcribe the audio file in Spanish language"
Detailed output: "Convert speech to text with detailed output format from audio file"
With phrase hints: "Recognize speech with phrase hints for better accuracy"
Multiple phrase hints: "Transcribe audio using multiple phrase hints: 'Azure', 'cognitive services', 'machine learning'"
Comma-separated hints: "Convert speech to text with comma-separated phrase hints: 'Azure, cognitive services, API'"
Raw profanity output: "Transcribe audio with raw profanity output from file"

Parameter	Required or optional	Description
Endpoint	Required	The Azure AI Services endpoint URL (for example, `https://your-service.cognitiveservices.azure.com/`).
File	Required	Path to the local audio file to recognize.
Language	Optional	The language for speech recognition (for example, `en-US`, `es-ES`). Default is `en-US`.
Phrases	Optional	Phrase hints to improve recognition accuracy. Can be specified multiple times or as comma-separated values.
Format	Optional	Output format: `simple` or `detailed`. Default is `simple`.
Profanity	Optional	Profanity filter: `masked`, `removed`, or `raw`. Default is `masked`.

Feedback

Was this page helpful?

Share via

Azure AI Speech tools for the Azure MCP Server

Speech-to-Text: Recognize

Related content

Feedback

Additional resources