Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Use the Azure MCP Server to manage Azure AI Speech functionalities such as speech-to-text (SST) with natural language prompts. You don't need to remember specific command syntax.
Note
The Azure MCP Server tools define parameters for data they need to complete tasks. Some of these parameters are specific to each tool and are documented below. Other parameters are global and shared by all tools. For more information, see Tool parameters.
Speech-to-Text: Recognize
Recognize speech from an audio file using Azure AI Services Speech. This command takes an audio file and converts it to text using advanced speech recognition capabilities. Supported audio formats include WAV, MP3, OPUS/OGG, FLAC, ALAW, MULAW, MP4, M4A, and AAC. Compressed formats require GStreamer to be installed on the system.
Example prompts include:
- Basic conversion: "Convert this audio file to text using Azure Speech Services"
- With language detection: "Recognize speech from my audio file with language detection"
- With profanity filtering: "Transcribe speech from audio file with profanity filtering"
- Specify endpoint: "Convert speech to text from audio file using my cognitive services endpoint"
- Spanish language: "Transcribe the audio file in Spanish language"
- Detailed output: "Convert speech to text with detailed output format from audio file"
- With phrase hints: "Recognize speech with phrase hints for better accuracy"
- Multiple phrase hints: "Transcribe audio using multiple phrase hints: 'Azure', 'cognitive services', 'machine learning'"
- Comma-separated hints: "Convert speech to text with comma-separated phrase hints: 'Azure, cognitive services, API'"
- Raw profanity output: "Transcribe audio with raw profanity output from file"
| Parameter | Required or optional | Description |
|---|---|---|
| Endpoint | Required | The Azure AI Services endpoint URL (for example, https://your-service.cognitiveservices.azure.com/). |
| File | Required | Path to the local audio file to recognize. |
| Language | Optional | The language for speech recognition (for example, en-US, es-ES). Default is en-US. |
| Phrases | Optional | Phrase hints to improve recognition accuracy. Can be specified multiple times or as comma-separated values. |
| Format | Optional | Output format: simple or detailed. Default is simple. |
| Profanity | Optional | Profanity filter: masked, removed, or raw. Default is masked. |