Transcriptions - Transcribe
Synchronous transcription of an audio file.
POST {endpoint}/speechtotext/transcriptions:transcribe?api-version=2024-11-15
URI Parameters
| Name | In | Required | Type | Description |
|---|---|---|---|---|
|
audio
|
formData | True |
file (binary) |
The content of the audio file to be transcribed. The audio file must be shorter than 2 hours in audio duration and smaller than 250 MB in size. |
|
definition
|
formData |
string |
Metadata for a transcription request. This field contains a JSON-serialized object of type |
|
|
endpoint
|
path | True |
string |
Supported Cognitive Services endpoints (protocol and hostname, for example: https://westus.api.cognitive.microsoft.com). |
|
api-version
|
query | True |
string |
The requested api version. |
Request Header
Media Types: "multipart/form-data"
| Name | Required | Type | Description |
|---|---|---|---|
| Ocp-Apim-Subscription-Key | True |
string |
Provide your cognitive services account key here. |
Responses
| Name | Type | Description |
|---|---|---|
| 200 OK |
OK |
|
| Other Status Codes |
An error occurred. |
Security
Ocp-Apim-Subscription-Key
Provide your cognitive services account key here.
Type:
apiKey
In:
header
Examples
Transcribe an audio file
Sample request
POST {endpoint}/speechtotext/transcriptions:transcribe?api-version=2024-11-15
Sample response
{
"durationMilliseconds": 2000,
"combinedPhrases": [
{
"text": "Weather"
}
],
"phrases": [
{
"offsetMilliseconds": 40,
"durationMilliseconds": 320,
"text": "Weather",
"words": [
{
"text": "weather",
"offsetMilliseconds": 40,
"durationMilliseconds": 320
}
],
"locale": "en-US",
"confidence": 0.78983736
}
]
}
Definitions
| Name | Description |
|---|---|
|
Channel |
The full transcript per channel. |
|
Detailed |
DetailedErrorCode |
| Error |
Error |
|
Error |
ErrorCode |
|
Inner |
InnerError |
| Phrase |
A transcribed phrase. |
|
Transcribe |
The result of the transcribe operation. |
| Word |
Time-stamped word in the display form. |
ChannelCombinedPhrases
The full transcript per channel.
| Name | Type | Description |
|---|---|---|
| channel |
integer (int32) |
The 0-based channel index. Only present if channel separation is enabled. |
| text |
string |
The transcribed text. |
DetailedErrorCode
DetailedErrorCode
| Value | Description |
|---|---|
| InvalidParameterValue |
Invalid parameter value. |
| InvalidRequestBodyFormat |
Invalid request body format. |
| EmptyRequest |
Empty Request. |
| MissingInputRecords |
Missing Input Records. |
| InvalidDocument |
Invalid Document. |
| ModelVersionIncorrect |
Model Version Incorrect. |
| InvalidDocumentBatch |
Invalid Document Batch. |
| UnsupportedLanguageCode |
Unsupported language code. |
| DataImportFailed |
Data import failed. |
| InUseViolation |
In use violation. |
| InvalidLocale |
Invalid locale. |
| InvalidBaseModel |
Invalid base model. |
| InvalidAdaptationMapping |
Invalid adaptation mapping. |
| InvalidDataset |
Invalid dataset. |
| InvalidTest |
Invalid test. |
| FailedDataset |
Failed dataset. |
| InvalidModel |
Invalid model. |
| InvalidTranscription |
Invalid transcription. |
| InvalidPayload |
Invalid payload. |
| InvalidParameter |
Invalid parameter. |
| EndpointWithoutLogging |
Endpoint without logging. |
| InvalidPermissions |
Invalid permissions. |
| InvalidPrerequisite |
Invalid prerequisite. |
| InvalidProductId |
Invalid product id. |
| InvalidSubscription |
Invalid subscription. |
| InvalidProject |
Invalid project. |
| InvalidProjectKind |
Invalid project kind. |
| InvalidRecordingsUri |
Invalid recordings uri. |
| OnlyOneOfUrlsOrContainerOrDataset |
Only one of urls or container or dataset. |
| ExceededNumberOfRecordingsUris |
Exceeded number of recordings uris. |
| InvalidChannels |
Invalid channels. |
| ModelMismatch |
Model mismatch. |
| ProjectGenderMismatch |
Project gender mismatch. |
| ModelDeprecated |
Model deprecated. |
| ModelExists |
Model exists. |
| ModelNotDeployable |
Model not deployable. |
| EndpointNotUpdatable |
Endpoint not updatable. |
| SingleDefaultEndpoint |
Single default endpoint. |
| EndpointCannotBeDefault |
Endpoint cannot be default. |
| InvalidModelUri |
Invalid model uri. |
| SubscriptionNotFound |
Subscription not found. |
| QuotaViolation |
Quota violation. |
| UnsupportedDelta |
Unsupported delta. |
| UnsupportedFilter |
Unsupported filter. |
| UnsupportedPagination |
Unsupported pagination. |
| UnsupportedDynamicConfiguration |
Unsupported dynamic configuration. |
| UnsupportedOrderBy |
Unsupported order by. |
| NoUtf8WithBom |
No utf8 with bom. |
| ModelDeploymentNotCompleteState |
Model deployment not complete state. |
| SkuLimitsExist |
Sku limits exist. |
| DeployingFailedModel |
Deploying failed model. |
| UnsupportedTimeRange |
Unsupported time range. |
| InvalidLogDate |
Invalid log date. |
| InvalidLogId |
Invalid log id. |
| InvalidLogStartTime |
Invalid log start time. |
| InvalidLogEndTime |
Invalid log end time. |
| InvalidTopForLogs |
Invalid top for logs. |
| InvalidSkipTokenForLogs |
Invalid skip token for logs. |
| DeleteNotAllowed |
Delete not allowed. |
| Forbidden |
Forbidden. |
| DeployNotAllowed |
Deploy not allowed. |
| UnexpectedError |
Unexpected error. |
| InvalidCollection |
Invalid collection. |
| InvalidCallbackUri |
Invalid callback uri. |
| InvalidSasValidityDuration |
Invalid sas validity duration. |
| InaccessibleCustomerStorage |
Inaccessible customer storage. |
| UnsupportedClassBasedAdaptation |
Unsupported class based adaptation. |
| InvalidWebHookEventKind |
Invalid web hook event kind. |
| InvalidTimeToLive |
Invalid time to live. |
| InvalidSourceAzureResourceId |
Invalid source Azure resource ID. |
| ModelCopyAuthorizationExpired |
Expired ModelCopyAuthorization. |
| EndpointLoggingNotSupported |
Endpoint logging not supported. |
| NoLanguageIdentified |
Language Identification did not recognize any language. |
| MultipleLanguagesIdentified |
Language Identification recognized multiple languages. No dominant language could be determined. |
| InvalidAudioFormat |
The format of input audio is not supported. |
| BadChannelConfiguration |
There is a mismatch between audio channels in the data, in the configuration, or the requirements of the application. |
| InvalidChannelSpecification |
The selection of channels in the transcription request is not supported (e.g., neither 0 nor 1 have been selected.) |
| AudioLengthLimitExceeded |
The audio file is longer than the maximum allowed duration. |
| EmptyAudioFile |
The audio file is empty. |
Error
Error
| Name | Type | Description |
|---|---|---|
| code |
ErrorCode |
|
| details |
Error[] |
Additional supportive details regarding the error and/or expected policies. |
| innerError |
InnerError |
|
| message |
string |
High level error message. |
| target |
string |
The source of the error. For example it would be "documents" or "document id" in case of invalid document. |
ErrorCode
ErrorCode
| Value | Description |
|---|---|
| InvalidRequest |
Representing the invalid request error code. |
| InvalidArgument |
Representing the invalid argument error code. |
| InternalServerError |
Representing the internal server error error code. |
| ServiceUnavailable |
Representing the service unavailable error code. |
| NotFound |
Representing the not found error code. |
| PipelineError |
Representing the pipeline error error code. |
| Conflict |
Representing the conflict error code. |
| InternalCommunicationFailed |
Representing the internal communication failed error code. |
| Forbidden |
Representing the forbidden error code. |
| NotAllowed |
Representing the not allowed error code. |
| Unauthorized |
Representing the unauthorized error code. |
| UnsupportedMediaType |
Representing the unsupported media type error code. |
| TooManyRequests |
Representing the too many requests error code. |
| UnprocessableEntity |
Representing the unprocessable entity error code. |
InnerError
InnerError
| Name | Type | Description |
|---|---|---|
| code |
DetailedErrorCode |
|
| details |
object |
Additional supportive details regarding the error and/or expected policies. |
| innerError |
InnerError |
|
| message |
string |
High level error message. |
| target |
string |
The source of the error. For example it would be "documents" or "document id" in case of invalid document. |
Phrase
A transcribed phrase.
| Name | Type | Description |
|---|---|---|
| channel |
integer (int32) |
The 0-based channel index. Only present if channel separation is enabled. |
| confidence |
number (float) |
The confidence value for the phrase. |
| durationMilliseconds |
integer (int32) |
The duration of the phrase in milliseconds. |
| locale |
string |
The locale of the phrase. |
| offsetMilliseconds |
integer (int32) |
The start offset of the phrase in milliseconds. |
| speaker |
integer (int32) |
A unique integer number that is assigned to each speaker detected in the audio without particular order. Only present if speaker diarization is enabled. |
| text |
string |
The transcribed text of the phrase. |
| words |
Word[] |
The words that make up the phrase. Only present if word-level timestamps are enabled. |
TranscribeResult
The result of the transcribe operation.
| Name | Type | Description |
|---|---|---|
| combinedPhrases |
The full transcript for each channel. |
|
| durationMilliseconds |
integer (int32) |
The duration of the audio in milliseconds. |
| phrases |
Phrase[] |
The transcription results segmented into phrases. |
Word
Time-stamped word in the display form.
| Name | Type | Description |
|---|---|---|
| durationMilliseconds |
integer (int32) |
The duration of the word in milliseconds. |
| offsetMilliseconds |
integer (int32) |
The start offset of the word in milliseconds. |
| text |
string |
The recognized word, including punctuation. |