你当前正在访问 Microsoft Azure Global Edition 技术文档网站。如果需要访问由世纪互联运营的 Microsoft Azure 中国技术文档网站，请访问 https://docs.azure.cn。

Transcriptions - Submit

服务:: Azure AI Services

API 版本:: 2024-11-15

提交新的听录作业。

POST {endpoint}/speechtotext/transcriptions:submit?api-version=2024-11-15

URI 参数

名称	在	必需	类型	说明
endpoint	path	True	string	支持的认知服务终结点（协议和主机名，例如：https://westus.api.cognitive.microsoft.com）。
api-version	query	True	string	请求的 API 版本。

请求头

名称	必需	类型	说明
Ocp-Apim-Subscription-Key	True	string	在此处提供认知服务帐户密钥。

请求正文

名称	必需	类型	说明
displayName	True	string minLength: 1	对象的显示名称。
locale	True	string minLength: 1	包含数据的区域设置。如果使用语言标识，则此区域设置用于转录无法检测到任何语言的语音。
properties	True	TranscriptionProperties	TranscriptionProperties
contentContainerUrl		string (uri)	包含音频文件的 Azure Blob 容器的 URL。允许容器的最大大小为 5GB，最大大小为 10000 个 blob。 Blob 的最大大小为 2.5GB。容器 SAS 应包含“r”（读取）和“l”（列表）权限。此属性不会在响应中返回。
contentUrls		string[] (uri)	用于获取要转录的音频文件的内容 URL 列表。最多允许 1000 个 URL。此属性不会在响应中返回。
customProperties		object	此实体的自定义属性。允许的最大密钥长度为 64 个字符，允许的最大值长度为 256 个字符，允许的条目计数为 10。
dataset		EntityReference	EntityReference
description		string	对象的说明。
model		EntityReference	EntityReference

响应

名称	类型	说明
201 Created	Transcription	响应包含有关实体作为有效负载的信息及其作为标头的位置。标头 Location: string
Other Status Codes	Error	发生错误。

名称

类型

说明

201 Created

Transcription

响应包含有关实体作为有效负载的信息及其作为标头的位置。

标头

Location: string

Other Status Codes

Error

发生错误。

安全性

Ocp-Apim-Subscription-Key

在此处提供认知服务帐户密钥。

类型: apiKey
在: header

示例

Create a transcription for URIs

Create a transcription from blob container

Create a transcription with language identification

Create a transcription with multispeaker diarization

Create a transcription for URIs

示例请求

HTTP

POST {endpoint}/speechtotext/transcriptions:submit?api-version=2024-11-15


{
  "displayName": "Transcription using default model for en-US",
  "locale": "en-US",
  "contentUrls": [
    "https://contoso.com/mystoragelocation",
    "https://contoso.com/myotherstoragelocation"
  ],
  "properties": {
    "wordLevelTimestampsEnabled": false,
    "displayFormWordLevelTimestampsEnabled": false,
    "punctuationMode": "DictatedAndAutomatic",
    "profanityFilterMode": "Masked",
    "timeToLiveHours": 48
  }
}

示例响应

状态代码:: 201

{
  "self": "https://westus.api.cognitive.microsoft.com/speechtotext/transcriptions/ba7ea6f5-3065-40b7-b49a-a90f48584683?api-version=2024-11-15",
  "displayName": "Transcription using adapted model en-US",
  "customProperties": {
    "key": "value"
  },
  "locale": "en-US",
  "createdDateTime": "2019-01-07T11:34:12Z",
  "lastActionDateTime": "2019-01-07T11:36:07Z",
  "model": {
    "self": "https://westus.api.cognitive.microsoft.com/speechtotext/models/827712a5-f942-4997-91c3-7c6cde35600b?api-version=2024-11-15"
  },
  "links": {
    "files": "https://westus.api.cognitive.microsoft.com/speechtotext/transcriptions/ba7ea6f5-3065-40b7-b49a-a90f48584683/files?api-version=2024-11-15"
  },
  "properties": {
    "wordLevelTimestampsEnabled": false,
    "displayFormWordLevelTimestampsEnabled": false,
    "channels": [
      0,
      1
    ],
    "punctuationMode": "DictatedAndAutomatic",
    "profanityFilterMode": "Masked",
    "timeToLiveHours": 48,
    "durationMilliseconds": 42000
  },
  "status": "Succeeded"
}

Create a transcription from blob container

示例请求

HTTP

POST {endpoint}/speechtotext/transcriptions:submit?api-version=2024-11-15


{
  "displayName": "Transcription of storage container using default model for en-US",
  "locale": "en-US",
  "properties": {
    "wordLevelTimestampsEnabled": false,
    "displayFormWordLevelTimestampsEnabled": false,
    "punctuationMode": "DictatedAndAutomatic",
    "profanityFilterMode": "Masked",
    "timeToLiveHours": 48
  },
  "contentContainerUrl": "https://customspeech-usw.blob.core.windows.net/artifacts/audiofiles/"
}

示例响应

状态代码:: 201

Location: https://westus.api.cognitive.microsoft.com/speechtotext/transcriptions/ba7ea6f5-3065-40b7-b49a-a90f48584683?api-version=2024-11-15

{
  "self": "https://westus.api.cognitive.microsoft.com/speechtotext/transcriptions/ba7ea6f5-3065-40b7-b49a-a90f48584683?api-version=2024-11-15",
  "displayName": "Transcription using adapted model en-US",
  "customProperties": {
    "key": "value"
  },
  "locale": "en-US",
  "createdDateTime": "2019-01-07T11:34:12Z",
  "lastActionDateTime": "2019-01-07T11:36:07Z",
  "model": {
    "self": "https://westus.api.cognitive.microsoft.com/speechtotext/models/827712a5-f942-4997-91c3-7c6cde35600b?api-version=2024-11-15"
  },
  "links": {
    "files": "https://westus.api.cognitive.microsoft.com/speechtotext/transcriptions/ba7ea6f5-3065-40b7-b49a-a90f48584683/files?api-version=2024-11-15"
  },
  "properties": {
    "wordLevelTimestampsEnabled": false,
    "displayFormWordLevelTimestampsEnabled": false,
    "channels": [
      0,
      1
    ],
    "punctuationMode": "DictatedAndAutomatic",
    "profanityFilterMode": "Masked",
    "timeToLiveHours": 48,
    "durationMilliseconds": 42000
  },
  "status": "Succeeded"
}

Create a transcription with language identification

示例请求

HTTP

POST {endpoint}/speechtotext/transcriptions:submit?api-version=2024-11-15


{
  "displayName": "Transcription using language identification with three candidate languages, 'fr-FR' as fallback locale and a custom model for transcribing utterances that were classified as 'nl-NL' locale.",
  "locale": "fr-FR",
  "contentUrls": [
    "https://contoso.com/mystoragelocation"
  ],
  "properties": {
    "wordLevelTimestampsEnabled": false,
    "displayFormWordLevelTimestampsEnabled": false,
    "channels": [
      0,
      1
    ],
    "punctuationMode": "DictatedAndAutomatic",
    "profanityFilterMode": "Masked",
    "timeToLiveHours": 48,
    "languageIdentification": {
      "candidateLocales": [
        "fr-FR",
        "nl-NL",
        "el-GR"
      ],
      "speechModelMapping": {
        "nl-NL": {
          "self": "https://westus.api.cognitive.microsoft.com/speechtotext/models/827712a5-f942-4997-91c3-7c6cde35600b?api-version=2024-11-15"
        }
      },
      "mode": "Single"
    }
  }
}

示例响应

状态代码:: 201

{
  "self": "https://westus.api.cognitive.microsoft.com/speechtotext/transcriptions/ba7ea6f5-3065-40b7-b49a-a90f48584683?api-version=2024-11-15",
  "displayName": "Transcription using language identification with three candidate languages, 'fr-FR' as fallback locale and a custom model for transcribing utterances that were classified as 'nl-NL' locale.",
  "customProperties": {
    "key": "value"
  },
  "locale": "fr-FR",
  "createdDateTime": "2019-01-07T11:34:12Z",
  "lastActionDateTime": "2019-01-07T11:36:07Z",
  "model": {
    "self": "https://westus.api.cognitive.microsoft.com/speechtotext/models/827712a5-f942-4997-91c3-7c6cde35600b?api-version=2024-11-15"
  },
  "links": {
    "files": "https://westus.api.cognitive.microsoft.com/speechtotext/transcriptions/ba7ea6f5-3065-40b7-b49a-a90f48584683/files?api-version=2024-11-15"
  },
  "properties": {
    "wordLevelTimestampsEnabled": false,
    "displayFormWordLevelTimestampsEnabled": false,
    "channels": [
      0,
      1
    ],
    "punctuationMode": "DictatedAndAutomatic",
    "profanityFilterMode": "Masked",
    "timeToLiveHours": 48,
    "languageIdentification": {
      "candidateLocales": [
        "fr-FR",
        "nl-NL",
        "el-GR"
      ],
      "speechModelMapping": {
        "nl-NL": {
          "self": "https://westus.api.cognitive.microsoft.com/speechtotext/models/827712a5-f942-4997-91c3-7c6cde35600b?api-version=2024-11-15"
        }
      },
      "mode": "Single"
    },
    "durationMilliseconds": 42000
  },
  "status": "Succeeded"
}

Create a transcription with multispeaker diarization

示例请求

HTTP

POST {endpoint}/speechtotext/transcriptions:submit?api-version=2024-11-15


{
  "displayName": "Transcription using diarization for audio that is known to contain speech from up to 5 speakers",
  "locale": "en-US",
  "contentUrls": [
    "https://contoso.com/mystoragelocation"
  ],
  "properties": {
    "wordLevelTimestampsEnabled": false,
    "displayFormWordLevelTimestampsEnabled": false,
    "channels": [
      0,
      1
    ],
    "punctuationMode": "DictatedAndAutomatic",
    "profanityFilterMode": "Masked",
    "timeToLiveHours": 48,
    "diarization": {
      "enabled": true,
      "maxSpeakers": 5
    }
  }
}

示例响应

状态代码:: 201

{
  "self": "https://westus.api.cognitive.microsoft.com/speechtotext/transcriptions/ba7ea6f5-3065-40b7-b49a-a90f48584683?api-version=2024-11-15",
  "displayName": "Transcription using diarization for audio that is known to contain speech from up to 5 speakers",
  "customProperties": {
    "key": "value"
  },
  "locale": "en-US",
  "createdDateTime": "2019-01-07T11:34:12Z",
  "lastActionDateTime": "2019-01-07T11:36:07Z",
  "model": {
    "self": "https://westus.api.cognitive.microsoft.com/speechtotext/models/827712a5-f942-4997-91c3-7c6cde35600b?api-version=2024-11-15"
  },
  "links": {
    "files": "https://westus.api.cognitive.microsoft.com/speechtotext/transcriptions/ba7ea6f5-3065-40b7-b49a-a90f48584683/files?api-version=2024-11-15"
  },
  "properties": {
    "wordLevelTimestampsEnabled": false,
    "displayFormWordLevelTimestampsEnabled": false,
    "channels": [
      0,
      1
    ],
    "punctuationMode": "DictatedAndAutomatic",
    "profanityFilterMode": "Masked",
    "timeToLiveHours": 48,
    "diarization": {
      "enabled": true,
      "maxSpeakers": 5
    },
    "durationMilliseconds": 42000
  },
  "status": "Succeeded"
}

定义

名称	说明
DetailedErrorCode	DetailedErrorCode
DiarizationProperties	DiarizationProperties
EntityError	EntityError
EntityReference	EntityReference
Error	错误
ErrorCode	ErrorCode
InnerError	InnerError
LanguageIdentificationMode	LanguageIdentificationMode
LanguageIdentificationProperties	LanguageIdentificationProperties
ProfanityFilterMode	ProfanityFilterMode
PunctuationMode	标点符号Mode
Status	地位
Transcription	转录
TranscriptionLinks	TranscriptionLinks
TranscriptionProperties	TranscriptionProperties

DetailedErrorCode

枚举

DetailedErrorCode

值	说明
InvalidParameterValue	参数值无效。
InvalidRequestBodyFormat	请求正文格式无效。
EmptyRequest	空请求。
MissingInputRecords	缺少输入记录。
InvalidDocument	无效的文档。
ModelVersionIncorrect	模型版本不正确。
InvalidDocumentBatch	文档批处理无效。
UnsupportedLanguageCode	不支持的语言代码。
DataImportFailed	数据导入失败。
InUseViolation	在使用冲突中。
InvalidLocale	区域设置无效。
InvalidBaseModel	基本模型无效。
InvalidAdaptationMapping	适应映射无效。
InvalidDataset	数据集无效。
InvalidTest	测试无效。
FailedDataset	失败的数据集。
InvalidModel	无效的模型。
InvalidTranscription	听录无效。
InvalidPayload	有效负载无效。
InvalidParameter	参数无效。
EndpointWithoutLogging	没有日志记录的终结点。
InvalidPermissions	权限无效。
InvalidPrerequisite	先决条件无效。
InvalidProductId	产品 ID 无效。
InvalidSubscription	订阅无效。
InvalidProject	项目无效。
InvalidProjectKind	项目类型无效。
InvalidRecordingsUri	录制 URI 无效。
OnlyOneOfUrlsOrContainerOrDataset	只有一个 URL 或容器或数据集。
ExceededNumberOfRecordingsUris	超过录制 URI 数。
InvalidChannels	通道无效。
ModelMismatch	模型不匹配。
ProjectGenderMismatch	项目性别不匹配。
ModelDeprecated	模型已弃用。
ModelExists	模型存在。
ModelNotDeployable	模型不可部署。
EndpointNotUpdatable	终结点不可更新。
SingleDefaultEndpoint	单个默认终结点。
EndpointCannotBeDefault	终结点不能为默认值。
InvalidModelUri	模型 URI 无效。
SubscriptionNotFound	找不到订阅。
QuotaViolation	配额冲突。
UnsupportedDelta	不支持的增量。
UnsupportedFilter	不支持的筛选器。
UnsupportedPagination	不支持的分页。
UnsupportedDynamicConfiguration	不支持的动态配置。
UnsupportedOrderBy	不受支持的订单依据。
NoUtf8WithBom	没有带有 bom 的 utf8。
ModelDeploymentNotCompleteState	模型部署未完成状态。
SkuLimitsExist	SKU 限制存在。
DeployingFailedModel	部署失败的模型。
UnsupportedTimeRange	不支持的时间范围。
InvalidLogDate	日志日期无效。
InvalidLogId	无效的日志 ID。
InvalidLogStartTime	无效的日志开始时间。
InvalidLogEndTime	日志结束时间无效。
InvalidTopForLogs	日志顶部无效。
InvalidSkipTokenForLogs	日志的跳过令牌无效。
DeleteNotAllowed	不允许删除。
Forbidden	禁止。
DeployNotAllowed	不允许部署。
UnexpectedError	意外错误。
InvalidCollection	集合无效。
InvalidCallbackUri	回调 URI 无效。
InvalidSasValidityDuration	SAS 有效期无效。
InaccessibleCustomerStorage	无法访问客户存储。
UnsupportedClassBasedAdaptation	不支持的基于类的适应。
InvalidWebHookEventKind	Web 挂钩事件类型无效。
InvalidTimeToLive	生存时间无效。
InvalidSourceAzureResourceId	源 Azure 资源 ID 无效。
ModelCopyAuthorizationExpired	已过期的 ModelCopyAuthorization。
EndpointLoggingNotSupported	不支持终结点日志记录。
NoLanguageIdentified	语言识别无法识别任何语言。
MultipleLanguagesIdentified	语言识别识别的多种语言。无法确定主导语言。
InvalidAudioFormat	不支持输入音频的格式。
BadChannelConfiguration	数据、配置或应用程序要求中的音频通道之间存在不匹配。
InvalidChannelSpecification	不支持在听录请求中选择通道（例如，未选择 0 或 1）。
AudioLengthLimitExceeded	音频文件比允许的最大持续时间长。
EmptyAudioFile	音频文件为空。

DiarizationProperties

Object

DiarizationProperties

名称	类型	说明
enabled	boolean	一个值，该值指示是否启用扬声器分割。
maxSpeakers	integer (int32) minimum: 2 maximum: 35	用于分割的最大扬声器数的提示。必须大于 1 且小于 36。

EntityError

Object

EntityError

名称	类型	说明
code	string	此错误的代码。
message	string	此错误的消息。

EntityReference

Object

EntityReference

名称	类型	说明
self	string (uri)	引用实体的位置。

Error

Object

错误

名称	类型	说明
code	ErrorCode	ErrorCode 高级错误代码。
details	Error[]	有关错误和/或预期策略的其他支持详细信息。
innerError	InnerError	InnerError 符合认知服务 API 准则的新内部错误格式，可在 https://microsoft.sharepoint.com/%3Aw%3A/t/CognitiveServicesPMO/EUoytcrjuJdKpeOKIK_QRC8BPtUYQpKBi8JsWyeDMRsWlQ?e=CPq8ow获取。这包括必需的属性 ErrorCode、消息和可选属性目标、详细信息（键值对）、内部错误（可嵌套）。
message	string	高级错误消息。
target	string	错误的源。例如，如果文档无效，则为“documents”或“document id”。

ErrorCode

枚举

ErrorCode

值	说明
InvalidRequest	表示无效的请求错误代码。
InvalidArgument	表示无效的参数错误代码。
InternalServerError	表示内部服务器错误代码。
ServiceUnavailable	表示服务不可用的错误代码。
NotFound	表示找不到的错误代码。
PipelineError	表示管道错误代码。
Conflict	表示冲突错误代码。
InternalCommunicationFailed	表示内部通信失败的错误代码。
Forbidden	表示禁止的错误代码。
NotAllowed	表示不允许的错误代码。
Unauthorized	表示未经授权的错误代码。
UnsupportedMediaType	表示不支持的媒体类型错误代码。
TooManyRequests	表示请求错误代码过多。
UnprocessableEntity	表示无法处理的实体错误代码。

InnerError

Object

InnerError

名称	类型	说明
code	DetailedErrorCode	DetailedErrorCode 详细的错误代码枚举。
details	object	有关错误和/或预期策略的其他支持详细信息。
innerError	InnerError	InnerError 符合认知服务 API 准则的新内部错误格式，可在 https://microsoft.sharepoint.com/%3Aw%3A/t/CognitiveServicesPMO/EUoytcrjuJdKpeOKIK_QRC8BPtUYQpKBi8JsWyeDMRsWlQ?e=CPq8ow获取。这包括必需的属性 ErrorCode、消息和可选属性目标、详细信息（键值对）、内部错误（可嵌套）。
message	string	高级错误消息。
target	string	错误的源。例如，如果文档无效，则为“documents”或“document id”。

LanguageIdentificationMode

枚举

LanguageIdentificationMode

值	说明
Continuous	连续语言标识（默认值）。
Single	单语言标识。如果无法识别任何语言，则会向用户返回错误代码 NoLanguageIdentified。如果多种语言之间存在歧义，则会向用户返回错误代码 MultipleLanguagesIdentified。

LanguageIdentificationProperties

Object

LanguageIdentificationProperties

名称	类型	默认值	说明
candidateLocales	string[]		语言标识的候选区域设置（例如 [“en-US”， “de-DE”， “es-ES”]）。连续模式支持至少 2 个和最多 10 个候选区域设置，包括听录的主区域设置。对于单语言标识，未绑定候选区域设置的最大数量。
mode	LanguageIdentificationMode	Continuous	LanguageIdentificationMode 用于语言识别的模式。
speechModelMapping	<string, EntityReference>		区域设置到语音模型实体的可选映射。如果未为区域设置提供模型，则使用默认基础模型。键必须是候选区域设置中包含的区域设置，值是相应区域设置模型的实体。

ProfanityFilterMode

枚举

ProfanityFilterMode

值	说明
None	禁用不雅内容筛选。
Removed	删除不雅内容。
Tags	</不雅内容> 添加“不雅内容”XML 标记
Masked	用 * 屏蔽不雅内容，但第一个字母除外，例如 f***

PunctuationMode

枚举

标点符号Mode

值	说明
None	无标点符号。
Dictated	仅听写标点符号，即显式标点符号。
Automatic	自动标点符号。
DictatedAndAutomatic	听写标点符号或自动标点符号。

Status

枚举

地位

值	说明
NotStarted	长时间运行的操作尚未启动。
Running	长时间运行的操作当前正在处理。
Succeeded	长时间运行的操作已成功完成。
Failed	长时间运行的操作失败。

Transcription

Object

转录

名称	类型	说明
contentContainerUrl	string (uri)	包含音频文件的 Azure Blob 容器的 URL。允许容器的最大大小为 5GB，最大大小为 10000 个 blob。 Blob 的最大大小为 2.5GB。容器 SAS 应包含“r”（读取）和“l”（列表）权限。此属性不会在响应中返回。
contentUrls	string[] (uri)	用于获取要转录的音频文件的内容 URL 列表。最多允许 1000 个 URL。此属性不会在响应中返回。
createdDateTime	string (date-time)	创建对象的时间戳。时间戳编码为 ISO 8601 日期和时间格式（“YYYY-MM-DDThh：mm：ssZ”，请参阅 https://en.wikipedia.org/wiki/ISO_8601#Combined_date_and_time_representations）。
customProperties	object	此实体的自定义属性。允许的最大密钥长度为 64 个字符，允许的最大值长度为 256 个字符，允许的条目计数为 10。
dataset	EntityReference	EntityReference
description	string	对象的说明。
displayName	string minLength: 1	对象的显示名称。
lastActionDateTime	string (date-time)	输入当前状态时的时间戳。时间戳编码为 ISO 8601 日期和时间格式（“YYYY-MM-DDThh：mm：ssZ”，请参阅 https://en.wikipedia.org/wiki/ISO_8601#Combined_date_and_time_representations）。
links	TranscriptionLinks	TranscriptionLinks
locale	string minLength: 1	包含数据的区域设置。如果使用语言标识，则此区域设置用于转录无法检测到任何语言的语音。
model	EntityReference	EntityReference
properties	TranscriptionProperties	TranscriptionProperties
self	string (uri)	此实体的位置。
status	Status	地位描述 API 的当前状态。

TranscriptionLinks

Object

TranscriptionLinks

名称	类型	说明
files	string (uri)	获取此实体的所有文件的位置。有关详细信息，请参阅操作“Transcriptions_ListFiles”。

TranscriptionProperties

Object

TranscriptionProperties

名称	类型	默认值	说明
channels	integer[] (int32)		请求的通道编号的集合。在默认情况下，将考虑通道 0 和 1。
destinationContainerUrl	string (uri)		请求的目标容器。言论当目标容器与 `timeToLive`结合使用时，将正常删除听录的元数据，但目标容器中存储的数据（包括听录结果）将保持不变，因为此容器不需要删除权限。若要支持自动清理，请在容器上配置 Blob 生存期，或使用“自带存储（BYOS）”，而不是 `destinationContainerUrl`，可在其中清理 blob。
diarization	DiarizationProperties		DiarizationProperties
displayFormWordLevelTimestampsEnabled	boolean		一个值，该值指示是否请求显示窗体的字级时间戳。默认值为 `false`。
durationMilliseconds	integer (int64)	0	听录的持续时间（以毫秒为单位）。不支持大于 2^53-1 的持续时间，以确保与 JavaScript 整数兼容。
error	EntityError		EntityError
languageIdentification	LanguageIdentificationProperties		LanguageIdentificationProperties
profanityFilterMode	ProfanityFilterMode		ProfanityFilterMode 不雅内容筛选模式。
punctuationMode	PunctuationMode		标点符号Mode 用于标点的模式。
timeToLiveHours	integer (int32)		听录将在系统完成后保留多长时间。一旦听录到达完成后生存时间（成功或失败），它将自动删除。注意：使用 BYOS 时（自带存储），也会删除客户拥有的存储帐户上的结果文件。使用 destinationContainerUrl 为结果文件指定单独的容器，当 timeToLive 过期时不会删除，或者通过 API 检索结果文件，并根据需要存储它们。最短支持的持续时间为 6 小时，最长支持的持续时间为 31 天。当直接使用数据时，建议使用 2 天（48 小时）。
wordLevelTimestampsEnabled	boolean		一个值，该值指示是否请求字级时间戳。默认值为 `false`。

通过

Transcriptions - Submit

URI 参数

请求头

请求正文

响应

安全性

Ocp-Apim-Subscription-Key

示例

Create a transcription for URIs

示例请求

示例响应

Create a transcription from blob container

示例请求

示例响应

Create a transcription with language identification

示例请求

示例响应

Create a transcription with multispeaker diarization

示例请求

示例响应

定义

DetailedErrorCode

DiarizationProperties

EntityError

EntityReference

Error

ErrorCode

InnerError

LanguageIdentificationMode

LanguageIdentificationProperties

ProfanityFilterMode

PunctuationMode

Status

Transcription

TranscriptionLinks

TranscriptionProperties

言论