你当前正在访问 Microsoft Azure Global Edition 技术文档网站。 如果需要访问由世纪互联运营的 Microsoft Azure 中国技术文档网站,请访问 https://docs.azure.cn

Entity components

在对话语言理解中,实体是指从言语中提取的相关信息片段。 可以使用多种不同的方法提取实体。 可以通过上下文来检测实体,也可以从列表中进行匹配,或者通过预构建的已识别实体来检测。 项目中的每个实体都由这些方法中的一种或多种组成,这些方法被定义为实体的组件。

当多个组件定义实体时,预测可能会重叠。 You can determine the behavior of an entity prediction when its components overlap by using a fixed set of options in the entity options.

Component types

实体组件确定可以提取实体的方式。 实体可以包含一个组件,以此确定用于提取实体的唯一方法。 实体还可以包含多个组件来扩展定义和提取实体的方法。

Learned component

习得组件使用标记言语的实体标记来训练机器学习的模型。 模型根据言语中的上下文学习预测实体的位置。 你的标签提供了实体在话语中预期出现的位置示例。 该决定基于周围的词语的含义以及被标记的词语。

仅当通过标记实体的语句来添加标签时,才会定义此组件。 如果未使用实体标记任何语句,则它没有习得组件。

屏幕截图显示实体的习得组件示例。

List component

列表组件表示一组固定、封闭的相关单词及其同义词。 该组件根据你作为同义词提供的值列表执行精确的文本匹配。 Each synonym belongs to a list key, which can be used as the normalized, standard value for the synonym that returns in the output if the list component is matched. List keys aren't used for matching.

在多语言项目中,可为每种语言指定一组不同的同义词。 使用预测 API 时,可以在输入请求中指定语言,这样就只会与该语言关联的同义词进行匹配。

显示实体的列表组件示例的屏幕截图。

Prebuilt component

通过预生成组件,你可以从常见类型库(如数字、日期时间、名称)中选择。 添加预生成组件时,将自动进行检测。 每个实体最多可以包含 5 个预生成组件。 有关详细信息,请参阅支持的预生成组件列表

显示实体的预生成组件示例的屏幕截图。

Regex component

正则表达式组件匹配正则表达式以捕获一致的模式。 添加后,将提取与正则表达式匹配的任何文本。 在同一实体中可有多个正则表达式,每个正则表达式具有不同的键标识符。 匹配的表达式将返回键作为预测响应的一部分。

在多语言项目中,可为每种语言指定一个不同的表达式。 使用预测 API 时,可以在输入请求中指定语言,这样就只会匹配与该语言关联的正则表达式。

显示实体的正则表达式组件示例的屏幕截图。

Entity options

如果多个组件定义实体,则其预测可能会重叠。 发生重叠时,以下选项之一决定了每个实体的最终预测:

Combine components

当组件重叠时,通过联合所有组件来将组件组合为一个实体。

使用此选项可以合并重叠的所有组件。 组合组件时,将获取与列表或预生成组件关联的所有额外信息(如果存在)。

Example

Suppose you have an entity called Software that has a list component, which contains "Proseware OS" as an entry. In your utterance data, you have "I want to buy Proseware OS 9" with "Proseware OS 9" tagged as Software:

屏幕截图显示重叠的习得实体和列表实体。

使用合并组件,实体将以“Proseware OS 9”返回完整上下文,以及列表组件中的键:

显示组合组件结果的屏幕截图。

假设你具有相同的话语,但只有“OS 9”预测学习的组件:

屏幕截图显示包含习得组件预测的 OS 9 的语句。

使用合并组件,该实体仍以“Proseware OS 9”返回,以及列表组件中的键:

显示返回的 Software 实体的屏幕截图。

请勿合并组件

每个重叠的组件将作为实体的单独实例返回。 使用此选项做出预测后应用你自己的逻辑。

Example

Suppose you have an entity called Software that has a list component, which contains "Proseware Desktop" as an entry. In your utterance data, you have "I want to buy Proseware Desktop Pro" with "Proseware Desktop Pro" tagged as Software:

屏幕截图显示重叠的习得实体和列表实体的示例。

如果不合并组件,实体将返回两次:

屏幕截图显示实体返回两次。

Required components

有时,可以使用多个组件定义一个实体,但实体需要至少存在一个或多个组件。 如果实体没有所需的组件,则系统不会返回它。 For example, if an entity has a list component and a required learned component, the system guarantees that any returned entity includes a learned component. If an entity doesn't have the required component, the system doesn't return it.

Required components are most frequently used with learned components because they can restrict the other component types to a specific context, which is commonly associated to roles. 还可以要求所有组件以确保实体的每个组件都存在。

在 Language Studio 中,实体中的每个组件旁都有一个切换开关,允许你根据需要进行设置。

Example

Suppose you have an entity called Ticket Quantity that attempts to extract the number of tickets you want to reserve for flights, for utterances such as "Book two tickets tomorrow to Cairo."

通常会为 Quantity.Number 添加已提取所有数字的预生成组件。 If your entity was only defined with the prebuilt component, it also extracts other numbers as part of the Ticket Quantity entity, such as "Book two tickets tomorrow to Cairo at 3 PM."

To resolve this scenario, you label a learned component in your training data for all the numbers that are meant to be Ticket Quantity. 实体现有两个组件:一个是知道所有数字的预生成组件,另一个是预测语句中机票数量位置的习得组件。 If you require the learned component, you make sure that Ticket Quantity only returns when the learned component predicts it in the right context. If you also require the prebuilt component, you can then guarantee that the returned Ticket Quantity entity is both a number and in the correct position.

使用组件和选项

使用组件可以灵活地以多种方式定义实体。 合并组件时,需确保表示每个组件,并减少预测中返回的实体数量。

一种常见做法是使用预生成组件可能不支持的值列表来扩展该预生成组件。 For example, if you have an Organization entity, which has a General.Organization prebuilt component added to it, the entity might not predict all the organizations specific to your domain. You can use a list component to extend the values of the Organization entity and extend the prebuilt component with your own organizations.

Other times, you might be interested in extracting an entity through context, such as a Product in a retail project. You label the learned component of the product to learn where a product is based on its position within the sentence. 你可能还事先知道要始终提取的产品列表。 将两个组件合并到一个实体可以获取实体的这两个选项。

不合并组件时,可以允许每个组件充当独立的实体提取器。 此选项的用法之一是将从列表中提取的实体与通过习得组件或预生成组件提取的实体分开,以区别处理和对待它们。

Note

Previously during the public preview of the service, there were four available options: Longest overlap, Exact overlap, Union overlap, and Return all separately. Longest overlap and Exact overlap are deprecated and are only supported for projects that previously had those options selected. Union overlap is renamed to Combine components, while Return all separately is renamed to Do not combine components.

支持的预生成组件