Edit

Share via


Extract entities with the ai.extract function

The ai.extract function uses generative AI to scan input text and extract specific types of information designated by labels you choose (for example, locations or names). It uses only a single line of code.

AI functions improve data engineering by using the power of large language models in Microsoft Fabric. To learn more, see this overview article.

Important

This feature is in preview, for use in Fabric Runtime 1.3 and later.

  • Review the prerequisites in this overview article, including the library installations that are temporarily required to use AI functions.
  • By default, the gpt-4o-mini model currently powers AI functions. Learn more about billing and consumption rates.
  • Although the underlying model can handle several languages, most of the AI functions are optimized for use on English-language texts.
  • During the initial rollout of AI functions, users are temporarily limited to 1,000 requests per minute with the built-in AI endpoint in Fabric.

Use ai.extract with pandas

The ai.extract function extends the pandas Series class. To extract custom entity types from each row of input, call the function on a pandas DataFrame text column.

Unlike other AI functions, ai.extract returns a pandas DataFrame, instead of a Series, with a separate column for each specified entity type that contains extracted values for each input row.

Syntax

df_entities = df["text"].ai.extract("entity1", "entity2", "entity3")

Parameters

Name Description
labels
Required
One or more strings that represent the set of entity types to extract from the input text values.

Returns

The function returns a pandas DataFrame with a column for each specified entity type. The column or columns contain the entities extracted for each row of input text. If the function identifies more than one match for an entity, it returns only one of those matches. If no match is found, the result is null.

Example

# This code uses AI. Always review output for mistakes. 
# Read terms: https://azure.microsoft.com/support/legal/preview-supplemental-terms/.

df = pd.DataFrame([
        "MJ Lee lives in Tuscon, AZ, and works as a software engineer for Microsoft.",
        "Kris Turner, a nurse at NYU Langone, is a resident of Jersey City, New Jersey."
    ], columns=["descriptions"])

df_entities = df["descriptions"].ai.extract("name", "profession", "city")
display(df_entities)

Use ai.extract with PySpark

The ai.extract function is also available for Spark DataFrames. You must specify the name of an existing input column as a parameter, along with a list of entity types to extract from each row of text.

The function returns a new DataFrame, with a separate column for each specified entity type that contains extracted values for each input row.

Syntax

df.ai.extract(labels=["entity1", "entity2", "entity3"], input_col="text")

Parameters

Name Description
labels
Required
An array of strings that represents the set of entity types to extract from the text values in the input column.
input_col
Required
A string that contains the name of an existing column with input text values to scan for the custom entities.
error_col
Optional
A string that contains the name of a new column to store any OpenAI errors that result from processing each input text row. If you don't set this parameter, a default name generates for the error column. If an input row has no errors, the value in this column is null.

Returns

The function returns a Spark DataFrame with a new column for each specified entity type. The column or columns contain the entities extracted for each row of input text. If the function identifies more than one match for an entity, it returns only one of those matches. If no match is found, the result is null.

Example

# This code uses AI. Always review output for mistakes. 
# Read terms: https://azure.microsoft.com/support/legal/preview-supplemental-terms/.

df = spark.createDataFrame([
        ("MJ Lee lives in Tuscon, AZ, and works as a software engineer for Microsoft.",),
        ("Kris Turner, a nurse at NYU Langone, is a resident of Jersey City, New Jersey.",)
    ], ["descriptions"])

df_entities = df.ai.extract(labels=["name", "profession", "city"], input_col="descriptions")
display(df_entities)