Edit

Share via


Categorize text with the ai.classify function

The ai.classify function uses generative AI to categorize input text according to custom labels you choose. It uses only a single line of code.

AI functions improve data engineering by using the power of large language models in Microsoft Fabric. To learn more, see this overview article.

Important

This feature is in preview, for use in Fabric Runtime 1.3 and later.

  • Review the prerequisites in this overview article, including the library installations that are temporarily required to use AI functions.
  • By default, the gpt-4o-mini model currently powers AI functions. Learn more about billing and consumption rates.
  • Although the underlying model can handle several languages, most of the AI functions are optimized for use on English-language texts.
  • During the initial rollout of AI functions, users are temporarily limited to 1,000 requests per minute with the built-in AI endpoint in Fabric.

Use ai.classify with pandas

The ai.classify function extends the pandas Series class. To assign user-provided labels to each input row, call the function on a text column of a pandas DataFrame.

The function returns a pandas Series that contains classification labels, which can be stored in a new DataFrame column.

Tip

We recommend using the ai.classify function with at least two input labels.

Syntax

df["classification"] = df["text"].ai.classify("category1", "category2", "category3")

Parameters

Name Description
labels
Required
One or more strings that represent the set of classification labels to match to input text values.

Returns

The function returns a pandas Series that contains a classification label for each input text row. If a text value can't be classified, the corresponding label is null.

Example

# This code uses AI. Always review output for mistakes. 
# Read terms: https://azure.microsoft.com/support/legal/preview-supplemental-terms/.

df = pd.DataFrame([
        "This duvet, lovingly hand-crafted from all-natural fabric, is perfect for a good night's sleep.",
        "Tired of friends judging your baking? With these handy-dandy measuring cups, you'll create culinary delights.",
        "Enjoy this *BRAND NEW CAR!* A compact SUV perfect for the professional commuter!"
    ], columns=["descriptions"])

df["category"] = df['descriptions'].ai.classify("kitchen", "bedroom", "garage", "other")
display(df)

Use ai.classify with PySpark

The ai.classify function is also available for Spark DataFrames. You must specify the name of an existing input column as a parameter, along with a list of classification labels.

The function returns a new DataFrame with labels that match each row of input text, stored in an output column.

Syntax

df.ai.classify(labels=["category1", "category2", "category3"], input_col="text", output_col="classification")

Parameters

Name Description
labels
Required
An array of strings that represents the set of classification labels to match to text values in the input column.
input_col
Required
A string that contains the name of an existing column with input text values to classify according to the custom labels.
output_col
Optional
A string that contains the name of a new column where you want to store a classification label for each input text row. If you don't set this parameter, a default name is generated for the output column.
error_col
Optional
A string that contains the name of a new column. The new column stores any OpenAI errors that result from processing each row of input text. If you don't set this parameter, a default name is generated for the error column. If there are no errors for a row of input, the value in this column is null.

Returns

The function returns a Spark DataFrame that includes a new column that contains classification labels that match each input text row. If a text value can't be classified, the corresponding label is null.

Example

# This code uses AI. Always review output for mistakes. 
# Read terms: https://azure.microsoft.com/support/legal/preview-supplemental-terms/.

df = spark.createDataFrame([
        ("This duvet, lovingly hand-crafted from all-natural fabric, is perfect for a good night's sleep.",),
        ("Tired of friends judging your baking? With these handy-dandy measuring cups, you'll create culinary delights.",),
        ("Enjoy this *BRAND NEW CAR!* A compact SUV perfect for the professional commuter!",)
    ], ["descriptions"])
    
categories = df.ai.classify(labels=["kitchen", "bedroom", "garage", "other"], input_col="descriptions", output_col="categories")
display(categories)