使用 AI 函数转换和扩充数据（预览版）

2025-10-31

Important

此功能目前为预览版。

Microsoft Fabric AI Functions 使所有业务专业人员（从开发人员到分析师）都可以使用生成 AI 来转换和丰富其企业数据。

AI 函数使用行业领先的大型语言模型（LLM）进行汇总、分类、文本生成等。使用单行代码，可以：

ai.analyze_sentiment：检测输入文本的情感状态。
ai.classify：根据标签对输入文本进行分类。
ai.extract：从输入文本中提取特定类型的信息（例如位置或名称）。
ai.fix_grammar：更正输入文本的拼写、语法和标点符号。
ai.generate_response：根据自己的指令生成响应。
ai.similarity：将输入文本的含义与单个文本值或另一列中的文本进行比较。
ai.summarize：获取输入文本的摘要。
ai.translate：将输入文本翻译为其他语言。

无论是使用 pandas 还是 Spark，都可以将这些函数合并为数据科学和数据工程工作流的一部分。没有详细的配置，也没有复杂的基础结构管理。不需要任何特定的技术专业知识。

Prerequisites

若要将 AI 函数与 Fabric 中的内置 AI 终结点配合使用，管理员需要为 Copilot 启用租户开关，以及由 Azure OpenAI 提供支持的其他功能。
可能需要为跨地理位置处理启用租户设置，具体取决于你的位置。详细了解 Azure OpenAI 服务的可用区域。
需要付费的 Fabric 容量（F2 或更高版本，或任何 P 版本）。

Note

Fabric Runtime 1.3 及更高版本中支持 AI 函数。
除非配置其他模型，否则 AI 函数默认为 gpt-4.1-mini。详细了解计费和消耗率。
尽管基础模型可以处理多种语言，但大多数 AI 函数都经过优化，可用于英语文本。

AI 函数入门

AI 函数可与 Pandas（Python 和 PySpark 运行时）以及 PySpark（PySpark 运行时）配合使用。以下部分概述了必要的每一步安装和导入步骤，并附带相应的命令。

安装依赖项

Pandas （Python 运行时）
- synapseml_internal 和 synapseml_core whl 文件需要安装（以下代码单元中提供了命令）
- openai 需要安装包（命令请见以下代码单元）
Pandas （PySpark 运行时）
- openai 需要安装软件包（命令在以下代码单元中提供）
PySpark （PySpark 运行时）
- 无需安装

pandas （PySpark 运行时）
pandas （Python 运行时）

# The pandas AI functions package requires OpenAI version 1.99.5 or later
%pip install -q --force-reinstall openai==1.99.5 2>/dev/null

# Install latest versions of AI functions library whl
!wget -q https://aka.ms/fabric-aifunctions-whl -O synapseml_internal-latest-py3-none-any.whl
!wget -q https://aka.ms/fabric-synapseml-core-whl -O synapseml_core-latest-py3-none-any.whl

# The pandas AI functions package requires OpenAI version 1.99.5 or later
%pip install -q --force-reinstall openai==1.99.5 synapseml_internal-latest-py3-none-any.whl synapseml_core-latest-py3-none-any.whl

导入所需的库

以下代码单元导入 AI 函数库及其依赖项。

pandas
PySpark

# Required imports
import synapse.ml.aifunc as aifunc
import pandas as pd

import synapse.ml.spark.aifunc as aifunc

# SparkSession with accessor `spark` in PySpark environments is pre-setup and available for use

应用 AI 函数

以下每个函数都允许调用 Fabric 中的内置 AI 终结点，以使用单行代码转换和扩充数据。可以使用 AI 函数分析 pandas 数据帧或 Spark 数据帧。

Tip

了解如何自定义 AI 函数的配置。

使用 ai.analyze_sentiment 检测情绪

该 ai.analyze_sentiment 函数调用 AI 来识别输入文本表示的情感状态是正、负、混合还是中性。如果 AI 无法做出此决定，则输出将留空。有关 ai.analyze_sentiment 与 pandas 使用的更详细说明，请参阅本文。有关 ai.analyze_sentiment PySpark，请参阅本文。

pandas
PySpark

# This code uses AI. Always review output for mistakes. 
# Read terms: https://azure.microsoft.com/support/legal/preview-supplemental-terms/.

df = pd.DataFrame([
        "The cleaning spray permanently stained my beautiful kitchen counter. Never again!",
        "I used this sunscreen on my vacation to Florida, and I didn't get burned at all. Would recommend.",
        "I'm torn about this speaker system. The sound was high quality, though it didn't connect to my roommate's phone.",
        "The umbrella is OK, I guess."
    ], columns=["reviews"])

df["sentiment"] = df["reviews"].ai.analyze_sentiment()
display(df)

# This code uses AI. Always review output for mistakes. 
# Read terms: https://azure.microsoft.com/support/legal/preview-supplemental-terms/.

df = spark.createDataFrame([
        ("The cleaning spray permanently stained my beautiful kitchen counter. Never again!",),
        ("I used this sunscreen on my vacation to Florida, and I didn't get burned at all. Would recommend.",),
        ("I'm torn about this speaker system. The sound was high quality, though it didn't connect to my roommate's phone.",),
        ("The umbrella is OK, I guess.",)
    ], ["reviews"])

sentiment = df.ai.analyze_sentiment(input_col="reviews", output_col="sentiment")
display(sentiment)

使用 ai.classify 对文本进行分类

该 ai.classify 函数调用 AI，根据所选自定义标签对输入文本进行分类。有关 ai.classify 的 pandas 使用详细信息，请转到本文。有关 ai.classify PySpark，请参阅本文。

pandas
PySpark

# This code uses AI. Always review output for mistakes. 
# Read terms: https://azure.microsoft.com/support/legal/preview-supplemental-terms/.

df = pd.DataFrame([
        "This duvet, lovingly hand-crafted from all-natural fabric, is perfect for a good night's sleep.",
        "Tired of friends judging your baking? With these handy-dandy measuring cups, you'll create culinary delights.",
        "Enjoy this *BRAND NEW CAR!* A compact SUV perfect for the professional commuter!"
    ], columns=["descriptions"])

df["category"] = df['descriptions'].ai.classify("kitchen", "bedroom", "garage", "other")
display(df)

# This code uses AI. Always review output for mistakes. 
# Read terms: https://azure.microsoft.com/support/legal/preview-supplemental-terms/.

df = spark.createDataFrame([
        ("This duvet, lovingly hand-crafted from all-natural fabric, is perfect for a good night's sleep.",),
        ("Tired of friends judging your baking? With these handy-dandy measuring cups, you'll create culinary delights.",),
        ("Enjoy this *BRAND NEW CAR!* A compact SUV perfect for the professional commuter!",)
    ], ["descriptions"])
    
categories = df.ai.classify(labels=["kitchen", "bedroom", "garage", "other"], input_col="descriptions", output_col="categories")
display(categories)

使用 ai.extract 提取实体

该 ai.extract 函数调用 AI 来扫描输入文本并提取所选标签指定的特定类型信息（例如位置或名称）。有关 ai.extract 与 pandas 更详细的说明，请参阅本文。有关 ai.extract PySpark，请参阅本文。

pandas
PySpark

# This code uses AI. Always review output for mistakes. 
# Read terms: https://azure.microsoft.com/support/legal/preview-supplemental-terms/.

df = pd.DataFrame([
        "MJ Lee lives in Tucson, AZ, and works as a software engineer for Microsoft.",
        "Kris Turner, a nurse at NYU Langone, is a resident of Jersey City, New Jersey."
    ], columns=["descriptions"])

df_entities = df["descriptions"].ai.extract("name", "profession", "city")
display(df_entities)

# This code uses AI. Always review output for mistakes. 
# Read terms: https://azure.microsoft.com/support/legal/preview-supplemental-terms/.

df = spark.createDataFrame([
        ("MJ Lee lives in Tucson, AZ, and works as a software engineer for Microsoft.",),
        ("Kris Turner, a nurse at NYU Langone, is a resident of Jersey City, New Jersey.",)
    ], ["descriptions"])

df_entities = df.ai.extract(labels=["name", "profession", "city"], input_col="descriptions")
display(df_entities)

使用 ai.fix_grammar 修复语法

该 ai.fix_grammar 函数调用 AI 来更正输入文本的拼写、语法和标点符号。关于如何使用 ai.fix_grammar 与 pandas 的详细说明，请参阅本文。有关 ai.fix_grammar PySpark，请参阅本文。

pandas
PySpark

# This code uses AI. Always review output for mistakes. 
# Read terms: https://azure.microsoft.com/support/legal/preview-supplemental-terms/.

df = pd.DataFrame([
        "There are an error here.",
        "She and me go weigh back. We used to hang out every weeks.",
        "The big picture are right, but you're details is all wrong."
    ], columns=["text"])

df["corrections"] = df["text"].ai.fix_grammar()
display(df)

# This code uses AI. Always review output for mistakes. 
# Read terms: https://azure.microsoft.com/support/legal/preview-supplemental-terms/.

df = spark.createDataFrame([
        ("There are an error here.",),
        ("She and me go weigh back. We used to hang out every weeks.",),
        ("The big picture are right, but you're details is all wrong.",)
    ], ["text"])

corrections = df.ai.fix_grammar(input_col="text", output_col="corrections")
display(corrections)

使用ai.generate_response回答自定义用户提示

该 ai.generate_response 函数调用 AI 以根据自己的说明生成自定义文本。关于 ai.generate_response 和 pandas 使用的详细说明，请参阅本文。有关 ai.generate_response PySpark，请参阅本文。

pandas
PySpark

# This code uses AI. Always review output for mistakes. 
# Read terms: https://azure.microsoft.com/support/legal/preview-supplemental-terms/.

df = pd.DataFrame([
        ("Scarves"),
        ("Snow pants"),
        ("Ski goggles")
    ], columns=["product"])

df["response"] = df.ai.generate_response("Write a short, punchy email subject line for a winter sale.")
display(df)

# This code uses AI. Always review output for mistakes. 
# Read terms: https://azure.microsoft.com/support/legal/preview-supplemental-terms/.

df = spark.createDataFrame([
        ("Scarves",),
        ("Snow pants",),
        ("Ski goggles",)
    ], ["product"])

responses = df.ai.generate_response(prompt="Write a short, punchy email subject line for a winter sale.", output_col="response")
display(responses)

使用 ai.similarity 计算相似性

该 ai.similarity 函数将每个输入文本值与一个公共引用文本或另一列中的对应值（成对模式）进行比较。输出相似性分数值是相对的，它们可以范围从 -1 （相反）到 1 （相同）。分数 0 指示值的含义不相关。有关如何使用ai.similarity与pandas的详细说明，请参阅这篇文章。有关 ai.similarity PySpark，请参阅本文。

pandas
PySpark

# This code uses AI. Always review output for mistakes. 
# Read terms: https://azure.microsoft.com/support/legal/preview-supplemental-terms/.

df = pd.DataFrame([ 
        ("Bill Gates", "Technology"), 
        ("Satya Nadella", "Healthcare"), 
        ("Joan of Arc", "Agriculture") 
    ], columns=["names", "industries"])
    
df["similarity"] = df["names"].ai.similarity(df["industries"])
display(df)

# This code uses AI. Always review output for mistakes. 
# Read terms: https://azure.microsoft.com/support/legal/preview-supplemental-terms/.

df = spark.createDataFrame([
        ("Bill Gates", "Technology"), 
        ("Satya Nadella", "Healthcare"), 
        ("Joan of Arc", "Agriculture")
    ], ["names", "industries"])

similarity = df.ai.similarity(input_col="names", other_col="industries", output_col="similarity")
display(similarity)

使用 ai.summarize 汇总文本

该 ai.summarize 函数调用 AI 来生成输入文本的摘要（数据帧的单个列中的值或所有列中的行值）。要了解有关如何使用 ai.summarize 与 pandas 的详细说明，请参阅本文。有关 ai.summarize PySpark，请参阅本文。

pandas
PySpark

# This code uses AI. Always review output for mistakes.
# Read terms: https://azure.microsoft.com/support/legal/preview-supplemental-terms/.

df= pd.DataFrame([
        ("Microsoft Teams", "2017",
        """
        The ultimate messaging app for your organization—a workspace for real-time 
        collaboration and communication, meetings, file and app sharing, and even the 
        occasional emoji! All in one place, all in the open, all accessible to everyone.
        """),
        ("Microsoft Fabric", "2023",
        """
        An enterprise-ready, end-to-end analytics platform that unifies data movement, 
        data processing, ingestion, transformation, and report building into a seamless, 
        user-friendly SaaS experience. Transform raw data into actionable insights.
        """)
    ], columns=["product", "release_year", "description"])

df["summaries"] = df["description"].ai.summarize()
display(df)

# This code uses AI. Always review output for mistakes. 
# Read terms: https://azure.microsoft.com/support/legal/preview-supplemental-terms/.

df = spark.createDataFrame([
        ("Microsoft Teams", "2017",
        """
        The ultimate messaging app for your organization—a workspace for real-time 
        collaboration and communication, meetings, file and app sharing, and even the 
        occasional emoji! All in one place, all in the open, all accessible to everyone.
        """,),
        ("Microsoft Fabric", "2023",
        """
        An enterprise-ready, end-to-end analytics platform that unifies data movement, 
        data processing, ingestion, transformation, and report building into a seamless, 
        user-friendly SaaS experience. Transform raw data into actionable insights.
        """,)
    ], ["product", "release_year", "description"])

summaries = df.ai.summarize(input_col="description", output_col="summary")
display(summaries)

使用 ai.translate 翻译文本

该 ai.translate 函数调用 AI 将输入文本翻译为所选的新语言。有关如何使用 pandas 中的 ai.translate 的更详细指引，请参阅本文。有关 ai.translate PySpark，请参阅本文。

pandas
PySpark

# This code uses AI. Always review output for mistakes. 
# Read terms: https://azure.microsoft.com/support/legal/preview-supplemental-terms/.

df = pd.DataFrame([
        "Hello! How are you doing today?", 
        "Tell me what you'd like to know, and I'll do my best to help.", 
        "The only thing we have to fear is fear itself."
    ], columns=["text"])

df["translations"] = df["text"].ai.translate("spanish")
display(df)

# This code uses AI. Always review output for mistakes. 
# Read terms: https://azure.microsoft.com/support/legal/preview-supplemental-terms/.

df = spark.createDataFrame([
        ("Hello! How are you doing today?",),
        ("Tell me what you'd like to know, and I'll do my best to help.",),
        ("The only thing we have to fear is fear itself.",),
    ], ["text"])

translations = df.ai.translate(to_lang="spanish", input_col="text", output_col="translations")
display(translations)

使用ai.analyze_sentiment in pandas或ai.analyze_sentiment in pyspark检测情绪。
使用 ai.classify in pandas 或 ai.classify in PySpark. 对文本进行分类。
使用ai.extract in pandas或ai.extract in PySpark提取实体。
使用ai.fix_grammar in pandas或ai.fix_grammar in PySpark修复语法问题。
使用ai.generate_response in pandas或ai.generate_response in PySpark回答自定义用户提示。
计算与 ai.similarity in pandas 或 ai.similarity in PySpark. 的相似性。
用ai.summarize in pandas 或 ai.summarize in PySpark 汇总文本。
使用 ai.translate in pandas 或 ai.translate in PySpark 翻译文本。
在 pandas 中自定义 AI 函数的配置，或在 PySpark 中配置 AI 函数。
我们错过了所需的功能吗？在面料创意论坛上提出建议。

反馈

此页面是否有帮助？