你当前正在访问 Microsoft Azure Global Edition 技术文档网站。如果需要访问由世纪互联运营的 Microsoft Azure 中国技术文档网站，请访问 https://docs.azure.cn。

使用 Semantic Kernel 和 Azure AI Foundry 开发应用程序

2025-10-21

在本文中，你将了解如何将 Semantic Kernel 与从 Azure AI Foundry 门户中的 Azure AI 模型目录部署的模型配合使用。

先决条件

拥有有效订阅的 Azure 帐户。如果没有帐户，请创建一个免费的 Azure 帐户，其中包括免费试用订阅。
Azure AI 项目，如为 Azure AI Foundry 创建项目中所述。
已部署的支持 Azure AI 模型推理 API 的模型。本文使用 Mistral-Large 部署。可以使用任何模型。若要使用 LlamaIndex 中的嵌入功能，你需要一个像 cohere-embed-v3-multilingual 这样的嵌入模型。
- 可以按照将模型部署为无服务器 API 部署中的说明进行操作。
已安装 Python 3.10 或更高版本，包括 pip。
已安装 Semantic Kernel。你可以使用以下命令：
```
pip install semantic-kernel
```
本文使用模型推理 API，因此请安装相关的 Azure 依赖项。你可以使用以下命令：
```
pip install semantic-kernel[azure]
```

配置环境

若要使用 Azure AI Foundry 门户中部署的语言模型，您需要端点和身份验证凭据才能进行连接到项目。按照以下步骤从模型获取所需的信息：

小窍门

由于可以在 Azure AI Foundry 门户中自定义左窗格，因此你可能会看到与这些步骤中显示的项不同。如果未看到要查找的内容，请选择 ... 左窗格底部的更多内容。

转到 Azure AI Foundry 门户。
打开部署模型的项目（如果尚未打开）。
转到“模型 + 终结点”，并根据先决条件选择已部署的模型。
复制终结点 URL 和密钥。

小窍门

如果模型部署了 Microsoft Entra ID 支持，则不需要密钥。

此示例对终结点 URL 和密钥使用环境变量：

export AZURE_AI_INFERENCE_ENDPOINT="<your-model-endpoint-goes-here>"
export AZURE_AI_INFERENCE_API_KEY="<your-key-goes-here>"

配置终结点和密钥后，创建客户端以连接到终结点：

from semantic_kernel.connectors.ai.azure_ai_inference import AzureAIInferenceChatCompletion

chat_completion_service = AzureAIInferenceChatCompletion(ai_model_id="<deployment-name>")

小窍门

客户端会自动读取环境变量 AZURE_AI_INFERENCE_ENDPOINT 和 AZURE_AI_INFERENCE_API_KEY 来连接到模型。可以通过在构造函数上使用endpoint 和 api_key 参数，直接将终结点和密钥传递给客户端。

也可在终结点支持 Microsoft Entra ID 的情况下，使用以下代码来创建客户端：

export AZURE_AI_INFERENCE_ENDPOINT="<your-model-endpoint-goes-here>"

from semantic_kernel.connectors.ai.azure_ai_inference import AzureAIInferenceChatCompletion

chat_completion_service = AzureAIInferenceChatCompletion(ai_model_id="<deployment-name>")

注释

如果使用 Microsoft Entra ID，请确保终结点是使用该身份验证方法部署的，并且你具有调用它所需的权限。

Azure OpenAI 模型

如果使用 Azure OpenAI 模型，可使用以下代码创建客户端：

from azure.ai.inference.aio import ChatCompletionsClient
from azure.identity.aio import DefaultAzureCredential

from semantic_kernel.connectors.ai.azure_ai_inference import AzureAIInferenceChatCompletion

chat_completion_service = AzureAIInferenceChatCompletion(
    ai_model_id="<deployment-name>",
    client=ChatCompletionsClient(
        endpoint=f"{str(<your-azure-open-ai-endpoint>).strip('/')}/openai/deployments/{<deployment_name>}",
        credential=DefaultAzureCredential(),
        credential_scopes=["https://cognitiveservices.azure.com/.default"],
    ),
)

推理参数

配置如何执行推理，可以使用类AzureAIInferenceChatPromptExecutionSettings。

from semantic_kernel.connectors.ai.azure_ai_inference import AzureAIInferenceChatPromptExecutionSettings

execution_settings = AzureAIInferenceChatPromptExecutionSettings(
    max_tokens=100,
    temperature=0.5,
    top_p=0.9,
    # extra_parameters={...},    # model-specific parameters
)

调用服务

首先，使用简单的历史聊天记录调用聊天完成服务：

小窍门

Semantic Kernel 是一个异步库，因此你需要使用 asyncio 库来运行代码。

import asyncio

async def main():
    ...

if __name__ == "__main__":
    asyncio.run(main())

from semantic_kernel.contents.chat_history import ChatHistory

chat_history = ChatHistory()
chat_history.add_user_message("Hello, how are you?")

response = await chat_completion_service.get_chat_message_content(
    chat_history=chat_history,
    settings=execution_settings,
)
print(response)

或者，可以从服务流式传输响应：

chat_history = ChatHistory()
chat_history.add_user_message("Hello, how are you?")

response = chat_completion_service.get_streaming_chat_message_content(
    chat_history=chat_history,
    settings=execution_settings,
)

chunks = []
async for chunk in response:
    chunks.append(chunk)
    print(chunk, end="")

full_response = sum(chunks[1:], chunks[0])

创建长时间运行的对话

可以使用循环创建长时间运行的对话：

while True:
    response = await chat_completion_service.get_chat_message_content(
        chat_history=chat_history,
        settings=execution_settings,
    )
    print(response)
    chat_history.add_message(response)
    chat_history.add_user_message(user_input = input("User:> "))

如果要流式传输响应，可使用以下代码：

while True:
    response = chat_completion_service.get_streaming_chat_message_content(
        chat_history=chat_history,
        settings=execution_settings,
    )

    chunks = []
    async for chunk in response:
        chunks.append(chunk)
        print(chunk, end="")

    full_response = sum(chunks[1:], chunks[0])
    chat_history.add_message(full_response)
    chat_history.add_user_message(user_input = input("User:> "))

使用嵌入模型

配置环境与上述步骤类似，但使用 AzureAIInferenceEmbeddings 类：

from semantic_kernel.connectors.ai.azure_ai_inference import AzureAIInferenceTextEmbedding

embedding_generation_service = AzureAIInferenceTextEmbedding(ai_model_id="<deployment-name>")

以下代码演示如何从服务中获取嵌入项：

embeddings = await embedding_generation_service.generate_embeddings(
    texts=["My favorite color is blue.", "I love to eat pizza."],
)

for embedding in embeddings:
    print(embedding)

反馈

此页面是否有帮助？