查询聊天模型

2025-10-23

本文介绍如何为已针对聊天和常规用途任务进行优化的基础模型编写查询请求，并将其发送到模型服务终结点。

本文中的示例适用于查询使用以下任一方法提供的基础模型：

基础模型 API ，称为 Databricks 托管的基础模型。
称为在 Databricks 外部托管的基础模型的外部模型。

要求

请参阅要求。
依据所选的查询客户端选项将合适的包安装到群集。

查询示例

本节中的示例演示如何使用不同的databricks-meta-llama-3-3-70b-instruct，查询由基础模型 API 的按令牌付费终结点提供的 Meta Llama 3.3 70B 指令模型。

有关批处理推理示例，请参阅使用 Azure Databricks AI Functions 对数据应用 AI。

OpenAI 客户端

要使用 OpenAI 客户端，需将模型服务终结点名称指定为 model 输入。


from databricks.sdk import WorkspaceClient

w = WorkspaceClient()
openai_client = w.serving_endpoints.get_open_ai_client()

response = openai_client.chat.completions.create(
    model="databricks-meta-llama-3-3-70b-instruct",
    messages=[
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "What is a mixture of experts model?",
      }
    ],
    max_tokens=256
)

若要在工作区外部查询基础模型，必须直接使用 OpenAI 客户端。还需要 Databricks 工作区实例才能将 OpenAI 客户端连接到 Databricks。以下示例假定在计算中安装了 Databricks API 令牌和 openai。


import os
import openai
from openai import OpenAI

client = OpenAI(
    api_key="dapi-your-databricks-token",
    base_url="https://example.staging.cloud.databricks.com/serving-endpoints"
)

response = client.chat.completions.create(
    model="databricks-meta-llama-3-3-70b-instruct",
    messages=[
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "What is a mixture of experts model?",
      }
    ],
    max_tokens=256
)

SQL

重要

以下示例使用内置 SQL 函数 ai_query。此函数为公共预览版，定义可能会更改。

SELECT ai_query(
    "databricks-meta-llama-3-3-70b-instruct",
    "Can you explain AI in ten words?"
  )

REST API

重要

以下示例使用 REST API 参数来查询为基础模型提供服务的终结点。这些参数为公共预览版，定义可能会更改。请参阅 POST /serving-endpoints/{name}/invocations。

curl \
-u token:$DATABRICKS_TOKEN \
-X POST \
-H "Content-Type: application/json" \
-d '{
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful assistant."
    },
    {
      "role": "user",
      "content": " What is a mixture of experts model?"
    }
  ]
}' \
https://<workspace_host>.databricks.com/serving-endpoints/databricks-meta-llama-3-3-70b-instruct/invocations \

MLflow 部署 SDK

重要

以下示例使用来自predict()的 API。


import mlflow.deployments

# Only required when running this example outside of a Databricks Notebook
export DATABRICKS_HOST="https://<workspace_host>.databricks.com"
export DATABRICKS_TOKEN="dapi-your-databricks-token"

client = mlflow.deployments.get_deploy_client("databricks")

chat_response = client.predict(
    endpoint="databricks-meta-llama-3-3-70b-instruct",
    inputs={
        "messages": [
            {
              "role": "user",
              "content": "Hello!"
            },
            {
              "role": "assistant",
              "content": "Hello! How can I assist you today?"
            },
            {
              "role": "user",
              "content": "What is a mixture of experts model??"
            }
        ],
        "temperature": 0.1,
        "max_tokens": 20
    }
)

Databricks Python SDK

此代码必须在工作区的笔记本中运行。请参阅在 Azure Databricks 笔记本中使用 Databricks SDK for Python。

from databricks.sdk import WorkspaceClient
from databricks.sdk.service.serving import ChatMessage, ChatMessageRole

w = WorkspaceClient()
response = w.serving_endpoints.query(
    name="databricks-meta-llama-3-3-70b-instruct",
    messages=[
        ChatMessage(
            role=ChatMessageRole.SYSTEM, content="You are a helpful assistant."
        ),
        ChatMessage(
            role=ChatMessageRole.USER, content="What is a mixture of experts model?"
        ),
    ],
    max_tokens=128,
)
print(f"RESPONSE:\n{response.choices[0].message.content}")

LangChain

若要使用 LangChain 查询基础模型终结点，可以使用 ChatDatabricks ChatModel 类并指定 endpoint。

%pip install databricks-langchain

from langchain_core.messages import HumanMessage, SystemMessage
from databricks_langchain import ChatDatabricks

messages = [
    SystemMessage(content="You're a helpful assistant"),
    HumanMessage(content="What is a mixture of experts model?"),
]

llm = ChatDatabricks(model="databricks-meta-llama-3-3-70b-instruct")
llm.invoke(messages)

例如，以下是使用 REST API 时聊天模型的预期请求格式。对于外部模型，可以包含对给定提供程序和终结点配置有效的其他参数。请参阅其他查询参数。

{
  "messages": [
    {
      "role": "user",
      "content": "What is a mixture of experts model?"
    }
  ],
  "max_tokens": 100,
  "temperature": 0.1
}

以下是使用 REST API 发出的请求的预期响应格式：

{
  "model": "databricks-meta-llama-3-3-70b-instruct",
  "choices": [
    {
      "message": {},
      "index": 0,
      "finish_reason": null
    }
  ],
  "usage": {
    "prompt_tokens": 7,
    "completion_tokens": 74,
    "total_tokens": 81
  },
  "object": "chat.completion",
  "id": null,
  "created": 1698824353
}