查询推理模型

2025-10-31

本文介绍如何为优化推理任务的基础模型编写查询请求，并将其发送到基础模型 API 终结点。

马赛克 AI 基础模型 API 提供了一个统一的 API，用于与所有基础模型（包括推理模型）进行交互。推理为基础模型提供了增强功能来处理复杂任务。一些模型还通过展示其分步思维过程来提供透明度，然后再提供最终答案。

推理模型的类型

有两种类型的模型，即仅推理模型和混合模型。下表介绍了不同的模型如何使用不同的方法来控制推理：

推理模型类型	详细信息	模型示例	参数
混合推理	支持快速、即时答复和更深层次的推理（如果需要）。	克劳德模型如 `databricks-claude-3-7-sonnet` 和 `databricks-claude-sonnet-4`。	包括以下参数以使用混合推理： `thinking` `budget_tokens`：控制模型可用于内部推理的词元数量。较高的预算可以提高复杂任务的质量，但超过 32K 的使用量可能会有所不同。 `budget_tokens` 必须小于 `max_tokens`。
仅推理	这些模型始终在其响应中使用内部推理。	GPT OSS 模型，如 `databricks-gpt-oss-120b` 和 `databricks-gpt-oss-20b`。	在请求中使用以下参数： `reasoning_effort`：接受值 `"low"`、 `"medium"` （默认值）或 `"high"`。更高的推理工作量可能会导致更周到和准确的响应，但可能会增加延迟和令牌使用。此参数仅接受一组有限的模型，包括 `databricks-gpt-oss-120b` 和 `databricks-gpt-oss-20b`。

查询示例

所有推理模型都通过对话补全接口进行访问。

Claude 模型示例

from openai import OpenAI
import base64
import httpx

client = OpenAI(
  api_key=os.environ.get('YOUR_DATABRICKS_TOKEN'),
  base_url=os.environ.get('YOUR_DATABRICKS_BASE_URL')
  )

response = client.chat.completions.create(
    model="databricks-claude-3-7-sonnet",
    messages=[{"role": "user", "content": "Why is the sky blue?"}],
    max_tokens=20480,
    extra_body={
        "thinking": {
            "type": "enabled",
            "budget_tokens": 10240
        }
    }
)

msg = response.choices[0].message
reasoning = msg.content[0]["summary"][0]["text"]
answer = msg.content[1]["text"]

print("Reasoning:", reasoning)
print("Answer:", answer)

GPT OSS 模型示例

参数 reasoning_effort 接受 "low"、 "medium" （默认值）或 "high" 值。更高的推理工作量可能会导致更周到和准确的响应，但可能会增加延迟和令牌使用。

curl -X POST "https://<workspace_host>/serving-endpoints/databricks-gpt-oss-120b/invocations" \
  -H "Authorization: Bearer $DATABRICKS_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {
        "role": "user",
        "content": "Why is the sky blue?"
      }
    ],
    "max_tokens": 4096,
    "reasoning_effort": "high"
  }'

API 响应包括思考内容和文本内容块：

ChatCompletionMessage(
    role="assistant",
    content=[
        {
            "type": "reasoning",
            "summary": [
                {
                    "type": "summary_text",
                    "text": ("The question is asking about the scientific explanation for why the sky appears blue... "),
                    "signature": ("EqoBCkgIARABGAIiQAhCWRmlaLuPiHaF357JzGmloqLqkeBm3cHG9NFTxKMyC/9bBdBInUsE3IZk6RxWge...")
                }
            ]
        },
        {
            "type": "text",
            "text": (
                "# Why the Sky Is Blue\n\n"
                "The sky appears blue because of a phenomenon called Rayleigh scattering. Here's how it works..."
            )
        }
    ],
    refusal=None,
    annotations=None,
    audio=None,
    function_call=None,
    tool_calls=None
)

跨多个回合管理推理

此部分特定于 databricks-claude-3-7-sonnet model.

在多轮对话中，模型只能看到与最后一次助理轮次或工具使用会话相关的推理块，并将其视作输入令牌。

如果不想将推理令牌传递回模型（例如，不需要它来推理其先前的步骤），可以完全省略推理块。例如：

response = client.chat.completions.create(
    model="databricks-claude-3-7-sonnet",
    messages=[
        {"role": "user", "content": "Why is the sky blue?"},
        {"role": "assistant", "content": text_content},
        {"role": "user", "content": "Can you explain in a way that a 5-year-old child can understand?"}
    ],
    max_tokens=20480,
    extra_body={
        "thinking": {
            "type": "enabled",
            "budget_tokens": 10240
        }
    }
)

answer = response.choices[0].message.content[1]["text"]
print("Answer:", answer)

但是，如果你确实需要模型来推理其以前的推理过程（例如，如果你正在构建显示其中间推理的体验）——则必须包含完整、未经修改的助理消息，包括上一轮次的推理块。下面是如何通过完整的助手消息来继续对话线程的方法：

assistant_message = response.choices[0].message

response = client.chat.completions.create(
    model="databricks-claude-3-7-sonnet",
    messages=[
        {"role": "user", "content": "Why is the sky blue?"},
        {"role": "assistant", "content": text_content},
        {"role": "user", "content": "Can you explain in a way that a 5-year-old child can understand?"},
        assistant_message,
        {"role": "user", "content": "Can you simplify the previous answer?"}
    ],
    max_tokens=20480,
    extra_body={
        "thinking": {
            "type": "enabled",
            "budget_tokens": 10240
        }
    }
)

answer = response.choices[0].message.content[1]["text"]
print("Answer:", answer)

推理模型的工作原理是什么？

推理模型除了标准输入和输出令牌外，还引入了特殊的推理令牌。这些令牌让模型通过提示“思考”，将其分解，并考虑不同的响应方式。在此内部推理过程之后，模型会生成其最终答案作为可见输出标记。某些模型（例如 databricks-claude-3-7-sonnet）向用户显示这些推理令牌，而其他模型（如 OpenAI o 系列）会丢弃它们，并且不会在最终输出中公开它们。

其他资源

反馈

此页面是否有帮助？