Query a deployed Mosaic AI agent

2025-09-17

Learn how to send requests to agents deployed to a Model Serving endpoint. Databricks provides multiple query methods to fit different use cases and integration needs.

To learn how to deploy agents, see Deploy an agent for generative AI applications.

Select the query approach that best fits your use case:

Method	Key benefits
Databricks OpenAI Client (Recommended)	Native integration, full feature support, streaming capabilities
MLflow deployments client	Existing MLflow patterns, established ML pipelines
REST API	OpenAI-compatible, language-agnostic, works with existing tools

Databricks recommends the Databricks OpenAI Client for new applications. Choose the REST API when integrating with platforms that expect OpenAI-compatible endpoints.

Databricks OpenAI Client (Recommended)

Databricks recommends that you use the Databricks OpenAI Client to query a deployed agent. Depending on the API of your deployed agent, you will either use the responses or chat completions client:

ResponsesAgent endpoints

Use the following example for agents created with the ResponsesAgent interface, which is the recommended approach for building agents.

from databricks.sdk import WorkspaceClient

input_msgs = [{"role": "user", "content": "What does Databricks do?"}]
endpoint = "<agent-endpoint-name>" # TODO: update this with your endpoint name

w = WorkspaceClient()
client = w.serving_endpoints.get_open_ai_client()

## Run for non-streaming responses. Invokes `predict`
response = client.responses.create(model=endpoint, input=input_msgs)
print(response)

## Include stream=True for streaming responses. Invokes `predict_stream`
streaming_response = client.responses.create(model=endpoint, input=input_msgs, stream=True)
for chunk in streaming_response:
  print(chunk)

If you want to pass in custom_inputs or databricks_options, you can add them with the extra_body param:

streaming_response = client.responses.create(
    model=endpoint,
    input=input_msgs,
    stream=True,
    extra_body={
        "custom_inputs": {"id": 5},
        "databricks_options": {"return_trace": True},
    },
)
for chunk in streaming_response:
    print(chunk)

ChatAgent or ChatModel endpoints

Use the following example for agents created with legacy ChatAgent or ChatModel interfaces, which are still supported but not recommended for new agents.

from databricks.sdk import WorkspaceClient

messages = [{"role": "user", "content": "What does Databricks do?"}]
endpoint = "<agent-endpoint-name>" # TODO: update this with your endpoint name

w = WorkspaceClient()
client = w.serving_endpoints.get_open_ai_client()

## Run for non-streaming responses. Invokes `predict`
response = client.chat.completions.create(model=endpoint, messages=messages)
print(response)

## Include stream=True for streaming responses. Invokes `predict_stream`
streaming_response = client.chat.completions.create(model=endpoint, messages=messages, stream=True)
for chunk in streaming_response:
  print(chunk)

If you want to pass in custom_inputs or databricks_options, you can add them with the extra_body param:

streaming_response = client.chat.completions.create(
    model=endpoint,
    messages=messages,
    stream=True,
    extra_body={
        "custom_inputs": {"id": 5},
        "databricks_options": {"return_trace": True},
    },
)
for chunk in streaming_response:
    print(chunk)

MLflow deployments client

Use the MLflow deployments client when working within existing MLflow workflows and pipelines. This approach integrates naturally with MLflow tracking and experiment management.

The following examples show you how to query an agent using the MLflow deployment client. For new applications, Databricks recommends using the Databricks OpenAI Client for its enhanced features and native integration.

Depending on the API of your deployed agent, you will either use the ResponsesAgent or ChatAgent format:

ResponsesAgent endpoints

Use the following example for agents created with the ResponsesAgent interface, which is the recommended approach for building agents.

from mlflow.deployments import get_deploy_client

client = get_deploy_client()
input_example = {
    "input": [{"role": "user", "content": "What does Databricks do?"}],
    ## Optional: Include any custom inputs
    ## "custom_inputs": {"id": 5},
    "databricks_options": {"return_trace": True},
}
endpoint = "<agent-endpoint-name>" # TODO: update this with your endpoint name

## Call predict for non-streaming responses
response = client.predict(endpoint=endpoint, inputs=input_example)

## Call predict_stream for streaming responses
streaming_response = client.predict_stream(endpoint=endpoint, inputs=input_example)

ChatAgent or ChatModel endpoints

Use this for agents created with legacy ChatAgent or ChatModel interfaces, which are still supported but not recommended for new agents.

from mlflow.deployments import get_deploy_client

client = get_deploy_client()
input_example = {
    "messages": [{"role": "user", "content": "What does Databricks do?"}],
    ## Optional: Include any custom inputs
    ## "custom_inputs": {"id": 5},
    "databricks_options": {"return_trace": True},
}
endpoint = "<agent-endpoint-name>" # TODO: update this with your endpoint name

## Call predict for non-streaming responses
response = client.predict(endpoint=endpoint, inputs=input_example)

## Call predict_stream for streaming responses
streaming_response = client.predict_stream(endpoint=endpoint, inputs=input_example)

client.predict() and client.predict_stream() call the agent functions you defined when authoring the agent. See Streaming responses.

REST API

The Databricks REST API provides endpoints for models that are OpenAI-compatible. This allows you to use Databricks agents to serve applications that require OpenAI interfaces.

This approach is ideal for:

Language-agnostic applications that use HTTP requests
Integrating with third-party platforms that expect OpenAI-compatible APIs
Migrating from OpenAI to Databricks with minimal code changes

Authenticate with the REST API using a Databricks OAuth token or Personal Access Token (PAT). The examples below use a Databricks OAuth token, refer to the Databricks Authentication Documentation for more options and information.

ResponsesAgent endpoints

Use the following example for agents created with the ResponsesAgent interface, which is the recommended approach for building agents. REST API call is equivalent to:

Using the Databricks OpenAI Client with responses.create.
Sending a POST request to the specific endpoint's URL (ex: https://<host.databricks.com>/serving-endpoints/\<model-name\>/invocations). Find more details in your endpoint's model serving page and the Model Serving Documentation.

curl --request POST \
  --url https://<host.databricks.com\>/serving-endpoints/responses \
  --header 'Authorization: Bearer <OAuth token>' \
  --header 'content-type: application/json' \
  --data '{
    "model": "\<model-name\>",
    "input": [{ "role": "user", "content": "hi" }],
    "stream": true
  }'

If you want to pass in custom_inputs or databricks_options, you can add them with the extra_body param:

curl --request POST \
  --url https://<host.databricks.com\>/serving-endpoints/responses \
  --header 'Authorization: Bearer <OAuth token>' \
  --header 'content-type: application/json' \
  --data '{
    "model": "\<model-name\>",
    "input": [{ "role": "user", "content": "hi" }],
    "stream": true,
    "extra_body": {
      "custom_inputs": { "id": 5 },
      "databricks_options": { "return_trace": true }
    }
  }'

ChatAgent or ChatModel endpoints

Use this for agents created with legacy ChatAgent or ChatModel interfaces, which are still supported but not recommended for new agents. This is equivalent to:

Using the Databricks OpenAI Client with chat.completions.create.
Sending a POST request to the specific endpoint's URL (ex: https://<host.databricks.com>/serving-endpoints/\<model-name\>/invocations). Find more details in your endpoint's model serving page and the Model Serving Documentation.

curl --request POST \
  --url https://<host.databricks.com\>/serving-endpoints/chat/completions \
  --header 'Authorization: Bearer <OAuth token>' \
  --header 'content-type: application/json' \
  --data '{
    "model": "\<model-name\>",
    "messages": [{ "role": "user", "content": "hi" }],
    "stream": true
  }'

If you want to pass in custom_inputs or databricks_options, you can add them with the extra_body param:

curl --request POST \
  --url https://<host.databricks.com\>/serving-endpoints/chat/completions \
  --header 'Authorization: Bearer <OAuth token>' \
  --header 'content-type: application/json' \
  --data '{
    "model": "\<model-name\>",
    "messages": [{ "role": "user", "content": "hi" }],
    "stream": true,
    "extra_body": {
      "custom_inputs": { "id": 5 },
      "databricks_options": { "return_trace": true }
    }
  }'

Next steps

Production monitoring for GenAI

Feedback

Was this page helpful?

Share via

Query a deployed Mosaic AI agent

Databricks OpenAI Client (Recommended)

ResponsesAgent endpoints

ChatAgent or ChatModel endpoints

MLflow deployments client

ResponsesAgent endpoints

ChatAgent or ChatModel endpoints

REST API

ResponsesAgent endpoints

ChatAgent or ChatModel endpoints

Next steps

Feedback

Additional resources