Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Note
Databricks recommends migrating to the ResponsesAgent schema to author agents. See Author AI agents in code.
AI agents must adhere to specific input and output schema requirements to be compatible with other features on Databricks. This page explains how to use the legacy agent authoring signatures and interfaces: ChatAgent interface, ChatModel interface, the SplitChatMessageRequest input schema, and the StringResponse output schema.
Author a legacy ChatAgent agent
The MLflow ChatAgent interface is similar to, but not strictly compatible with, the OpenAI ChatCompletion schema.
To learn how to create a ChatAgent, see the examples in the following section and MLflow documentation - What is the ChatAgent interface.
To author and deploy agents using ChatAgent, install the following`:
- databricks-agents0.16.0 or above
- mlflow2.20.2 or above
- Python 3.10 or above.
- To meet this requirement, you can use serverless compute or Databricks Runtime 13.3 LTS or above.
 
%pip install -U -qqqq databricks-agents>=0.16.0 mlflow>=2.20.2
What if I already have an agent?
If you already have an agent built with LangChain, LangGraph, or a similar framework, you don't need to rewrite your agent to use it on Databricks. Instead, just wrap your existing agent with the MLflow ChatAgent interface:
- Write a Python wrapper class that inherits from - mlflow.pyfunc.ChatAgent.- Inside the wrapper class, keep your existing agent as an attribute - self.agent = your_existing_agent.
- The - ChatAgentclass requires you to implement a- predictmethod to handle non-streaming requests.- predictmust accept:- messages: list[ChatAgentMessage], which is a list of- ChatAgentMessageeach with a role (like "user" or "assistant"), the prompt, and an ID.
- (Optional) - context: Optional[ChatContext]and- custom_inputs: Optional[dict]for extra data.
 - import uuid # input example [ ChatAgentMessage( id=str(uuid.uuid4()), # Generate a unique ID for each message role="user", content="What's the weather in Paris?" ) ]- predictmust return a- ChatAgentResponse.- import uuid # output example ChatAgentResponse( messages=[ ChatAgentMessage( id=str(uuid.uuid4()), # Generate a unique ID for each message role="assistant", content="It's sunny in Paris." ) ] )
- Convert between formats - In - predict, convert the incoming messages from- list[ChatAgentMessage]into the input format your agent expects.- After your agent generates a response, convert its output to one or more - ChatAgentMessageobjects and wrap them in a- ChatAgentResponse.
Tip
Convert LangChain output automatically
If you are wrapping a LangChain agent, you can use mlflow.langchain.output_parsers.ChatAgentOutputParser to automatically convert LangChain outputs into the MLflow ChatAgentMessage and ChatAgentResponse schema.
The following is a simplified template for converting your agent:
from mlflow.pyfunc import ChatAgent
from mlflow.types.agent import ChatAgentMessage, ChatAgentResponse, ChatAgentChunk
import uuid
class MyWrappedAgent(ChatAgent):
  def __init__(self, agent):
    self.agent = agent
  def predict(self, messages, context=None, custom_inputs=None):
    # Convert messages to your agent's format
    agent_input = ... # build from messages
    agent_output = self.agent.invoke(agent_input)
    # Convert output to ChatAgentMessage
    return ChatAgentResponse(
      messages=[ChatAgentMessage(role="assistant", content=agent_output, id=str(uuid.uuid4()),)]
    )
  def predict_stream(self, messages, context=None, custom_inputs=None):
    # If your agent supports streaming
    for chunk in self.agent.stream(...):
      yield ChatAgentChunk(delta=ChatAgentMessage(role="assistant", content=chunk, id=str(uuid.uuid4())))
For complete examples, see the notebooks in the following section.
ChatAgent examples
The following notebooks show how to author streaming and non-streaming ChatAgents using the popular libraries OpenAI, LangGraph, and AutoGen.
LangGraph
If you are wrapping a LangChain agent, you can use mlflow.langchain.output_parsers.ChatAgentOutputParser to automatically convert LangChain outputs into the MLflow ChatAgentMessage and ChatAgentResponse schema.
LangGraph tool-calling agent
OpenAI
OpenAI tool-calling agent
OpenAI Responses API tool-calling agent
OpenAI chat-only agent
AutoGen
AutoGen tool-calling agent
DSPy
DSPy chat-only agent
To learn how to expand the capabilities of these agents by adding tools, see AI agent tools.
Streaming ChatAgent responses
Streaming agents deliver responses in a continuous stream of smaller, incremental chunks. Streaming reduces perceived latency and improves user experience for conversational agents.
To author a streaming ChatAgent, define a predict_stream method that returns a generator that yields ChatAgentChunk objects - each ChatAgentChunk contains a portion of the response. Read more about ideal ChatAgent streaming behavior in the MLflow docs.
The following code shows an example predict_stream function, for complete examples of streaming agents, see ChatAgent examples:
def predict_stream(
  self,
  messages: list[ChatAgentMessage],
  context: Optional[ChatContext] = None,
  custom_inputs: Optional[dict[str, Any]] = None,
) -> Generator[ChatAgentChunk, None, None]:
  # Convert messages to a format suitable for your agent
  request = {"messages": self._convert_messages_to_dict(messages)}
  # Stream the response from your agent
  for event in self.agent.stream(request, stream_mode="updates"):
    for node_data in event.values():
      # Yield each chunk of the response
      yield from (
        ChatAgentChunk(**{"delta": msg}) for msg in node_data["messages"]
      )
Author a legacy ChatModel agent
Important
Databricks recommends the ChatAgent interface for creating agents or gen AI apps. To migrate from ChatModel to ChatAgent, see MLflow documentation - Migrate from ChatModel to ChatAgent.
ChatModel is a legacy agent authoring interface in MLflow that extends OpenAI's ChatCompletion schema, allowing you to maintain compatibility
with platforms supporting the ChatCompletion standard while adding custom functionality. See MLflow: Getting Started with ChatModel for additional details.
Authoring your agent as a subclass of mlflow.pyfunc.ChatModel provides the following benefits:
- Enables streaming agent output when invoking a served agent (bypassing {stream: true}in the request body).
- Automatically enables AI Gateway inference tables when your agent is served, providing access to enhanced request log metadata, such as the requester name.
- Allows you to write agent code compatible with the ChatCompletion schema using typed Python classes.
- MLflow automatically infers a chat completion-compatible signature when logging the agent, even without an input_example. This simplifies the process of registering and deploying the agent. See Infer Model Signature during logging.
The following code is best run in a Databricks notebook. Notebooks provide a convenient environment for developing, testing, and iterating on your agent.
The MyAgent class extends mlflow.pyfunc.ChatModel, implementing the required predict method. This ensures compatibility with Mosaic AI Agent Framework.
The class also includes the optional methods _create_chat_completion_chunk and predict_stream to handle streaming outputs.
# Install the latest version of mlflow
%pip install -U mlflow
dbutils.library.restartPython()
import re
from typing import Optional, Dict, List, Generator
from mlflow.pyfunc import ChatModel
from mlflow.types.llm import (
  # Non-streaming helper classes
  ChatCompletionRequest,
  ChatCompletionResponse,
  ChatCompletionChunk,
  ChatMessage,
  ChatChoice,
  ChatParams,
  # Helper classes for streaming agent output
  ChatChoiceDelta,
  ChatChunkChoice,
)
class MyAgent(ChatModel):
  """
  Defines a custom agent that processes ChatCompletionRequests
  and returns ChatCompletionResponses.
  """
  def predict(self, context, messages: list[ChatMessage], params: ChatParams) -> ChatCompletionResponse:
    last_user_question_text = messages[-1].content
    response_message = ChatMessage(
      role="assistant",
      content=(
        f"I will always echo back your last question. Your last question was: {last_user_question_text}. "
      )
    )
    return ChatCompletionResponse(
      choices=[ChatChoice(message=response_message)]
    )
  def _create_chat_completion_chunk(self, content) -> ChatCompletionChunk:
    """Helper for constructing a ChatCompletionChunk instance for wrapping streaming agent output"""
    return ChatCompletionChunk(
      choices=[ChatChunkChoice(
        delta=ChatChoiceDelta(
          role="assistant",
          content=content
        )
      )]
    )
  def predict_stream(
    self, context, messages: List[ChatMessage], params: ChatParams
  ) -> Generator[ChatCompletionChunk, None, None]:
    last_user_question_text = messages[-1].content
    yield self._create_chat_completion_chunk(f"Echoing back your last question, word by word.")
    for word in re.findall(r"\S+\s*", last_user_question_text):
      yield self._create_chat_completion_chunk(word)
agent = MyAgent()
model_input = ChatCompletionRequest(
  messages=[ChatMessage(role="user", content="What is Databricks?")]
)
response = agent.predict(context=None, messages=model_input.messages, params=None)
print(response)
While you define the agent class MyAgent in one notebook, we recommend creating a separate driver notebook. The driver notebook logs the agent to Model Registry and deploys the agent using Model Serving.
This separation follows the workflow recommended by Databricks for logging models using MLflow's Models from Code methodology.
SplitChatMessageRequest input schema (deprecated)
SplitChatMessagesRequest allows you to pass the current query and history separately as agent input.
  question = {
    "query": "What is MLflow",
    "history": [
      {
        "role": "user",
        "content": "What is Retrieval-augmented Generation?"
      },
      {
        "role": "assistant",
        "content": "RAG is"
      }
    ]
  }
StringResponse output schema (deprecated)
StringResponse allows you to return the agent's response as an object with a single string content field:
{"content": "This is an example string response"}