Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
This article demonstrates how to enable Model Serving in your workspace and switch your models to the Mosaic AI Model Serving experience built on serverless compute.
Important
Starting August 22, 2025, customers will no longer be able to create new serving endpoints using the Legacy MLflow Model Serving experience. On September 15, 2025, the legacy experience will reach end of life and all existing endpoints using this service can no longer be used.
Requirements
- Registered model in the MLflow Model Registry.
- Permissions on the registered models as described in the access control guide.
- Enable serverless compute on your workspace.
Significant changes
- In Model Serving, the format of the request to the endpoint and the response from the endpoint are slightly different from Legacy MLflow Model Serving. See Scoring a model endpoint for details on the new format protocol.
- In Model Serving, the endpoint URL includes serving-endpointsinstead ofmodel.
- Model Serving includes full support for managing resources with API workflows.
- Model Serving is production-ready and backed by the Azure Databricks SLA.
Identify serving endpoints that use Legacy MLflow Model Serving
To identify model serving endpoints that use Legacy MLflow Model Serving:
- Navigate to the Models UI in your workspace.
- Select the Workspace Model Registry filter.
- Select the Legacy serving enabled only filter.
Migrate Legacy MLflow Model Serving served models to Model Serving
You can create a Model Serving endpoint and flexibly transition model serving workflows without disabling Legacy MLflow Model Serving.
The following steps show how to accomplish this with the UI. For each model on which you have Legacy MLflow Model Serving enabled:
- Register your model to Unity Catalog.
- Navigate to Serving endpoints on the sidebar of your machine learning workspace.
- Follow the workflow described in Create custom model serving endpoints on how to create a serving endpoint with your model.
- Transition your application to use the new URL provided by the serving endpoint to query the model, along with the new scoring format.
- When your models are transitioned over, you can navigate to Models on the sidebar of your machine learning workspace.
- Select the model for which you want to disable Legacy MLflow Model Serving.
- On the Serving tab, select Stop.
- A message appears to confirm. Select Stop Serving.
Migrate deployed model versions to Model Serving
In previous versions of the Model Serving functionality, the serving endpoint was created based on the stage of the registered model version: Staging or Production. To migrate your served models from that experience, you can replicate that behavior in the new Model Serving experience.
This section demonstrates how to create separate model serving endpoints for Staging model versions and Production model versions. The following steps show how to accomplish this with the serving endpoints API for each of your served models.
In the example, the registered model name modelA has version 1 in the model stage Production and version 2 in the model stage Staging.
- Create two endpoints for your registered model, one for - Stagingmodel versions and another for- Productionmodel versions.- For - Stagingmodel versions:- POST /api/2.0/serving-endpoints { "name":"modelA-Staging" "config": { "served_entities": [ { "entity_name":"model-A", "entity_version":"2", // Staging Model Version "workload_size":"Small", "scale_to_zero_enabled":true }, ], }, }- For - Productionmodel versions:- POST /api/2.0/serving-endpoints { "name":"modelA-Production" "config": { "served_entities": [ { "entity_name":"model-A", "entity_version":"1", // Production Model Version "workload_size":"Small", "scale_to_zero_enabled":true }, ], }, }
- Verify the status of the endpoints. - For Staging endpoint: - GET /api/2.0/serving-endpoints/modelA-Staging- For Production endpoint: - GET /api/2.0/serving-endpoints/modelA-Production
- Once the endpoints are ready, query the endpoint using: - For Staging endpoint: - POST /serving-endpoints/modelA-Staging/invocations- For Production endpoint: - POST /serving-endpoints/modelA-Production/invocations
- Update the endpoint based on model version transitions. - In the scenario where a new model version 3 is created, you can have the model version 2 transition to - Production, while model version 3 can transition to- Stagingand model version 1 is- Archived. These changes can be reflected in separate model serving endpoints as follows:- For the - Stagingendpoint, update the endpoint to use the new model version in- Staging.- PUT /api/2.0/serving-endpoints/modelA-Staging/config { "served_entities": [ { "entity_name":"model-A", "entity_version":"3", // New Staging model version "workload_size":"Small", "scale_to_zero_enabled":true }, ], }- For - Productionendpoint, update the endpoint to use the new model version in- Production.- PUT /api/2.0/serving-endpoints/modelA-Production/config { "served_entities": [ { "entity_name":"model-A", "entity_version":"2", // New Production model version "workload_size":"Small", "scale_to_zero_enabled":true }, ], }