Edit

Share via


Deploy models as serverless API deployments

Important

Items marked (preview) in this article are currently in public preview. This preview is provided without a service-level agreement, and we don't recommend it for production workloads. Certain features might not be supported or might have constrained capabilities. For more information, see Supplemental Terms of Use for Microsoft Azure Previews.

In this article, you learn how to deploy an Azure AI Foundry Model as a serverless API deployment. Certain models in the model catalog can be deployed as a serverless API deployment. This kind of deployment provides a way to consume models as an API without hosting them on your subscription, while keeping the enterprise security and compliance that organizations need. This deployment option doesn't require quota from your subscription.

Although serverless API deployment is one option for deploying Azure AI Foundry Models, we recommend that you deploy Foundry Models to Azure AI Foundry resources.

Note

We recommend that you deploy Azure AI Foundry Models to Azure AI Foundry resources so that you can consume your deployments in the resource via a single endpoint with the same authentication and schema to generate inference. The endpoint follows the Azure AI Model Inference API which all the Foundry Models support. To learn how to deploy a Foundry Model to the Azure AI Foundry resources, see Add and configure models to Azure AI Foundry Models.

Prerequisites

  • An Azure subscription with a valid payment method. Free or trial Azure subscriptions won't work. If you don't have an Azure subscription, create a paid Azure account to begin.

  • If you don't have one, create a hub-based project.

  • Ensure that the Deploy models to Azure AI Foundry resources (preview) feature is turned off in the Azure AI Foundry portal. When this feature is on, serverless API deployments aren't available from the portal.

    A screenshot of the Azure AI Foundry portal showing where to disable deployment to Azure AI Foundry resources.

  • Foundry Models from Partners and Community require access to Azure Marketplace, while Foundry Models Sold Directly by Azure don't have this requirement. Ensure you have the permissions required to subscribe to model offerings in Azure Marketplace.

  • Azure role-based access controls (Azure RBAC) are used to grant access to operations in Azure AI Foundry portal. To perform the steps in this article, your user account must be assigned the Azure AI Developer role on the resource group. For more information on permissions, see Role-based access control in Azure AI Foundry portal.

Find your model in the model catalog

  1. Sign in to Azure AI Foundry.
  2. If you’re not already in your project, select it.
  3. Select Model catalog from the left pane.
  1. Select the model card of the model you want to deploy. In this article, you select a DeepSeek-R1 model.

  2. Select Use this model to open the Serverless API deployment window where you can view the Pricing and terms tab.

  3. In the deployment wizard, name the deployment. The Content filter (preview) option is enabled by default. Leave the default setting for the service to detect harmful content such as hate, self-harm, sexual, and violent content. For more information about content filtering, see Content filtering in Azure AI Foundry portal.

    Screenshot showing the deployment wizard for a model sold directly by Azure.

Deploy the model to a serverless API

In this section, you create an endpoint for your model.

  1. In the deployment wizard, select Deploy. Wait until the deployment is ready and you're redirected to the Deployments page.

  2. To see the endpoints deployed to your project, in the My assets section of the left pane, select Models + endpoints.

  3. The created endpoint uses key authentication for authorization. To get the keys associated with a given endpoint, follow these steps:

    1. Select the deployment, and note the endpoint's Target URI and Key.

    2. Use these credentials to call the deployment and generate predictions.

  4. If you need to consume this deployment from a different project or hub, or you plan to use Prompt flow to build intelligent applications, you need to create a connection to the serverless API deployment. To learn how to configure an existing serverless API deployment on a new project or hub, see Consume deployed serverless API deployment from a different project or from Prompt flow.

    Tip

    If you're using Prompt flow in the same project or hub where the deployment was deployed, you still need to create the connection.

Use the serverless API deployment

Models deployed in Azure Machine Learning and Azure AI Foundry in serverless API deployments support the Azure AI Model Inference API that exposes a common set of capabilities for foundational models and that can be used by developers to consume predictions from a diverse set of models in a uniform and consistent way.

Read more about the capabilities of this API and how you can use it when building applications.

Delete endpoints and subscriptions

Tip

Because you can customize the left pane in the Azure AI Foundry portal, you might see different items than shown in these steps. If you don't see what you're looking for, select ... More at the bottom of the left pane.

You can delete model subscriptions and endpoints. Deleting a model subscription makes any associated endpoint become Unhealthy and unusable.

To delete a serverless API deployment:

  1. Go to the Azure AI Foundry.
  2. Go to your project.
  3. In the My assets section, select Models + endpoints.
  4. Open the deployment you want to delete.
  5. Select Delete.

To delete the associated model subscription:

  1. Go to the Azure portal
  2. Navigate to the resource group where the project belongs.
  3. On the Type filter, select SaaS.
  4. Select the subscription you want to delete.
  5. Select Delete.
  • To work with Azure AI Foundry, install the Azure CLI and the ml extension for Azure Machine Learning.

    az extension add -n ml
    

    If you already have the extension installed, ensure the latest version is installed.

    az extension update -n ml
    

    Once the extension is installed, configure it:

    az account set --subscription <subscription>
    az configure --defaults workspace=<project-name> group=<resource-group> location=<location>
    

Find your model in the model catalog

  1. Sign in to Azure AI Foundry.
  2. If you’re not already in your project, select it.
  3. Select Model catalog from the left pane.
  1. Select the model card of the model you want to deploy. In this article, you select a DeepSeek-R1 model.

  2. Copy the Model ID without including the model version, since serverless API deployments always deploy the model's latest version available. For example, for the model ID azureml://registries/azureml-deepseek/models/DeepSeek-R1/versions/1, copy azureml://registries/azureml-deepseek/models/DeepSeek-R1.

    A screenshot showing a model's details page for a model sold directly by Azure.

The steps in this section of the article use the DeepSeek-R1 model for illustration. The steps are the same, whether you're using Foundry Models sold directly by Azure or Foundry Models from partners and community. For example, if you choose to deploy the Cohere-command-r-08-2024 model instead, you can replace the model credentials in the code snippets with the credentials for Cohere.

Deploy the model to a serverless API

In this section, you create an endpoint for your model. Name the endpoint DeepSeek-R1-qwerty.

  1. Create the serverless endpoint.

    endpoint.yml

    name: DeepSeek-R1-qwerty
    model_id: azureml://registries/azureml-deepseek/models/DeepSeek-R1
    

    Use the endpoint.yml file to create the endpoint:

    az ml serverless-endpoint create -f endpoint.yml
    
  2. At any point, you can see the endpoints deployed to your project:

    az ml serverless-endpoint list
    
  3. The created endpoint uses key authentication for authorization. Use the following steps to get the keys associated with a given endpoint.

    az ml serverless-endpoint get-credentials -n DeepSeek-R1-qwerty
    
  4. If you need to consume this deployment from a different project or hub, or you plan to use Prompt flow to build intelligent applications, you need to create a connection to the serverless API deployment. To learn how to configure an existing serverless API deployment on a new project or hub, see Consume deployed serverless API deployment from a different project or from Prompt flow.

    Tip

    If you're using Prompt flow in the same project or hub where the deployment was deployed, you still need to create the connection.

Use the serverless API deployment

Models deployed in Azure Machine Learning and Azure AI Foundry in serverless API deployments support the Azure AI Model Inference API that exposes a common set of capabilities for foundational models and that can be used by developers to consume predictions from a diverse set of models in a uniform and consistent way.

Read more about the capabilities of this API and how you can use it when building applications.

Delete endpoints and subscriptions

You can delete model subscriptions and endpoints. Deleting a model subscription makes any associated endpoint become Unhealthy and unusable.

To delete a serverless API deployment:

az ml serverless-endpoint delete \
    --name "DeepSeek-R1-qwerty"

To delete the associated model subscription:

az ml marketplace-subscription delete \
    --name "DeepSeek-R1"
  • To work with Azure AI Foundry, install the Azure Machine Learning SDK for Python.

    pip install -U azure-ai-ml
    

    Once installed, import necessary namespaces and create a client connected to your project:

    from azure.ai.ml import MLClient
    from azure.identity import InteractiveBrowserCredential
    from azure.ai.ml.entities import MarketplaceSubscription, ServerlessEndpoint
    
    client = MLClient(
        credential=InteractiveBrowserCredential(tenant_id="<tenant-id>"),
        subscription_id="<subscription-id>",
        resource_group_name="<resource-group>",
        workspace_name="<project-name>",
    )
    

Find your model in the model catalog

  1. Sign in to Azure AI Foundry.
  2. If you’re not already in your project, select it.
  3. Select Model catalog from the left pane.
  1. Select the model card of the model you want to deploy. In this article, you select a DeepSeek-R1 model.

  2. Copy the Model ID without including the model version, since serverless API deployments always deploy the model's latest version available. For example, for the model ID azureml://registries/azureml-deepseek/models/DeepSeek-R1/versions/1, copy azureml://registries/azureml-deepseek/models/DeepSeek-R1.

    A screenshot showing a model's details page for a model sold directly by Azure.

The steps in this section of the article use the DeepSeek-R1 model for illustration. The steps are the same, whether you're using Foundry Models sold directly by Azure or Foundry Models from partners and community. For example, if you choose to deploy the Cohere-command-r-08-2024 model instead, you can replace the model credentials in the code snippets with the credentials for Cohere.

Deploy the model to a serverless API

In this section, you create an endpoint for your model. Name the endpoint DeepSeek-R1-qwerty.

  1. Create the serverless endpoint.

    endpoint_name="DeepSeek-R1-qwerty"
    
    serverless_endpoint = ServerlessEndpoint(
        name=endpoint_name,
        model_id=model_id
    )
    
    created_endpoint = client.serverless_endpoints.begin_create_or_update(
        serverless_endpoint
    ).result()
    
  2. At any point, you can see the endpoints deployed to your project:

    endpoint_name="DeepSeek-R1-qwerty"
    
    serverless_endpoint = ServerlessEndpoint(
        name=endpoint_name,
        model_id=model_id
    )
    
    created_endpoint = client.serverless_endpoints.begin_create_or_update(
        serverless_endpoint
    ).result()
    
  3. The created endpoint uses key authentication for authorization. Use the following steps to get the keys associated with a given endpoint.

    endpoint_keys = client.serverless_endpoints.get_keys(endpoint_name)
    print(endpoint_keys.primary_key)
    print(endpoint_keys.secondary_key)
    
  4. If you need to consume this deployment from a different project or hub, or you plan to use Prompt flow to build intelligent applications, you need to create a connection to the serverless API deployment. To learn how to configure an existing serverless API deployment on a new project or hub, see Consume deployed serverless API deployment from a different project or from Prompt flow.

    Tip

    If you're using Prompt flow in the same project or hub where the deployment was deployed, you still need to create the connection.

Use the serverless API deployment

Models deployed in Azure Machine Learning and Azure AI Foundry in serverless API deployments support the Azure AI Model Inference API that exposes a common set of capabilities for foundational models and that can be used by developers to consume predictions from a diverse set of models in a uniform and consistent way.

Read more about the capabilities of this API and how you can use it when building applications.

Delete endpoints and subscriptions

You can delete model subscriptions and endpoints. Deleting a model subscription makes any associated endpoint become Unhealthy and unusable.

client.serverless_endpoints.begin_delete(endpoint_name).wait()

To delete the associated model subscription:

client.marketplace_subscriptions.begin_delete(subscription_name).wait()
  • To work with Azure AI Foundry, install the Azure CLI as described at Azure CLI.

    Configure the following environment variables according to your settings:

    RESOURCE_GROUP="serverless-models-dev"
    LOCATION="eastus2" 
    

Find your model in the model catalog

  1. Sign in to Azure AI Foundry.
  2. If you’re not already in your project, select it.
  3. Select Model catalog from the left pane.
  1. Select the model card of the model you want to deploy. In this article, you select a DeepSeek-R1 model.

  2. Copy the Model ID without including the model version, since serverless API deployments always deploy the model's latest version available. For example, for the model ID azureml://registries/azureml-deepseek/models/DeepSeek-R1/versions/1, copy azureml://registries/azureml-deepseek/models/DeepSeek-R1.

    A screenshot showing a model's details page for a model sold directly by Azure.

The steps in this section of the article use the DeepSeek-R1 model for illustration. The steps are the same, whether you're using Foundry Models sold directly by Azure or Foundry Models from partners and community. For example, if you choose to deploy the Cohere-command-r-08-2024 model instead, you can replace the model credentials in the code snippets with the credentials for Cohere.

Deploy the model to a serverless API

In this section, you create an endpoint for your model. Name the endpoint myserverless-text-1234ss.

  1. Create the serverless endpoint. Use the following template to create an endpoint:

    serverless-endpoint.bicep

    param projectName string = 'my-project'
    param endpointName string = 'myserverless-text-1234ss'
    param location string = resourceGroup().location
    param modelId string = 'azureml://registries/azureml-deepseek/models/DeepSeek-R1'
    
    var modelName = substring(modelId, (lastIndexOf(modelId, '/') + 1))
    // Replace period character which is used in some model names (and is not valid in the subscription name)
    var sanitizedModelName = replace(modelName, '.', '')
    var subscriptionName = '${sanitizedModelName}-subscription'
    
    resource projectName_endpoint 'Microsoft.MachineLearningServices/workspaces/serverlessEndpoints@2024-04-01-preview' = {
      name: '${projectName}/${endpointName}'
      location: location
      sku: {
        name: 'Consumption'
      }
      properties: {
        modelSettings: {
          modelId: modelId
        }
      }
      dependsOn: [
        projectName_subscription
      ]
    }
    
    output endpointUri string = projectName_endpoint.properties.inferenceEndpoint.uri
    

    Create the deployment as follows:

    az deployment group create --resource-group $RESOURCE_GROUP --template-file model-subscription.bicep
    
  2. At any point, you can see the endpoints deployed to your project:

    You can use the resource management tools to query the resources. The following code uses Azure CLI:

    az resource list \
        --query "[?type=='Microsoft.MachineLearningServices/workspaces/serverlessEndpoints']"
    
  3. The created endpoint uses key authentication for authorization. Get the keys associated with the given endpoint by using REST APIs to query this information.

  4. If you need to consume this deployment from a different project or hub, or you plan to use Prompt flow to build intelligent applications, you need to create a connection to the serverless API deployment. To learn how to configure an existing serverless API deployment on a new project or hub, see Consume deployed serverless API deployment from a different project or from Prompt flow.

    Tip

    If you're using Prompt flow in the same project or hub where the deployment was deployed, you still need to create the connection.

Use the serverless API deployment

Models deployed in Azure Machine Learning and Azure AI Foundry in serverless API deployments support the Azure AI Model Inference API that exposes a common set of capabilities for foundational models and that can be used by developers to consume predictions from a diverse set of models in a uniform and consistent way.

Read more about the capabilities of this API and how you can use it when building applications.

Delete endpoints and subscriptions

You can delete model subscriptions and endpoints. Deleting a model subscription makes any associated endpoint become Unhealthy and unusable.

You can use the resource management tools to manage the resources. The following code uses Azure CLI:

az resource delete --name <resource-name>

Cost and quota considerations for Foundry Models deployed as a serverless API deployment

Quota is managed per deployment. Each deployment has a rate limit of 200,000 tokens per minute and 1,000 API requests per minute. Additionally, we currently limit one deployment per model per project. Contact Microsoft Azure Support if the current rate limits aren't sufficient for your scenarios.

  • You can find pricing information for Models Sold Directly by Azure, on the Pricing and terms tab of the Serverless API deployment window.

  • Models from Partners and Community are offered through Azure Marketplace and integrated with Azure AI Foundry for use. You can find Azure Marketplace pricing when deploying or fine-tuning these models. Each time a project subscribes to a given offer from Azure Marketplace, a new resource is created to track the costs associated with its consumption. The same resource is used to track costs associated with inference and fine-tuning; however, multiple meters are available to track each scenario independently. For more information on how to track costs, see Monitor costs for models offered through Azure Marketplace.

Permissions required to subscribe to model offerings

Azure role-based access controls (Azure RBAC) are used to grant access to operations in Azure AI Foundry portal. To perform the steps in this article, your user account must be assigned the Owner, Contributor, or Azure AI Developer role for the Azure subscription. Alternatively, your account can be assigned a custom role that has the following permissions:

  • On the Azure subscription—to subscribe the workspace to Azure Marketplace offering, once for each workspace, per offering:

    • Microsoft.MarketplaceOrdering/agreements/offers/plans/read
    • Microsoft.MarketplaceOrdering/agreements/offers/plans/sign/action
    • Microsoft.MarketplaceOrdering/offerTypes/publishers/offers/plans/agreements/read
    • Microsoft.Marketplace/offerTypes/publishers/offers/plans/agreements/read
    • Microsoft.SaaS/register/action
  • On the resource group—to create and use the SaaS resource:

    • Microsoft.SaaS/resources/read
    • Microsoft.SaaS/resources/write
  • On the workspace—to deploy endpoints (the Azure Machine Learning data scientist role contains these permissions already):

    • Microsoft.MachineLearningServices/workspaces/marketplaceModelSubscriptions/*
    • Microsoft.MachineLearningServices/workspaces/serverlessEndpoints/*

For more information on permissions, see Role-based access control in Azure AI Foundry portal.