Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
APPLIES TO:  Azure CLI ml extension v2 (current)
 Azure CLI ml extension v2 (current)
The source JSON schema can be found at https://azuremlschemas.azureedge.net/latest/kubernetesOnlineDeployment.schema.json.
Note
The YAML syntax detailed in this document is based on the JSON schema for the latest version of the ML CLI v2 extension. This syntax is guaranteed only to work with the latest version of the ML CLI v2 extension. You can find the schemas for older extension versions at https://azuremlschemasprod.azureedge.net/.
YAML syntax
| Key | Type | Description | Allowed values | Default value | 
|---|---|---|---|---|
| $schema | string | The YAML schema. If you use the Azure Machine Learning VS Code extension to author the YAML file, including $schemaat the top of your file enables you to invoke schema and resource completions. | ||
| name | string | Required. Name of the deployment. Naming rules are defined here. | ||
| description | string | Description of the deployment. | ||
| tags | object | Dictionary of tags for the deployment. | ||
| endpoint_name | string | Required. Name of the endpoint to create the deployment under. | ||
| model | string or object | The model to use for the deployment. This value can be either a reference to an existing versioned model in the workspace or an inline model specification. To reference an existing model, use the azureml:<model-name>:<model-version>syntax.To define a model inline, follow the Model schema. As a best practice for production scenarios, you should create the model separately and reference it here. This field is optional for custom container deployment scenarios. | ||
| model_mount_path | string | The path to mount the model in a custom container. Applicable only for custom container deployment scenarios. If the modelfield is specified, it's mounted on this path in the container. | ||
| code_configuration | object | Configuration for the scoring code logic. This field is optional for custom container deployment scenarios. | ||
| code_configuration.code | string | Local path to the source code directory for scoring the model. | ||
| code_configuration.scoring_script | string | Relative path to the scoring file in the source code directory. | ||
| environment_variables | object | Dictionary of environment variable key-value pairs to set in the deployment container. You can access these environment variables from your scoring scripts. | ||
| environment | string or object | Required. The environment to use for the deployment. This value can be either a reference to an existing versioned environment in the workspace or an inline environment specification. To reference an existing environment, use the azureml:<environment-name>:<environment-version>syntax.To define an environment inline, follow the Environment schema. As a best practice for production scenarios, you should create the environment separately and reference it here. | ||
| instance_type | string | The instance type used to place the inference workload. If omitted, the inference workload will be placed on the default instance type of the Kubernetes cluster specified in the endpoint's computefield. If specified, the inference workload will be placed on that selected instance type.The set of instance types for a Kubernetes cluster is configured via the Kubernetes cluster custom resource definition (CRD), hence they aren't part of the Azure Machine Learning YAML schema for attaching Kubernetes compute.For more information, see Create and select Kubernetes instance types. | ||
| instance_count | integer | The number of instances to use for the deployment. Specify the value based on the workload you expect. This field is only required if you're using the defaultscale type (scale_settings.type: default).instance_countcan be updated after deployment creation usingaz ml online-deployment updatecommand. | ||
| app_insights_enabled | boolean | Whether to enable integration with the Azure Application Insights instance associated with your workspace. | false | |
| scale_settings | object | The scale settings for the deployment. The two types of scale settings supported are the defaultscale type and thetarget_utilizationscale type.With the defaultscale type (scale_settings.type: default), you can manually scale the instance count up and down after deployment creation by updating theinstance_countproperty.To configure the target_utilizationscale type (scale_settings.type: target_utilization), see TargetUtilizationScaleSettings for the set of configurable properties. | ||
| scale_settings.type | string | The scale type. | default,target_utilization | target_utilization | 
| data_collector | object | Data collection settings for the deployment. See DataCollector for the set of configurable properties. | ||
| request_settings | object | Scoring request settings for the deployment. See RequestSettings for the set of configurable properties. | ||
| liveness_probe | object | Liveness probe settings for monitoring the health of the container regularly. See ProbeSettings for the set of configurable properties. | ||
| readiness_probe | object | Readiness probe settings for validating if the container is ready to serve traffic. See ProbeSettings for the set of configurable properties. | ||
| resources | object | Container resource requirements. | ||
| resources.requests | object | Resource requests for the container. See ContainerResourceRequests for the set of configurable properties. | ||
| resources.limits | object | Resource limits for the container. See ContainerResourceLimits for the set of configurable properties. | 
RequestSettings
| Key | Type | Description | Default value | 
|---|---|---|---|
| request_timeout_ms | integer | The scoring timeout in milliseconds. | 5000 | 
| max_concurrent_requests_per_instance | integer | The maximum number of concurrent requests per instance allowed for the deployment. Do not change this setting from the default value unless instructed by Microsoft Technical Support or a member of the Azure Machine Learning team. | 1 | 
| max_queue_wait_ms | integer | The maximum amount of time in milliseconds a request will stay in the queue. | 500 | 
ProbeSettings
| Key | Type | Description | Default value | 
|---|---|---|---|
| period | integer | How often (in seconds) to perform the probe. | 10 | 
| initial_delay | integer | The number of seconds after the container has started before the probe is initiated. Minimum value is 1. | 10 | 
| timeout | integer | The number of seconds after which the probe times out. Minimum value is 1. | 2 | 
| success_threshold | integer | The minimum consecutive successes for the probe to be considered successful after having failed. Minimum value is 1. | 1 | 
| failure_threshold | integer | When a probe fails, the system will try failure_thresholdtimes before giving up. Giving up in the case of a liveness probe means the container will be restarted. In the case of a readiness probe the container will be marked Unready. Minimum value is1. | 30 | 
TargetUtilizationScaleSettings
| Key | Type | Description | Default value | 
|---|---|---|---|
| type | const | The scale type | target_utilization | 
| min_instances | integer | The minimum number of instances to use. | 1 | 
| max_instances | integer | The maximum number of instances to scale to. | 1 | 
| target_utilization_percentage | integer | The target CPU usage for the autoscaler. | 70 | 
| polling_interval | integer | How often the autoscaler should attempt to scale the deployment, in seconds. | 1 | 
ContainerResourceRequests
| Key | Type | Description | 
|---|---|---|
| cpu | string | The number of CPU cores requested for the container. | 
| memory | string | The memory size requested for the container | 
| nvidia.com/gpu | string | The number of NVIDIA GPU cards requested for the container | 
ContainerResourceLimits
| Key | Type | Description | 
|---|---|---|
| cpu | string | The limit for the number of CPU cores for the container. | 
| memory | string | The limit for the memory size for the container. | 
| nvidia.com/gpu | string | The limit for the number of NVIDIA GPU cards for the container | 
DataCollector
| Key | Type | Description | Default value | 
|---|---|---|---|
| sampling_rate | float | The percentage, represented as a decimal rate, of data to collect. For instance, a value of 1.0 represents collecting 100% of data. | 1.0 | 
| rolling_rate | string | The rate to partition the data in storage. Value can be: Minute, Hour, Day, Month, Year. | Hour | 
| collections | object | Set of individual collection_names and their respective settings for this deployment. | |
| collections.<collection_name> | object | Logical grouping of production inference data to collect (example: model_inputs). There are two reserved names:requestandresponse, which respectively correspond to HTTP request and response payload data collection. All other names are arbitrary and definable by the user.Note: Each collection_nameshould correspond to the name of theCollectorobject used in the deploymentscore.pyto collect the production inference data. For more information on payload data collection and data collection with the provided Python SDK, see Collect data from models in production. | |
| collections.<collection_name>.enabled | boolean | Whether to enable data collection for the specified collection_name. | 'False'' | 
| collections.<collection_name>.data.name | string | The name of the data asset to register with the collected data. | <endpoint>-<deployment>-<collection_name> | 
| collections.<collection_name>.data.path | string | The full Azure Machine Learning datastore path where the collected data should be registered as a data asset. | azureml://datastores/workspaceblobstore/paths/modelDataCollector/<endpoint_name>/<deployment_name>/<collection_name> | 
| collections.<collection_name>.data.version | integer | The version of the data asset to be registered with the collected data in Blob storage. | 1 | 
Remarks
The az ml online-deployment commands can be used for managing Azure Machine Learning Kubernetes online deployments.
Examples
Examples are available in the examples GitHub repository.