Azure Machine Learning Studio compute stuck on 'Starting up' and costing us money

Question

Azure Machine Learning Studio compute stuck on 'Starting up' and costing us money

Angus McKay 20

Twice now we've ended up with a compute instance in Azure Machine Learning Studio which is stuck on 'Setting up'. In this state it is not possible to stop the compute and the idle shutdown does not function, yet the resource still costs us money.

Last time we had to delete the resource to stop it costing us. This time I am more reluctant to do that as it is a compute which is being used in a pipeline.

Is there a way I can resolve this without deleting the compute?

Answer accepted by question author

1 additional answer

Your answer

Answer 1

Hello Angus McKay,

I understand how frustrating it can be when an Azure Machine Learning (AML) compute instance gets stuck on ‘Starting up’ (or ‘Setting up’) and continues to accrue costs, especially when it is part of a pipeline. Let’s go through some ways to resolve this without deleting the compute.

Check the Compute Status via Azure CLI

Sometimes the portal may show the instance as stuck, but you can inspect and manage it using the CLI:

# List compute instances in your workspace
az ml compute list --workspace-name <workspace-name> --resource-group <resource-group>

# Check the state of the specific compute instance
az ml compute show --name <compute-name> --workspace-name <workspace-name> --resource-group <resource-group>

If the instance is in a ‘Creating’ or ‘Starting’ state for an unusually long time, you can attempt to restart it:

az ml compute restart --name <compute-name> --workspace-name <workspace-name> --resource-group <resource-group>

Even if the portal’s stop button isn’t working, the CLI may succeed:

az ml compute stop --name <compute-name> --workspace-name <workspace-name> --resource-group <resource-group>

This can sometimes push the compute into a recoverable state without deleting it.

Check Activity Logs and Metrics in the Azure Portal for the compute resource.

Look for failed provisioning events or quota issues that might prevent the VM from starting.

Ensure your workspace has sufficient vCPU and GPU quotas for the compute SKU you’re using.

If the above steps fail:

You can clone the pipeline or notebook jobs using a new compute instance of the same SKU.

This avoids deleting your old compute but let's your work continue while support investigates the stuck instance.

I hope this is helpful to you. Let me know if you have any other questions.

Thank you!

Answer 2

Angus McKay 20

Thanks that's resolved it

SRILAKSHMI C 8,295 Reputation points Microsoft External Staff Moderator

2025-10-22T03:55:27.9466667+00:00

Hi Angus McKay,

Glad to hear the issue resolved! Thanks for confirming the solution. Since I’ve converted my earlier comment into an answer, could you please take a moment to mark it as Accepted? This helps others in the community with the same question find the solution more easily.

Thank you!

Share via

Azure Machine Learning Studio compute stuck on 'Starting up' and costing us money

1 additional answer

Your answer