Azure Machine Learning Studio compute stuck on 'Starting up' and costing us money

Angus McKay 20 Reputation points
2025-10-20T08:14:30.3333333+00:00

Twice now we've ended up with a compute instance in Azure Machine Learning Studio which is stuck on 'Setting up'. In this state it is not possible to stop the compute and the idle shutdown does not function, yet the resource still costs us money.

Last time we had to delete the resource to stop it costing us. This time I am more reluctant to do that as it is a compute which is being used in a pipeline.

Is there a way I can resolve this without deleting the compute?

Azure Machine Learning
0 comments No comments
{count} votes

Answer accepted by question author
  1. SRILAKSHMI C 8,295 Reputation points Microsoft External Staff Moderator
    2025-10-21T13:19:28.83+00:00

    Hello Angus McKay,

    I understand how frustrating it can be when an Azure Machine Learning (AML) compute instance gets stuck on ‘Starting up’ (or ‘Setting up’) and continues to accrue costs, especially when it is part of a pipeline. Let’s go through some ways to resolve this without deleting the compute.

    Check the Compute Status via Azure CLI

    Sometimes the portal may show the instance as stuck, but you can inspect and manage it using the CLI:

    # List compute instances in your workspace
    az ml compute list --workspace-name <workspace-name> --resource-group <resource-group>
    
    # Check the state of the specific compute instance
    az ml compute show --name <compute-name> --workspace-name <workspace-name> --resource-group <resource-group>
    

    If the instance is in a ‘Creating’ or ‘Starting’ state for an unusually long time, you can attempt to restart it:

    az ml compute restart --name <compute-name> --workspace-name <workspace-name> --resource-group <resource-group>
    

    Even if the portal’s stop button isn’t working, the CLI may succeed:

    az ml compute stop --name <compute-name> --workspace-name <workspace-name> --resource-group <resource-group>
    

    This can sometimes push the compute into a recoverable state without deleting it.

    Check Activity Logs and Metrics in the Azure Portal for the compute resource.

    Look for failed provisioning events or quota issues that might prevent the VM from starting.

    Ensure your workspace has sufficient vCPU and GPU quotas for the compute SKU you’re using.

    If the above steps fail:

    You can clone the pipeline or notebook jobs using a new compute instance of the same SKU.

    This avoids deleting your old compute but let's your work continue while support investigates the stuck instance.

    I hope this is helpful to you. Let me know if you have any other questions.

    Thank you!

    2 people found this answer helpful.
    0 comments No comments

1 additional answer

Sort by: Most helpful
  1. Angus McKay 20 Reputation points
    2025-10-21T16:00:53.17+00:00

    Thanks that's resolved it


Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.