Batch account stuck in deleting state

Erik Heeren 20 Reputation points
2025-10-20T08:33:54.5233333+00:00

Hi,

We've got another batch account stuck in deleting state. (see also my previous request ; I wonder if it's the same issue).

In this case, I provisioned the batch account with a pool with one node through terraform and about half an hour later tried to delete it again through terraform. The node was stuck in LEAVINGPOOL for more than a day before getting cleaned up, possibly because I clicked the "stop" button in the portal. I don't know if it's actually deleted or just not being shown any more, because navigating to the pools feature of the account in the portal leads me to a "not found" error page.

While the end goal is to get the batch account deleted, I would very much like to know what is going wrong here and how I can avoid getting into this state again, because "just create a new subscription" is not a workable solution in the long term.

Azure Batch
Azure Batch
An Azure service that provides cloud-scale job scheduling and compute management.
{count} votes

1 answer

Sort by: Most helpful
  1. Erik Heeren 20 Reputation points
    2025-10-20T09:49:53.6233333+00:00

    Hello Himanshu,

    Thanks for getting back to me. --resource-group is not an argument to az batch pool list, but if I try without I get this result:

    az batch pool list --account-name <batchaccountname>
    <urllib3.connection.HTTPSConnection object at 0x1046715e0>: Failed to establish a new connection: [Errno 8] nodename nor servname provided, or not known
    

    --force-delete is not an argument to az batch pool delete , but deleting via CLI also didn't work without.

    The "Stop" Button is Destructive so if in an Azure Batch Pool, the Stop action in the portal is a "deallocate" operation. If you interrupt a node while it is in running tasks or during its own internal cleanup process, it can get stuck in a transient state like leaving pool. The node is neither fully operational nor fully deallocated.

    The node had already been in a leavingpool state for a day before I clicked the stop button - I think it may have been what triggered it into finally getting fully cleaned up.

    Terraform initiates a delete command and waits for a successful response from Azure. If the Azure Batch service acknowledges the delete but the compute resources (Virtual Machiness or disks) take too long or they get stuck, Terraform's operation gets time out but the process continues (or fails) on the Azure side and this leaves the resource in a "Deleting" ghost state.

    What resource are you referring to here? The Azure resource? I'm not sure why a timeout on the client-side would impact resource cleanup on the server-side. If you're referring to the terraform resource: I'm not worried about that, this is a development subscription where I can simply start from scratch if need be.

    I'm going to wait to try moving the batch account to a different resource group as I'm hoping that internal support will be able to take a look at what's going wrong and instruct me as to how I can avoid this situation in the future.

    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.