Batch account stuck in deleting state

Question

Batch account stuck in deleting state

Erik Heeren 20

Hi,

We've got another batch account stuck in deleting state. (see also my previous request ; I wonder if it's the same issue).

In this case, I provisioned the batch account with a pool with one node through terraform and about half an hour later tried to delete it again through terraform. The node was stuck in LEAVINGPOOL for more than a day before getting cleaned up, possibly because I clicked the "stop" button in the portal. I don't know if it's actually deleted or just not being shown any more, because navigating to the pools feature of the account in the portal leads me to a "not found" error page.

While the end goal is to get the batch account deleted, I would very much like to know what is going wrong here and how I can avoid getting into this state again, because "just create a new subscription" is not a workable solution in the long term.

Raphael Bickel 0 Reputation points Microsoft Employee

2025-10-20T08:42:09.76+00:00

+1 following
Himanshu Shekhar 1,170 Reputation points Microsoft External Staff Moderator

2025-10-20T08:57:40.8666667+00:00
Hello @Erik Heeren

Welcome to Microsoft Q&A Platform. Thank you for reaching out & hope you are doing well.

List all pools in the Batch account (replace with your details)

az batch pool list --account-name YourBatchAccountName --resource-group YourResourceGroupName

If you can see the pool, delete it with the --force-deletion flag

az batch pool delete --pool-id YourProblematicPoolId --account-name YourBatchAccountName --resource-group YourResourceGroupName --force-deletion true

And If you are using Azure PowerShell:

List all pools

Get-AzBatchPool -BatchAccountName YourBatchAccountName -ResourceGroupName YourResourceGroupName

Force delete the pool

Remove-AzBatchPool -Id YourProblematicPoolId -BatchAccountName YourBatchAccountName -ResourceGroupName YourResourceGroupName -Force

Alternative way is to Force Delete the Batch Account via Move-Resources (https://free.blessedness.top/en-us/azure/azure-resource-manager/management/move-resource-group-and-subscription)

If the pool cannot be deleted or the account is still stuck, you can use the resource group-level operation to force the deletion. This method works by moving yout stuck Batch account to a temporary resource group, which will force a cleanup.

bash

First Create a temporary resource group : az group create --name TempForceDeleteRG --location <region>

Move the BATCH ACCOUNT (not the pool) to the new resource group.

This often triggers a background process that cleans up the stuck state.

az resource move --destination-group TempForceDeleteRG --ids "/subscriptions/YourSubscriptionId/resourceGroups/YourOriginalRG/providers/Microsoft.Batch/batchAccounts/YourBatchAccountName"

Now try to delete the temporary resource group and everything in it.

az group delete --name TempForceDeleteRG --yes

Additionally, we can suggest few points as below:

The "Stop" Button is Destructive so if in an Azure Batch Pool, the Stop action in the portal is a "deallocate" operation. If you interrupt a node while it is in running tasks or during its own internal cleanup process, it can get stuck in a transient state like leaving pool. The node is neither fully operational nor fully deallocated.

Terraform initiates a delete command and waits for a successful response from Azure. If the Azure Batch service acknowledges the delete but the compute resources (Virtual Machiness or disks) take too long or they get stuck, Terraform's operation gets time out but the process continues (or fails) on the Azure side and this leaves the resource in a "Deleting" ghost state.

Kindly let us know if the suggested steps helps or you need further assistance on this issue.

Regards

Himanshu

1 answer

Your answer

Raphael Bickel 0 Reputation points Microsoft Employee

2025-10-20T08:42:09.76+00:00

+1 following

Answer 1

Hello Himanshu,

Thanks for getting back to me. --resource-group is not an argument to az batch pool list, but if I try without I get this result:

az batch pool list --account-name <batchaccountname>
<urllib3.connection.HTTPSConnection object at 0x1046715e0>: Failed to establish a new connection: [Errno 8] nodename nor servname provided, or not known

--force-delete is not an argument to az batch pool delete , but deleting via CLI also didn't work without.

The "Stop" Button is Destructive so if in an Azure Batch Pool, the Stop action in the portal is a "deallocate" operation. If you interrupt a node while it is in running tasks or during its own internal cleanup process, it can get stuck in a transient state like leaving pool. The node is neither fully operational nor fully deallocated.

The node had already been in a leavingpool state for a day before I clicked the stop button - I think it may have been what triggered it into finally getting fully cleaned up.

Terraform initiates a delete command and waits for a successful response from Azure. If the Azure Batch service acknowledges the delete but the compute resources (Virtual Machiness or disks) take too long or they get stuck, Terraform's operation gets time out but the process continues (or fails) on the Azure side and this leaves the resource in a "Deleting" ghost state.

What resource are you referring to here? The Azure resource? I'm not sure why a timeout on the client-side would impact resource cleanup on the server-side. If you're referring to the terraform resource: I'm not worried about that, this is a development subscription where I can simply start from scratch if need be.

I'm going to wait to try moving the batch account to a different resource group as I'm hoping that internal support will be able to take a look at what's going wrong and instruct me as to how I can avoid this situation in the future.

Share via

Batch account stuck in deleting state

1 answer

Your answer