CosmosDB for Mongo upgrade stuck for hours

William Souza 0 Reputation points
2025-10-25T13:15:49.35+00:00

We are trying to scale a Cosmos DB for a Mongo instance, but it is getting stuck for hours. As an alternative, we tried to create another resource, but it also got stuck in the upgrade process.

This resource is showing a weird behavior, with high CPU usage even when our API has lower traffic, and we are not able to fix it, as the upgrade process is not working.
User's image

User's image

Azure Cosmos DB
Azure Cosmos DB
An Azure NoSQL database service for app development.
0 comments No comments
{count} votes

2 answers

Sort by: Most helpful
  1. Jerald Felix 8,230 Reputation points
    2025-10-27T02:28:30.4+00:00

    Hello William Souza,

    Sorry for the disruption with your Cosmos DB for MongoDB—stuck upgrades (e.g., scaling RU/s) for hours, plus new resource hangs and unexplained high CPU during low traffic, point to backend throttling or regional capacity issues, common after the 2025 Mongo API updates. As an Azure specialist, I've seen this in production—here's a concise fix path.

    Quick Diagnostics

    • Check Status: In portal, Cosmos DB > Your account > Metrics—look for "Provisioned Throughput" stuck in "Updating." Resource health may show "Degraded" (platform-initiated).
    • Logs: Enable Diagnostic settings > Log Analytics; query for "DataPlaneRequest" errors or "UpgradeFailed." CLI: az cosmosdb show --resource-group <rg> --name <account> --query "properties.provisioningState".
    • CPU Spike: High CPU with low traffic? Check for hot partitions (indexing loops)—use Query Explorer for slow Mongo queries.

    Resolution Steps

    1. Retry with Limits: Cancel via CLI: az cosmosdb database update --resource-group <rg> --name <account> --database <db> --throughput 400 (start low). Wait 30 mins; if stuck, scale to a single-node setup temporarily.
    2. Failover or Region Switch: If multi-region, Global distribution > Failover priority to another region (e.g., East US 2). For new resources, deploy in a different region like West Europe.
    3. Optimize for CPU: Add indexes on frequent queries; use autoscale (400-1000 RU/s) instead of fixed. Monitor Request units—throttling causes CPU spikes.
    4. Escalate: Open support ticket: Help + support > New request > Technical > Cosmos DB > Scaling. Set Severity C (hours impact); include account ID, upgrade timestamp. Resolutions often in 1-2 hours via backend force-complete.

    Workaround: Export data via mongodump to a new account if urgent. Track at status.azure.com for outages.

    Best Regards,

    Jerald Felix

    0 comments No comments

  2. William Souza 0 Reputation points
    2025-10-27T16:47:43.3366667+00:00

    Hi @Jerald Felix MCT

    Thank you for your reply. In the end, we fixed it by restoring a backup to a new account. We already deleted the old resources, but they were still stuck as 'updating'.


Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.