Hello William Souza,
Sorry for the disruption with your Cosmos DB for MongoDB—stuck upgrades (e.g., scaling RU/s) for hours, plus new resource hangs and unexplained high CPU during low traffic, point to backend throttling or regional capacity issues, common after the 2025 Mongo API updates. As an Azure specialist, I've seen this in production—here's a concise fix path.
Quick Diagnostics
- Check Status: In portal, Cosmos DB > Your account > Metrics—look for "Provisioned Throughput" stuck in "Updating." Resource health may show "Degraded" (platform-initiated).
- Logs: Enable Diagnostic settings > Log Analytics; query for "DataPlaneRequest" errors or "UpgradeFailed." CLI:
az cosmosdb show --resource-group <rg> --name <account> --query "properties.provisioningState". - CPU Spike: High CPU with low traffic? Check for hot partitions (indexing loops)—use Query Explorer for slow Mongo queries.
Resolution Steps
- Retry with Limits: Cancel via CLI:
az cosmosdb database update --resource-group <rg> --name <account> --database <db> --throughput 400(start low). Wait 30 mins; if stuck, scale to a single-node setup temporarily. - Failover or Region Switch: If multi-region, Global distribution > Failover priority to another region (e.g., East US 2). For new resources, deploy in a different region like West Europe.
- Optimize for CPU: Add indexes on frequent queries; use autoscale (400-1000 RU/s) instead of fixed. Monitor Request units—throttling causes CPU spikes.
- Escalate: Open support ticket: Help + support > New request > Technical > Cosmos DB > Scaling. Set Severity C (hours impact); include account ID, upgrade timestamp. Resolutions often in 1-2 hours via backend force-complete.
Workaround: Export data via mongodump to a new account if urgent. Track at status.azure.com for outages.
Best Regards,
Jerald Felix