Hello,
Welcome to Microsoft Q&A,
This is unfortunately a known gotcha with the quota slider in the Azure AI Foundry UI when you change a deployment’s model version. sometimes it “snaps” to your entire remaining GPT-4o quota (e.g., ~29M TPM) and won’t let you drag it back down.
You could set TPM explicitly via API/CLI (bypasses the UI)
1 unit of capacity = 1,000 TPM. Use the 2023-05-01 management API.
REST:
curl -X PUT "https://management.azure.com/subscriptions/<subId>/resourceGroups/<rg>/providers/Microsoft.CognitiveServices/accounts/<aoaiResource>/deployments/<deploymentName>?api-version=2023-05-01" \
-H "Authorization: Bearer $(az account get-access-token --query accessToken -o tsv)" \
-H "Content-Type: application/json" \
-d '{
"sku": { "name": "Standard", "capacity": 10 }, // 10K TPM
"properties": { "model": { "format": "OpenAI", "name": "gpt-4o", "version": "2024-11-20" } }
}'
Azure CLI:
az cognitiveservices account deployment create \
-g <rg> -n <aoaiResource> --deployment-name <deploymentName> \
--model-name gpt-4o --model-version "2024-11-20" --model-format OpenAI \
--sku-name Standard --sku-capacity 10 # 10K TPM
https://free.blessedness.top/en-us/azure/ai-foundry/openai/how-to/quota?tabs=rest
Please upvote and accept the answer if it helps!!