GPT 5 API requests - We are getting 429 responses but not even appearing to touch our quota this morning.

Question

GPT 5 API requests - We are getting 429 responses but not even appearing to touch our quota this morning.

TH-4622 80

We are getting 429 responses but not even appearing to touch our quota this morning.

GPT 5 API requests:
User's image

It should be able to handle 100 API requests per min, analytics shows no more than 10.

Locally our Developer made the same request and he's getting a 429, but even the request shows there is quota in response headers
User's image

Please can you help determine what the issue is here?

1 answer

Your answer

Answer 1

Hello TH-4622,

Welcome to the Microsoft Q&A and thank you for posting your questions here.

I understand that with your GPT 5 API requests, you are getting 429 responses but not even appearing to touch our quota this morning.

The issue is not caused by exceeding your visible quota but likely results from hidden token-per-minute (TPM) throttling, sub-minute rate bursts, or API Management policies that apply limits independent of your portal analytics. Azure OpenAI enforces both RPM (requests per minute) and TPM (tokens per minute) limits, so even a few large responses can trigger a 429, despite low request volume, see Microsoft documentation on Azure OpenAI quotas and limits.

To resolve this, capture and submit complete request-response samples with all rate-limit headers (x-rate-limit-remaining-requests, x-rate-limit-reset-tokens, x-ms-request-id, and Retry-After) and timestamps, along with your resource name, region, deployment ID, and subscription ID to Support via your Azure Portal. These data points allow Azure engineers to correlate your calls with internal logs and identify whether throttling comes from TPM exhaustion, an API gateway policy, or temporary service-side constraints. You can refer to Azure support diagnostics guidance.

Additionally, update your client code to handle all Azure-specific rate-limit headers and apply exponential backoff with jitter. A simple implementation is shown below:

import time, random
def handle_429(response, attempt):
    wait = response.headers.get("Retry-After")
    if not wait:
        wait = response.headers.get("x-rate-limit-reset-tokens")
    wait = float(wait or 0.5) * (2 ** attempt) + random.uniform(0, 0.5)
    time.sleep(min(wait, 30))

Before reopening requests, verify that no other applications share the same API key and check for custom API Management policies such as azure-openai-token-limit, which may impose additional throttles - https://free.blessedness.top/en-us/azure/api-management/api-management-policies

If token throttling remains the cause, reduce max_tokens in your calls or request a quota increase in the Azure portal under Azure OpenAI > Quotas. These validated steps, aligned with Microsoft’s official rate-limit and diagnostics documentation, will ensure accurate root-cause isolation and restore stable GPT-5 API performance.

I hope this is helpful! Do not hesitate to let me know if you have any other questions or clarifications.

Please don't forget to close up the thread here by upvoting and accept it as an answer if it is helpful.

TH-4622 80 Reputation points

2025-10-21T10:49:51.3033333+00:00

Thanks for the information. We only have Developer level support which only includes requests via this forum. Does this mean we will have to upgrade to STANDARD support to gain further engineer support? @Sina Salam we should have Microsoft engineer priority support via this forum I believe?
SRILAKSHMI C 8,275 Reputation points Microsoft External Staff Moderator

2025-10-22T04:30:01.4733333+00:00

Hi TH-4622,

Thank you for following up and for sharing that clarification.

You are correct Developer support plans provide assistance primarily through community channels such as Microsoft Q&A, while direct investigation and correlation of backend logs (e.g., throttling diagnostics, internal traces, or service-level telemetry) require a Standard or higher support plan.

At the Developer plan level, we can continue to help guide you through troubleshooting steps and configuration reviews here in the forum, but Azure engineers do not have access to resource-level diagnostics or logs under this plan.

If you would like Microsoft engineering to perform a deeper investigation (for example, to analyze internal throttling logs, quota enforcement, or backend telemetry for your OpenAI resource), you would need to upgrade your support plan to Standard or above. You can do this by visiting: https://azure.microsoft.com/support/plans/

Thank you!
TH-4622 80 Reputation points

2025-10-22T07:16:49.6333333+00:00

Thank you both for the information, I'll request from management that we upgrade our subscriptions for this.
SRILAKSHMI C 8,275 Reputation points Microsoft External Staff Moderator

2025-10-23T04:37:02.0066667+00:00

Hi TH-4622,

Thanks for letting me know. I appreciate you taking this forward with management. Please feel free to reach out if any additional information is needed from my side.

Share via

GPT 5 API requests - We are getting 429 responses but not even appearing to touch our quota this morning.

1 answer

Your answer