Hi there zhuzin zhuzin
It sounds like your issue is related to regional availability and load balancing of the custom model endpoint. In Azure AI Foundry, deployed models are bound to the region where the deployment happens, and cross-region API calls can sometimes experience latency spikes or intermittent 503 errors. The recommended approach is to either replicate your model to multiple regions where your users or applications will access it or use Azure Front Door / Traffic Manager to route requests to the nearest healthy region. Also, check the endpoint scaling settings in Foundry — increasing the number of replicas can improve reliability for high-latency or burst traffic scenarios.