Azure Custom Vision Training Stuck/Not Completing

naoki_takashima 25 Reputation points
2025-09-11T11:36:31.9333333+00:00

Overview

Since September 8, 2025, training operations for object detection models in Azure Custom Vision have been stuck and unable to complete. The training process starts but never finishes, with no clear error messages or failure indicators. This issue appears to be specific to object detection tasks, as image classification training completes normally.

Resource Tier: S0 (Standard)

Region: South Central US and Japan East (both affected)

Investigation Status

Multiple hypotheses have been tested to identify the root cause:

Project-Related Issues:

  • ❌ Attempted to copy the project to a new instance using project tokens - training still won't complete
  • ❌ Manually created a new project and imported data - training still won't complete

Data/Annotation Issues:

  • ❌ Reverted to previously successful iteration 12 annotations - training still won't complete

Service-Specific Issues:

  • ✅ Image classification tasks complete successfully (issue isolated to object detection)
  • ❌ Both Quick Training and Advanced Training modes get stuck

Resource-Related Issues:

  • Image classification completes on the same resource, suggesting resource is functional

Current Assessment

The training process for object detection models hangs indefinitely without providing failure messages or completing successfully. Since image classification training completes normally on the same resource, this suggests either:

  1. A service-level issue affecting object detection training completion
  2. Possible undocumented training limits or timeouts for object detection models
  3. A blocking condition that prevents the training process from proceeding

Reason for Inquiry

We have comprehensively tested all possible user-side causes including project settings, data, and training methods, but the issue remains unresolved. Given that this occurs only with object detection, provides no error messages, and similar issues have been reported by other users, we believe this requires investigation and resolution on the Azure Custom Vision service side, which is why we are contacting support.

Potentially similar problems have been reported by other users:

Azure AI Custom Vision
Azure AI Custom Vision
An Azure artificial intelligence service and end-to-end platform for applying computer vision to specific domains.
0 comments No comments
{count} votes

Answer accepted by question author
  1. Jerald Felix 7,520 Reputation points
    2025-09-12T00:23:02.2933333+00:00

    Hello naoki_takashima,

    Thank you for providing an incredibly detailed and thorough breakdown of the issue and the extensive troubleshooting steps you have already taken. Your investigation is very clear and helps to quickly isolate the problem.

    Based on your report and multiple other identical reports from other users starting around September 8, 2025, it is clear that this is a widespread service issue with Azure Custom Vision that is beyond your control.

    You are not alone in this. Other users have confirmed the exact same behavior :

    • The issue is specific to object detection models; image classification works fine.

    Training jobs get stuck indefinitely with no completion or failure message.

    The problem began on or around September 8, 2025.

    It is occurring across multiple Azure regions, including South Central US and Japan East.

    Your assessment is correct. Since you have systematically ruled out project, data, and resource-specific causes, and because the issue is impacting multiple tenants, the root cause is almost certainly a service-side bug within the Azure Custom Vision platform that was introduced recently. Historical data shows that similar "stuck training" incidents have occurred in the past and required a fix from the Azure product team to resolve.

    Recommendation

    Given that this is a platform-level problem, there are no further troubleshooting steps on your side that will resolve it. The only path to a solution is to ensure the Azure engineering team is aware of the widespread impact and is actively working on a fix.

    The most important action you can take is to file an official Azure Support Ticket immediately.

    Go to the Azure Portal.

    Navigate to "Help + support" and create a new support request.

    For the issue type, select "Technical."

    For the service, choose "AI Language" and then select "Custom Vision."

    In the ticket description, provide all the excellent details from your post. Crucially, mention that this is a widespread issue affecting multiple users and link to your own forum post and the others you found. This will help the support engineer escalate the ticket with higher priority.

    By filing a support ticket, you add your case to the official count of affected customers, which is the primary metric Microsoft uses to determine the severity and priority of an outage or service degradation.

    Unfortunately, you will likely have to wait for Microsoft to deploy a fix. You can monitor the Azure Status page for any official acknowledgment of the problem, although sometimes these smaller service issues are not posted there.

    Thank you for your excellent and detailed report, which is very helpful to the community and to Microsoft for diagnosing the issue.

    Best regards, Jerald Felix

    1 person found this answer helpful.

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.