Custom Vision AI Object Detection training is hung

jpaolino 0 Reputation points
2025-09-11T13:09:15.4533333+00:00

I uploaded 25 images, tagged items in them (at or close to 300 regions per image) and attempted to train with a 3 hour budget on that data set. It has been over 16 hours now and there has been no observable progress, no 'training time' recorded in the metrics, and the training is hung. There is no way to delete (grayed out) or stop the training and no indication as to what is holding it up. It appears that I have no recourse to intervene with the training process once it has begun. Is there a way to have an engineer on the Azure side take a look at what is happening on the backend of this process or to force a cancel so I can retry with Quick Training instead of Advanced Training to see if that could work?

Azure AI Custom Vision
Azure AI Custom Vision
An Azure artificial intelligence service and end-to-end platform for applying computer vision to specific domains.
{count} votes

2 answers

Sort by: Most helpful
  1. SRILAKSHMI C 8,275 Reputation points Microsoft External Staff Moderator
    2025-09-15T15:32:33.3933333+00:00

    Hello jpaolino,

    Welcome to Microsoft Q&A and Thank you for reaching out.

    It sounds like you’re experiencing a frustrating issue with your Custom Vision AI Object Detection training hanging for an unusually long time. While training times can vary depending on dataset size and complexity, 16 hours is far beyond what would normally be expected for a dataset with 25 images. The fact that no training time is being recorded indicates that the job is likely stuck rather than just slow.

    Unfortunately, once a training job is started in Custom Vision, there is no manual way to stop or delete it from the portal if the Stop option is grayed out. These jobs run on Azure’s managed backend, and only the service itself can resolve or terminate a hung process.

    Your dataset setup may also be contributing to the problem. With ~300 regions tagged per image, you are effectively training on around 7,500 regions in total, which is very large for object detection. This complexity could be straining the training pipeline, especially when using Advanced Training with a 3-hour budget. For this reason, it’s a good idea to start with Quick Training on a smaller subset of your data to validate the setup before committing to longer training runs.

    Here are the best steps you can take:

    Reduce the number of tagged regions per image or start with fewer images. Focus only on the most important objects to make the dataset more manageable for the training pipeline.

    Instead of starting with Advanced Training and a long budget, try Quick Training first. Quick Training is faster and can validate whether your dataset is structured correctly. Once you confirm it works, you can gradually move to Advanced Training with larger data and more regions.

    Even if no training time is being logged, it’s important to check the Azure Service Health Dashboard. Sometimes jobs can hang due to regional service delays or resource availability issues. Monitoring metrics will help you confirm whether the issue is local to your dataset or related to the Azure backend.

    I Hope this helps. Do let me know if you have any further queries.

    Thank you!

    1 person found this answer helpful.
    0 comments No comments

  2. Deleted

    This answer has been deleted due to a violation of our Code of Conduct. The answer was manually reported or identified through automated detection before action was taken. Please refer to our Code of Conduct for more information.


    Comments have been turned off. Learn more

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.