How to fix incorrect coordinates from Azure AI vision Read API

Vikash Bharti 0 Reputation points
2025-10-06T13:19:17.1433333+00:00

We are using Azure AI Vision API (Read OCR) as part of our production OCR pipeline for text extraction and bounding box generation in Monotype’s AI-based font recognition services (e.g., WhatTheFont / Font-Lens).

After successful PoC completion, we have migrated 20% of our live production traffic (≈14 million monthly OCR calls) from Google Vision API to Azure AI Vision OCR. However, we are now observing inaccurate and inconsistent bounding box coordinates in the OCR response. These inaccuracies cause cropped previews to appear misaligned and truncated leading to degraded recognition accuracy and poor visual output for end users.

This issue is directly impacting our live customer experience and production service reliability.

Azure AI services
Azure AI services
A group of Azure services, SDKs, and APIs designed to make apps more intelligent, engaging, and discoverable.
0 comments No comments
{count} votes

2 answers

Sort by: Most helpful
  1. Sina Salam 25,761 Reputation points Volunteer Moderator
    2025-10-06T17:24:10.4933333+00:00

    Hello Vikash Bharti,

    Welcome to the Microsoft Q&A and thank you for posting your questions here.

    I understand that you are seeking a precise and production-grade solution to bounding box inaccuracies in Azure AI Vision Read OCR, especially as it affects Monotype’s font recognition services.

    To address this, the most effective strategy is to migrate from Azure Read OCR to Azure Document Intelligence v4.0, which offers layout-aware polygonal bounding boxes, paragraph detection, and custom model training for complex document structures – https://free.blessedness.top/en-us/azure/ai-services/document-intelligence/prebuilt/read?view=doc-intel-4.0.0 This significantly improves bounding box precision and alignment in real-world OCR pipelines.

    For bounding box normalization, convert Azure’s polygon coordinates (x1,y1...x4,y4) into rectangular (x, y, width, height) format using OpenCV or PIL. This ensures compatibility with cropping and rendering tools. If working with PDFs or scanned documents, apply DPI conversion to handle units correctly – See samples here - https://stackoverflow.com/questions/78866090/how-to-draw-bounding-boxes-around-sections-in-the-result-from-azure-document-int

    To correct misaligned or rotated text regions, apply a rotation matrix based on the average skew angle detected across bounding boxes. This post-processing step realigns bounding boxes with the actual text orientation – This answer on this platform could be of help here - https://free.blessedness.top/en-us/answers/questions/2107296/azure-document-intelligence-misaligned-or-rotated

    Next, filter out bounding boxes with confidence scores below 0.8 to eliminate false positives and improve visual output. This thresholding is a proven method to enhance OCR reliability – https://stackoverflow.com/questions/78991568/how-to-increase-accuracy-of-text-read-in-images-using-microsoft-azure-computer-v

    For resilience, implement a hybrid OCR strategy by integrating Google Vision API as a fallback. Merge results using fuzzy text matching and bounding box overlap logic to ensure consistent recognition across platforms – https://persumi.com/c/product-builders/u/fredwu/p/comparison-of-ai-ocr-tools-microsoft-azure-ai-document-intelligence-google-cloud-document-ai-aws-textract-and-others

    Finally, build a visual validation pipeline to overlay bounding boxes on images. This allows manual inspection and automated flagging of misaligned regions, ensuring quality control before results reach end users – https://www.syntera.ch/blog/2023/01/05/drawing-bounding-boxes-from-azures-form-recognizer-results/

    I hope this is helpful! Do not hesitate to let me know if you have any other questions or clarifications.


    Please don't forget to close up the thread here by upvoting and accept it as an answer if it is helpful.


  2. Anshika Varshney 1,910 Reputation points Microsoft External Staff Moderator
    2025-10-06T19:18:24.1933333+00:00

    hello Vikash Bharti,

    Welcome to the Microsoft Q&A and thank you for posting your questions here.

    Based on your description, it seems that the bounding box coordinates returned by the azure ai vision read API are not aligning correctly with the extracted text regions. this can happen due to image orientation, scaling, or preprocessing differences between input formats.

    first, please make sure the input images are not being resized or transformed after being uploaded. azure’s read api provides bounding boxes relative to the original image size, so any resizing or compression before or after submission can cause coordinate shifts. it’s also recommended to check if the image rotation metadata (exif orientation) is being handled properly, as unhandled rotations can misalign coordinates.

    second, you can apply a normalization step before processing. for example, convert all images to a fixed size and consistent dpi, and then map the coordinates using the same scale when rendering bounding boxes. if you are batching multiple pages or images, ensure each image’s response coordinates are interpreted independently.

    as per the current situation, the possible solutions to the above problem could be:
    review the preprocessing pipeline, verify that coordinate scaling matches the original resolution, and test with a few sample images using the vision studio to confirm if the issue persists.

    please let me know if you have any query.

    If you feel that your quires have been resolved, please accept the answer by clicking the "Upvote" and "Accept Answer" on the post.

    Thankyou!


Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.