how to fix Vision Layout not getting all the correct results from nearly identical pdf's

JB 1 Reputation point
2025-06-03T20:36:19.74+00:00

I am submitting two PDF's to Vision Layout endpoint so I can extract the check marked Utilities located from this section of the pdf.
User's image

I first split the file and remove the first page. The other two are sent separately through Vision The first page works flawlessly and returns Water as 'Selected'.

User's image

"content": "Water , St. lights Storm\n:selected:\n:unselected: :unselected:"

The second page comes through, but the output doesn't see this correctly. We have figured out that it is proximity of the text above that is messing it up.

Page 1
User's image

Versus Page 2

User's image

I'm sending the results to a function to return the label of any checked values but how do i get Vision to correctly process the file?

I think i have a work around in this instance by redacting that text area before sending it to Vision but that won't help when I get to processing other pdf's.

Here is the file I am trying to process.
184 King St - 20251915589.pdf

Azure AI Custom Vision
Azure AI Custom Vision
An Azure artificial intelligence service and end-to-end platform for applying computer vision to specific domains.
{count} votes

1 answer

Sort by: Most helpful
  1. Ravada Shivaprasad 1,955 Reputation points Microsoft External Staff Moderator
    2025-06-05T00:05:17.67+00:00

    Hi JB

    The issue you're encountering with Azure's Vision Layout service misinterpreting check-marked utilities on the second page of your PDF stems from layout interference—specifically, the proximity of unrelated text above the target section. While the first page is processed correctly, the second page fails due to this spatial overlap, which disrupts the model's ability to accurately associate checkboxes with their corresponding labels.

    To address this, the first step is to identify the problematic area on the second page where the interference occurs. This typically involves analyzing the layout output from Vision to pinpoint where the text blocks are too close or overlapping. Once identified, a practical workaround is to redact or remove the interfering text before submitting the page to the Vision Layout service. This can be done programmatically using PDF processing libraries like PyMuPDF or PDFPlumber.

    Optimizing the document layout is also crucial. This includes removing unnecessary elements, ensuring consistent spacing, and possibly reformatting the document to isolate form fields more clearly. After redaction and optimization, re-submit the page to the Vision Layout service and verify whether the check-marked utilities are now correctly extracted.

    While redaction is a viable short-term fix, for broader scalability across varied documents, consider implementing a preprocessing pipeline that dynamically detects and isolates form sections based on layout heuristics or visual zoning. This approach can help maintain accuracy even when document structures vary.

    For more on Azure's Vision Layout capabilities and best practices, you can refer to the official documentation: Azure AI Vision Documentation Hub

    Thanks


Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.