Azure Document Intelligence: Misaligned or Rotated Bounding Boxes

Al T 0 Reputation points
2024-10-19T17:24:40.22+00:00

When using Azure Document Intelligence (DI), some of the bounding boxes appear misaligned or rotated relative to the content, leading to parsing inaccuracies. What are some potential solutions to rectify this issue? API Version: 2024-07-31-preview

Note: The previous API version, 2023-07-31-GA, produces aligned polygons (without rotation), but many of the checkboxes remain disconnected from the corresponding text.

User's image

User's image

Azure AI Document Intelligence
{count} votes

1 answer

Sort by: Most helpful
  1. Vinodh247 39,376 Reputation points MVP Volunteer Moderator
    2024-10-22T09:03:48.9433333+00:00

    Summary of Steps:

    1. Preprocessing: Preprocess your documents by de-skewing and enhancing them before sending them for analysis.
    2. Custom Training: Use Azure DI's custom model training to define form relationships, grouping, and hierarchy for better layout understanding.
    3. Leverage Layout Zoning: Train the model with specific zones for form elements.
    4. Enforce Proximity and Positional Rules: During training, define proximity rules for checkboxes and their labels to ensure proper bounding box alignment.
    5. Combine Models and Fine-tuning: Use pre-trained models and customize them with specific layout adjustments and structural constraints based on your form.

    By combining these layout understanding techniques with careful training, your custom Azure DI model will better align bounding boxes with text and checkbox fields.

    Long answer with examples:

    Document Layout Understanding in Azure Document Intelligence (Azure DI) leverages structural relationships between various form elements (e.g., checkboxes, labels, text fields) to improve document parsing. When documents follow a structured format like forms, invoices, or checklists, capturing the layout and using it as part of model training or inference allows better detection of form components and their relationships.

    Here’s a more detailed breakdown of how you can incorporate document layout understanding in your model to enhance the accuracy of bounding box alignment and text parsing:

    1. Training Custom Models with Layout Understanding:

    Document Layout Constraints: When creating a custom model in Azure Document Intelligence, you can define and map form elements (text, checkboxes, dropdowns, etc.) and train the model to recognize their expected spatial relationships. For example, you can label checkboxes and the associated text, indicating their proximity or relative alignment in the training data.

    Proximity Relationships: When checkboxes or form elements are adjacent to labels, you can specify proximity constraints during training. This means that the model will expect a certain form field to have an associated label within a given distance (e.g., a checkbox should be near the associated text). If the bounding box for the checkbox is far from the text, the model can infer that it is misaligned and attempt to adjust its position.

    Hierarchical and Positional Rules: During model training, you can define hierarchical structures. For instance, a section header (like "Are you (Seller) aware of the following:") should have specific form fields (e.g., checkboxes and text options) underneath it. This structure gives the model better understanding of how fields relate to each other based on layout rules.

    Example:

    • In your form, you have checkboxes next to labeled items like "Burglar Alarms" or "Microwave." You would tag the checkboxes and labels together during training, specifying that the checkbox should be interpreted as "attached" to the label based on its position. This helps the model understand that, even if the checkbox bounding box is misaligned slightly, it should be linked to the label nearby.
    1. Utilizing Field Order and Grouping:

    Form Element Ordering: Forms often follow predictable sequences, such as a checkbox next to a description followed by another checkbox and description in a list. When training or using the model, you can enforce these ordering constraints to ensure that if a checkbox is misaligned, it can be realigned based on its expected sequence.

    Example: In a checklist form like yours, there are multiple fields under the category "The subject property has the items checked below." The model can be trained to understand that every checkbox in this section is part of the same group, so if one is misaligned or missing, the model can infer its expected position based on surrounding elements.

    1. Field Type Definitions:

    Defining Field Types: In Azure DI’s custom models, you can define field types (checkboxes, text fields, radio buttons, etc.). This helps the model differentiate between different types of form elements based on their appearance and behavior. If the form has specific structural types like radio buttons versus checkboxes, the model can use this distinction to better understand relationships and align bounding boxes accordingly.

    Example: In your form, checkboxes could be misinterpreted as text or vice versa. By specifying that a checkbox should be associated with certain binary responses (like "Yes" or "No") and is typically located in a specific location (e.g., to the left of a label), you can ensure that the model processes them as checkboxes and aligns them correctly with their labels.

    1. Page Layout and Document Zones:

    Zonal Layout Training: You can train the model to understand that certain areas (zones) of a document are reserved for specific content. For instance, in a contract or real estate form, the top-left corner could be designated for general information (like personal details), while the bottom section could be for signatures. This zoning helps with proper alignment and recognition of fields in structured forms.

    Example: In your form, the model can be trained to understand that checkboxes for “Property Features” should appear in a specific zone on the page, even if the input document has slight variations. If a bounding box appears outside the expected zone, the model will attempt to adjust it to the right place.

    1. Handling Rotated/Misaligned Documents with Layout Understanding:
    • Skew and Rotation Handling: If you encounter rotated or skewed documents (like the ones you are experiencing), the layout understanding feature can be combined with preprocessing steps to correct for skew. Once the document is aligned, the model uses layout constraints (proximity, hierarchy, zoning) to interpret the elements correctly even if they were initially misaligned.
    1. Contextual Relationship Mapping:

    Contextual Parsing: If your document contains complex relationships between form elements (e.g., a checkbox dependent on a selected option or condition), you can incorporate this logic into the model. This would allow it to infer when a form field (like a checkbox) should be linked to another based on context.

    Example: If a “Yes” or “No” checkbox in your form is misaligned or interpreted incorrectly, the model can use context (like the related question) to adjust the placement and ensure the checkbox aligns with the correct option.

    1. Leveraging Pre-built Models with Custom Modifications:
    • Azure DI provides some pre-trained models (like form recognizer models) that have basic layout understanding for common documents. You can start with these models and customize them further for your specific form, focusing on improving checkbox alignment and bounding box accuracy.
    • Enhancements: You can build on these pre-trained models by adding your form's specific layout, relationships, and structure during fine-tuning or retraining.
    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.