Azure Document Intelligence prebuilt-layout not working on some HTML documents

Marcus Denny 25 Reputation points
2025-10-20T04:15:20.91+00:00

We use the Azure AI Document Intelligence layout models on PDF documents without any issues.However when we try to run layout on HTML files the analyser often returns an error.

I have narrowed the cause down to uneven tables in HTML docs.

The following HTML file will be analysed successfully

<html>
<head></head>
<body>
<table>
<tbody>
<tr><td>Col 1</td><td>Col 2</td><td>Col 3</td></tr>
<tr><td>Col 1</td><td>Col 2</td></tr>
</tbody>
</table>
</body>
</html>

But if I remove one of the columns in the second row, the file will not analyse successfully


<html> 
<head></head>
<body>
<table>
<tbody>
<tr><td>Col 1</td><td>Col 2</td><td>Col 3</td></tr>
<tr><td>Col 1</td><td>Col 2</td></tr>
</tbody>
</table>
</body>
</html>

This is valid HTML, and the document intelligence studio even renders a preview of the file successfully, but when I try to analyse I get eg.

UnsupportedContent

Content is not supported: The input content is corrupted or format is invalid. apim-request-id: 6fe94b20-15b1-40ea-9c33-35dad999a192

We have tried using the studio and also direct using the Azure SDK.

Azure AI Document Intelligence
{count} votes

1 answer

Sort by: Most helpful
  1. Sina Salam 25,761 Reputation points Volunteer Moderator
    2025-10-22T10:58:33.54+00:00

    Hello Marcus Denny,

    Welcome to the Microsoft Q&A and thank you for posting your questions here.

    Regarding your Azure Document Intelligence prebuilt-layout that's not working on some HTML documents.

    The most reliable path is to pin to the GA service and prove the behavior with minimal repros, then apply a pragmatic HTML normalization or fall back to PDF. Concretely: lock your calls to v4.0 GA (service version 2024‑11‑30) using the current REST/SDKs this avoids preview‑model variability and aligns with the documented HTML support in the Layout model (HTML is listed as a supported file type) see Layout model docs and JavaScript REST client version map (v1.x ⇒ 2024‑11‑30) - https://free.blessedness.top/azure/ai-services/document-intelligence/prebuilt/layout?view=doc-intel-4.0.0, and SDK: https://free.blessedness.top/javascript/api/overview/azure/ai-document-intelligence-rest-readme?view=azure-node-latest

    When invoking, ensure the content type is text/html and start with default (JSON/text) output rather than Markdown to avoid known markdown interactions (e.g., some features not emitted with markdown) - https://free.blessedness.top/azure/ai-services/document-intelligence/prebuilt/layout?view=doc-intel-4.0.0

    Next, create three tiny HTML fixtures:

    (1) a rectangular table that should pass,

    (2) the failing ragged‑row case (missing <td>), and

    (3) the same ragged case but using colspan to keep the grid explicit, this often sidesteps parser sensitivity to implicit ragged rows (docs and known behavior around table extraction variations and version differences: https://free.blessedness.top/azure/ai-services/document-intelligence/prebuilt/layout?view=doc-intel-4.0.0, and https://github.com/Azure/azure-sdk-for-python/issues/36834.

    If the ragged version still fails, pre‑normalize your HTML by inserting empty <td> cells (or using colspan) so every row matches the header/longest row; if you can’t adjust source HTML, render the HTML to PDF server‑side and submit the PDF Layout support for PDF is mature and stable -https://free.blessedness.top/azure/ai-services/document-intelligence/prebuilt/layout?view=doc-intel-4.0.0

    Throughout, capture and keep the apim-request-id for each attempt and, if the GA pipeline still rejects valid HTML, open an Azure Support case with the three minimal files, the exact API version, region, and request IDs so engineering can replicate and either document a limitation or deliver a fix (SDK maintainers recommend support escalation for service‑side issues; see similar guidance in service/SDK threads: https://github.com/Azure/azure-sdk-for-net/issues/43367

    You can validate REST example (HTML input, GA service, JSON output) like the below:

    POST https://{your-endpoint}/documentintelligence/documentModels/prebuilt-layout:analyze?api-version=2024-11-30 
    Content
    

    I hope this is helpful! Do not hesitate to let me know if you have any other questions or clarifications.


    Please don't forget to close up the thread here by upvoting and accept it as an answer if it is helpful.

    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.