Azure Document Intelligence prebuilt-layout not working on some HTML documents

Question

Azure Document Intelligence prebuilt-layout not working on some HTML documents

Marcus Denny 25

We use the Azure AI Document Intelligence layout models on PDF documents without any issues.However when we try to run layout on HTML files the analyser often returns an error.

I have narrowed the cause down to uneven tables in HTML docs.

The following HTML file will be analysed successfully

<html>
<head></head>
<body>
<table>
<tbody>
<tr><td>Col 1</td><td>Col 2</td><td>Col 3</td></tr>
<tr><td>Col 1</td><td>Col 2</td></tr>
</tbody>
</table>
</body>
</html>

But if I remove one of the columns in the second row, the file will not analyse successfully


<html> 
<head></head>
<body>
<table>
<tbody>
<tr><td>Col 1</td><td>Col 2</td><td>Col 3</td></tr>
<tr><td>Col 1</td><td>Col 2</td></tr>
</tbody>
</table>
</body>
</html>

This is valid HTML, and the document intelligence studio even renders a preview of the file successfully, but when I try to analyse I get eg.

UnsupportedContent

Content is not supported: The input content is corrupted or format is invalid. apim-request-id: 6fe94b20-15b1-40ea-9c33-35dad999a192

We have tried using the studio and also direct using the Azure SDK.

Sridhar M 1,220 Reputation points Microsoft External Staff Moderator

2025-10-20T05:16:22.9766667+00:00
Hi Marcus Denny,

Azure AI Document Intelligence layout model when processing HTML documents. it appears that the unevenness in table structure within your HTML files is causing the model to throw errors, specifically indicating "Content is not supported."

Here are a few things you can try to address the issue:

Ensure Consistency in Table Structure: Since you've noted that removing a column causes the analysis to fail, make sure that all rows have the same number of columns across the table. Maintaining a consistent structure is crucial for the layout analysis to function correctly.

Valid HTML Structure: While your HTML is valid, double-check that all tags are properly closed and nested. Inconsistent HTML can lead to unexpected behavior during processing.

Document Size Limit: Confirm that your HTML document does not exceed the 500 MB size limit imposed by the Azure Document Intelligence service. This limit applies to both single HTML files and the total size of any nested content.

Use Azure Monitor Logs: If the issue persists, use Azure Monitor to check logs related to your document analysis. This could reveal more details about why the analysis fails.

Consider Custom Models: If your documents frequently have complex layouts that the prebuilt model struggles with, think about training a custom extraction model. This approach allows the model to learn from labeled examples specific to your documents.

Reference

https://free.blessedness.top/en-us/azure/ai-services/document-intelligence/service-limits?view=doc-intel-4.0.0

https://free.blessedness.top/en-us/azure/ai-services/document-intelligence/prebuilt/layout?view=doc-intel-4.0.0&tabs=rest%2Csample-code#input-requirements

https://free.blessedness.top/en-us/azure/ai-services/document-intelligence/?view=doc-intel-4.0.0
Sridhar M 1,220 Reputation points Microsoft External Staff Moderator

2025-10-21T08:51:54.46+00:00

Hi Marcus Denny,

Did you get any chance to review the above response.

Thank you!
Sridhar M 1,220 Reputation points Microsoft External Staff Moderator

2025-10-22T08:24:39.78+00:00

Hi Marcus Denny

We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. In case if you have any resolution please do share that same with the community as it can be helpful to others. Otherwise, will respond with more details and we will try to help.

1 answer

Your answer

Sridhar M 1,220 Reputation points Microsoft External Staff Moderator

2025-10-20T05:16:22.9766667+00:00

Hi Marcus Denny,

Azure AI Document Intelligence layout model when processing HTML documents. it appears that the unevenness in table structure within your HTML files is causing the model to throw errors, specifically indicating "Content is not supported."

Here are a few things you can try to address the issue:

Ensure Consistency in Table Structure: Since you've noted that removing a column causes the analysis to fail, make sure that all rows have the same number of columns across the table. Maintaining a consistent structure is crucial for the layout analysis to function correctly.

Valid HTML Structure: While your HTML is valid, double-check that all tags are properly closed and nested. Inconsistent HTML can lead to unexpected behavior during processing.

Document Size Limit: Confirm that your HTML document does not exceed the 500 MB size limit imposed by the Azure Document Intelligence service. This limit applies to both single HTML files and the total size of any nested content.

Use Azure Monitor Logs: If the issue persists, use Azure Monitor to check logs related to your document analysis. This could reveal more details about why the analysis fails.

Consider Custom Models: If your documents frequently have complex layouts that the prebuilt model struggles with, think about training a custom extraction model. This approach allows the model to learn from labeled examples specific to your documents.

Reference

https://free.blessedness.top/en-us/azure/ai-services/document-intelligence/service-limits?view=doc-intel-4.0.0

https://free.blessedness.top/en-us/azure/ai-services/document-intelligence/prebuilt/layout?view=doc-intel-4.0.0&tabs=rest%2Csample-code#input-requirements

https://free.blessedness.top/en-us/azure/ai-services/document-intelligence/?view=doc-intel-4.0.0
Sridhar M 1,220 Reputation points Microsoft External Staff Moderator

2025-10-21T08:51:54.46+00:00

Hi Marcus Denny,

Did you get any chance to review the above response.

Thank you!
Sridhar M 1,220 Reputation points Microsoft External Staff Moderator

2025-10-22T08:24:39.78+00:00

Hi Marcus Denny

We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. In case if you have any resolution please do share that same with the community as it can be helpful to others. Otherwise, will respond with more details and we will try to help.

Answer 1

Hello Marcus Denny,

Welcome to the Microsoft Q&A and thank you for posting your questions here.

Regarding your Azure Document Intelligence prebuilt-layout that's not working on some HTML documents.

The most reliable path is to pin to the GA service and prove the behavior with minimal repros, then apply a pragmatic HTML normalization or fall back to PDF. Concretely: lock your calls to v4.0 GA (service version 2024‑11‑30) using the current REST/SDKs this avoids preview‑model variability and aligns with the documented HTML support in the Layout model (HTML is listed as a supported file type) see Layout model docs and JavaScript REST client version map (v1.x ⇒ 2024‑11‑30) - https://free.blessedness.top/azure/ai-services/document-intelligence/prebuilt/layout?view=doc-intel-4.0.0, and SDK: https://free.blessedness.top/javascript/api/overview/azure/ai-document-intelligence-rest-readme?view=azure-node-latest

When invoking, ensure the content type is text/html and start with default (JSON/text) output rather than Markdown to avoid known markdown interactions (e.g., some features not emitted with markdown) - https://free.blessedness.top/azure/ai-services/document-intelligence/prebuilt/layout?view=doc-intel-4.0.0

Next, create three tiny HTML fixtures:

(1) a rectangular table that should pass,

(2) the failing ragged‑row case (missing <td>), and

(3) the same ragged case but using colspan to keep the grid explicit, this often sidesteps parser sensitivity to implicit ragged rows (docs and known behavior around table extraction variations and version differences: https://free.blessedness.top/azure/ai-services/document-intelligence/prebuilt/layout?view=doc-intel-4.0.0, and https://github.com/Azure/azure-sdk-for-python/issues/36834.

If the ragged version still fails, pre‑normalize your HTML by inserting empty <td> cells (or using colspan) so every row matches the header/longest row; if you can’t adjust source HTML, render the HTML to PDF server‑side and submit the PDF Layout support for PDF is mature and stable -https://free.blessedness.top/azure/ai-services/document-intelligence/prebuilt/layout?view=doc-intel-4.0.0

Throughout, capture and keep the apim-request-id for each attempt and, if the GA pipeline still rejects valid HTML, open an Azure Support case with the three minimal files, the exact API version, region, and request IDs so engineering can replicate and either document a limitation or deliver a fix (SDK maintainers recommend support escalation for service‑side issues; see similar guidance in service/SDK threads: https://github.com/Azure/azure-sdk-for-net/issues/43367

You can validate REST example (HTML input, GA service, JSON output) like the below:

POST https://{your-endpoint}/documentintelligence/documentModels/prebuilt-layout:analyze?api-version=2024-11-30 
Content

I hope this is helpful! Do not hesitate to let me know if you have any other questions or clarifications.

Please don't forget to close up the thread here by upvoting and accept it as an answer if it is helpful.

Share via

Azure Document Intelligence prebuilt-layout not working on some HTML documents

1 answer

Your answer