Hello Marcus Denny,
Welcome to the Microsoft Q&A and thank you for posting your questions here.
Regarding your Azure Document Intelligence prebuilt-layout that's not working on some HTML documents.
The most reliable path is to pin to the GA service and prove the behavior with minimal repros, then apply a pragmatic HTML normalization or fall back to PDF. Concretely: lock your calls to v4.0 GA (service version 2024‑11‑30) using the current REST/SDKs this avoids preview‑model variability and aligns with the documented HTML support in the Layout model (HTML is listed as a supported file type) see Layout model docs and JavaScript REST client version map (v1.x ⇒ 2024‑11‑30) - https://free.blessedness.top/azure/ai-services/document-intelligence/prebuilt/layout?view=doc-intel-4.0.0, and SDK: https://free.blessedness.top/javascript/api/overview/azure/ai-document-intelligence-rest-readme?view=azure-node-latest
When invoking, ensure the content type is text/html and start with default (JSON/text) output rather than Markdown to avoid known markdown interactions (e.g., some features not emitted with markdown) - https://free.blessedness.top/azure/ai-services/document-intelligence/prebuilt/layout?view=doc-intel-4.0.0
Next, create three tiny HTML fixtures:
(1) a rectangular table that should pass,
(2) the failing ragged‑row case (missing <td>), and
(3) the same ragged case but using colspan to keep the grid explicit, this often sidesteps parser sensitivity to implicit ragged rows (docs and known behavior around table extraction variations and version differences: https://free.blessedness.top/azure/ai-services/document-intelligence/prebuilt/layout?view=doc-intel-4.0.0, and https://github.com/Azure/azure-sdk-for-python/issues/36834.
If the ragged version still fails, pre‑normalize your HTML by inserting empty <td> cells (or using colspan) so every row matches the header/longest row; if you can’t adjust source HTML, render the HTML to PDF server‑side and submit the PDF Layout support for PDF is mature and stable -https://free.blessedness.top/azure/ai-services/document-intelligence/prebuilt/layout?view=doc-intel-4.0.0
Throughout, capture and keep the apim-request-id for each attempt and, if the GA pipeline still rejects valid HTML, open an Azure Support case with the three minimal files, the exact API version, region, and request IDs so engineering can replicate and either document a limitation or deliver a fix (SDK maintainers recommend support escalation for service‑side issues; see similar guidance in service/SDK threads: https://github.com/Azure/azure-sdk-for-net/issues/43367
You can validate REST example (HTML input, GA service, JSON output) like the below:
POST https://{your-endpoint}/documentintelligence/documentModels/prebuilt-layout:analyze?api-version=2024-11-30
Content
I hope this is helpful! Do not hesitate to let me know if you have any other questions or clarifications.
Please don't forget to close up the thread here by upvoting and accept it as an answer if it is helpful.