Hi ,
Thanks for reaching out to Microsoft Q&A.
No, this is not the intended behaviour of Document Intelligence but seems to be a known limitation in certain layout or prebuilt models. When a completely blank page (page 3 in your PDF?) appears, the service may assume it is the end of meaningful content, especially if the blank page occurs early in the document. This would have caused the result in premature truncation during the processing.
Think that the doc int’s page segmentation logic may mistakenly treat the blank page as a signal to stop processing, especially when using specific models that are optimized for structured documents like invoices, contracts, or layouts.
Workarounds that you can try :
Preprocessing PDF to remove blank pages (recommended)
Use a preprocessing step to automatically remove blank pages before uploading:
- With
PyMuPDF,pdfplumber, orpdfminer.sixin Python, you can detect and drop pages with no text or pixel content. This keeps your pipeline dynamic and avoids hardcoding page ranges.
Switch to Read API (Layout model)
If you are not using custom models and only need text, try the Read API or Layout model, which tends to be more tolerant of blank pages and processes all pages unless explicitly told to skip.
Use OCR fallback logic
If only some pages are missed, run a secondary OCR pass (Azure’s Read API or Tesseract) on missing pages and stitch results manually.
Please 'Upvote'(Thumbs-up) and 'Accept' as answer if the reply was helpful. This will be benefitting other community members who face the same issue.