Azure Document Intelligence - Automating Monthly Invoice Extraction from Yearly PDFs

PGS-7643 1 Reputation point
2025-10-20T02:50:48.6366667+00:00

Extracting invoice data from multi-page yearly PDFs—where each set of pages corresponds to a different month. The PDF can have images. Is there a better option than Split pdf based on fixed page numbers and send to Document Intelligence. The challenge with this option is when the invoice template changes.

Azure AI Document Intelligence
{count} votes

1 answer

Sort by: Most helpful
  1. Sina Salam 25,761 Reputation points Volunteer Moderator
    2025-10-22T10:39:17.5933333+00:00

    Hello PGS-7643,

    Welcome to the Microsoft Q&A and thank you for posting your questions here.

    Reading through the question and response:

    Document Intelligence does not universally auto-split repeated same-type instances within a single file unless you use the classifier/splitting features correctly or pre-process to create one invoice per input. Therefore, use Document Intelligence Classification + Splitting pipeline (recommended) The goal is to detect invoice boundaries inside the yearly PDF, split into individual invoice documents (page ranges), then run invoice extraction per invoice. However, if classification+split isn't practical: pre-process using OCR text detection + heuristic splitting (fallback) That is if you can’t train a classifier (lack of labeled multi-document inputs) and invoices have clear repeating headers/keywords (e.g., “Invoice No:” appears at top of each invoice), a robust heuristic pre-splitter can work.

    So, in a nutshell:

    I hope this is helpful! Do not hesitate to let me know if you have any other questions or clarifications.


    Please don't forget to close up the thread here by upvoting and accept it as an answer if it is helpful.

    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.