Speed of custom extraction models compared to prebuilt models

Bogdan Pechounov 125 Reputation points
2025-10-17T20:00:04.9+00:00

For a document with 13 pages, the model "prebuilt-layout" takes about 12 seconds to analyze it. However, with a custom extraction model, it can take between 2 and 4 minutes. Is this normal? (Are less ressources and instances allocated to custom models compared to prebuilt models)

I am assuming that the OCR and layout/table analysis is very similar between the 2 models, but I am not sure why the extraction part would take minutes. Could the number of fields make a difference? I am thinking that the number of fields/classes only affects the classifier head, which is negligible compared to the base model.

Azure AI Document Intelligence
0 comments No comments
{count} votes

Answer accepted by question author
  1. Nikhil Jha (Accenture International Limited) 2,220 Reputation points Microsoft External Staff Moderator
    2025-10-22T07:18:27.6033333+00:00

    Hello Bogdan Pechounov,

    I understand you're seeing a significant latency difference between the prebuilt-layout model and your custom extraction model for the same 13-page document.

    The answer from the community volunteer correctly identifies the most likely cause. I'll add a few more details and a clear diagnostic step to help you confirm the root cause.

    Your assumption is correct that the initial OCR and layout analysis is similar for both. However, the prebuilt-layout model's job stops there. A custom extraction model performs one or two additional, computationally expensive steps that prebuilt-layout does not:

    1. Classification (If Composed): This is the most likely source of the added time. Before extracting, the service must run a classifier on all 13 pages to decide which of your sub-models (e.g., "Invoice," "Purchase Order") to use. This is a full model inference pass across the entire document.
    2. Extraction (Full Inference): After classification, the selected custom model (a neural or template model) must run a second full inference pass over all 13 pages to find your specific labeled fields.

    To your other question, it's not that custom models get "fewer" resources, but that they are tenant-specific and run a much more complex, multi-stage pipeline compared to the highly optimized, single-purpose prebuilt-layout model.

    Recommended Steps:

    1: Isolate the Bottleneck (Classification vs. Extraction)

    The best way to diagnose this is to determine if the latency is from the classification step or the extraction step. If you are using a composed model, try analyzing the document again, but this time, call one of your sub-models directly instead of calling the composed model ID.

    2: Analyze the Test Results

    • If the single sub-model is still 2-4 minutes: This is less common but indicates the extraction model itself is slow. This can be due to the document's complexity (e.g., extremely high-resolution images, dense tables) or a transient regional capacity issue.
    • If the single sub-model is significantly faster (e.g., 30-60 seconds): This confirms the 2–4-minute latency is being caused by the classification step in your composed model. This is normal behavior, as the classifier must run on every page.

    3: Address Your "Number of Fields" Question

    You are correct. The number of fields you are extracting (e.g., 50 vs. 100) has a negligible impact on inference time. The number of models you compose (e.g., 5 models vs. 10) has a very large impact because it makes the classification step more complex.

    Documentation for reference:


    Please accept the answer and upvote for visibility to other community members.

    1 person found this answer helpful.

1 additional answer

Sort by: Most helpful
  1. Divyesh Govaerdhanan 9,355 Reputation points
    2025-10-18T17:44:19.1866667+00:00

    Hello,

    Welcome to Microsoft Q&A,

    yes, what you’re seeing is normal. You’re comparing layout-only (OCR + structure) to custom extraction, which adds classification/routing and field-level inference. That extra work (especially with composed models) is where the minutes go.

    This is why a custom layout is slower than a prebuilt layout

    1. Different pipeline
      1. prebuilt-layout stops after OCR/structure (pages, lines, tables, etc.). A custom model runs layout plus a learned extractor over the pages, which is more computationally intensive.
    2. Composed/classified custom models add a full classification pass. In v4.0 (GA, 2024-11-30), composed models use an explicit classifier before routing pages to an extractor; classification runs across all pages and is billed separately, so it adds latency proportional to pages and the number of candidate models. If you composed many submodels, expect linear slowdowns.
    3. Document/region factors. Latency varies by page count, file size/DPI, and regional capacity—Microsoft calls this out as expected variability in a multitenant, async service. Establish a per-page baseline; sustained >~15s/page warrants tuning or a ticket.

    https://free.blessedness.top/en-us/azure/ai-services/document-intelligence/concept/troubleshoot-latency?view=doc-intel-4.0.0

    Please Upvote and accept the answer if it helps!!

    1 person found this answer helpful.

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.