Speed of custom extraction models compared to prebuilt models

Question

Speed of custom extraction models compared to prebuilt models

Bogdan Pechounov 125

For a document with 13 pages, the model "prebuilt-layout" takes about 12 seconds to analyze it. However, with a custom extraction model, it can take between 2 and 4 minutes. Is this normal? (Are less ressources and instances allocated to custom models compared to prebuilt models)

I am assuming that the OCR and layout/table analysis is very similar between the 2 models, but I am not sure why the extraction part would take minutes. Could the number of fields make a difference? I am thinking that the number of fields/classes only affects the classifier head, which is negligible compared to the base model.

Answer accepted by question author

1 additional answer

Your answer

Answer 1

Hello Bogdan Pechounov,

I understand you're seeing a significant latency difference between the prebuilt-layout model and your custom extraction model for the same 13-page document.

The answer from the community volunteer correctly identifies the most likely cause. I'll add a few more details and a clear diagnostic step to help you confirm the root cause.

Your assumption is correct that the initial OCR and layout analysis is similar for both. However, the prebuilt-layout model's job stops there. A custom extraction model performs one or two additional, computationally expensive steps that prebuilt-layout does not:

Classification (If Composed): This is the most likely source of the added time. Before extracting, the service must run a classifier on all 13 pages to decide which of your sub-models (e.g., "Invoice," "Purchase Order") to use. This is a full model inference pass across the entire document.
Extraction (Full Inference): After classification, the selected custom model (a neural or template model) must run a second full inference pass over all 13 pages to find your specific labeled fields.

To your other question, it's not that custom models get "fewer" resources, but that they are tenant-specific and run a much more complex, multi-stage pipeline compared to the highly optimized, single-purpose prebuilt-layout model.

Recommended Steps:

1: Isolate the Bottleneck (Classification vs. Extraction)

The best way to diagnose this is to determine if the latency is from the classification step or the extraction step. If you are using a composed model, try analyzing the document again, but this time, call one of your sub-models directly instead of calling the composed model ID.

2: Analyze the Test Results

If the single sub-model is still 2-4 minutes: This is less common but indicates the extraction model itself is slow. This can be due to the document's complexity (e.g., extremely high-resolution images, dense tables) or a transient regional capacity issue.
If the single sub-model is significantly faster (e.g., 30-60 seconds): This confirms the 2–4-minute latency is being caused by the classification step in your composed model. This is normal behavior, as the classifier must run on every page.

3: Address Your "Number of Fields" Question

You are correct. The number of fields you are extracting (e.g., 50 vs. 100) has a negligible impact on inference time. The number of models you compose (e.g., 5 models vs. 10) has a very large impact because it makes the classification step more complex.

Documentation for reference:

Troubleshoot latency issues with Document Intelligence
Composed custom models (This explains the classification process that adds latency)

Please accept the answer and upvote for visibility to other community members.

Bogdan Pechounov 125 Reputation points

2025-10-22T12:48:40.8833333+00:00

Thank you for the response. I think that the prebuilt-layout model processes all the 13 pages of the document in parallel. The time is about 10s whether I send the entire document or a single page.

However, with the (single) custom extraction model, it takes about 2-3 minutes for the entire document, but 20-40s for a single page. It contains about 500 fields total, a lot of which are checkboxes.

For a custom extraction model with 200 fields, the time is about 10-20 seconds per page. (The files are about 1-2MB for each model, but maybe the image quality is different).

Answer 2

Hello,

Welcome to Microsoft Q&A,

yes, what you’re seeing is normal. You’re comparing layout-only (OCR + structure) to custom extraction, which adds classification/routing and field-level inference. That extra work (especially with composed models) is where the minutes go.

This is why a custom layout is slower than a prebuilt layout

Different pipeline
1. prebuilt-layout stops after OCR/structure (pages, lines, tables, etc.). A custom model runs layout plus a learned extractor over the pages, which is more computationally intensive.
Composed/classified custom models add a full classification pass. In v4.0 (GA, 2024-11-30), composed models use an explicit classifier before routing pages to an extractor; classification runs across all pages and is billed separately, so it adds latency proportional to pages and the number of candidate models. If you composed many submodels, expect linear slowdowns.
Document/region factors. Latency varies by page count, file size/DPI, and regional capacity—Microsoft calls this out as expected variability in a multitenant, async service. Establish a per-page baseline; sustained >~15s/page warrants tuning or a ticket.

https://free.blessedness.top/en-us/azure/ai-services/document-intelligence/concept/troubleshoot-latency?view=doc-intel-4.0.0

Please Upvote and accept the answer if it helps!!

Bogdan Pechounov 125 Reputation points

2025-10-22T12:47:57.4+00:00

Thank you for the response. I think that the prebuilt-layout model processes all the 13 pages of the document in parallel. The time is about 10s whether I send the entire document or a single page.

However, with the (single) custom extraction model, it takes about 2-3 minutes for the entire document, but 20-40s for a single page. It contains about 500 fields total, a lot of which are checkboxes.

For a custom extraction model with 200 fields, the time is about 10-20 seconds per page. (The files are about 1-2MB for each model, but maybe the image quality is different).

Share via

Speed of custom extraction models compared to prebuilt models

1 additional answer

Your answer