Hello Bogdan Pechounov,
I understand you're seeing a significant latency difference between the prebuilt-layout model and your custom extraction model for the same 13-page document.
The answer from the community volunteer correctly identifies the most likely cause. I'll add a few more details and a clear diagnostic step to help you confirm the root cause.
Your assumption is correct that the initial OCR and layout analysis is similar for both. However, the prebuilt-layout model's job stops there. A custom extraction model performs one or two additional, computationally expensive steps that prebuilt-layout does not:
- Classification (If Composed): This is the most likely source of the added time. Before extracting, the service must run a classifier on all 13 pages to decide which of your sub-models (e.g., "Invoice," "Purchase Order") to use. This is a full model inference pass across the entire document.
- Extraction (Full Inference): After classification, the selected custom model (a neural or template model) must run a second full inference pass over all 13 pages to find your specific labeled fields.
To your other question, it's not that custom models get "fewer" resources, but that they are tenant-specific and run a much more complex, multi-stage pipeline compared to the highly optimized, single-purpose prebuilt-layout model.  
Recommended Steps:
1: Isolate the Bottleneck (Classification vs. Extraction)
The best way to diagnose this is to determine if the latency is from the classification step or the extraction step. If you are using a composed model, try analyzing the document again, but this time, call one of your sub-models directly instead of calling the composed model ID.
2: Analyze the Test Results
- If the single sub-model is still 2-4 minutes: This is less common but indicates the extraction model itself is slow. This can be due to the document's complexity (e.g., extremely high-resolution images, dense tables) or a transient regional capacity issue.
- If the single sub-model is significantly faster (e.g., 30-60 seconds): This confirms the 2–4-minute latency is being caused by the classification step in your composed model. This is normal behavior, as the classifier must run on every page.
3: Address Your "Number of Fields" Question
You are correct. The number of fields you are extracting (e.g., 50 vs. 100) has a negligible impact on inference time. The number of models you compose (e.g., 5 models vs. 10) has a very large impact because it makes the classification step more complex.
Documentation for reference:
- Troubleshoot latency issues with Document Intelligence
- Composed custom models (This explains the classification process that adds latency)
Please accept the answer and upvote for visibility to other community members.
