Hello Jagannathan, Chitra,
Welcome to Microsoft Q&A & Thank you for reaching out to us.
I understand that you're trying to figure out the savings both in dollars and percentage when comparing the Pay-As-You-Go (PAYG) model against Provisioned Throughput Units (PTUs) for the GPT-4o service in the US East region. Here’s a detailed guide to walk you through the calculations.
- Understand Pricing Structures
Identify the cost per unit/token for the GPT-4o model under both pricing models (PAYG and PTUs). This information can usually be found on the Azure OpenAI pricing page.
- Calculate Your Costs
Pay-As-You-Go Cost (PAYG): Multiply your expected usage (tokens processed or API calls) by the cost per token for the PAYG model.
PTU Cost: Determine how many PTUs you need (based on expected traffic) and multiply that by the hourly cost of PTUs. PTUs may have minimum requirements.
- Determine Savings
Savings in dollars:
Savings$=CostPAYG−CostPTUSavings_{$} = Cost_{PAYG} - Cost_{PTU}Savings$=CostPAYG−CostPTU
Percentage Savings:
Percentage_Savings=Savings$CostPAYG×100Percentage_Savings = \frac{Savings_{$}}{Cost_{PAYG}} \times 100Percentage_Savings=CostPAYGSavings$×100
Example: If under PAYG, usage costs $1,000 and under PTU it costs $800:
Savings $ = $1,000 − $800 = $200
Percentage Savings = ($200 / $1,000) × 100 = 20%
- Step-by-Step PTU Calculation
Required Inputs
PAYG Pricing (per 1M tokens)
Input tokens: P_in (USD per 1M input tokens)
Output tokens: `P_out` (USD per 1M output tokens)
**PTU Pricing & Capacity**
Hourly PTU price: `PTU_hourly` (USD / PTU / hour)
Throughput per PTU: `TPM_per_PTU` (tokens per minute per PTU)
**Usage Options**
Option A: total input & output tokens per month
Option B: expected requests per minute & average tokens per request
Formulas
PTU monthly capacity (tokens/month)
tokens_per_PTU_per_month=TPM_per_PTU×60×24×30tokens_per_PTU_per_month = TPM_per_PTU \times 60 \times 24 \times 30tokens_per_PTU_per_month=TPM_per_PTU×60×24×30
PTU monthly cost
PTU_monthly_cost=PTU_hourly×24×30PTU_monthly_cost = PTU_hourly \times 24 \times 30PTU_monthly_cost=PTU_hourly×24×30
Effective PTU cost per 1M input tokens
PTU_cost_per_1M_input=PTU_monthly_costtokens_per_PTU_per_month×1,000,000PTU_cost_per_1M_input = \frac{PTU_monthly_cost}{tokens_per_PTU_per_month} \times 1,000,000PTU_cost_per_1M_input=tokens_per_PTU_per_monthPTU_monthly_cost×1,000,000
PAYG cost per 1M tokens (input + output)
PAYG_cost_per_1M_combined=P_in+P_outPAYG_cost_per_1M_combined = P_in + P_outPAYG_cost_per_1M_combined=P_in+P_out
Savings
Savings_per_1M=PAYG_cost_per_1M_combined−PTU_cost_per_1M_inputSavings_per_1M = PAYG_cost_per_1M_combined - PTU_cost_per_1M_inputSavings_per_1M=PAYG_cost_per_1M_combined−PTU_cost_per_1M_input%Savings=Savings_per_1MPAYG_cost_per_1M_combined×100%Savings = \frac{Savings_per_1M}{PAYG_cost_per_1M_combined} \times 100%Savings=PAYG_cost_per_1M_combinedSavings_per_1M×100
If your input/output volumes differ, compute total monthly costs for each model and compare.
- Worked Example
Assumptions:
PAYG: P_in = $2.50, P_out = $10.00 → combined = $12.50 / 1M tokens
PTU hourly = $1.00 / PTU / hour
GPT-4o TPM_per_PTU = 2,500 tokens/min
Calculations:
PTU monthly capacity:
2,500×60×24×30=108,000,000 input tokens/month2,500 \times 60 \times 24 \times 30 = 108,000,000 \text{ input tokens/month}2,500×60×24×30=108,000,000 input tokens/month
PTU monthly cost:
1×720=720 USD/month1 \times 720 = 720 \text{ USD/month}1×720=720 USD/month
PTU cost per 1M input tokens:
720108,000,000×1,000,000≈6.667 USD\frac{720}{108,000,000} \times 1,000,000 \approx 6.667 \text{ USD}108,000,000720×1,000,000≈6.667 USD
Savings per 1M tokens:
12.50−6.667≈5.83 USD12.50 - 6.667 \approx 5.83 \text{ USD}12.50−6.667≈5.83 USD
% Savings:
5.8312.50×100≈46.6%\frac{5.83}{12.50} \times 100 \approx 46.6%12.505.83×100≈46.6%
Interpretation: 1 PTU at $1/hr handles ~108M input tokens/month, yielding ~47% savings versus PAYG.
PTU reservation discounts reduce effective hourly cost → recalc PTU_monthly_cost using your reservation rate.
Output tokens consume more processing than input tokens; convert them to equivalent input tokens for accurate sizing.
PAYG rates vary by deployment type (Global / Data Zone / Regional) — use official Azure pricing.
Use your actual monthly token volumes → provide input/output tokens or requests/min + tokens/request, and PTU sizing (or I can size).
Use official Azure PAYG US East rates → I fetch current rates and compute.
Use your PTU reservation price → provide PTU hourly or monthly reservation price, and I compute savings.
Provide your monthly input/output tokens or PTU rate and I can give the exact $ and % savings with all assumptions
I Hope this helps. Do let me know if you have any further queries.
If this answers your query, please do click Accept Answer and Yes for was this answer helpful.
Thank you!