Calculate the savings $ and % against PTU vs Standard pricing for OpenAI service

Jagannathan, Chitra 20 Reputation points
2025-10-16T18:01:41.7233333+00:00

I am trying to find the right calculation that will help us know what is the % and savings when used OpenAI in PayGo/OnDemand vs going into PTU for US East - Gpt-4o - global deployment type.

Azure AI services
Azure AI services
A group of Azure services, SDKs, and APIs designed to make apps more intelligent, engaging, and discoverable.
{count} votes

Answer accepted by question author
  1. SRILAKSHMI C 8,275 Reputation points Microsoft External Staff Moderator
    2025-10-17T05:18:02.4766667+00:00

    Hello Jagannathan, Chitra,

    Welcome to Microsoft Q&A & Thank you for reaching out to us.

    I understand that you're trying to figure out the savings both in dollars and percentage when comparing the Pay-As-You-Go (PAYG) model against Provisioned Throughput Units (PTUs) for the GPT-4o service in the US East region. Here’s a detailed guide to walk you through the calculations.

    1. Understand Pricing Structures

    Identify the cost per unit/token for the GPT-4o model under both pricing models (PAYG and PTUs). This information can usually be found on the Azure OpenAI pricing page.

    1. Calculate Your Costs

    Pay-As-You-Go Cost (PAYG): Multiply your expected usage (tokens processed or API calls) by the cost per token for the PAYG model.

    PTU Cost: Determine how many PTUs you need (based on expected traffic) and multiply that by the hourly cost of PTUs. PTUs may have minimum requirements.

    1. Determine Savings

    Savings in dollars:

    Savings$=CostPAYG−CostPTUSavings_{$} = Cost_{PAYG} - Cost_{PTU}Savings$​=CostPAYG​−CostPTU​

    Percentage Savings:

    Percentage_Savings=Savings$CostPAYG×100Percentage_Savings = \frac{Savings_{$}}{Cost_{PAYG}} \times 100Percentage_Savings=CostPAYG​Savings$​​×100

    Example: If under PAYG, usage costs $1,000 and under PTU it costs $800:

    Savings $ = $1,000 − $800 = $200

    Percentage Savings = ($200 / $1,000) × 100 = 20%

    1. Step-by-Step PTU Calculation

    Required Inputs

    PAYG Pricing (per 1M tokens)

    Input tokens: P_in (USD per 1M input tokens)

      Output tokens: `P_out` (USD per 1M output tokens)
      
      **PTU Pricing & Capacity**
      
         Hourly PTU price: `PTU_hourly` (USD / PTU / hour)
         
            Throughput per PTU: `TPM_per_PTU` (tokens per minute per PTU)
            
            **Usage Options**
            
               Option A: total input & output tokens per month
               
                  Option B: expected requests per minute & average tokens per request
                  
    

    Formulas

    PTU monthly capacity (tokens/month)

    tokens_per_PTU_per_month=TPM_per_PTU×60×24×30tokens_per_PTU_per_month = TPM_per_PTU \times 60 \times 24 \times 30tokens_per_PTU_per_month=TPM_per_PTU×60×24×30

    PTU monthly cost

    PTU_monthly_cost=PTU_hourly×24×30PTU_monthly_cost = PTU_hourly \times 24 \times 30PTU_monthly_cost=PTU_hourly×24×30

    Effective PTU cost per 1M input tokens

    PTU_cost_per_1M_input=PTU_monthly_costtokens_per_PTU_per_month×1,000,000PTU_cost_per_1M_input = \frac{PTU_monthly_cost}{tokens_per_PTU_per_month} \times 1,000,000PTU_cost_per_1M_input=tokens_per_PTU_per_monthPTU_monthly_cost​×1,000,000

    PAYG cost per 1M tokens (input + output)

    PAYG_cost_per_1M_combined=P_in+P_outPAYG_cost_per_1M_combined = P_in + P_outPAYG_cost_per_1M_combined=P_in+P_out

    Savings

    Savings_per_1M=PAYG_cost_per_1M_combined−PTU_cost_per_1M_inputSavings_per_1M = PAYG_cost_per_1M_combined - PTU_cost_per_1M_inputSavings_per_1M=PAYG_cost_per_1M_combined−PTU_cost_per_1M_input%Savings=Savings_per_1MPAYG_cost_per_1M_combined×100%Savings = \frac{Savings_per_1M}{PAYG_cost_per_1M_combined} \times 100%Savings=PAYG_cost_per_1M_combinedSavings_per_1M​×100

    If your input/output volumes differ, compute total monthly costs for each model and compare.

    1. Worked Example

    Assumptions:

    PAYG: P_in = $2.50, P_out = $10.00 → combined = $12.50 / 1M tokens

    PTU hourly = $1.00 / PTU / hour

    GPT-4o TPM_per_PTU = 2,500 tokens/min

    Calculations:

    PTU monthly capacity:

    2,500×60×24×30=108,000,000 input tokens/month2,500 \times 60 \times 24 \times 30 = 108,000,000 \text{ input tokens/month}2,500×60×24×30=108,000,000 input tokens/month

    PTU monthly cost:

    1×720=720 USD/month1 \times 720 = 720 \text{ USD/month}1×720=720 USD/month

    PTU cost per 1M input tokens:

    720108,000,000×1,000,000≈6.667 USD\frac{720}{108,000,000} \times 1,000,000 \approx 6.667 \text{ USD}108,000,000720​×1,000,000≈6.667 USD

    Savings per 1M tokens:

    12.50−6.667≈5.83 USD12.50 - 6.667 \approx 5.83 \text{ USD}12.50−6.667≈5.83 USD

    % Savings:

    5.8312.50×100≈46.6%\frac{5.83}{12.50} \times 100 \approx 46.6%12.505.83​×100≈46.6%

    Interpretation: 1 PTU at $1/hr handles ~108M input tokens/month, yielding ~47% savings versus PAYG.

    PTU reservation discounts reduce effective hourly cost → recalc PTU_monthly_cost using your reservation rate.

    Output tokens consume more processing than input tokens; convert them to equivalent input tokens for accurate sizing.

    PAYG rates vary by deployment type (Global / Data Zone / Regional) — use official Azure pricing.

    Use your actual monthly token volumes → provide input/output tokens or requests/min + tokens/request, and PTU sizing (or I can size).

    Use official Azure PAYG US East rates → I fetch current rates and compute.

    Use your PTU reservation price → provide PTU hourly or monthly reservation price, and I compute savings.

    Provide your monthly input/output tokens or PTU rate and I can give the exact $ and % savings with all assumptions

    I Hope this helps. Do let me know if you have any further queries.


    If this answers your query, please do click Accept Answer and Yes for was this answer helpful.

    Thank you!

    1 person found this answer helpful.
    0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.