Realtime API doesn’t stream response.function_call_arguments.delta events

Daria Smirnova 0 Reputation points
2025-10-24T15:22:36.25+00:00

Description:

We are using the gpt-4o-mini-realtime-preview and gpt-realtime models via the Azure OpenAI Realtime API (tested with API versions 2024-10-01-preview and 2025-04-01-preview). When streaming responses with a function call, we only receive response.text.delta events — but never any response.function_call_arguments.delta events.

We also tested the newer API version, which only returned error events and no deltas at all. This prevents us from receiving function call arguments in real time.

Expected behavior:

Both response.text.delta and response.function_call_arguments.delta events should stream during a Realtime session, as described in the OpenAI Realtime API documentation.

What we’ve tried:

• Tested across different API versions (2024-10-01-preview, 2025-04-01-preview).

• Verified session creation and permissions (all succeed).

• Confirmed function call definition format matches the latest documentation.

We suppose that this limitation occurs on Azure's side, since we did same tests on direct OpenAI realtime connection, and we successfully received all the needed deltas.

One more discovery: we also couldn't receive conversation.item.input_audio_transcription events with user speech transcript. Same as with previous test, direct OpenAI realtime connection didn't have any issues with this.

Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
0 comments No comments
{count} votes

2 answers

Sort by: Most helpful
  1. Jerald Felix 7,910 Reputation points
    2025-10-25T11:28:53.2133333+00:00

    Hello Daria Smirnova,

    It appears that you’re facing an issue where the Azure OpenAI Realtime API does not stream response.function_call_arguments.delta events, even though it does stream response.text.delta events when using models like gpt-4o-mini-realtime-preview and gpt-realtime. You’ve also noticed that this works with direct OpenAI’s realtime API but not with Azure’s, and that you’re unable to receive conversation.item.input_audio_transcription events for user speech transcripts through Azure as well.

    Based on your description and troubleshooting steps:

    This behavior suggests there is a limitation or a gap in the current Azure OpenAI Realtime API implementation regarding streaming of function_call_arguments.delta events and possibly audio transcription events.

    Since it works as expected on OpenAI’s direct API, and not via Azure, the issue is likely with Azure’s current API layer or model deployment in their service.

    You’ve already verified API versions, permissions, and function call definitions, which rules out most client-side misconfigurations.

    Recommended Steps:

    Report the issue via Azure support channels or the feedback form in the Azure portal, referencing your use case and the fact that it works directly with OpenAI.

    Monitor the official Azure OpenAI Service documentation and release notes for updates on real-time API capabilities.

    Meanwhile, if streaming function call arguments is critical, you may need to consider using the direct OpenAI API (if possible within your deployment requirements) until Azure’s API adds or fixes this streaming functionality.

    This looks like a service-side limitation rather than something you can resolve from your end. Kindly approve the answer if you find it is helpful.

    Best Regards,

    Jerald Felix

    0 comments No comments

  2. Aryan Parashar 1,850 Reputation points Microsoft External Staff Moderator
    2025-10-27T05:40:29.35+00:00

    Hi Daria,

    You’re observing this behavior because OpenAI and Azure OpenAI use the same models (e.g., gpt-4o-mini-realtime-preview) but have different deployment configurations, leading to differences in event behavior.
    Below is an example code that returns response.function_call_arguments.delta events using gpt-4o-mini-realtime-preview :

    import asyncio
    from openai import AsyncAzureOpenAI
    from azure.identity.aio import DefaultAzureCredential, get_bearer_token_provider
    
    async def main() -> None:
        # Authenticate using Entra ID
        credential = DefaultAzureCredential()
        token_provider = get_bearer_token_provider(
            credential, "https://cognitiveservices.azure.com/.default"
        )
    
        # Initialize Azure OpenAI Realtime client
        client = AsyncAzureOpenAI(
            azure_endpoint="<AI-FOUNDRY-ENDPOINT>",
            azure_ad_token_provider=token_provider,
            api_version="2024-10-01-preview",
        )
    
        # Connect to Realtime API session
        async with client.beta.realtime.connect(
            model="gpt-4o-mini-realtime-preview"
        ) as connection:
            
            # Configure session with function/tool definition
            await connection.session.update(
                session={
                    "modalities": ["text"],
                    "tools": [
                        {
                            "type": "function",
                            "name": "get_weather",
                            "description": "Get the current weather for a location",
                            "parameters": {
                                "type": "object",
                                "properties": {
                                    "location": {
                                        "type": "string",
                                        "description": "City name"
                                    },
                                    "unit": {
                                        "type": "string",
                                        "enum": ["celsius", "fahrenheit"]
                                    }
                                },
                                "required": ["location"]
                            }
                        }
                    ]
                }
            )
    
            # Send hardcoded user message that should trigger function call
            await connection.conversation.item.create(
                item={
                    "type": "message",
                    "role": "user",
                    "content": [
                        {"type": "input_text", "text": "What's the weather in San Francisco?"}
                    ],
                }
            )
    
            # Request response
            await connection.response.create()
    
            # Stream events - only print function_call_arguments.delta
            async for event in connection:
                if event.type == "response.function_call_arguments.delta":
                    print(event.delta, end="", flush=True)
                elif event.type == "response.done":
                    print("\n--- Done ---")
                    break
    
        await credential.close()
    
    if __name__ == "__main__":
        asyncio.run(main())
    

    Below is an example of the events returned by the server when the above script is executed:

    SessionCreatedEvent(
        event_id='event_001',
        session=Session(
            id='sess_001',
            instructions='[SYSTEM_INSTRUCTIONS]',
            model='gpt-4o-mini-realtime-preview-2024-12-17',
            modalities=['audio', 'text'],
            voice='alloy',
            temperature=0.8,
            tools=[]
        ),
        type='session.created'
    )
    
    SessionUpdatedEvent(
        event_id='event_002',
        session=Session(
            id='sess_001',
            modalities=['text'],
            tools=[
                Tool(
                    name='get_weather',
                    description='Get the current weather for a location',
                    parameters={
                        'type': 'object',
                        'properties': {
                            'location': {'type': 'string', 'description': 'City name'},
                            'unit': {'type': 'string', 'enum': ['celsius', 'fahrenheit']}
                        },
                        'required': ['location']
                    }
                )
            ]
        ),
        type='session.updated'
    )
    
    ConversationItemCreatedEvent(
        event_id='event_003',
        item=ConversationItem(
            id='item_001',
            type='message',
            role='user',
            content=[ConversationItemContent(type='input_text', text="What's the weather in San Francisco?")],
            status='completed'
        ),
        type='conversation.item.created'
    )
    
    ResponseCreatedEvent(
        event_id='event_004',
        response=RealtimeResponse(
            id='resp_001',
            status='in_progress',
            output=[]
        ),
        type='response.created'
    )
    
    ResponseOutputItemAddedEvent(
        event_id='event_005',
        response_id='resp_001',
        output_index=0,
        item=ConversationItem(
            id='item_002',
            type='function_call',
            name='get_weather',
            call_id='call_001',
            arguments='',
            status='in_progress'
        ),
        type='response.output_item.added'
    )
    
    ConversationItemCreatedEvent(
        event_id='event_006',
        previous_item_id='item_001',
        item=ConversationItem(
            id='item_002',
            type='function_call',
            name='get_weather',
            call_id='call_001',
            status='in_progress'
        ),
        type='conversation.item.created'
    )
    
    ResponseFunctionCallArgumentsDeltaEvent(
        event_id='event_007',
        response_id='resp_001',
        item_id='item_002',
        output_index=0,
        call_id='call_001',
        delta='{"',
        type='response.function_call_arguments.delta'
    )
    
    # [Additional delta events for: location, ":", San,  Francisco, ", unit, ":", c, elsius, "}]
    
    ResponseFunctionCallArgumentsDoneEvent(
        event_id='event_008',
        response_id='resp_001',
        item_id='item_002',
        output_index=0,
        call_id='call_001',
        name='get_weather',
        arguments='{"location":"San Francisco","unit":"celsius"}',
        type='response.function_call_arguments.done'
    )
    
    ResponseOutputItemDoneEvent(
        event_id='event_009',
        response_id='resp_001',
        output_index=0,
        item=ConversationItem(
            id='item_002',
            type='function_call',
            name='get_weather',
            call_id='call_001',
            arguments='{"location":"San Francisco","unit":"celsius"}',
            status='completed'
        ),
        type='response.output_item.done'
    )
    
    ResponseDoneEvent(
        event_id='event_010',
        response=RealtimeResponse(
            id='resp_001',
            status='completed',
            usage=RealtimeResponseUsage(
                input_tokens=172,
                output_tokens=21,
                total_tokens=193
            )
        ),
        type='response.done'
    )
    

    Also note that:
    The server response.text.delta event is returned when the model-generated text is updated.

    The server response.function_call_arguments.delta event is returned when the model-generated function call arguments are updated.

    Here is relevant documentation:
    https://free.blessedness.top/en-us/azure/ai-foundry/openai/realtime-audio-reference#server-events

    Feel free to accept this as an answer.
    Thankyou for reaching out to The Microsoft QNA Portal.

    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.