Overview

These are events emitted from the Realtime servers and sent via WebRTC datachannel to the client.

`error`

Returned when an error occurs, which could be a client problem or a server problem. Most errors are recoverable and the session will stay open, we recommend to implementors to monitor and log error messages by default.

Properties

event_id: string - The unique ID of the server event.
type: string - The event type, must be error.
error: object - Details of the error.

{
    "event_id": "event_890",
    "type": "error",
    "error": {
        "type": "invalid_request_error",
        "code": "invalid_event",
        "message": "The 'type' field is missing.",
        "param": null,
        "event_id": "event_567"
    }
}

`session.created`

Returned when a Session is created. Emitted automatically when a new connection is established as the first server event. This event will contain the default Session configuration.

Properties

event_id: string - The unique ID of the server event.
type: string - The event type, must be session.created.
session: object - Realtime session object configuration.

{
    "event_id": "event_1234",
    "type": "session.created",
    "session": {
        "id": "sess_001",
        "object": "realtime.session",
        "model": "gpt-4o-realtime-preview-2024-12-17",
        "modalities": ["text", "audio"],
        "instructions": "...model instructions here...",
        "voice": "sage",
        "input_audio_format": "pcm16",
        "output_audio_format": "pcm16",
        "input_audio_transcription": null,
        "turn_detection": {
            "type": "server_vad",
            "threshold": 0.5,
            "prefix_padding_ms": 300,
            "silence_duration_ms": 200
        },
        "tools": [],
        "tool_choice": "auto",
        "temperature": 0.8,
        "max_response_output_tokens": "inf"
    }
}

`session.updated`

Returned when a session is updated with a session.update event, unless there is an error.

Properties

event_id: string - The unique ID of the server event.
type: string - The event type, must be session.updated.
session: object - Realtime session object configuration.

{
    "event_id": "event_5678",
    "type": "session.updated",
    "session": {
        "id": "sess_001",
        "object": "realtime.session",
        "model": "gpt-4o-realtime-preview-2024-12-17",
        "modalities": ["text"],
        "instructions": "New instructions",
        "voice": "sage",
        "input_audio_format": "pcm16",
        "output_audio_format": "pcm16",
        "input_audio_transcription": {
            "model": "whisper-1"
        },
        "turn_detection": null,
        "tools": [],
        "tool_choice": "none",
        "temperature": 0.7,
        "max_response_output_tokens": 200
    }
}

`conversation.item.created`

Returned when a conversation item is created. There are several scenarios that produce this event:

The server is generating a Response, which if successful will produce either one or two Items, which will be of type message (role assistant) or type function_call.
The input audio buffer has been committed, either by the client or the server (in server_vad mode). The server will take the content of the input audio buffer and add it to a new user message Item.
The client has sent a conversation.item.create event to add a new Item to the Conversation.

Properties

event_id: string - The unique ID of the server event.
type: string - The event type, must be conversation.item.created.
previous_item_id: string - The ID of the preceding item in the Conversation context, allows the client to understand the order of the conversation.
item: object - The item to add to the conversation.

{
    "event_id": "event_1920",
    "type": "conversation.item.created",
    "previous_item_id": "msg_002",
    "item": {
        "id": "msg_003",
        "object": "realtime.item",
        "type": "message",
        "status": "completed",
        "role": "user",
        "content": [
            {
                "type": "input_audio",
                "transcript": "hello how are you",
                "audio": "base64encodedaudio=="
            }
        ]
    }
}

`input_audio_buffer.committed`

Returned when an input audio buffer is committed, either by the client or automatically in server VAD mode. The item_id property is the ID of the user message item that will be created, thus a conversation.item.created event will also be sent to the client.

Properties

event_id: string - The unique ID of the server event.
type: string - The event type, must be input_audio_buffer.committed.
previous_item_id: string - The ID of the preceding item after which the new item will be inserted.
item_id: string - The ID of the user message item that will be created.

{
    "event_id": "event_1121",
    "type": "input_audio_buffer.committed",
    "previous_item_id": "msg_001",
    "item_id": "msg_002"
}

`input_audio_buffer.speech_started`

Sent by the server when in server_vad mode to indicate that speech has been detected in the audio buffer. This can happen any time audio is added to the buffer (unless speech is already detected). The client may want to use this event to interrupt audio playback or provide visual feedback to the user. The client should expect to receive a input_audio_buffer.speech_stopped event when speech stops. The item_id property is the ID of the user message item that will be created when speech stops and will also be included in the input_audio_buffer.speech_stopped event (unless the client manually commits the audio buffer during VAD activation).

Properties

event_id: string - The unique ID of the server event.
type: string - The event type, must be input_audio_buffer.speech_started.
audio_start_ms: integer - Milliseconds from the start of all audio written to the buffer during the session when speech was first detected. This will correspond to the beginning of audio sent to the model, and thus includes the prefix_padding_ms configured in the Session.
item_id: string - The ID of the user message item that will be created when speech stops.

{
    "event_id": "event_1516",
    "type": "input_audio_buffer.speech_started",
    "audio_start_ms": 1000,
    "item_id": "msg_003"
}

`input_audio_buffer.speech_stopped`

Returned in server_vad mode when the server detects the end of speech in the audio buffer. The server will also send an conversation.item.created event with the user message item that is created from the audio buffer.

Properties

event_id: string - The unique ID of the server event.
type: string - The event type, must be input_audio_buffer.speech_stopped.
audio_end_ms: integer - Milliseconds since the session started when speech stopped. This will correspond to the end of audio sent to the model, and thus includes the min_silence_duration_ms configured in the Session.
item_id: string - The ID of the user message item that will be created.

{
    "event_id": "event_1718",
    "type": "input_audio_buffer.speech_stopped",
    "audio_end_ms": 2000,
    "item_id": "msg_003"
}

`output_audio_buffer.started`

Returned when the server begins playing audio output to the client. This event is sent at the start of audio playback for a response, allowing clients to synchronize their UI with the audio stream.

Properties

event_id: string - The unique ID of the server event.
type: string - The event type, must be output_audio_buffer.started.
response_id: string - The ID of the response that triggered this audio output to start

{
    "event_id": "event_1718",
    "type": "output_audio_buffer.started",
    "response_id": "resp_3876604f-7a35-4c95-ad80-e553ddb9a166"
}

`output_audio_buffer.stopped`

Returned when the output audio buffer stops playing audio. This event is sent by the server when the audio output for a response has been fully generated and transmitted.

Properties

event_id: string - The unique ID of the server event.
type: string - The event type, must be output_audio_buffer.stopped.
response_id: string - The ID of the response that triggered this audio output.

{
    "event_id": "event_1920",
    "type": "output_audio_buffer.stopped",
    "response_id": "resp_3876604f-7a35-4c95-ad80-e553ddb9a166"
}

`response.done`

Returned when a Response is done streaming. Always emitted, no matter the final state. The Response object included in the response.done event will include all output Items in the Response but will omit the raw audio data.

Properties

event_id: string - The unique ID of the server event.
type: string - The event type, must be response.done.
response: object - The response resource.

{
    "event_id": "event_3132",
    "type": "response.done",
    "response": {
        "id": "resp_001",
        "object": "realtime.response",
        "status": "completed",
        "status_details": null,
        "output": [
            {
                "id": "msg_006",
                "object": "realtime.item",
                "type": "message",
                "status": "completed",
                "role": "assistant",
                "content": [
                    {
                        "type": "text",
                        "text": "Sure, how can I assist you today?"
                    }
                ]
            }
        ],
        "usage": {
            "total_tokens":275,
            "input_tokens":127,
            "output_tokens":148,
            "input_token_details": {
                "cached_tokens":384,
                "text_tokens":119,
                "audio_tokens":8,
                "cached_tokens_details": {
                    "text_tokens": 128,
                    "audio_tokens": 256
                }
            },
            "output_token_details": {
              "text_tokens":36,
              "audio_tokens":112
            }
        }
    }
}

`response.audio_transcript.delta`

Returned when the model-generated transcription of audio output is updated.

Properties

event_id: string - The unique ID of the server event.
type: string - The event type, must be response.audio_transcript.delta.
response_id: string - The ID of the response.
item_id: string - The ID of the item.
output_index: integer - The index of the output item in the response.
content_index: integer - The index of the content part in the item’s content array.
delta: string - The transcript delta.

{
    "event_id": "event_4546",
    "type": "response.audio_transcript.delta",
    "response_id": "resp_001",
    "item_id": "msg_008",
    "output_index": 0,
    "content_index": 0,
    "delta": "Hello, how can I a"
}

API Reference

Compatibility

Server Events

Overview

`error`

Properties

`session.created`

Properties

`session.updated`

Properties

`conversation.item.created`

Properties

`input_audio_buffer.committed`

Properties

`input_audio_buffer.speech_started`

Properties

`input_audio_buffer.speech_stopped`

Properties

`output_audio_buffer.started`

Properties

`output_audio_buffer.stopped`

Properties

`response.done`

Properties

`response.audio_transcript.delta`

Properties

API Reference

Compatibility

​Overview

​error

​Properties

​session.created

​Properties

​session.updated

​Properties

​conversation.item.created

​Properties

​input_audio_buffer.committed

​Properties

​input_audio_buffer.speech_started

​Properties

​input_audio_buffer.speech_stopped

​Properties

​output_audio_buffer.started

​Properties

​output_audio_buffer.stopped

​Properties

​response.done

​Properties

​response.audio_transcript.delta

​Properties

Overview

`error`

Properties

`session.created`

Properties

`session.updated`

Properties

`conversation.item.created`

Properties

`input_audio_buffer.committed`

Properties

`input_audio_buffer.speech_started`

Properties

`input_audio_buffer.speech_stopped`

Properties

`output_audio_buffer.started`

Properties

`output_audio_buffer.stopped`

Properties

`response.done`

Properties

`response.audio_transcript.delta`

Properties