Overview

These are events emitted from the Realtime servers and sent via WebRTC datachannel to the client.

error

Returned when an error occurs, which could be a client problem or a server problem. Most errors are recoverable and the session will stay open, we recommend to implementors to monitor and log error messages by default.

Properties

  • event_id: string - The unique ID of the server event.
  • type: string - The event type, must be error.
  • error: object - Details of the error.
{
    "event_id": "event_890",
    "type": "error",
    "error": {
        "type": "invalid_request_error",
        "code": "invalid_event",
        "message": "The 'type' field is missing.",
        "param": null,
        "event_id": "event_567"
    }
}

session.created

Returned when a Session is created. Emitted automatically when a new connection is established as the first server event. This event will contain the default Session configuration.

Properties

  • event_id: string - The unique ID of the server event.
  • type: string - The event type, must be session.created.
  • session: object - Realtime session object configuration.
{
    "event_id": "event_1234",
    "type": "session.created",
    "session": {
        "id": "sess_001",
        "object": "realtime.session",
        "model": "gpt-4o-realtime-preview-2024-12-17",
        "modalities": ["text", "audio"],
        "instructions": "...model instructions here...",
        "voice": "sage",
        "input_audio_format": "pcm16",
        "output_audio_format": "pcm16",
        "input_audio_transcription": null,
        "turn_detection": {
            "type": "server_vad",
            "threshold": 0.5,
            "prefix_padding_ms": 300,
            "silence_duration_ms": 200
        },
        "tools": [],
        "tool_choice": "auto",
        "temperature": 0.8,
        "max_response_output_tokens": "inf"
    }
}

session.updated

Returned when a session is updated with a session.update event, unless there is an error.

Properties

  • event_id: string - The unique ID of the server event.
  • type: string - The event type, must be session.updated.
  • session: object - Realtime session object configuration.
{
    "event_id": "event_5678",
    "type": "session.updated",
    "session": {
        "id": "sess_001",
        "object": "realtime.session",
        "model": "gpt-4o-realtime-preview-2024-12-17",
        "modalities": ["text"],
        "instructions": "New instructions",
        "voice": "sage",
        "input_audio_format": "pcm16",
        "output_audio_format": "pcm16",
        "input_audio_transcription": {
            "model": "whisper-1"
        },
        "turn_detection": null,
        "tools": [],
        "tool_choice": "none",
        "temperature": 0.7,
        "max_response_output_tokens": 200
    }
}

conversation.item.created

Returned when a conversation item is created. There are several scenarios that produce this event:

  • The server is generating a Response, which if successful will produce either one or two Items, which will be of type message (role assistant) or type function_call.
  • The input audio buffer has been committed, either by the client or the server (in server_vad mode). The server will take the content of the input audio buffer and add it to a new user message Item.
  • The client has sent a conversation.item.create event to add a new Item to the Conversation.

Properties

  • event_id: string - The unique ID of the server event.
  • type: string - The event type, must be conversation.item.created.
  • previous_item_id: string - The ID of the preceding item in the Conversation context, allows the client to understand the order of the conversation.
  • item: object - The item to add to the conversation.
{
    "event_id": "event_1920",
    "type": "conversation.item.created",
    "previous_item_id": "msg_002",
    "item": {
        "id": "msg_003",
        "object": "realtime.item",
        "type": "message",
        "status": "completed",
        "role": "user",
        "content": [
            {
                "type": "input_audio",
                "transcript": "hello how are you",
                "audio": "base64encodedaudio=="
            }
        ]
    }
}

input_audio_buffer.committed

Returned when an input audio buffer is committed, either by the client or automatically in server VAD mode. The item_id property is the ID of the user message item that will be created, thus a conversation.item.created event will also be sent to the client.

Properties

  • event_id: string - The unique ID of the server event.
  • type: string - The event type, must be input_audio_buffer.committed.
  • previous_item_id: string - The ID of the preceding item after which the new item will be inserted.
  • item_id: string - The ID of the user message item that will be created.
{
    "event_id": "event_1121",
    "type": "input_audio_buffer.committed",
    "previous_item_id": "msg_001",
    "item_id": "msg_002"
}

input_audio_buffer.speech_started

Sent by the server when in server_vad mode to indicate that speech has been detected in the audio buffer. This can happen any time audio is added to the buffer (unless speech is already detected). The client may want to use this event to interrupt audio playback or provide visual feedback to the user.

The client should expect to receive a input_audio_buffer.speech_stopped event when speech stops. The item_id property is the ID of the user message item that will be created when speech stops and will also be included in the input_audio_buffer.speech_stopped event (unless the client manually commits the audio buffer during VAD activation).

Properties

  • event_id: string - The unique ID of the server event.
  • type: string - The event type, must be input_audio_buffer.speech_started.
  • audio_start_ms: integer - Milliseconds from the start of all audio written to the buffer during the session when speech was first detected. This will correspond to the beginning of audio sent to the model, and thus includes the prefix_padding_ms configured in the Session.
  • item_id: string - The ID of the user message item that will be created when speech stops.
{
    "event_id": "event_1516",
    "type": "input_audio_buffer.speech_started",
    "audio_start_ms": 1000,
    "item_id": "msg_003"
}

input_audio_buffer.speech_stopped

Returned in server_vad mode when the server detects the end of speech in the audio buffer. The server will also send an conversation.item.created event with the user message item that is created from the audio buffer.

Properties

  • event_id: string - The unique ID of the server event.
  • type: string - The event type, must be input_audio_buffer.speech_stopped.
  • audio_end_ms: integer - Milliseconds since the session started when speech stopped. This will correspond to the end of audio sent to the model, and thus includes the min_silence_duration_ms configured in the Session.
  • item_id: string - The ID of the user message item that will be created.
{
    "event_id": "event_1718",
    "type": "input_audio_buffer.speech_stopped",
    "audio_end_ms": 2000,
    "item_id": "msg_003"
}

response.done

Returned when a Response is done streaming. Always emitted, no matter the final state. The Response object included in the response.done event will include all output Items in the Response but will omit the raw audio data.

Properties

  • event_id: string - The unique ID of the server event.
  • type: string - The event type, must be response.done.
  • response: object - The response resource.
{
    "event_id": "event_3132",
    "type": "response.done",
    "response": {
        "id": "resp_001",
        "object": "realtime.response",
        "status": "completed",
        "status_details": null,
        "output": [
            {
                "id": "msg_006",
                "object": "realtime.item",
                "type": "message",
                "status": "completed",
                "role": "assistant",
                "content": [
                    {
                        "type": "text",
                        "text": "Sure, how can I assist you today?"
                    }
                ]
            }
        ],
        "usage": {
            "total_tokens":275,
            "input_tokens":127,
            "output_tokens":148,
            "input_token_details": {
                "cached_tokens":384,
                "text_tokens":119,
                "audio_tokens":8,
                "cached_tokens_details": {
                    "text_tokens": 128,
                    "audio_tokens": 256
                }
            },
            "output_token_details": {
              "text_tokens":36,
              "audio_tokens":112
            }
        }
    }
}

response.audio_transcript.delta

Returned when the model-generated transcription of audio output is updated.

Properties

  • event_id: string - The unique ID of the server event.
  • type: string - The event type, must be response.audio_transcript.delta.
  • response_id: string - The ID of the response.
  • item_id: string - The ID of the item.
  • output_index: integer - The index of the output item in the response.
  • content_index: integer - The index of the content part in the item’s content array.
  • delta: string - The transcript delta.
{
    "event_id": "event_4546",
    "type": "response.audio_transcript.delta",
    "response_id": "resp_001",
    "item_id": "msg_008",
    "output_index": 0,
    "content_index": 0,
    "delta": "Hello, how can I a"
}