Server events

These are events emitted from the OpenAI Realtime WebSocket server to the client.source

error

Returned when an error occurs, which could be a client problem or a server problem. Most errors are recoverable and the session will stay open, we recommend to implementors to monitor and log error messages by default.source

The unique ID of the server event.source

The event type, must be error.source

Details of the error.source

OBJECT error
1
2
3
4
5
6
7
8
9
10
11
{
    "event_id": "event_890",
    "type": "error",
    "error": {
        "type": "invalid_request_error",
        "code": "invalid_event",
        "message": "The 'type' field is missing.",
        "param": null,
        "event_id": "event_567"
    }
}

session.created

Returned when a Session is created. Emitted automatically when a new connection is established as the first server event. This event will contain the default Session configuration.source

The unique ID of the server event.source

The event type, must be session.created.source

Realtime session object configuration.source

OBJECT session.created
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
{
    "event_id": "event_1234",
    "type": "session.created",
    "session": {
        "id": "sess_001",
        "object": "realtime.session",
        "model": "gpt-4o-realtime-preview-2024-10-01",
        "modalities": ["text", "audio"],
        "instructions": "...model instructions here...",
        "voice": "sage",
        "input_audio_format": "pcm16",
        "output_audio_format": "pcm16",
        "input_audio_transcription": null,
        "turn_detection": {
            "type": "server_vad",
            "threshold": 0.5,
            "prefix_padding_ms": 300,
            "silence_duration_ms": 200
        },
        "tools": [],
        "tool_choice": "auto",
        "temperature": 0.8,
        "max_response_output_tokens": "inf"
    }
}

session.updated

Returned when a session is updated with a session.update event, unless there is an error.source

The unique ID of the server event.source

The event type, must be session.updated.source

Realtime session object configuration.source

OBJECT session.updated
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
{
    "event_id": "event_5678",
    "type": "session.updated",
    "session": {
        "id": "sess_001",
        "object": "realtime.session",
        "model": "gpt-4o-realtime-preview-2024-10-01",
        "modalities": ["text"],
        "instructions": "New instructions",
        "voice": "sage",
        "input_audio_format": "pcm16",
        "output_audio_format": "pcm16",
        "input_audio_transcription": {
            "model": "whisper-1"
        },
        "turn_detection": null,
        "tools": [],
        "tool_choice": "none",
        "temperature": 0.7,
        "max_response_output_tokens": 200
    }
}

conversation.created

Returned when a conversation is created. Emitted right after session creation.source

The unique ID of the server event.source

The event type, must be conversation.created.source

The conversation resource.source

OBJECT conversation.created
1
2
3
4
5
6
7
8
{
    "event_id": "event_9101",
    "type": "conversation.created",
    "conversation": {
        "id": "conv_001",
        "object": "realtime.conversation"
    }
}

conversation.item.created

Returned when a conversation item is created. There are several scenarios that produce this event:source

  • The server is generating a Response, which if successful will produce either one or two Items, which will be of type message (role assistant) or type function_call.
  • The input audio buffer has been committed, either by the client or the server (in server_vad mode). The server will take the content of the input audio buffer and add it to a new user message Item.
  • The client has sent a conversation.item.create event to add a new Item to the Conversation.

The unique ID of the server event.source

The event type, must be conversation.item.created.source

The ID of the preceding item in the Conversation context, allows the client to understand the order of the conversation.source

The item to add to the conversation.source

OBJECT conversation.item.created
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
{
    "event_id": "event_1920",
    "type": "conversation.item.created",
    "previous_item_id": "msg_002",
    "item": {
        "id": "msg_003",
        "object": "realtime.item",
        "type": "message",
        "status": "completed",
        "role": "user",
        "content": [
            {
                "type": "input_audio",
                "transcript": "hello how are you",
                "audio": "base64encodedaudio=="
            }
        ]
    }
}

conversation.item.input_audio_transcription.completed

This event is the output of audio transcription for user audio written to the user audio buffer. Transcription begins when the input audio buffer is committed by the client or server (in server_vad mode). Transcription runs asynchronously with Response creation, so this event may come before or after the Response events.source

Realtime API models accept audio natively, and thus input transcription is a separate process run on a separate ASR (Automatic Speech Recognition) model, currently always whisper-1. Thus the transcript may diverge somewhat from the model's interpretation, and should be treated as a rough guide.source

The unique ID of the server event.source

The event type, must be conversation.item.input_audio_transcription.completed.source

The ID of the user message item containing the audio.source

The index of the content part containing the audio.source

The transcribed text.source

OBJECT conversation.item.input_audio_transcription.completed
1
2
3
4
5
6
7
{
    "event_id": "event_2122",
    "type": "conversation.item.input_audio_transcription.completed",
    "item_id": "msg_003",
    "content_index": 0,
    "transcript": "Hello, how are you?"
}

conversation.item.input_audio_transcription.failed

Returned when input audio transcription is configured, and a transcription request for a user message failed. These events are separate from other error events so that the client can identify the related Item.source

The unique ID of the server event.source

The event type, must be conversation.item.input_audio_transcription.failed.source

The ID of the user message item.source

The index of the content part containing the audio.source

Details of the transcription error.source

OBJECT conversation.item.input_audio_transcription.failed
1
2
3
4
5
6
7
8
9
10
11
12
{
    "event_id": "event_2324",
    "type": "conversation.item.input_audio_transcription.failed",
    "item_id": "msg_003",
    "content_index": 0,
    "error": {
        "type": "transcription_error",
        "code": "audio_unintelligible",
        "message": "The audio could not be transcribed.",
        "param": null
    }
}

conversation.item.truncated

Returned when an earlier assistant audio message item is truncated by the client with a conversation.item.truncate event. This event is used to synchronize the server's understanding of the audio with the client's playback.source

This action will truncate the audio and remove the server-side text transcript to ensure there is no text in the context that hasn't been heard by the user.source

The unique ID of the server event.source

The event type, must be conversation.item.truncated.source

The ID of the assistant message item that was truncated.source

The index of the content part that was truncated.source

The duration up to which the audio was truncated, in milliseconds.source

OBJECT conversation.item.truncated
1
2
3
4
5
6
7
{
    "event_id": "event_2526",
    "type": "conversation.item.truncated",
    "item_id": "msg_004",
    "content_index": 0,
    "audio_end_ms": 1500
}

conversation.item.deleted

Returned when an item in the conversation is deleted by the client with a conversation.item.delete event. This event is used to synchronize the server's understanding of the conversation history with the client's view.source

The unique ID of the server event.source

The event type, must be conversation.item.deleted.source

The ID of the item that was deleted.source

OBJECT conversation.item.deleted
1
2
3
4
5
{
    "event_id": "event_2728",
    "type": "conversation.item.deleted",
    "item_id": "msg_005"
}

input_audio_buffer.committed

Returned when an input audio buffer is committed, either by the client or automatically in server VAD mode. The item_id property is the ID of the user message item that will be created, thus a conversation.item.created event will also be sent to the client.source

The unique ID of the server event.source

The event type, must be input_audio_buffer.committed.source

The ID of the preceding item after which the new item will be inserted.source

The ID of the user message item that will be created.source

OBJECT input_audio_buffer.committed
1
2
3
4
5
6
{
    "event_id": "event_1121",
    "type": "input_audio_buffer.committed",
    "previous_item_id": "msg_001",
    "item_id": "msg_002"
}

input_audio_buffer.cleared

Returned when the input audio buffer is cleared by the client with a input_audio_buffer.clear event.source

The unique ID of the server event.source

The event type, must be input_audio_buffer.cleared.source

OBJECT input_audio_buffer.cleared
1
2
3
4
{
    "event_id": "event_1314",
    "type": "input_audio_buffer.cleared"
}

input_audio_buffer.speech_started

Sent by the server when in server_vad mode to indicate that speech has been detected in the audio buffer. This can happen any time audio is added to the buffer (unless speech is already detected). The client may want to use this event to interrupt audio playback or provide visual feedback to the user.source

The client should expect to receive a input_audio_buffer.speech_stopped event when speech stops. The item_id property is the ID of the user message item that will be created when speech stops and will also be included in the input_audio_buffer.speech_stopped event (unless the client manually commits the audio buffer during VAD activation).source

The unique ID of the server event.source

The event type, must be input_audio_buffer.speech_started.source

Milliseconds from the start of all audio written to the buffer during the session when speech was first detected. This will correspond to the beginning of audio sent to the model, and thus includes the prefix_padding_ms configured in the Session.source

The ID of the user message item that will be created when speech stops.source

OBJECT input_audio_buffer.speech_started
1
2
3
4
5
6
{
    "event_id": "event_1516",
    "type": "input_audio_buffer.speech_started",
    "audio_start_ms": 1000,
    "item_id": "msg_003"
}

input_audio_buffer.speech_stopped

Returned in server_vad mode when the server detects the end of speech in the audio buffer. The server will also send an conversation.item.created event with the user message item that is created from the audio buffer.source

The unique ID of the server event.source

The event type, must be input_audio_buffer.speech_stopped.source

Milliseconds since the session started when speech stopped. This will correspond to the end of audio sent to the model, and thus includes the min_silence_duration_ms configured in the Session.source

The ID of the user message item that will be created.source

OBJECT input_audio_buffer.speech_stopped
1
2
3
4
5
6
{
    "event_id": "event_1718",
    "type": "input_audio_buffer.speech_stopped",
    "audio_end_ms": 2000,
    "item_id": "msg_003"
}

response.created

Returned when a new Response is created. The first event of response creation, where the response is in an initial state of in_progress.source

The unique ID of the server event.source

The event type, must be response.created.source

The response resource.source

OBJECT response.created
1
2
3
4
5
6
7
8
9
10
11
12
{
    "event_id": "event_2930",
    "type": "response.created",
    "response": {
        "id": "resp_001",
        "object": "realtime.response",
        "status": "in_progress",
        "status_details": null,
        "output": [],
        "usage": null
    }
}

response.done

Returned when a Response is done streaming. Always emitted, no matter the final state. The Response object included in the response.done event will include all output Items in the Response but will omit the raw audio data.source

The unique ID of the server event.source

The event type, must be response.done.source

The response resource.source

OBJECT response.done
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
{
    "event_id": "event_3132",
    "type": "response.done",
    "response": {
        "id": "resp_001",
        "object": "realtime.response",
        "status": "completed",
        "status_details": null,
        "output": [
            {
                "id": "msg_006",
                "object": "realtime.item",
                "type": "message",
                "status": "completed",
                "role": "assistant",
                "content": [
                    {
                        "type": "text",
                        "text": "Sure, how can I assist you today?"
                    }
                ]
            }
        ],
        "usage": {
            "total_tokens":275,
            "input_tokens":127,
            "output_tokens":148,
            "input_token_details": {
                "cached_tokens":384,
                "text_tokens":119,
                "audio_tokens":8,
                "cached_tokens_details": {
                    "text_tokens": 128,
                    "audio_tokens": 256
                }
            },
            "output_token_details": {
              "text_tokens":36,
              "audio_tokens":112
            }
        }
    }
}

response.output_item.added

Returned when a new Item is created during Response generation.source

The unique ID of the server event.source

The event type, must be response.output_item.added.source

The ID of the Response to which the item belongs.source

The index of the output item in the Response.source

The item to add to the conversation.source

OBJECT response.output_item.added
1
2
3
4
5
6
7
8
9
10
11
12
13
14
{
    "event_id": "event_3334",
    "type": "response.output_item.added",
    "response_id": "resp_001",
    "output_index": 0,
    "item": {
        "id": "msg_007",
        "object": "realtime.item",
        "type": "message",
        "status": "in_progress",
        "role": "assistant",
        "content": []
    }
}

response.output_item.done

Returned when an Item is done streaming. Also emitted when a Response is interrupted, incomplete, or cancelled.source

The unique ID of the server event.source

The event type, must be response.output_item.done.source

The ID of the Response to which the item belongs.source

The index of the output item in the Response.source

The item to add to the conversation.source

OBJECT response.output_item.done
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
{
    "event_id": "event_3536",
    "type": "response.output_item.done",
    "response_id": "resp_001",
    "output_index": 0,
    "item": {
        "id": "msg_007",
        "object": "realtime.item",
        "type": "message",
        "status": "completed",
        "role": "assistant",
        "content": [
            {
                "type": "text",
                "text": "Sure, I can help with that."
            }
        ]
    }
}

response.content_part.added

Returned when a new content part is added to an assistant message item during response generation.source

The unique ID of the server event.source

The event type, must be response.content_part.added.source

The ID of the response.source

The ID of the item to which the content part was added.source

The index of the output item in the response.source

The index of the content part in the item's content array.source

The content part that was added.source

OBJECT response.content_part.added
1
2
3
4
5
6
7
8
9
10
11
12
{
    "event_id": "event_3738",
    "type": "response.content_part.added",
    "response_id": "resp_001",
    "item_id": "msg_007",
    "output_index": 0,
    "content_index": 0,
    "part": {
        "type": "text",
        "text": ""
    }
}

response.content_part.done

Returned when a content part is done streaming in an assistant message item. Also emitted when a Response is interrupted, incomplete, or cancelled.source

The unique ID of the server event.source

The event type, must be response.content_part.done.source

The ID of the response.source

The ID of the item.source

The index of the output item in the response.source

The index of the content part in the item's content array.source

The content part that is done.source

OBJECT response.content_part.done
1
2
3
4
5
6
7
8
9
10
11
12
{
    "event_id": "event_3940",
    "type": "response.content_part.done",
    "response_id": "resp_001",
    "item_id": "msg_007",
    "output_index": 0,
    "content_index": 0,
    "part": {
        "type": "text",
        "text": "Sure, I can help with that."
    }
}

response.text.delta

Returned when the text value of a "text" content part is updated.source

The unique ID of the server event.source

The event type, must be response.text.delta.source

The ID of the response.source

The ID of the item.source

The index of the output item in the response.source

The index of the content part in the item's content array.source

The text delta.source

OBJECT response.text.delta
1
2
3
4
5
6
7
8
9
{
    "event_id": "event_4142",
    "type": "response.text.delta",
    "response_id": "resp_001",
    "item_id": "msg_007",
    "output_index": 0,
    "content_index": 0,
    "delta": "Sure, I can h"
}

response.text.done

Returned when the text value of a "text" content part is done streaming. Also emitted when a Response is interrupted, incomplete, or cancelled.source

The unique ID of the server event.source

The event type, must be response.text.done.source

The ID of the response.source

The ID of the item.source

The index of the output item in the response.source

The index of the content part in the item's content array.source

The final text content.source

OBJECT response.text.done
1
2
3
4
5
6
7
8
9
{
    "event_id": "event_4344",
    "type": "response.text.done",
    "response_id": "resp_001",
    "item_id": "msg_007",
    "output_index": 0,
    "content_index": 0,
    "text": "Sure, I can help with that."
}

response.audio_transcript.delta

Returned when the model-generated transcription of audio output is updated.source

The unique ID of the server event.source

The event type, must be response.audio_transcript.delta.source

The ID of the response.source

The ID of the item.source

The index of the output item in the response.source

The index of the content part in the item's content array.source

The transcript delta.source

OBJECT response.audio_transcript.delta
1
2
3
4
5
6
7
8
9
{
    "event_id": "event_4546",
    "type": "response.audio_transcript.delta",
    "response_id": "resp_001",
    "item_id": "msg_008",
    "output_index": 0,
    "content_index": 0,
    "delta": "Hello, how can I a"
}

response.audio_transcript.done

Returned when the model-generated transcription of audio output is done streaming. Also emitted when a Response is interrupted, incomplete, or cancelled.source

The unique ID of the server event.source

The event type, must be response.audio_transcript.done.source

The ID of the response.source

The ID of the item.source

The index of the output item in the response.source

The index of the content part in the item's content array.source

The final transcript of the audio.source

OBJECT response.audio_transcript.done
1
2
3
4
5
6
7
8
9
{
    "event_id": "event_4748",
    "type": "response.audio_transcript.done",
    "response_id": "resp_001",
    "item_id": "msg_008",
    "output_index": 0,
    "content_index": 0,
    "transcript": "Hello, how can I assist you today?"
}

response.audio.delta

Returned when the model-generated audio is updated.source

The unique ID of the server event.source

The event type, must be response.audio.delta.source

The ID of the response.source

The ID of the item.source

The index of the output item in the response.source

The index of the content part in the item's content array.source

Base64-encoded audio data delta.source

OBJECT response.audio.delta
1
2
3
4
5
6
7
8
9
{
    "event_id": "event_4950",
    "type": "response.audio.delta",
    "response_id": "resp_001",
    "item_id": "msg_008",
    "output_index": 0,
    "content_index": 0,
    "delta": "Base64EncodedAudioDelta"
}

response.audio.done

Returned when the model-generated audio is done. Also emitted when a Response is interrupted, incomplete, or cancelled.source

The unique ID of the server event.source

The event type, must be response.audio.done.source

The ID of the response.source

The ID of the item.source

The index of the output item in the response.source

The index of the content part in the item's content array.source

OBJECT response.audio.done
1
2
3
4
5
6
7
8
{
    "event_id": "event_5152",
    "type": "response.audio.done",
    "response_id": "resp_001",
    "item_id": "msg_008",
    "output_index": 0,
    "content_index": 0
}

response.function_call_arguments.delta

Returned when the model-generated function call arguments are updated.source

The unique ID of the server event.source

The event type, must be response.function_call_arguments.delta.source

The ID of the response.source

The ID of the function call item.source

The index of the output item in the response.source

The ID of the function call.source

The arguments delta as a JSON string.source

OBJECT response.function_call_arguments.delta
1
2
3
4
5
6
7
8
9
{
    "event_id": "event_5354",
    "type": "response.function_call_arguments.delta",
    "response_id": "resp_002",
    "item_id": "fc_001",
    "output_index": 0,
    "call_id": "call_001",
    "delta": "{\"location\": \"San\""
}

response.function_call_arguments.done

Returned when the model-generated function call arguments are done streaming. Also emitted when a Response is interrupted, incomplete, or cancelled.source

The unique ID of the server event.source

The event type, must be response.function_call_arguments.done.source

The ID of the response.source

The ID of the function call item.source

The index of the output item in the response.source

The ID of the function call.source

The final arguments as a JSON string.source

OBJECT response.function_call_arguments.done
1
2
3
4
5
6
7
8
9
{
    "event_id": "event_5556",
    "type": "response.function_call_arguments.done",
    "response_id": "resp_002",
    "item_id": "fc_001",
    "output_index": 0,
    "call_id": "call_001",
    "arguments": "{\"location\": \"San Francisco\"}"
}

rate_limits.updated

Emitted at the beginning of a Response to indicate the updated rate limits. When a Response is created some tokens will be "reserved" for the output tokens, the rate limits shown here reflect that reservation, which is then adjusted accordingly once the Response is completed.source

The unique ID of the server event.source

The event type, must be rate_limits.updated.source

List of rate limit information.source

OBJECT rate_limits.updated
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
{
    "event_id": "event_5758",
    "type": "rate_limits.updated",
    "rate_limits": [
        {
            "name": "requests",
            "limit": 1000,
            "remaining": 999,
            "reset_seconds": 60
        },
        {
            "name": "tokens",
            "limit": 50000,
            "remaining": 49950,
            "reset_seconds": 60
        }
    ]
}