Server events
These are events emitted from the OpenAI Realtime WebSocket server to the client.
error
Returned when an error occurs, which could be a client problem or a server problem. Most errors are recoverable and the session will stay open, we recommend to implementors to monitor and log error messages by default.
1
2
3
4
5
6
7
8
9
10
11
{
"event_id": "event_890",
"type": "error",
"error": {
"type": "invalid_request_error",
"code": "invalid_event",
"message": "The 'type' field is missing.",
"param": null,
"event_id": "event_567"
}
}
session.created
Returned when a Session is created. Emitted automatically when a new connection is established as the first server event. This event will contain the default Session configuration.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
{
"event_id": "event_1234",
"type": "session.created",
"session": {
"id": "sess_001",
"object": "realtime.session",
"model": "gpt-4o-realtime-preview-2024-10-01",
"modalities": ["text", "audio"],
"instructions": "...model instructions here...",
"voice": "sage",
"input_audio_format": "pcm16",
"output_audio_format": "pcm16",
"input_audio_transcription": null,
"turn_detection": {
"type": "server_vad",
"threshold": 0.5,
"prefix_padding_ms": 300,
"silence_duration_ms": 200
},
"tools": [],
"tool_choice": "auto",
"temperature": 0.8,
"max_response_output_tokens": "inf"
}
}
session.updated
Returned when a session is updated with a session.update
event, unless
there is an error.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
{
"event_id": "event_5678",
"type": "session.updated",
"session": {
"id": "sess_001",
"object": "realtime.session",
"model": "gpt-4o-realtime-preview-2024-10-01",
"modalities": ["text"],
"instructions": "New instructions",
"voice": "sage",
"input_audio_format": "pcm16",
"output_audio_format": "pcm16",
"input_audio_transcription": {
"model": "whisper-1"
},
"turn_detection": null,
"tools": [],
"tool_choice": "none",
"temperature": 0.7,
"max_response_output_tokens": 200
}
}
conversation.created
Returned when a conversation is created. Emitted right after session creation.
1
2
3
4
5
6
7
8
{
"event_id": "event_9101",
"type": "conversation.created",
"conversation": {
"id": "conv_001",
"object": "realtime.conversation"
}
}
conversation.item.created
Returned when a conversation item is created. There are several scenarios that produce this event:
- The server is generating a Response, which if successful will produce
either one or two Items, which will be of type
message
(roleassistant
) or typefunction_call
. - The input audio buffer has been committed, either by the client or the
server (in
server_vad
mode). The server will take the content of the input audio buffer and add it to a new user message Item. - The client has sent a
conversation.item.create
event to add a new Item to the Conversation.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
{
"event_id": "event_1920",
"type": "conversation.item.created",
"previous_item_id": "msg_002",
"item": {
"id": "msg_003",
"object": "realtime.item",
"type": "message",
"status": "completed",
"role": "user",
"content": [
{
"type": "input_audio",
"transcript": "hello how are you",
"audio": "base64encodedaudio=="
}
]
}
}
conversation.item.input_audio_transcription.completed
This event is the output of audio transcription for user audio written to the
user audio buffer. Transcription begins when the input audio buffer is
committed by the client or server (in server_vad
mode). Transcription runs
asynchronously with Response creation, so this event may come before or after
the Response events.
Realtime API models accept audio natively, and thus input transcription is a
separate process run on a separate ASR (Automatic Speech Recognition) model,
currently always whisper-1
. Thus the transcript may diverge somewhat from
the model's interpretation, and should be treated as a rough guide.
1
2
3
4
5
6
7
{
"event_id": "event_2122",
"type": "conversation.item.input_audio_transcription.completed",
"item_id": "msg_003",
"content_index": 0,
"transcript": "Hello, how are you?"
}
conversation.item.input_audio_transcription.failed
Returned when input audio transcription is configured, and a transcription
request for a user message failed. These events are separate from other
error
events so that the client can identify the related Item.
1
2
3
4
5
6
7
8
9
10
11
12
{
"event_id": "event_2324",
"type": "conversation.item.input_audio_transcription.failed",
"item_id": "msg_003",
"content_index": 0,
"error": {
"type": "transcription_error",
"code": "audio_unintelligible",
"message": "The audio could not be transcribed.",
"param": null
}
}
conversation.item.truncated
Returned when an earlier assistant audio message item is truncated by the
client with a conversation.item.truncate
event. This event is used to
synchronize the server's understanding of the audio with the client's playback.
This action will truncate the audio and remove the server-side text transcript to ensure there is no text in the context that hasn't been heard by the user.
1
2
3
4
5
6
7
{
"event_id": "event_2526",
"type": "conversation.item.truncated",
"item_id": "msg_004",
"content_index": 0,
"audio_end_ms": 1500
}
conversation.item.deleted
Returned when an item in the conversation is deleted by the client with a
conversation.item.delete
event. This event is used to synchronize the
server's understanding of the conversation history with the client's view.
1
2
3
4
5
{
"event_id": "event_2728",
"type": "conversation.item.deleted",
"item_id": "msg_005"
}
input_audio_buffer.committed
Returned when an input audio buffer is committed, either by the client or
automatically in server VAD mode. The item_id
property is the ID of the user
message item that will be created, thus a conversation.item.created
event
will also be sent to the client.
1
2
3
4
5
6
{
"event_id": "event_1121",
"type": "input_audio_buffer.committed",
"previous_item_id": "msg_001",
"item_id": "msg_002"
}
input_audio_buffer.cleared
Returned when the input audio buffer is cleared by the client with a
input_audio_buffer.clear
event.
1
2
3
4
{
"event_id": "event_1314",
"type": "input_audio_buffer.cleared"
}
input_audio_buffer.speech_started
Sent by the server when in server_vad
mode to indicate that speech has been
detected in the audio buffer. This can happen any time audio is added to the
buffer (unless speech is already detected). The client may want to use this
event to interrupt audio playback or provide visual feedback to the user.
The client should expect to receive a input_audio_buffer.speech_stopped
event
when speech stops. The item_id
property is the ID of the user message item
that will be created when speech stops and will also be included in the
input_audio_buffer.speech_stopped
event (unless the client manually commits
the audio buffer during VAD activation).
Milliseconds from the start of all audio written to the buffer during the
session when speech was first detected. This will correspond to the
beginning of audio sent to the model, and thus includes the
prefix_padding_ms
configured in the Session.
1
2
3
4
5
6
{
"event_id": "event_1516",
"type": "input_audio_buffer.speech_started",
"audio_start_ms": 1000,
"item_id": "msg_003"
}
input_audio_buffer.speech_stopped
Returned in server_vad
mode when the server detects the end of speech in
the audio buffer. The server will also send an conversation.item.created
event with the user message item that is created from the audio buffer.
Milliseconds since the session started when speech stopped. This will
correspond to the end of audio sent to the model, and thus includes the
min_silence_duration_ms
configured in the Session.
1
2
3
4
5
6
{
"event_id": "event_1718",
"type": "input_audio_buffer.speech_stopped",
"audio_end_ms": 2000,
"item_id": "msg_003"
}
response.created
Returned when a new Response is created. The first event of response creation,
where the response is in an initial state of in_progress
.
1
2
3
4
5
6
7
8
9
10
11
12
{
"event_id": "event_2930",
"type": "response.created",
"response": {
"id": "resp_001",
"object": "realtime.response",
"status": "in_progress",
"status_details": null,
"output": [],
"usage": null
}
}
response.done
Returned when a Response is done streaming. Always emitted, no matter the
final state. The Response object included in the response.done
event will
include all output Items in the Response but will omit the raw audio data.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
{
"event_id": "event_3132",
"type": "response.done",
"response": {
"id": "resp_001",
"object": "realtime.response",
"status": "completed",
"status_details": null,
"output": [
{
"id": "msg_006",
"object": "realtime.item",
"type": "message",
"status": "completed",
"role": "assistant",
"content": [
{
"type": "text",
"text": "Sure, how can I assist you today?"
}
]
}
],
"usage": {
"total_tokens":275,
"input_tokens":127,
"output_tokens":148,
"input_token_details": {
"cached_tokens":384,
"text_tokens":119,
"audio_tokens":8,
"cached_tokens_details": {
"text_tokens": 128,
"audio_tokens": 256
}
},
"output_token_details": {
"text_tokens":36,
"audio_tokens":112
}
}
}
}
response.output_item.added
Returned when a new Item is created during Response generation.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
{
"event_id": "event_3334",
"type": "response.output_item.added",
"response_id": "resp_001",
"output_index": 0,
"item": {
"id": "msg_007",
"object": "realtime.item",
"type": "message",
"status": "in_progress",
"role": "assistant",
"content": []
}
}
response.output_item.done
Returned when an Item is done streaming. Also emitted when a Response is interrupted, incomplete, or cancelled.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
{
"event_id": "event_3536",
"type": "response.output_item.done",
"response_id": "resp_001",
"output_index": 0,
"item": {
"id": "msg_007",
"object": "realtime.item",
"type": "message",
"status": "completed",
"role": "assistant",
"content": [
{
"type": "text",
"text": "Sure, I can help with that."
}
]
}
}
response.content_part.added
Returned when a new content part is added to an assistant message item during response generation.
1
2
3
4
5
6
7
8
9
10
11
12
{
"event_id": "event_3738",
"type": "response.content_part.added",
"response_id": "resp_001",
"item_id": "msg_007",
"output_index": 0,
"content_index": 0,
"part": {
"type": "text",
"text": ""
}
}
response.content_part.done
Returned when a content part is done streaming in an assistant message item. Also emitted when a Response is interrupted, incomplete, or cancelled.
1
2
3
4
5
6
7
8
9
10
11
12
{
"event_id": "event_3940",
"type": "response.content_part.done",
"response_id": "resp_001",
"item_id": "msg_007",
"output_index": 0,
"content_index": 0,
"part": {
"type": "text",
"text": "Sure, I can help with that."
}
}
response.text.delta
Returned when the text value of a "text" content part is updated.
1
2
3
4
5
6
7
8
9
{
"event_id": "event_4142",
"type": "response.text.delta",
"response_id": "resp_001",
"item_id": "msg_007",
"output_index": 0,
"content_index": 0,
"delta": "Sure, I can h"
}
response.text.done
Returned when the text value of a "text" content part is done streaming. Also emitted when a Response is interrupted, incomplete, or cancelled.
1
2
3
4
5
6
7
8
9
{
"event_id": "event_4344",
"type": "response.text.done",
"response_id": "resp_001",
"item_id": "msg_007",
"output_index": 0,
"content_index": 0,
"text": "Sure, I can help with that."
}
response.audio_transcript.delta
Returned when the model-generated transcription of audio output is updated.
1
2
3
4
5
6
7
8
9
{
"event_id": "event_4546",
"type": "response.audio_transcript.delta",
"response_id": "resp_001",
"item_id": "msg_008",
"output_index": 0,
"content_index": 0,
"delta": "Hello, how can I a"
}
response.audio_transcript.done
Returned when the model-generated transcription of audio output is done streaming. Also emitted when a Response is interrupted, incomplete, or cancelled.
1
2
3
4
5
6
7
8
9
{
"event_id": "event_4748",
"type": "response.audio_transcript.done",
"response_id": "resp_001",
"item_id": "msg_008",
"output_index": 0,
"content_index": 0,
"transcript": "Hello, how can I assist you today?"
}
response.audio.delta
Returned when the model-generated audio is updated.
1
2
3
4
5
6
7
8
9
{
"event_id": "event_4950",
"type": "response.audio.delta",
"response_id": "resp_001",
"item_id": "msg_008",
"output_index": 0,
"content_index": 0,
"delta": "Base64EncodedAudioDelta"
}
response.audio.done
Returned when the model-generated audio is done. Also emitted when a Response is interrupted, incomplete, or cancelled.
1
2
3
4
5
6
7
8
{
"event_id": "event_5152",
"type": "response.audio.done",
"response_id": "resp_001",
"item_id": "msg_008",
"output_index": 0,
"content_index": 0
}
response.function_call_arguments.delta
Returned when the model-generated function call arguments are updated.
1
2
3
4
5
6
7
8
9
{
"event_id": "event_5354",
"type": "response.function_call_arguments.delta",
"response_id": "resp_002",
"item_id": "fc_001",
"output_index": 0,
"call_id": "call_001",
"delta": "{\"location\": \"San\""
}
response.function_call_arguments.done
Returned when the model-generated function call arguments are done streaming. Also emitted when a Response is interrupted, incomplete, or cancelled.
1
2
3
4
5
6
7
8
9
{
"event_id": "event_5556",
"type": "response.function_call_arguments.done",
"response_id": "resp_002",
"item_id": "fc_001",
"output_index": 0,
"call_id": "call_001",
"arguments": "{\"location\": \"San Francisco\"}"
}
rate_limits.updated
Emitted at the beginning of a Response to indicate the updated rate limits. When a Response is created some tokens will be "reserved" for the output tokens, the rate limits shown here reflect that reservation, which is then adjusted accordingly once the Response is completed.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
{
"event_id": "event_5758",
"type": "rate_limits.updated",
"rate_limits": [
{
"name": "requests",
"limit": 1000,
"remaining": 999,
"reset_seconds": 60
},
{
"name": "tokens",
"limit": 50000,
"remaining": 49950,
"reset_seconds": 60
}
]
}