OpenAI API参考文档 | OpenAI开发文档|OpenAI中文官方文档|ChatGPT中文版|ChatGPT教程

客户端事件

这些是 OpenAI Realtime WebSocket 服务器将从客户端接受的事件。source

会话更新

发送此事件以更新会话的默认配置。客户可以随时发送此事件以更新会话配置，以及任何字段可能随时更新，但 “voice” 除外。服务器将响应替换为session.updated事件，该事件显示完整的有效配置。只有存在的字段才会更新，因此清除字段（如 “instructions” ）用于传递空字符串。source

event_id

字符串source

用于标识此事件的可选客户端生成的 ID。source

类型

字符串source

事件类型必须为session.update.source

会期

对象source

实时会话对象配置。source

对象 session.update

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
{
    "event_id": "event_123",
    "type": "session.update",
    "session": {
        "modalities": ["text", "audio"],
        "instructions": "You are a helpful assistant.",
        "voice": "sage",
        "input_audio_format": "pcm16",
        "output_audio_format": "pcm16",
        "input_audio_transcription": {
            "model": "whisper-1"
        },
        "turn_detection": {
            "type": "server_vad",
            "threshold": 0.5,
            "prefix_padding_ms": 300,
            "silence_duration_ms": 500
        },
        "tools": [
            {
                "type": "function",
                "name": "get_weather",
                "description": "Get the current weather...",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "location": { "type": "string" }
                    },
                    "required": ["location"]
                }
            }
        ],
        "tool_choice": "auto",
        "temperature": 0.8,
        "max_response_output_tokens": "inf"
    }
}

input_audio_buffer.append

发送此事件以将音频字节追加到输入音频缓冲区。音频 buffer 是您可以写入并在以后提交的临时存储。在服务器 VAD 中模式下，音频缓冲区用于检测语音，服务器将决定何时提交。禁用服务器 VAD 后，必须提交音频缓冲区手动地。source

客户端可以选择在每个事件中放置多少音频，但最多可以放置多少音频的 15 MiB 中，例如，从客户端流式传输较小的数据块可能允许 VAD 的响应速度更快。与创建的其他客户端事件不同，服务器将不发送对此事件的确认响应。source

event_id

字符串source

用于标识此事件的可选客户端生成的 ID。source

类型

字符串source

事件类型必须为input_audio_buffer.append.source

音频

字符串source

Base64 编码的音频字节。这必须采用input_audio_format字段。source

对象 input_audio_buffer.append

1
2
3
4
5
{
    "event_id": "event_456",
    "type": "input_audio_buffer.append",
    "audio": "Base64EncodedAudioData"
}

input_audio_buffer.commit

发送此事件以提交用户输入音频缓冲区，这将创建一个对话中的 New User Message 项。此事件将产生错误如果 Input Audio buffer 为空。当处于 Server VAD 模式时，客户端执行不需要发送此事件，服务器将提交音频缓冲区自然而然。source

提交输入音频缓冲区将触发输入音频转录（如果在会话配置中启用），但它不会创建响应从模型中。服务器将使用input_audio_buffer.committed事件。source

event_id

字符串source

用于标识此事件的可选客户端生成的 ID。source

类型

字符串source

事件类型必须为input_audio_buffer.commit.source

对象 input_audio_buffer.commit

1
2
3
4
{
    "event_id": "event_789",
    "type": "input_audio_buffer.commit"
}

input_audio_buffer.clear

发送此事件以清除缓冲区中的音频字节。服务器将使用input_audio_buffer.cleared事件。source

event_id

字符串source

用于标识此事件的可选客户端生成的 ID。source

类型

字符串source

事件类型必须为input_audio_buffer.clear.source

对象 input_audio_buffer.clear

1
2
3
4
{
    "event_id": "event_012",
    "type": "input_audio_buffer.clear"
}

conversation.item.create

向 Conversation 的上下文添加新 Item，包括 messages， function calls 和函数调用响应。此事件可用于填充 “history” 并添加新项目，但具有当前限制，它无法填充 Assistant 音频消息。source

如果成功，服务器将使用conversation.item.createdevent 的error事件。source

event_id

字符串source

用于标识此事件的可选客户端生成的 ID。source

类型

字符串source

事件类型必须为conversation.item.create.source

previous_item_id

字符串source

前一项的 ID，新项将在其后插入。如果未设置，则新项目将附加到对话的末尾。如果设置，则允许在对话中插入项目。如果 ID 找不到，则将返回错误，并且不会添加该项目。source

项目

对象source

要添加到对话中的项。source

对象 conversation.item.create

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
{
    "event_id": "event_345",
    "type": "conversation.item.create",
    "previous_item_id": null,
    "item": {
        "id": "msg_001",
        "type": "message",
        "role": "user",
        "content": [
            {
                "type": "input_text",
                "text": "Hello, how are you?"
            }
        ]
    }
}

conversation.item.truncate

发送此事件可截断上一个助理消息的音频。服务器将比 RealTime 更快地生成音频，因此当用户 interrupts 截断已发送到客户端但未发送到客户端的音频但玩了。这将使服务器对音频的理解与客户端的播放。source

截断音频将删除服务器端文本转录，以确保存在不是用户未听到的上下文中的文本。source

如果成功，服务器将使用conversation.item.truncated事件。source

event_id

字符串source

用于标识此事件的可选客户端生成的 ID。source

类型

字符串source

事件类型必须为conversation.item.truncate.source

item_id

字符串source

需要截断的助手消息项的 ID。仅助手消息项可以截断。source

content_index

整数source

要截断的内容部分的索引。将此项设置为 0。source

audio_end_ms

整数source

非独占持续时间，最长为音频被截断的持续时间（以毫秒为单位）。如果 audio_end_ms大于实际音频持续时间，则服务器将响应错误。source

对象 conversation.item.truncate

1
2
3
4
5
6
7
{
    "event_id": "event_678",
    "type": "conversation.item.truncate",
    "item_id": "msg_002",
    "content_index": 0,
    "audio_end_ms": 1500
}

conversation.item.delete

当您想从对话中删除任何项目时发送此事件历史。服务器将使用conversation.item.deleted事件除非该项目在对话历史记录中不存在，在这种情况下，服务器将响应错误。source

event_id

字符串source

用于标识此事件的可选客户端生成的 ID。source

类型

字符串source

事件类型必须为conversation.item.delete.source

item_id

字符串source

要删除的项目的 ID。source

对象 conversation.item.delete

1
2
3
4
5
{
    "event_id": "event_901",
    "type": "conversation.item.delete",
    "item_id": "msg_003"
}

response.create

此事件指示服务器创建一个 Response，这意味着触发模型推理。当处于 Server VAD 模式时，服务器将创建 Responses 自然而然。source

响应将至少包含一个 Item，并且可能包含两个，在这种情况下第二个将是函数调用。这些 Item 将被附加到对话历史。source

服务器将使用response.created事件，项目的事件和内容创建，最后创建一个response.doneevent 来指示响应已完成。source

这response.create事件包括推理配置，例如instructions和temperature.这些字段将覆盖 Session 的配置。source

event_id

字符串source

用于标识此事件的可选客户端生成的 ID。source

类型

字符串source

事件类型必须为response.create.source

响应

对象source

实时会话对象配置。source

OBJECT response.create

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
{
    "event_id": "event_234",
    "type": "response.create",
    "response": {
        "modalities": ["text", "audio"],
        "instructions": "Please assist the user.",
        "voice": "sage",
        "output_audio_format": "pcm16",
        "tools": [
            {
                "type": "function",
                "name": "calculate_sum",
                "description": "Calculates the sum of two numbers.",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "a": { "type": "number" },
                        "b": { "type": "number" }
                    },
                    "required": ["a", "b"]
                }
            }
        ],
        "tool_choice": "auto",
        "temperature": 0.7,
        "max_output_tokens": 150
    }
}

response.cancel

发送此事件可取消正在进行的响应。服务器将响应替换为response.cancelled事件或错误（如果没有对取消。source

event_id

字符串source

用于标识此事件的可选客户端生成的 ID。source

类型

字符串source

事件类型必须为response.cancel.source

OBJECT 响应.cancel

1
2
3
4
{
    "event_id": "event_567",
    "type": "response.cancel"
}