{"id":44112329,"url":"https://github.com/chicogong/realtime-ai","last_synced_at":"2026-02-08T16:34:15.076Z","repository":{"id":294234663,"uuid":"982115475","full_name":"chicogong/realtime-ai","owner":"chicogong","description":"Real-time AI voice conversation platform with WebSocket, supporting streaming STT/LLM/TTS","archived":false,"fork":false,"pushed_at":"2025-12-24T08:11:35.000Z","size":377,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-12-25T01:30:05.793Z","etag":null,"topics":["azure-speech","fastapi","openai","python","real-time","realtime-voice","streaming","vad","voice-assistant"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/chicogong.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-05-12T11:55:03.000Z","updated_at":"2025-12-24T11:47:37.000Z","dependencies_parsed_at":"2025-05-19T14:37:37.517Z","dependency_job_id":null,"html_url":"https://github.com/chicogong/realtime-ai","commit_stats":null,"previous_names":["chicogong/realtime-ai"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/chicogong/realtime-ai","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chicogong%2Frealtime-ai","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chicogong%2Frealtime-ai/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chicogong%2Frealtime-ai/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chicogong%2Frealtime-ai/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/chicogong","download_url":"https://codeload.github.com/chicogong/realtime-ai/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chicogong%2Frealtime-ai/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29236900,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-08T14:18:14.570Z","status":"ssl_error","status_checked_at":"2026-02-08T14:18:14.071Z","response_time":57,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["azure-speech","fastapi","openai","python","real-time","realtime-voice","streaming","vad","voice-assistant"],"created_at":"2026-02-08T16:34:14.604Z","updated_at":"2026-02-08T16:34:15.070Z","avatar_url":"https://github.com/chicogong.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# 实时AI语音对话\n\n[![Python](https://img.shields.io/badge/Python-3.8+-3776AB?style=flat-square\u0026logo=python\u0026logoColor=white)](https://www.python.org/)\n[![FastAPI](https://img.shields.io/badge/FastAPI-009688?style=flat-square\u0026logo=fastapi\u0026logoColor=white)](https://fastapi.tiangolo.com/)\n[![WebSocket](https://img.shields.io/badge/WebSocket-Real--time-010101?style=flat-square\u0026logo=websocket\u0026logoColor=white)](https://developer.mozilla.org/en-US/docs/Web/API/WebSockets_API)\n[![Azure Speech](https://img.shields.io/badge/Azure-Speech%20Services-0078D4?style=flat-square\u0026logo=microsoftazure\u0026logoColor=white)](https://azure.microsoft.com/en-us/products/ai-services/speech-services)\n[![OpenAI](https://img.shields.io/badge/OpenAI-Compatible-412991?style=flat-square\u0026logo=openai\u0026logoColor=white)](https://openai.com/)\n[![License](https://img.shields.io/badge/License-MIT-green?style=flat-square)](LICENSE)\n[![Tests](https://img.shields.io/badge/Tests-114%20passed-success?style=flat-square\u0026logo=pytest\u0026logoColor=white)](tests/)\n[![Coverage](https://img.shields.io/badge/Coverage-63%25-yellow?style=flat-square\u0026logo=codecov\u0026logoColor=white)](tests/)\n[![Code Style](https://img.shields.io/badge/Code%20Style-Ruff-D7FF64?style=flat-square\u0026logo=ruff\u0026logoColor=black)](https://github.com/astral-sh/ruff)\n[![Pre-commit](https://img.shields.io/badge/Pre--commit-enabled-brightgreen?style=flat-square\u0026logo=pre-commit\u0026logoColor=white)](https://pre-commit.com/)\n\n[English](README.en.md) | 中文\n\n一个低延迟、高质量的实时语音对话平台，允许用户通过麦克风与AI进行自然对话。系统采用流式处理架构，支持动态对话流程，包括实时打断和智能转向检测。\n\n## 界面预览\n\n![Web界面](docs/images/web-interface.png)\n\n## 系统架构\n\n```mermaid\ngraph TB\n    subgraph Client[\"🌐 客户端 (Web Browser)\"]\n        MIC[🎤 麦克风]\n        SPK[🔊 扬声器]\n        UI[Web UI]\n    end\n\n    subgraph Server[\"⚙️ 服务器 (FastAPI)\"]\n        WS[WebSocket Handler]\n        \n        subgraph Pipeline[\"语音处理管道\"]\n            STT[🗣️ STT\u003cbr/\u003eAzure Speech]\n            LLM[🧠 LLM\u003cbr/\u003eOpenAI/本地]\n            TTS[🔈 TTS\u003cbr/\u003eAzure/MiniMax]\n        end\n        \n        SM[Session Manager]\n        VAD[Voice Activity\u003cbr/\u003eDetection]\n    end\n\n    MIC --\u003e|PCM 音频| WS\n    WS --\u003e|音频流| STT\n    STT --\u003e|文本| LLM\n    LLM --\u003e|响应文本| TTS\n    TTS --\u003e|PCM 音频| WS\n    WS --\u003e|音频流| SPK\n    \n    WS \u003c--\u003e|状态同步| SM\n    WS --\u003e|打断检测| VAD\n    \n    UI \u003c--\u003e|控制命令| WS\n```\n\n### 数据流程\n\n```mermaid\ngraph LR\n    A[🎤 麦克风] --\u003e|PCM采集| B[WebSocket]\n    B --\u003e|音频流| C[STT]\n    C --\u003e|文本| D[LLM]\n    D --\u003e|响应| E[TTS]\n    E --\u003e|音频流| F[WebSocket]\n    F --\u003e|PCM播放| G[🔊 扬声器]\n    \n    style A fill:#e1f5fe\n    style G fill:#e1f5fe\n    style D fill:#fff3e0\n```\n\n### WebSocket协议\n\n系统使用WebSocket进行实时双向通信，支持以下消息类型：\n\n#### 客户端到服务器消息\n\n| 消息类型     | 格式                         | 用途                  |\n|--------------|------------------------------|------------------------|\n| `start`      | `{\"type\": \"start\"}`          | 开始对话               |\n| `stop`       | `{\"type\": \"stop\"}`           | 停止对话和处理         |\n| `reset`      | `{\"type\": \"reset\"}`          | 重置对话状态           |\n| `interrupt`  | `{\"type\": \"interrupt\"}`      | 客户端请求打断当前响应 |\n\n#### 服务器到客户端消息\n\n| 消息类型                | 格式                                                                                           | 用途                    |\n|-------------------------|------------------------------------------------------------------------------------------------|-------------------------|\n| `partial_transcript`    | `{\"type\": \"partial_transcript\", \"content\": \"文本\", \"session_id\": \"会话ID\"}`                    | 实时转录字幕            |\n| `final_transcript`      | `{\"type\": \"final_transcript\", \"content\": \"文本\", \"session_id\": \"会话ID\"}`                      | 最终转录结果            |\n| `llm_status`            | `{\"type\": \"llm_status\", \"status\": \"processing\", \"session_id\": \"会话ID\"}`                       | LLM处理状态             |\n| `llm_response`          | `{\"type\": \"llm_response\", \"content\": \"文本\", \"is_complete\": true/false, \"session_id\": \"会话ID\"}` | AI文本回复              |\n| `tts_start`             | `{\"type\": \"tts_start\", \"format\": \"格式\", \"is_first\": true/false, \"text\": \"文本\", \"session_id\": \"会话ID\"}` | TTS音频开始            |\n| `tts_end`               | `{\"type\": \"tts_end\", \"session_id\": \"会话ID\"}`                                                 | TTS音频结束             |\n| `tts_stop`              | `{\"type\": \"tts_stop\", \"session_id\": \"会话ID\"}`                                                | 通知客户端停止TTS音频播放 |\n| `status`                | `{\"type\": \"status\", \"status\": \"listening/stopped\", \"session_id\": \"会话ID\"}`                    | 系统状态更新            |\n| `error`                 | `{\"type\": \"error\", \"message\": \"错误信息\", \"session_id\": \"会话ID\"}`                             | 错误消息                |\n| `stop_acknowledged`     | `{\"type\": \"stop_acknowledged\", \"message\": \"所有处理已停止\", \"queues_cleared\": true, \"session_id\": \"会话ID\"}` | 停止命令确认回复        |\n| `interrupt_acknowledged`| `{\"type\": \"interrupt_acknowledged\", \"session_id\": \"会话ID\"}`                                  | 中断请求确认回复        |\n\n#### 二进制音频数据\n\n除了JSON消息外，系统还通过WebSocket传输二进制音频数据：\n\n**客户端到服务器**：\n- 格式: `[8字节头部][PCM音频数据]`\n- 头部: `[4字节时间戳][4字节状态标志]`\n- 状态标志包含音频能量、麦克风状态等信息\n\n**服务器到客户端**：\n- 格式: 直接传输PCM音频数据\n- 配合`tts_start`和`tts_end`消息标记音频流的开始和结束\n\n### 音频传输规范\n\n#### 客户端到服务器（用户语音）\n- **音频格式**: 16位PCM\n- **采样率**: 24kHz\n- **声道数**: 单声道\n- **传输协议**: WebSocket二进制传输\n- **分块大小**: 2048样本/块\n\n#### 服务器到客户端（AI语音）\n- **音频格式**: 16位PCM\n- **采样率**: 24kHz\n- **声道数**: 单声道\n- **传输协议**: WebSocket二进制数据\n\n### 语音处理\n\n#### 语音识别(STT)\n- **引擎**: Azure语音服务\n\n#### 文本生成(LLM)\n- **支持**:\n  - OpenAI API\n  - 兼容的本地服务\n\n#### 语音合成(TTS)\n- **支持引擎**:\n  - Azure TTS\n  - MiniMax TTS\n\n## 安装与设置\n\n1. 克隆代码库\n```bash\ngit clone https://github.com/chicogong/realtime-ai.git\ncd realtime-ai\n```\n\n2. 安装依赖\n```bash\npip install -r requirements.txt\n```\n\n3. 配置环境变量\n```bash\ncp .env.example .env\n# 编辑 .env 文件，填入你的 API 密钥\n```\n\n4. 运行应用\n```bash\npython app.py\n```\n\n5. 在浏览器中打开 `http://localhost:8000`\n\n## 项目结构\n\n```\n├── app.py              # 应用程序入口点\n├── config.py           # 配置设置\n├── session.py          # 会话管理\n├── services/           # 服务模块\n│   ├── asr/            # 语音识别服务\n│   ├── llm/            # 语言模型服务\n│   └── tts/            # 文本到语音服务\n├── websocket/          # WebSocket 处理\n│   ├── handler.py      # 连接处理\n│   └── pipeline.py     # 处理管道\n├── static/             # 前端资源\n│   ├── css/            # 样式表\n│   ├── js/             # JavaScript 文件\n│   └── index.html      # 主界面\n└── utils/              # 工具函数\n```\n\n## 功能特点\n\n- 实时语音转文字识别\n- 流式LLM响应\n- 高质量文本到语音合成\n- 打断检测\n- 自然对话流程\n\n## 贡献\n\n欢迎贡献！请查看 [贡献指南](CONTRIBUTING.md)。\n\n## 许可证\n\n[MIT](LICENSE)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchicogong%2Frealtime-ai","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fchicogong%2Frealtime-ai","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchicogong%2Frealtime-ai/lists"}