{"id":50100593,"url":"https://github.com/justplus/turn-detection","last_synced_at":"2026-05-23T07:12:40.898Z","repository":{"id":315215222,"uuid":"1050831396","full_name":"justplus/turn-detection","owner":"justplus","description":"人机对话轮次检测模型，有效解决声学VAD等待过长的问题","archived":false,"fork":false,"pushed_at":"2025-09-17T09:00:44.000Z","size":433,"stargazers_count":3,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-09-17T11:13:28.697Z","etag":null,"topics":["turn","turn-detection"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/justplus.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-09-05T02:33:09.000Z","updated_at":"2025-09-17T09:00:48.000Z","dependencies_parsed_at":"2025-09-17T11:13:38.826Z","dependency_job_id":null,"html_url":"https://github.com/justplus/turn-detection","commit_stats":null,"previous_names":["justplus/turn-detection"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/justplus/turn-detection","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/justplus%2Fturn-detection","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/justplus%2Fturn-detection/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/justplus%2Fturn-detection/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/justplus%2Fturn-detection/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/justplus","download_url":"https://codeload.github.com/justplus/turn-detection/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/justplus%2Fturn-detection/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33386197,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-23T04:15:53.637Z","status":"ssl_error","status_checked_at":"2026-05-23T04:15:53.242Z","response_time":53,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["turn","turn-detection"],"created_at":"2026-05-23T07:12:40.188Z","updated_at":"2026-05-23T07:12:40.885Z","avatar_url":"https://github.com/justplus.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Turn Detection - 对话轮次检测模型\n\n## 1. 介绍\n\nTurn Detection（对话轮次检测）是一个用于人机对话系统中的关键技术，主要用于：\n- **对话边界识别**：准确判断用户何时结束当前发言，避免对话系统过早或过晚响应\n- **多轮对话管理**：在连续对话中识别每个对话轮次的开始和结束，提升对话体验\n- **实时交互优化**：通过精准的轮次检测，实现更自然流畅的人机交互\n- **语音助手增强**：为语音助手、客服机器人等应用提供更智能的对话控制\n\n模型基于gemma3 270M模型进行微调，提供了完整的数据集和微调脚本。\n效果媲美7B模型效果。\n\n\n## 2. 主要特点\n\n### 🔄 支持多轮对话\n- 能够处理复杂的多轮对话场景\n- 准确识别对话中的停顿、思考和真正的轮次结束\n- 支持上下文感知的轮次判断\n\n  支持多轮对话的重要性：\n  ```\n  user: 我们来个成语接龙吧？\n  assistant: 那我先来，杞人忧天。该你了\n  user: 天天向上\n  ```\n  非多轮对话下单一的\"天天向上\"是不完整的，但是放在上下文中则应该是完整的。\n\n### 🚀 轻量化推理\n- 模型参数仅270M，资源占用低\n- 支持CPU推理，无需GPU即可部署\n- 推理速度快，满足实时对话需求\n- 适合边缘设备和资源受限环境\n\n### 🌍 多语言支持\n- 原生支持中文和英文对话检测\n- 模型架构支持微调扩展到其他语言\n- 跨语言泛化能力强\n\n### 🛠️ 可定制化\n- 提供完整的微调框架\n- 支持针对特定领域和语言的定制训练\n- 灵活的数据处理和训练流程\n\n### 🙅‍♂️ 支持等待状态\n- **0 (不完整)**：用户话语未说完，需要等待继续输入\n- **1 (完整)**：用户话语表达完整，可以进行回复\n- **2 (要求等待)**：用户要求暂停或打断AI回复\n\n## 3. 微调过程\n\n### 数据集构造\n中文单轮和多轮数据：使用LLM合成\n英文单轮和多轮数据：[turns-2k](https://huggingface.co/datasets/latishab/turns-2k/)数据集使用LLM扩展为多轮\n\n### 微调训练\n使用 `finetune.py` 进行模型微调：\n```bash\npip install -r requirements.txt\npython finetune.py\n```\n\n如果微调的过程中出现下面的错误，unsloth依赖的triton版本过高，需要卸载triton版本，重新安装triton-3.2.0版本\n```bash\npip uninsatll triton\npip install triton==3.2.0\n```\n```plain text\ntorch._inductor.exc.InductorError: AttributeError: type object 'CompiledKernel' has no attribute 'launch_enter_hook'\n\nSet TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS=\"+dynamo\"\n```\n\n\n## 4. 模型效果\n\n### 效果指标\n中文准确率: 0.9591 (258/269)\u003cbr/\u003e\n英文准确率: 0.9654 (223/231)\u003cbr/\u003e\n总体准确率: 0.9620 (481/500)\n\n### 性能指标\nNvidia T4单卡推理耗时: \u003c100ms (P95\u003c80ms)\n\n## 5. 部署与推理\n### 推理注意事项\n- 中文场景使用中文的[system prompt](system_prompt_cn.txt)，英文场景使用英文的[system prompt](system_prompt_en.txt)\n- wait场景在多轮对话中才有效，结合实际使用场景，训练集中wait场景均为多轮对话。\n- 训练数据中未使用通用数据集进行配比训练，所以通用能力可能会有下降。如果需要通用能力请在当前数据集基础上添加通用数据集进行训练，通常做1:1配比即可。\n\n### 模型权重\n[justpluso/turn-detection](https://huggingface.co/justpluso/turn-detection)\n\n国内访问huggingface遇到网络问题时，可以设置\n```\n# For Linux or MacOS\nexport HF_ENDPOINT=https://hf-mirror.com\n```\n或\n```\n# For Windows PowerShell\n$env:HF_ENDPOINT = \"https://hf-mirror.com\"\n```\n\n### 使用vLLM部署\n```bash\n# 启动HTTP API服务\nvllm serve gemma3-270m-full-turn-detection --served-model-name=gemma3 --port 8000 --enable-prefix-caching --gpu-memory-utilization 0.8\n\n# 调用API\ncurl -X POST http://localhost:8000/v1 \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"audio_data\": \"base64_encoded_audio\"}'\n```\n也兼容openAI库。\n\n### 使用\n#### transformers库调用\n```python\nfrom inference import TurnDetector\n\n# 初始化检测器\ndetector = TurnDetector(\n    model_path=\"gemma3-270m-full-turn-detection\",  # 模型路径\n    device=\"auto\"  # 自动选择设备，也可以指定\"cpu\"或\"cuda\"\n)\n\n# 方式1: 字符串格式输入\nconversation_str = \"\"\"user: 我们来成语接龙吧\nassistant: 杞人忧天\nuser: 天天向上\"\"\"\n\nresult = detector.detect(conversation_str)\nprint(f\"检测结果: {result}\")  # 0-完整, 1-不完整, 2-要求等待\n\n# 方式2: 消息列表格式输入\nconversation_msgs = [\n    {\"role\": \"user\", \"content\": \"我们来成语接龙吧\"},\n    {\"role\": \"assistant\", \"content\": \"杞人忧天\"},\n    {\"role\": \"user\", \"content\": \"天天向上\"}\n]\n\nresult = detector.detect(conversation_msgs)\nprint(f\"检测结果: {result}\")\n\n# 方式3: 获取详细结果\ndetailed_result = detector.detect_with_explanation(conversation_str)\nprint(f\"状态码: {detailed_result['status_code']}\")\nprint(f\"状态名: {detailed_result['status_name']}\")\nprint(f\"说明: {detailed_result['description']}\")\n\n# 方式4: 批量检测\nconversations = [\n    \"user: 我想要...\",\n    \"user: 停\",\n    \"user: 今天天气很好\"\n]\n\nresults = detector.detect_batch(conversations)\nprint(f\"批量检测结果: {results}\")  # [1, 2, 0]\n```\n\n#### 命令行使用\n```bash\n# 交互式模式\npython inference.py --interactive\n\n# 单次检测\npython inference.py --input \"user: 我想要...\"\n\n# 批量检测\npython inference.py --input_file conversations.json --output_file results.json\n\n# 指定设备和参数\npython inference.py --device cpu --temperature 0.1 --interactive\n\n# 演示示例\npython inference.py\n```\n\n\n#### API服务部署\n```bash\n# 启动HTTP API服务\nvllm serve gemma3-270m-full-turn-detection --gpu-memory-utilization 0.8 --enable-prefix-caching --served-model-name=gemma3-turn-detection --port 8080 \n\n# 调用API\ncurl -X POST \"http://localhost:8080/v1/chat/completions\" \\\n  -H \"Content-Type: application/json\" \\\n  -H \"Authorization: Bearer sk-xx\" \\\n  -d '{\n    \"model\": \"gemma3-turn-detection\",\n    \"temperature\": 1.0,\n    \"top_p\": 0.95,\n    \"top_k\": 64,\n    \"messages\": [\n      {\n        \"role\": \"system\",\n        \"content\": \"你是一个专门分析对话状态的AI助手。请根据对话历史，判断用户最后说的话属于以下哪种状态：\\n\\n**状态定义：**\\n- 0 (不完整)：用户的话语表达完整，意思清晰明确，不需要继续补充\\n- 1 (完整)：用户的话语未说完，存在停顿、犹豫或明显的未完成表达\\n- 2 (要求等待)：用户明确表示要打断或暂停AI的回复，要求AI停止说话或等待\\n\\n**判断标准：**\\n\\n**不完整(0)的特征：**\\n- 句子突然中断，没有完整表达意思\\n- 包含停顿词：如\"嗯\"、\"那个\"、\"就是\"、\"呃\"等\\n- 语句结构不完整，明显还有后续内容\\n- 例如：\"我想要...\"、\"关于这个问题，嗯...\"、\"山字怎么\"\\n\\n**完整(1)的特征：**\\n- 句子结构完整，语法正确\\n- 表达了清晰的意图或完整的信息\\n- 没有明显的停顿词或未完成标记\\n- 例如：\"我想去北京旅游\"、\"今天天气很好\"、\"谢谢你的帮助\"\\n\\n**要求等待(2)的特征：**\\n- 明确的打断指令：如\"停\"、\"等等\"、\"暂停\"、\"闭嘴\"\\n- 礼貌的暂停请求：如\"稍等\"、\"等一下\"、\"先别说\"\\n- 表达需要时间思考：如\"让我想想\"、\"我需要安静\"\\n- 表达不耐烦：如\"够了\"、\"太多了\"、\"别说了\"\\n- 英文打断：如\"Stop\"、\"Wait\"、\"Hold on\"、\"Shut up\"、\"Enough\"\\n\\n\\n**输出格式：**\\n你只能输出[0、1、2]中的其中一个数字，不要输出其他的。\"\n      },\n      {\n        \"role\": \"user\", \n        \"content\": \"请分析以下对话中用户最后说的话：\\nuser: 我们来成语接龙吧\\nassistant: 杞人忧天\\nuser: 停\"\n      }\n    ]\n  }'\n```\n\n## More\n- 可以基于提供的训练脚本新增其他语种的语料进行继续微调。每个语种在200条数据即可达到比较好的效果\n- 模型可以量化以进一步降低资源占用，提升推理效率。\n\n\n## 致谢\n- [Unsloth](https://unsloth.ai/): 优秀的微调框架\n- [Gemma3](https://deepmind.google/models/gemma/gemma-3/): 优秀的开源模型权重\n- [ten-turn-detection](https://github.com/TEN-framework/ten-turn-detection): 参考了其wait数据集，并对比了其模型效果\n\n---\n\n## License\n[This project is Apache 2.0 licensed with certain conditions.](LICENSE)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjustplus%2Fturn-detection","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjustplus%2Fturn-detection","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjustplus%2Fturn-detection/lists"}