{"id":15628097,"url":"https://github.com/shibing624/dialogbot","last_synced_at":"2025-04-04T14:05:55.084Z","repository":{"id":44726133,"uuid":"125800164","full_name":"shibing624/dialogbot","owner":"shibing624","description":"dialogbot, provide search-based dialogue, task-based dialogue and generative dialogue model. 对话机器人，基于问答型对话、任务型对话、聊天型对话等模型实现，支持网络检索问答，领域知识问答，任务引导问答，闲聊问答，开箱即用。","archived":false,"fork":false,"pushed_at":"2024-04-23T02:51:22.000Z","size":40172,"stargazers_count":331,"open_issues_count":4,"forks_count":61,"subscribers_count":5,"default_branch":"master","last_synced_at":"2025-03-28T13:07:06.493Z","etag":null,"topics":["chatbot","deep-learning","dialog","dialogbot","nlp","qa","question-answering"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/shibing624.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-03-19T04:10:41.000Z","updated_at":"2025-03-17T16:06:46.000Z","dependencies_parsed_at":"2024-01-02T13:58:40.800Z","dependency_job_id":"dfa01ccc-d7b0-4577-9f53-c4369ab6ef28","html_url":"https://github.com/shibing624/dialogbot","commit_stats":{"total_commits":130,"total_committers":4,"mean_commits":32.5,"dds":0.523076923076923,"last_synced_commit":"049938aa30c318c0377581537e13ecfecd4fab96"},"previous_names":[],"tags_count":6,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shibing624%2Fdialogbot","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shibing624%2Fdialogbot/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shibing624%2Fdialogbot/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shibing624%2Fdialogbot/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/shibing624","download_url":"https://codeload.github.com/shibing624/dialogbot/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247190231,"owners_count":20898700,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["chatbot","deep-learning","dialog","dialogbot","nlp","qa","question-answering"],"created_at":"2024-10-03T10:20:50.849Z","updated_at":"2025-04-04T14:05:55.067Z","avatar_url":"https://github.com/shibing624.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"![alt text](docs/public/dialogbot.jpg)\n\n[![PyPI version](https://badge.fury.io/py/dialogbot.svg)](https://badge.fury.io/py/dialogbot)\n[![Downloads](https://static.pepy.tech/badge/dialogbot)](https://pepy.tech/project/dialogbot)\n[![Contributions welcome](https://img.shields.io/badge/contributions-welcome-brightgreen.svg)](CONTRIBUTING.md)\n[![GitHub contributors](https://img.shields.io/github/contributors/shibing624/dialogbot.svg)](https://github.com/shibing624/dialogbot/graphs/contributors)\n[![License Apache 2.0](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](LICENSE)\n[![python_vesion](https://img.shields.io/badge/Python-3.7%2B-green.svg)](requirements.txt)\n[![GitHub issues](https://img.shields.io/github/issues/shibing624/dialogbot.svg)](https://github.com/shibing624/dialogbot/issues)\n[![Wechat Group](https://img.shields.io/badge/wechat-group-green.svg?logo=wechat)](#Contact)\n\n# DialogBot\nDialogbot, provide complete dialogue model technology. Combining **search-based dialogue model**, **task-based dialogue model** and **generative dialogue model**, output the optimal dialogue response.\n\n**dialogbot**实现了问答型对话，任务型对话，聊天型对话等多种对话机器人方案，支持网络检索问答，领域知识问答，任务引导问答，闲聊问答，开箱即用。\n\n\n\n**Guide**\n\n- [Question](#Question)\n- [Solution](#Solution)\n- [Feature](#Feature)\n- [Install](#install)\n- [Usage](#usage)\n- [Dataset](#Dataset)\n- [Contact](#Contact)\n- [Reference](#reference)\n\n# Question\n\n人机对话系统一直是AI的重要方向，图灵测试以对话检测机器是否拥有高度的智能。\n\n如何构建人机对话系统或者对话机器人呢？\n\n\n# Solution\n\n对话系统经过三代的演变：\n\n1. 规则对话系统：垂直领域可以利用模板匹配方法的匹配问句和相应的答案。优点是内部逻辑透明，易于分析调试，缺点是高度依赖专家干预，\n缺少灵活性和可可拓展性。\n2. 统计对话系统：基于部分可见马尔科夫决策过程的统计对话系统，先对问句进行贝叶斯推断，维护每轮对话状态，再跟进对话状态进行对话策略的选择，\n从而生成自然语言回复。基本形成现代的对话系统框架，它避免了对专家的高度依赖，缺点是模型难以维护，可拓展性比较受限。\n3. 深度对话系统：基本延续了统计对话系统的框架，但各个模型采用深度网络模型。利用了深度模型强大的表征能力，语言分类和生成能力大幅提高，\n缺点是需要大量标注数据才能有效训练模型。\n\n对话系统分为三类：\n\n- 问答型对话：多是一问一答，用户提问，系统通过对问题解析和查找知识库返回正确答案，如搜索。\n- 任务型对话：指由任务驱动的多轮对话，机器需要通过理解、主动询问、澄清等方式确定用户目标，然后查找知识库返回结果，完成用户需求。\n如：机器人售电影票。\n- 聊天型对话：目标是产生有趣且富有信息量的自然答复使人机对话持续下去，如小度音响。\n\n\n# Feature\n\n### 问答型对话（Search Dialogue Bot）\n\n#### 本地检索问答\n\n计算用户问句与问答库中问句的相似度，选择最相似的问句，给出其对应的答复。\n\n句子相似度计算包括以下方法：\n\n- TFIDF\n- BM25\n- OneHot\n- Query Vector\n\n#### 网络检索问答\n\n对百度、Bing的搜索结果摘要进行答案的检索\n- 百度搜索，包括百度知识图谱、百度诗词、百度万年历、百度计算器、百度知道\n- 微软Bing搜索，包括bing知识图谱、bing网典\n\n\n### 任务型对话（Task Oriented Dialogue Bot）\n\n- End to End Memory Networks(memn2n)\n- BABi dataset\n\n### 聊天型对话（Generative Dialogue Bot）\n\n- GPT2 Model\n- Sequence To Sequence Model(seq2seq)\n- Taobao dataset\n\n\n# Demo\nOfficial Demo: https://www.mulanai.com/product/dialogbot/\n\n\u003cimg src=\"docs/public/jietu.png\" width=\"400\" /\u003e\n\n# Install\n\nThe project is based on transformers 4.4.2+, torch 1.6.0+ and Python 3.6+.\nThen, simply do:\n\n```\npip3 install torch # conda install pytorch\npip3 install -U dialogbot\n```\n\nor\n\n```\npip3 install torch # conda install pytorch\ngit clone https://github.com/shibing624/dialogbot.git\ncd dialogbot\npython3 setup.py install\n```\n\n# Usage\n## 问答型对话（Search Bot）\n\nexample: [examples/bot_demo.py](examples/bot_demo.py)\n\n```python\nfrom dialogbot import Bot\n\nbot = Bot()\nresponse = bot.answer('姚明多高呀？')\nprint(response)\n```\n\noutput:\n\n```\nquery: \"姚明多高呀？\"\nanswer: \"226cm\"\n```\n\n## 任务型对话（Task Bot）\n\nexample: [examples/taskbot_demo.py](examples/taskbot_demo.py)\n\n\n\n\n## 聊天型对话（Generative Bot）\n\n### GPT2模型使用\n基于GPT2生成模型训练的聊天型对话模型。\n\n模型已经 release 到huggingface models：[shibing624/gpt2-dialogbot-base-chinese](https://huggingface.co/shibing624/gpt2-dialogbot-base-chinese) \n\nexample: [examples/genbot_demo.py](examples/genbot_demo.py)\n\n\n```python\nfrom dialogbot import GPTBot\nbot = GPTBot()\nr = bot.answer('亲 你吃了吗？', use_history=False)\nprint('gpt2', r)\n```\n\noutput:\n\n```\nquery: \"亲 吃了吗？\"\nanswer: \"吃了\"\n```\n\n\n### GPT2模型fine-tune\n\n#### 数据预处理\n在项目根目录下创建data文件夹，将原始训练语料命名为train.txt，存放在该目录下。train.txt的格式如下，每段闲聊之间间隔一行，格式如下：\n```\n真想找你一起去看电影\n突然很想你\n我也很想你\n\n想看你的美照\n亲我一口就给你看\n我亲两口\n讨厌人家拿小拳拳捶你胸口\n\n今天好点了吗？\n一天比一天严重\n吃药不管用，去打一针。别拖着\n```\n运行preprocess.py，对data/train.txt对话语料进行tokenize，然后进行序列化保存到data/train.pkl。train.pkl中序列化的对象的类型为List[List],记录对话列表中,每个对话包含的token。\n```\ncd dialogbot/gpt/\npython preprocess.py --train_path data/train.txt --save_path data/train.pkl\n```\n\n\n#### 训练模型\n运行train.py,使用预处理后的数据，对模型进行自回归训练，模型保存在根目录下的model文件夹中。\n\n在训练时，可以通过指定patience参数进行early stop。当patience=n时，若连续n个epoch，模型在验证集上的loss均没有下降，则进行early stop，停止训练。当patience=0时，不进行early stop。\n\n代码中默认关闭了early stop，因为在实践中，early stop得到的模型的生成效果不一定会更好。\n```\npython train.py --epochs 40 --batch_size 8 --device 0,1 --train_path data/train.pkl\n```\n更多的训练参数介绍，可直接看train.py中的set_args()函数中的参数说明\n\n#### 预测模型（人机交互）\n运行interact.py，使用训练好的模型，进行人机交互，输入q结束对话之后，聊天记录将保存到sample.txt文件中。\n```\npython interact.py --no_cuda --model_dir path_to_your_model\n```\n执行interact.py时，可以尝试通过调整topk、topp、repetition_penalty、max_history_len等参数，调整生成的效果。更多的参数介绍，可直接看interact.py的set_args()函数中的参数说明\n如果要使用GPU进行生成，则不要调用--no_cuda参数，并且通过--device gpu_id来指定使用哪块GPU。\n\n\n##### 问答示例\n- sample1\n```\n  user :在干嘛\nchatbot:睡觉\n  user :这么早就睡觉啦\nchatbot:嗯啊你也早点睡啊\n  user :我再玩一会儿手机\nchatbot:早点休息晚安\n  user :拜拜啦\nchatbot:晚安晚安\n  user :多喝热水\nchatbot:哈哈哈\n  user :多喝热水\nchatbot:好的咯\n```\n\n- sample2\n```\n  user :在吗\nchatbot:在\n  user :我不开心\nchatbot:怎么啦\n  user :最近好累啊\nchatbot:是挺累的\n  user :作业好多啊\nchatbot:还好还好\n  user :我觉得我快要挂科了\nchatbot:我也是啊，但是我觉得我快熬出头了\n  user :不说了，我还有好多作业没写\nchatbot:我也是啊好多啊\n  user :那先这样，下次再聊，再见了\nchatbot:好哒\n```\n\n# Dataset\n\n### 闲聊语料分享\n|中文闲聊语料 | 数据集地址 |语料描述|\n|---------|--------|--------|\n|常见中文闲聊|[chinese_chatbot_corpus](https://github.com/codemayq/chinese_chatbot_corpus)|包含小黄鸡语料、豆瓣语料、电视剧对白语料、贴吧论坛回帖语料、微博语料、PTT八卦语料、青云语料等|\n|50w中文闲聊语料 | [百度网盘【提取码:4g5e】](https://pan.baidu.com/s/1M87Zf9e8iBqqmfTkKBWBWA) 或 [GoogleDrive](https://drive.google.com/drive/folders/1QFRsftLNTR_D3T55mS_FocPEZI7khdST?usp=sharing) |包含50w个多轮对话的原始语料、预处理数据|\n|100w中文闲聊语料 | [百度网盘【提取码:s908】](https://pan.baidu.com/s/1TvCQgJWuOoK2f5D95nH3xg) 或 [GoogleDrive](https://drive.google.com/drive/folders/1NU4KLDRxdOGINwxoHGWfVOfP0wL05gyj?usp=sharing)|包含100w个多轮对话的原始语料、预处理数据|\n\n\n中文闲聊语料的内容样例如下:\n```\n谢谢你所做的一切\n你开心就好\n开心\n嗯因为你的心里只有学习\n某某某，还有你\n这个某某某用的好\n\n你们宿舍都是这么厉害的人吗\n眼睛特别搞笑这土也不好捏但就是觉得挺可爱\n特别可爱啊\n\n今天好点了吗？\n一天比一天严重\n吃药不管用，去打一针。别拖着\n```\n\n### 模型分享\n\n|模型 | 共享地址 |模型描述|\n|---------|--------|--------|\n|model_epoch40_50w | [shibing624/gpt2-dialogbot-base-chinese](https://huggingface.co/shibing624/gpt2-dialogbot-base-chinese) 或 [百度网盘(提取码:taqh)](https://pan.baidu.com/s/1Ptuzq-4b_Mqxci464YHnRg?pwd=taqh) 或 [GoogleDrive](https://drive.google.com/drive/folders/18TG2sKkHOZz8YlP5t1Qo_NqnGx9ogNay?usp=sharing) |使用50w多轮对话语料训练了40个epoch，loss降到2.0左右。|\n\n\n# Contact\n\n- Issue(建议)：[![GitHub issues](https://img.shields.io/github/issues/shibing624/dialogbot.svg)](https://github.com/shibing624/dialogbot/issues)\n- 邮件我：xuming: xuming624@qq.com\n- 微信我：加我*微信号：xuming624*, 进Python-NLP交流群，备注：*姓名-公司名-NLP*\n\n\u003cimg src=\"docs/public/wechat.jpeg\" width=\"200\" /\u003e\n\n\n# Citation\n\n如果你在研究中使用了dialogbot，请按如下格式引用：\n\n```latex\n@misc{dialogbot,\n  title={dialogbot: Dialogue Model Technology Tool},\n  author={Xu Ming},\n  year={2021},\n  howpublished={\\url{https://github.com/shibing624/dialogbot}},\n}\n```\n\n# License\n\n\n授权协议为 [The Apache License 2.0](/LICENSE)，可免费用做商业用途。请在产品说明中附加dialogbot的链接和授权协议。\n\n\n# Contribute\n项目代码还很粗糙，如果大家对代码有所改进，欢迎提交回本项目，在提交之前，注意以下两点：\n\n - 在`tests`添加相应的单元测试\n - 使用`python -m pytest`来运行所有单元测试，确保所有单测都是通过的\n\n之后即可提交PR。\n\n\n# Reference\n\n- Wen T H, Vandyke D, Mrksic N, et al. A Network-based End-to-End Trainable Task-oriented Dialogue System[J]. 2016.\n- How NOT To Evaluate Your Dialogue System: An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation\n- A. Bordes, Y. Boureau, J. Weston. Learning End-to-End Goal-Oriented Dialog 2016\n- Zhao T, Eskenazi M. Towards End-to-End Learning for Dialog State Tracking and Management using Deep Reinforcement Learning [J]. arXiv preprint arXiv:1606.02560, 2016.\n- Kulkarni T D, Narasimhan K R, Saeedi A, et al. Hierarchical deep reinforcement learning: Integrating temporal abstraction and intrinsic motivation [J]. arXiv preprint arXiv:1604.06057, 2016.\n- BBQ-Networks: Efficient Exploration in Deep Reinforcement Learning for Task-Oriented Dialogue Systems\n- Deep Reinforcement Learning with Double Q-Learning\n- Deep Attention Recurrent Q-Network\n- SimpleDS: A Simple Deep Reinforcement Learning Dialogue System\n- Deep Reinforcement Learning with a Natural Language Action Space\n- Integrating User and Agent Models: A Deep Task-Oriented Dialogue System\n- [The Curious Case of Neural Text Degeneration](https://arxiv.xilesou.top/pdf/1904.09751.pdf)\n- [DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation](https://arxiv.xilesou.top/pdf/1911.00536.pdf)\n- [vyraun/chatbot-MemN2N-tensorflow](https://github.com/vyraun/chatbot-MemN2N-tensorflow)\n- [huggingface/transformers](https://github.com/huggingface/transformers)\n- [Morizeyao/GPT2-Chinese](https://github.com/Morizeyao/GPT2-Chinese)\n- [yangjianxin1/GPT2-chitchat](https://github.com/yangjianxin1/GPT2-chitchat)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fshibing624%2Fdialogbot","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fshibing624%2Fdialogbot","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fshibing624%2Fdialogbot/lists"}