{"id":20298109,"url":"https://github.com/FreedomIntelligence/PlatoLM","last_synced_at":"2025-05-07T20:34:19.510Z","repository":{"id":190094119,"uuid":"681032134","full_name":"FreedomIntelligence/PlatoLM","owner":"FreedomIntelligence","description":"A trainable user simulator","archived":false,"fork":false,"pushed_at":"2024-09-14T07:56:07.000Z","size":868,"stargazers_count":34,"open_issues_count":0,"forks_count":0,"subscribers_count":8,"default_branch":"main","last_synced_at":"2025-04-30T19:49:28.170Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/FreedomIntelligence.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2023-08-21T05:52:38.000Z","updated_at":"2025-02-17T09:35:03.000Z","dependencies_parsed_at":"2023-08-23T06:56:44.810Z","dependency_job_id":"7467e7e9-2322-40c3-83d5-004503790a1d","html_url":"https://github.com/FreedomIntelligence/PlatoLM","commit_stats":null,"previous_names":["freedomintelligence/realm"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FreedomIntelligence%2FPlatoLM","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FreedomIntelligence%2FPlatoLM/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FreedomIntelligence%2FPlatoLM/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FreedomIntelligence%2FPlatoLM/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/FreedomIntelligence","download_url":"https://codeload.github.com/FreedomIntelligence/PlatoLM/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252953717,"owners_count":21830890,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-14T16:02:11.001Z","updated_at":"2025-05-07T20:34:19.463Z","avatar_url":"https://github.com/FreedomIntelligence.png","language":"Python","readme":"# PlatoLM: Teaching LLMs in Multi-Round Dialogue via a User Simulator\n# ✨ Latest News\n- [09/08/2024]: All the experimental data is public on [Google Drive](https://drive.google.com/file/d/1wqRqJlx_J4I17Xy8gQwpAG7aILfiyTw2/view?usp=sharing). \n- [05/15/2024]: We are accepted by [ACL-2024](https://2024.aclweb.org/program/main_conference_papers/), you can find our final version in [arxiv](https://arxiv.org/abs/2308.11534v6).\n- [01/16/2024]: We are rejected by [ICLR-2024](https://openreview.net/forum?id=9nddtu94uX) with scores 8666（ranked top 13%-16%）.\n- [10/12/2023]: Upload the dataset `SocraticChat` in [hugging face](https://huggingface.co/datasets/FreedomIntelligence/SocraticChat).\n- [10/10/2023]: Update the [tech report v4](https://arxiv.org/abs/2308.11534v4).\n- [10/08/2023]: The user simulator `UserGPT`, dataset `RealChat` and the respondent model `ReaLM` are renamed to `Socratic`, `SocraticChat`, and `PlatoLM` by Benyou Wang, the provider of 4 x A100s.\n- [08/21/2023]: PlatoLM-7b Rank #1 on [AlpacaEval benchmark](https://tatsu-lab.github.io/alpaca_eval/) among 7B scale, achieving 81.94% win rates against text-davinci-003 (has entered into the official benchmark).\n- [08/21/2023]: PlatoLM-7b Rank #1 on [MT-Bench benchmark](https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard) among 7B scale (hasn't entered into the official benchmark yet).\n- [08/21/2023]: Release the [model weights](https://huggingface.co/FreedomIntelligence/PlatoLM-7b/tree/main).\n- [08/21/2023]: Release the [tech report v1](https://arxiv.org/abs/2308.11534).\n\n# ⚡ Introduction\n\nWelcome to our realm🤗\n\nWe propose a new paradigm for training a user simulator.\n\nAfter applying this paradigm to ShareGPT and LLaMA-7B, a novel user simulator, `Socratic`, emerged. Through iterative interactions between Socratic and gpt-3.5-turbo, a multi-round conversation dataset named `SocraticChat` was generated. Leveraging this dataset for fine-tuning LLAMA-7B-2 resulted in the `PlatoLM` model, which exhibits superior performance. \n\nWith fewer samples(50.7K) distilled from gpt-3.5, shorter context length(2048), and smaller model scale(7B), we even beat GPT 3.5 in Alpaca-Eval benchmark.\n\n\u003cimg src=\"https://github.com/FreedomIntelligence/PlatoLM/assets/73695787/b314a609-dfc6-4d6a-9795-3bf492f84c0c.png\" width=\"400\" height=\"150\" alt=\"cool\"\u003e\n\n\n\u003cimg src=\"https://github.com/FreedomIntelligence/PlatoLM/assets/73695787/51141cbc-046a-4a55-b937-254e1155c06b.png\" width=\"300\" height=\"250\" alt=\"cool\"\u003e\n\n\n# 📖 Methodology\n\nThe key to our idea is to `flip the chessboard`.\n\nWe just `mask the questions of real users` and accordingly, only `calculate their loss` for the purpose of `modifying the learning objective`.\nIn addition, we use `a dyadic prompt template` to instruct our backbone.\n\nThe main difference between us and other research is shown below.\n![pipeline](https://github.com/FreedomIntelligence/PlatoLM/assets/73695787/ecd6156e-4125-4e3b-93a3-b9955cb740ce)\n\nThe pipeline can be analogous to `Socratic teaching`, which means teaching students via questioning. We argue that after learning the real human's high-quality instructions based on the knowledgeable llama backbone, more human-like LLMs will master the sophisticated teaching ability.\nTherefore, we named the query model `Socratic`, which means the follower of Socrates.  Likewise, we labeled the dataset as `SocraticChat`, and the resulting model was dubbed `PlatoLM`.\n\n\u003cimg src=\"https://github.com/FreedomIntelligence/PlatoLM/assets/73695787/5c60df0a-93a3-44bd-a6b3-fa4e2e73ad96.png\" width=\"400\" height=\"266\" alt=\"analogy\"\u003e\n\nExperiments show that a more human-like questioning pattern in dynamic multi-round conversations can teach the response model better compared to static role-playing, which can be attributed to `the natural and rich topic structures of the questioning pattern from humans` in human-machine dialogue where they `hold topic dominance`. \n\n# 📄 Case Study\n\n`The typical samples` for Socratic Dialogues and our dataset SocraticChat are shown below.\n![sample2](https://github.com/FreedomIntelligence/PlatoLM/assets/73695787/22e3754d-a28c-4cf3-a7fb-517afa6ec41a)\n\n\n\n# 🚀 Training\n\n```shell\n# To fine-tune Socratic\ncd model/sft_socratic\nbash scripts/sft_7b.sh \n\n# To fine-tune PlatoLM\ncd model/sft_platolm\nbash scripts/sft_7b.sh \n```\n\n# 🧐 Inferencing\n\n```shell\n# To infer PlatoLM\npython -m model.sft_platolm.source.deploy.cli --model FreedomIntelligence/PlatoLM-7b\n\n# To infer Socratic\n# The model's weights of Socratic has not been published yet. \npython -m model.sft_socratic.source.deploy.cli --model balabala\n```\n\n# 🎉 Acknowledgement\n\nWe are aware that our works are inspired by the following works, including but not limited to\n\n- LLaMA: https://huggingface.co/meta-llama\n- Self-instruct: https://github.com/yizhongw/self-instruct\n- LLMZoo: https://github.com/FreedomIntelligence/LLMZoo\n\nWithout these, nothing could happen in this repository.\n\n# 💭 Citation\n\n```\n@inproceedings{kong2024platolm,\n  title={PlatoLM: Teaching LLMs in Multi-Round Dialogue via a User Simulator},\n  author={Kong, Chuyi and Fan, Yaxin and Wan, Xiang and Jiang, Feng and Wang, Benyou},\n  booktitle={Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},\n  pages={7841--7863},\n  year={2024}\n}\n```\n\nWe are from the School of Data Science, the Chinese University of Hong Kong, Shenzhen (CUHKSZ), and the Shenzhen Research Institute of Big Data (SRIBD).\n","funding_links":[],"categories":["A01_文本生成_文本对话"],"sub_categories":["大语言对话模型及数据"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FFreedomIntelligence%2FPlatoLM","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FFreedomIntelligence%2FPlatoLM","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FFreedomIntelligence%2FPlatoLM/lists"}