https://github.com/FreedomIntelligence/PlatoLM
A trainable user simulator
https://github.com/FreedomIntelligence/PlatoLM
Last synced: 12 months ago
JSON representation
A trainable user simulator
- Host: GitHub
- URL: https://github.com/FreedomIntelligence/PlatoLM
- Owner: FreedomIntelligence
- License: apache-2.0
- Created: 2023-08-21T05:52:38.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2024-09-14T07:56:07.000Z (over 1 year ago)
- Last Synced: 2025-04-30T19:49:28.170Z (about 1 year ago)
- Language: Python
- Size: 848 KB
- Stars: 34
- Watchers: 8
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- StarryDivineSky - FreedomIntelligence/PlatoLM - 7B应用于该方法,产生了名为Socratic的新型用户模拟器。Socratic与gpt-3.5-turbo的迭代交互产生了名为SocraticChat的多轮对话数据集。利用该数据集对LLAMA-7B-2进行微调,得到了PlatoLM模型,该模型表现出优异的性能。PlatoLM仅使用从gpt-3.5中提取的少量样本(50.7K)、较短的上下文长度(2048)和较小的模型规模(7B),在Alpaca-Eval基准测试中甚至超过了GPT 3.5。该项目的主要创新在于将“翻转棋盘”的理念应用于用户模拟器的训练,通过遮蔽真实用户的提问并仅计算其损失来修改学习目标。此外,该项目还使用了一个二元提示模板来指导模型。实验表明,在动态多轮对话中,更像人类的提问模式比静态角色扮演更能有效地训练响应模型。 (A01_文本生成_文本对话 / 大语言对话模型及数据)
README
# PlatoLM: Teaching LLMs in Multi-Round Dialogue via a User Simulator
# ✨ Latest News
- [09/08/2024]: All the experimental data is public on [Google Drive](https://drive.google.com/file/d/1wqRqJlx_J4I17Xy8gQwpAG7aILfiyTw2/view?usp=sharing).
- [05/15/2024]: We are accepted by [ACL-2024](https://2024.aclweb.org/program/main_conference_papers/), you can find our final version in [arxiv](https://arxiv.org/abs/2308.11534v6).
- [01/16/2024]: We are rejected by [ICLR-2024](https://openreview.net/forum?id=9nddtu94uX) with scores 8666(ranked top 13%-16%).
- [10/12/2023]: Upload the dataset `SocraticChat` in [hugging face](https://huggingface.co/datasets/FreedomIntelligence/SocraticChat).
- [10/10/2023]: Update the [tech report v4](https://arxiv.org/abs/2308.11534v4).
- [10/08/2023]: The user simulator `UserGPT`, dataset `RealChat` and the respondent model `ReaLM` are renamed to `Socratic`, `SocraticChat`, and `PlatoLM` by Benyou Wang, the provider of 4 x A100s.
- [08/21/2023]: PlatoLM-7b Rank #1 on [AlpacaEval benchmark](https://tatsu-lab.github.io/alpaca_eval/) among 7B scale, achieving 81.94% win rates against text-davinci-003 (has entered into the official benchmark).
- [08/21/2023]: PlatoLM-7b Rank #1 on [MT-Bench benchmark](https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard) among 7B scale (hasn't entered into the official benchmark yet).
- [08/21/2023]: Release the [model weights](https://huggingface.co/FreedomIntelligence/PlatoLM-7b/tree/main).
- [08/21/2023]: Release the [tech report v1](https://arxiv.org/abs/2308.11534).
# ⚡ Introduction
Welcome to our realm🤗
We propose a new paradigm for training a user simulator.
After applying this paradigm to ShareGPT and LLaMA-7B, a novel user simulator, `Socratic`, emerged. Through iterative interactions between Socratic and gpt-3.5-turbo, a multi-round conversation dataset named `SocraticChat` was generated. Leveraging this dataset for fine-tuning LLAMA-7B-2 resulted in the `PlatoLM` model, which exhibits superior performance.
With fewer samples(50.7K) distilled from gpt-3.5, shorter context length(2048), and smaller model scale(7B), we even beat GPT 3.5 in Alpaca-Eval benchmark.


# 📖 Methodology
The key to our idea is to `flip the chessboard`.
We just `mask the questions of real users` and accordingly, only `calculate their loss` for the purpose of `modifying the learning objective`.
In addition, we use `a dyadic prompt template` to instruct our backbone.
The main difference between us and other research is shown below.

The pipeline can be analogous to `Socratic teaching`, which means teaching students via questioning. We argue that after learning the real human's high-quality instructions based on the knowledgeable llama backbone, more human-like LLMs will master the sophisticated teaching ability.
Therefore, we named the query model `Socratic`, which means the follower of Socrates. Likewise, we labeled the dataset as `SocraticChat`, and the resulting model was dubbed `PlatoLM`.

Experiments show that a more human-like questioning pattern in dynamic multi-round conversations can teach the response model better compared to static role-playing, which can be attributed to `the natural and rich topic structures of the questioning pattern from humans` in human-machine dialogue where they `hold topic dominance`.
# 📄 Case Study
`The typical samples` for Socratic Dialogues and our dataset SocraticChat are shown below.

# 🚀 Training
```shell
# To fine-tune Socratic
cd model/sft_socratic
bash scripts/sft_7b.sh
# To fine-tune PlatoLM
cd model/sft_platolm
bash scripts/sft_7b.sh
```
# 🧐 Inferencing
```shell
# To infer PlatoLM
python -m model.sft_platolm.source.deploy.cli --model FreedomIntelligence/PlatoLM-7b
# To infer Socratic
# The model's weights of Socratic has not been published yet.
python -m model.sft_socratic.source.deploy.cli --model balabala
```
# 🎉 Acknowledgement
We are aware that our works are inspired by the following works, including but not limited to
- LLaMA: https://huggingface.co/meta-llama
- Self-instruct: https://github.com/yizhongw/self-instruct
- LLMZoo: https://github.com/FreedomIntelligence/LLMZoo
Without these, nothing could happen in this repository.
# 💭 Citation
```
@inproceedings{kong2024platolm,
title={PlatoLM: Teaching LLMs in Multi-Round Dialogue via a User Simulator},
author={Kong, Chuyi and Fan, Yaxin and Wan, Xiang and Jiang, Feng and Wang, Benyou},
booktitle={Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
pages={7841--7863},
year={2024}
}
```
We are from the School of Data Science, the Chinese University of Hong Kong, Shenzhen (CUHKSZ), and the Shenzhen Research Institute of Big Data (SRIBD).