https://github.com/FreedomIntelligence/PlatoLM

A trainable user simulator
https://github.com/FreedomIntelligence/PlatoLM

Last synced: about 1 year ago
JSON representation

A trainable user simulator

Host: GitHub
URL: https://github.com/FreedomIntelligence/PlatoLM
Owner: FreedomIntelligence
License: apache-2.0
Created: 2023-08-21T05:52:38.000Z (almost 3 years ago)
Default Branch: main
Last Pushed: 2024-09-14T07:56:07.000Z (over 1 year ago)
Last Synced: 2025-04-30T19:49:28.170Z (about 1 year ago)
Language: Python
Size: 848 KB
Stars: 34
Watchers: 8
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

StarryDivineSky - FreedomIntelligence/PlatoLM - 7B应用于该方法，产生了名为Socratic的新型用户模拟器。Socratic与gpt-3.5-turbo的迭代交互产生了名为SocraticChat的多轮对话数据集。利用该数据集对LLAMA-7B-2进行微调，得到了PlatoLM模型，该模型表现出优异的性能。PlatoLM仅使用从gpt-3.5中提取的少量样本（50.7K）、较短的上下文长度（2048）和较小的模型规模（7B），在Alpaca-Eval基准测试中甚至超过了GPT 3.5。该项目的主要创新在于将“翻转棋盘”的理念应用于用户模拟器的训练，通过遮蔽真实用户的提问并仅计算其损失来修改学习目标。此外，该项目还使用了一个二元提示模板来指导模型。实验表明，在动态多轮对话中，更像人类的提问模式比静态角色扮演更能有效地训练响应模型。 (A01_文本生成_文本对话 / 大语言对话模型及数据)

README

# PlatoLM: Teaching LLMs in Multi-Round Dialogue via a User Simulator
# ✨ Latest News
- [09/08/2024]: All the experimental data is public on [Google Drive](https://drive.google.com/file/d/1wqRqJlx_J4I17Xy8gQwpAG7aILfiyTw2/view?usp=sharing).
- [05/15/2024]: We are accepted by [ACL-2024](https://2024.aclweb.org/program/main_conference_papers/), you can find our final version in [arxiv](https://arxiv.org/abs/2308.11534v6).
- [01/16/2024]: We are rejected by [ICLR-2024](https://openreview.net/forum?id=9nddtu94uX) with scores 8666（ranked top 13%-16%）.
- [10/12/2023]: Upload the dataset `SocraticChat` in [hugging face](https://huggingface.co/datasets/FreedomIntelligence/SocraticChat).
- [10/10/2023]: Update the [tech report v4](https://arxiv.org/abs/2308.11534v4).
- [10/08/2023]: The user simulator `UserGPT`, dataset `RealChat` and the respondent model `ReaLM` are renamed to `Socratic`, `SocraticChat`, and `PlatoLM` by Benyou Wang, the provider of 4 x A100s.
- [08/21/2023]: PlatoLM-7b Rank #1 on [AlpacaEval benchmark](https://tatsu-lab.github.io/alpaca_eval/) among 7B scale, achieving 81.94% win rates against text-davinci-003 (has entered into the official benchmark).
- [08/21/2023]: PlatoLM-7b Rank #1 on [MT-Bench benchmark](https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard) among 7B scale (hasn't entered into the official benchmark yet).
- [08/21/2023]: Release the [model weights](https://huggingface.co/FreedomIntelligence/PlatoLM-7b/tree/main).
- [08/21/2023]: Release the [tech report v1](https://arxiv.org/abs/2308.11534).

# ⚡ Introduction

Welcome to our realm🤗

We propose a new paradigm for training a user simulator.

After applying this paradigm to ShareGPT and LLaMA-7B, a novel user simulator, `Socratic`, emerged. Through iterative interactions between Socratic and gpt-3.5-turbo, a multi-round conversation dataset named `SocraticChat` was generated. Leveraging this dataset for fine-tuning LLAMA-7B-2 resulted in the `PlatoLM` model, which exhibits superior performance.

With fewer samples(50.7K) distilled from gpt-3.5, shorter context length(2048), and smaller model scale(7B), we even beat GPT 3.5 in Alpaca-Eval benchmark.

cool

# 📖 Methodology

The key to our idea is to `flip the chessboard`.

We just `mask the questions of real users` and accordingly, only `calculate their loss` for the purpose of `modifying the learning objective`.
In addition, we use `a dyadic prompt template` to instruct our backbone.

The main difference between us and other research is shown below.
![pipeline](https://github.com/FreedomIntelligence/PlatoLM/assets/73695787/ecd6156e-4125-4e3b-93a3-b9955cb740ce)

The pipeline can be analogous to `Socratic teaching`, which means teaching students via questioning. We argue that after learning the real human's high-quality instructions based on the knowledgeable llama backbone, more human-like LLMs will master the sophisticated teaching ability.
Therefore, we named the query model `Socratic`, which means the follower of Socrates. Likewise, we labeled the dataset as `SocraticChat`, and the resulting model was dubbed `PlatoLM`.

analogy

Experiments show that a more human-like questioning pattern in dynamic multi-round conversations can teach the response model better compared to static role-playing, which can be attributed to `the natural and rich topic structures of the questioning pattern from humans` in human-machine dialogue where they `hold topic dominance`.

# 📄 Case Study

`The typical samples` for Socratic Dialogues and our dataset SocraticChat are shown below.
![sample2](https://github.com/FreedomIntelligence/PlatoLM/assets/73695787/22e3754d-a28c-4cf3-a7fb-517afa6ec41a)

# 🚀 Training

```shell
# To fine-tune Socratic
cd model/sft_socratic
bash scripts/sft_7b.sh

# To fine-tune PlatoLM
cd model/sft_platolm
bash scripts/sft_7b.sh
```

# 🧐 Inferencing

```shell
# To infer PlatoLM
python -m model.sft_platolm.source.deploy.cli --model FreedomIntelligence/PlatoLM-7b

# To infer Socratic
# The model's weights of Socratic has not been published yet.
python -m model.sft_socratic.source.deploy.cli --model balabala
```

# 🎉 Acknowledgement

We are aware that our works are inspired by the following works, including but not limited to

- LLaMA: https://huggingface.co/meta-llama
- Self-instruct: https://github.com/yizhongw/self-instruct
- LLMZoo: https://github.com/FreedomIntelligence/LLMZoo

Without these, nothing could happen in this repository.

# 💭 Citation

```
@inproceedings{kong2024platolm,
title={PlatoLM: Teaching LLMs in Multi-Round Dialogue via a User Simulator},
author={Kong, Chuyi and Fan, Yaxin and Wan, Xiang and Jiang, Feng and Wang, Benyou},
booktitle={Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
pages={7841--7863},
year={2024}
}
```

We are from the School of Data Science, the Chinese University of Hong Kong, Shenzhen (CUHKSZ), and the Shenzhen Research Institute of Big Data (SRIBD).

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/FreedomIntelligence/PlatoLM

Awesome Lists containing this project

README