https://github.com/chen700564/RGB

Last synced: 18 days ago
JSON representation

Host: GitHub
URL: https://github.com/chen700564/RGB
Owner: chen700564
License: other
Created: 2023-09-04T08:12:45.000Z (over 1 year ago)
Default Branch: master
Last Pushed: 2024-05-17T07:48:34.000Z (12 months ago)
Last Synced: 2024-11-08T19:42:46.493Z (6 months ago)
Language: Python
Size: 11.5 MB
Stars: 276
Watchers: 1
Forks: 25
Open Issues: 16
Metadata Files:
- Readme: readme.md
- License: LICENSE.txt

Awesome Lists containing this project

Awesome-LLMs-Datasets - https://github.com/chen700564/RGB
awesome-llm-eval - RGB - 09-04) | (Datasets-or-Benchmark / RAG检索增强生成评估)
StarryDivineSky - chen700564/RGB - Augmented Generation》的实现，主要用于评估大型语言模型在检索增强生成任务中的性能。项目提供了用于评估的数据集，包括英文和中文版本，并细分为原始数据、精炼数据、信息整合数据和反事实鲁棒性数据。精炼数据通过移除错误文档、添加正确文档和修正答案来提高数据质量。项目支持使用ChatGPT及其他模型进行评估，通过设置温度、噪声率和文档数量等参数来控制评估过程。评估指标包括准确率、错误检测率、拒绝率和错误纠正率。该项目使用Creative Commons Attribution-NonCommercial-ShareAlike 4.0国际许可协议，仅限非商业用途。 (A01_文本生成_文本对话 / 大语言对话模型及数据)

README

# RGB

- An implementation for [Benchmarking Large Language Models in Retrieval-Augmented Generation](https://arxiv.org/abs/2309.01431)

## News

- \[2024/03\] We refine the retrieved documents and some answers of `en.json` and `zh.json`, and name the new data files as `en_refine.json` and `zh_refine.json`.

## Quick links

* [Environment](#Environment)
* [Retrieval-Augmented Generation Benchmark](#Retrieval-Augmented)
* [Evaluation](#Evaluation)
* [Licence](#Licence)

### Environment

```bash
conda create -n rgb python=3.10.0
conda activate rgb
bash env.sh
```

### Retrieval-Augmented Generation Benchmark

The data is putted in `data/`

```text
data/
├── en.json
├── en_refine.json
├── en_int.json
├── en_fact.json
├── zh.json
├── zh_refine.json
├── zh_int.json
└── zh_fact.json
```

To evalute the Information Integration, you should use `zh_int` or `en_int` for Chinese questions or English questions.

To evalute the Counterfactual Robustness, you should use `zh_fact` or `en_fact` for Chinese questions or English questions.

#### The refined data

We refine the retrieved documents and some answers of `en.json` and `zh.json`, and name the new data files as `en_refine.json` and `zh_refine.json`:

+ Removing incorrect positive and negative documents

+ Adding some positive documents.

+ Correcting some inaccurate answers.

### Evaluation

For evaluating ChatGPT, you can run as:

```bash
python evalue.py \
--dataset en \
--modelname chatgpt \
--temp 0.2 \
--noise_rate 0.6 \
--api_key YourAPIKEY \
--passage_num 5
```

For evaluating other models, you can run as:

```bash
python evalue.py \
--dataset en \
--modelname chatglm2-6b \
--temp 0.2 \
--noise_rate 0.6 \
--plm THUDM/chatglm-6b \
--passage_num 5
```

You should change `modelname` and `plm` for different models, where `plm` is the path of model.

`temp` is the temperature of model.

`noise_rate` is rate of noisy documents in inputs.

`passage_num` is number of provided documents for LLM (default is 5).

The outputs are:

+ all_rate: The accuracy (noise_rate<1) or rejection rate (noise_rate=1)
+ fact_check_rate: the error detection rates (ED)

---

To evaluate rejection using ChatGPT, you should first run the `evalue.py` in noise_rate=1 to obtain the generation result, and then run:

```bash
python reject_evalue.py \
--dataset en \
--modelname chatglm2-6b \
--api_key YourAPIKEY
```

The "reject_rate" in the outputs are the reject rate (Rej\*).

---

To evaluate counterfactual robustness using ChatGPT, you should first run the `evalue.py` in dataset=en_fact/zh_fact to obtain the generation result, and then run:

```bash
python fact_evalue.py \
--dataset en_fact \
--modelname chatglm2-6b \
--api_key YourAPIKEY
```

The "reject_rate" in the outputs are the error detection rates (ED\*). The `correct_rate` in the outputs are the error correction rate (CR)

## License

The code and data are released under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License for Noncommercial use only. Any commercial use should get formal permission first.

Shield: [![CC BY-NC-SA 4.0][cc-by-nc-sa-shield]][cc-by-nc-sa]

This work is licensed under a
[Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License][cc-by-nc-sa].

[![CC BY-NC-SA 4.0][cc-by-nc-sa-image]][cc-by-nc-sa]

[cc-by-nc-sa]: http://creativecommons.org/licenses/by-nc-sa/4.0/
[cc-by-nc-sa-image]: https://licensebuttons.net/l/by-nc-sa/4.0/88x31.png
[cc-by-nc-sa-shield]: https://img.shields.io/badge/License-CC%20BY--NC--SA%204.0-lightgrey.svg

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/chen700564/RGB

Awesome Lists containing this project

README