https://github.com/qinyiwei/InfoBench

Last synced: 5 months ago
JSON representation

Host: GitHub
URL: https://github.com/qinyiwei/InfoBench
Owner: qinyiwei
License: mit
Created: 2023-06-16T19:25:00.000Z (about 2 years ago)
Default Branch: main
Last Pushed: 2024-08-22T03:45:21.000Z (10 months ago)
Last Synced: 2024-10-11T18:08:16.095Z (9 months ago)
Language: Python
Size: 31.3 KB
Stars: 44
Watchers: 3
Forks: 6
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

awesome-llm-if - InfoBench
awesome-llm-if - InfoBench

README

# InfoBench

- **Paper:** [InFoBench: Evaluating Instruction Following Ability in Large Language Models](https://arxiv.org/pdf/2401.03601.pdf)
- **Dataset:** [InFoBench Dataset](https://huggingface.co/datasets/kqsong/InFoBench)
- **Generation and Annotation:** [InFoBench Generation and Annotation](https://drive.google.com/drive/folders/1Bj7u196p2fxBP03dQgd5lvddFoeSdPFO?usp=drive_link)

## Citation
```
@article{qin2024infobench,
title={InFoBench: Evaluating Instruction Following Ability in Large Language Models},
author={Yiwei Qin and Kaiqiang Song and Yebowen Hu and Wenlin Yao and Sangwoo Cho and Xiaoyang Wang and Xuansheng Wu and Fei Liu and Pengfei Liu and Dong Yu},
year={2024},
eprint={2401.03601},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
```

## Evaluation with InFoBench
### Step1: Dataset Usage
You can directly download it with huggingface datasets.
``` python
from datasets import load_dataset

dataset = load_dataset("kqsong/InFoBench")
```

### Step2: Generating the response
Provide an output file in `model/output.json`.
Each data entry should be a json object with a newline, containing all the fields in the input format.
The generated response should be included in the json object with the new field named `output`.

We suggest using greedy decoding to avoid the randomness of decoding.

### Step3: Evaluation

Evaluate LLM's outputs on decomposed questions. Using GPT-4-0314 by default in this research.
```bash
python evaluation.py \
--api_key \
--eval_model gpt-4-0314 \
--input model/output.json \
--output_dir evaluation/ \
--temperature 0
```

Each data entry will include an "eval" key in the format of ```List[bool]``` which represents "Yes" or "No" answers to each decomposed question.
The final output evaluation file will be saved in JSON format at location ```//```.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/qinyiwei/InfoBench

Awesome Lists containing this project

README