https://github.com/Abbey4799/CELLO

Code and data for the paper "Can Large Language Models Understand Real-World Complex Instructions?"(AAAI2024)
https://github.com/Abbey4799/CELLO

Last synced: about 2 months ago
JSON representation

Code and data for the paper "Can Large Language Models Understand Real-World Complex Instructions?"(AAAI2024)

Host: GitHub
URL: https://github.com/Abbey4799/CELLO
Owner: Abbey4799
Created: 2023-09-16T13:23:31.000Z (almost 2 years ago)
Default Branch: main
Last Pushed: 2024-04-19T02:57:49.000Z (about 1 year ago)
Last Synced: 2024-08-03T09:06:51.540Z (11 months ago)
Language: Python
Homepage:
Size: 6.29 MB
Stars: 37
Watchers: 1
Forks: 3
Open Issues: 1
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

StarryDivineSky - Abbey4799/CELLO
awesome-llm-if - CELLO
awesome-llm-if - CELLO

README

        # CELLO

CELLO is a benchmark for evaluating the**C**ompl**E**x instruction understanding ability of **L**arge **L**anguage M**O**dels systematically (AAAI 2024).

- We design **eight features** for complex instructions and construct **a comprehensive evaluation dataset** from real-world scenarios.

- We establish **four criteria** and develop **corresponding metrics**, as current ones are inadequate, biased or too strict and coarse-grained.

- We compare the performance of representative **Chinese-oriented and English-oriented models** in following complex instructions through extensive experiments.



    


    

    




## Install Dependencies

```

conda create -n cello python=3.10.9

conda activate cello

conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.7 -c pytorch -c nvidia

pip install -r requirements.txt

```

## Evaluate Models

You can evaluate any desired model via the following scirpt `eval.sh`:

```

cd CELLO/

CUDA_VISIBLE_DEVICES=0 python code/eval.py --model_name chatglm --save_name chatglm

```

All the models are implemented in the folder [code/evaluators](code/evaluators/).

All the model results are in the folder [results/](results/).

## Scoring System

The metrics for our designed four criteria can be calculated using the following script  `score.sh`:

```

cd CELLO/

python code/score.py

```

All the scorers are implemented in the folder [code/scorers](code/scorers/).

All the scoring results are in the folder [scores/](scores/).

## Data

The collected data can be found in the [data/](data/). All samples have been anonymized.

## Citation

```

@inproceedings{he2024can,

  title={Can Large Language Models Understand Real-World Complex Instructions?},

  author={He, Qianyu and Zeng, Jie and Huang, Wenhao and Chen, Lina and Xiao, Jin and He, Qianxi and Zhou, Xunzhe and Liang, Jiaqing and Xiao, Yanghua},

  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},

  volume={38},

  number={16},

  pages={18188--18196},

  year={2024}

}

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/Abbey4799/CELLO

Awesome Lists containing this project

README