Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/SalesforceAIResearch/FoFo


https://github.com/SalesforceAIResearch/FoFo

Last synced: 27 days ago
JSON representation

Awesome Lists containing this project

README

        

This repo is for research only.

## Setup Enviroment
```
conda create --prefix ./envs/ python=3.10
conda init
conda activate envs
cd scripts
pip install -r requirements.txt
cd ..
cd alpaca_eval
pip install -e .[all]
pip install openai==0.27.0
```

## Setup Openai Account
```
export OPENAI_API_KEY=
export OPENAI_ORGANIZATION_IDS= # Optional; if not set, this will be your default org id.
```

## Model Evaluation
```
data_path='data/fofo_test_prompts.json'
output_path='results'
```

### 1. Get model outputs
```
CUDA_VISIBLE_DEVICES='0' python scripts/inference_anymodel_anydata.py --input_file_path $data_path --output_file_path $output_path/wizardlm-13b-v1.2/model_outputs.json --model_name_or_path WizardLM/WizardLM-13B-V1.2 --prompt_style wizardlm --max_seq_length 5120```
```
### 2. Evaluate models' performance based on the outputs
```
alpaca_eval --annotators_config gpt4_format_correctness --model_outputs $output_path/chatgpt/reference_outputs.json --output_path $output_path/wizardlm-13b-v1.2/
```
### 3. Domain/Format analysis
If you want to draw radar figures of your model's performance and conduct domain/format analysis, please refer to
```
scripts/draw_analysis.py
```

## Citation
Please consider using the follow citation when using our code:
```
@article{xia2024fofo,
title={FOFO: A Benchmark to Evaluate LLMs' Format-Following Capability},
author={Xia, Congying and Xing, Chen and Du, Jiangshu and Yang, Xinyi and Feng, Yihao and Xu, Ran and Yin, Wenpeng and Xiong, Caiming},
journal={arXiv preprint arXiv:2402.18667},
year={2024}
}
```