https://github.com/SalesforceAIResearch/FoFo

Last synced: 5 months ago
JSON representation

Host: GitHub
URL: https://github.com/SalesforceAIResearch/FoFo
Owner: SalesforceAIResearch
License: apache-2.0
Created: 2024-02-28T18:41:22.000Z (about 2 years ago)
Default Branch: main
Last Pushed: 2024-03-06T00:42:36.000Z (about 2 years ago)
Last Synced: 2024-10-11T18:08:16.370Z (over 1 year ago)
Language: Python
Size: 13.9 MB
Stars: 17
Watchers: 1
Forks: 2
Open Issues: 1
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE.txt
- Code of conduct: CODE_OF_CONDUCT.md
- Codeowners: CODEOWNERS
- Security: SECURITY.md

Awesome Lists containing this project

awesome-llm-if - FoFo

README

This repo is for research only.

## Setup Enviroment
```
conda create --prefix ./envs/ python=3.10
conda init
conda activate envs
cd scripts
pip install -r requirements.txt
cd ..
cd alpaca_eval
pip install -e .[all]
pip install openai==0.27.0
```

## Setup Openai Account
```
export OPENAI_API_KEY=
export OPENAI_ORGANIZATION_IDS= # Optional; if not set, this will be your default org id.
```

## Model Evaluation
```
data_path='data/fofo_test_prompts.json'
output_path='results'
```

### 1. Get model outputs
```
CUDA_VISIBLE_DEVICES='0' python scripts/inference_anymodel_anydata.py --input_file_path $data_path --output_file_path $output_path/wizardlm-13b-v1.2/model_outputs.json --model_name_or_path WizardLM/WizardLM-13B-V1.2 --prompt_style wizardlm --max_seq_length 5120```
```
### 2. Evaluate models' performance based on the outputs
```
alpaca_eval --annotators_config gpt4_format_correctness --model_outputs $output_path/chatgpt/reference_outputs.json --output_path $output_path/wizardlm-13b-v1.2/
```
### 3. Domain/Format analysis
If you want to draw radar figures of your model's performance and conduct domain/format analysis, please refer to
```
scripts/draw_analysis.py
```

## Citation
Please consider using the follow citation when using our code:
```
@article{xia2024fofo,
title={FOFO: A Benchmark to Evaluate LLMs' Format-Following Capability},
author={Xia, Congying and Xing, Chen and Du, Jiangshu and Yang, Xinyi and Feng, Yihao and Xu, Ran and Yin, Wenpeng and Xiong, Caiming},
journal={arXiv preprint arXiv:2402.18667},
year={2024}
}
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/SalesforceAIResearch/FoFo

Awesome Lists containing this project

README