Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/open-compass/compassbench

Demo data of CompassBench
https://github.com/open-compass/compassbench

Last synced: 4 days ago
JSON representation

Demo data of CompassBench

Host: GitHub
URL: https://github.com/open-compass/compassbench
Owner: open-compass
Created: 2024-07-19T12:55:02.000Z (4 months ago)
Default Branch: main
Last Pushed: 2024-08-07T12:39:22.000Z (3 months ago)
Last Synced: 2024-08-07T15:17:51.204Z (3 months ago)
Size: 608 KB
Stars: 2
Watchers: 2
Forks: 2
Open Issues: 1
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# CompassBench

CompassBench is the self-built benchmark for the [CompassRank LLM Leaderboard](https://rank.opencompass.org.cn/leaderboard-llm/), we will provide the example data for the benchmark.

## 202407 V1.3

Please check [CompassBench](https://opencompass.readthedocs.io/zh-cn/latest/advanced_guides/compassbench_intro.html) for more information of the benchmark.

```
v1_3_data
├── code
│   ├── compass_bench_coding_cn_val.json
│   └── compass_bench_coding_en_val.json
├── instruct
│   ├── compass_bench_instruct_cn_val.json
│   └── compass_bench_instruct_en_val.json
├── knowledge
│   └── single_choice_cn.jsonl
├── language
│   ├── compass_bench_language_cn_val.json
│   └── compass_bench_language_en_val.json
├── math
│   └── single_choice_cn.jsonl
└── reasoning
├── compass_bench_reasoning_cn_val.json
└── compass_bench_reasoning_en_val.json
```

- For subjective evaluation, please refer to [CompassBench Subjective Config](https://github.com/open-compass/opencompass/blob/main/configs/eval_compassbench_v1_3_subjective.py)
- For objective evaluation, please refer to [CompassBench Objective Config](https://github.com/open-compass/opencompass/blob/main/configs/datasets/compassbench_v1_3/compassbench_v1_3_objective_gen_068af0.py)

Performance of the example data will be updated soon.

### Evaluation

- Please link the `v1_3_data` to `data/compassbench_v1_3` within the opencompass directory

```bash
export HUGGINGFACE_HUB_CACHE=/path-to-hf_hub/
export HF_HUB_CACHE=/path-to-hf_hub/
export HF_EVALUATE_OFFLINE=1
export HF_DATASETS_OFFLINE=1
export TRANSFORMERS_OFFLINE=1
# Objective Evaluation
# We use `perf_4` as the final metric
python run.py --models hf_internlm2_chat_1_8b --datasets compassbench_v1_3_objective_gen

# Subjective Evaluation
python run.py configs/eval_compassbench_v1_3_subjective.py
```