Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/open-compass/compassbench
Demo data of CompassBench
https://github.com/open-compass/compassbench
Last synced: 4 days ago
JSON representation
Demo data of CompassBench
- Host: GitHub
- URL: https://github.com/open-compass/compassbench
- Owner: open-compass
- Created: 2024-07-19T12:55:02.000Z (4 months ago)
- Default Branch: main
- Last Pushed: 2024-08-07T12:39:22.000Z (3 months ago)
- Last Synced: 2024-08-07T15:17:51.204Z (3 months ago)
- Size: 608 KB
- Stars: 2
- Watchers: 2
- Forks: 2
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# CompassBench
CompassBench is the self-built benchmark for the [CompassRank LLM Leaderboard](https://rank.opencompass.org.cn/leaderboard-llm/), we will provide the example data for the benchmark.
## 202407 V1.3
Please check [CompassBench](https://opencompass.readthedocs.io/zh-cn/latest/advanced_guides/compassbench_intro.html) for more information of the benchmark.
```
v1_3_data
├── code
│ ├── compass_bench_coding_cn_val.json
│ └── compass_bench_coding_en_val.json
├── instruct
│ ├── compass_bench_instruct_cn_val.json
│ └── compass_bench_instruct_en_val.json
├── knowledge
│ └── single_choice_cn.jsonl
├── language
│ ├── compass_bench_language_cn_val.json
│ └── compass_bench_language_en_val.json
├── math
│ └── single_choice_cn.jsonl
└── reasoning
├── compass_bench_reasoning_cn_val.json
└── compass_bench_reasoning_en_val.json
```- For subjective evaluation, please refer to [CompassBench Subjective Config](https://github.com/open-compass/opencompass/blob/main/configs/eval_compassbench_v1_3_subjective.py)
- For objective evaluation, please refer to [CompassBench Objective Config](https://github.com/open-compass/opencompass/blob/main/configs/datasets/compassbench_v1_3/compassbench_v1_3_objective_gen_068af0.py)Performance of the example data will be updated soon.
### Evaluation
- Please link the `v1_3_data` to `data/compassbench_v1_3` within the opencompass directory
```bash
export HUGGINGFACE_HUB_CACHE=/path-to-hf_hub/
export HF_HUB_CACHE=/path-to-hf_hub/
export HF_EVALUATE_OFFLINE=1
export HF_DATASETS_OFFLINE=1
export TRANSFORMERS_OFFLINE=1
# Objective Evaluation
# We use `perf_4` as the final metric
python run.py --models hf_internlm2_chat_1_8b --datasets compassbench_v1_3_objective_gen# Subjective Evaluation
python run.py configs/eval_compassbench_v1_3_subjective.py
```