Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/ruixiangcui/AGIEval
https://github.com/ruixiangcui/AGIEval
Last synced: 6 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/ruixiangcui/AGIEval
- Owner: ruixiangcui
- License: mit
- Created: 2023-03-26T10:37:46.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-06-13T14:20:51.000Z (5 months ago)
- Last Synced: 2024-11-01T00:04:22.055Z (8 days ago)
- Language: Python
- Homepage:
- Size: 6.87 MB
- Stars: 704
- Watchers: 9
- Forks: 47
- Open Issues: 7
-
Metadata Files:
- Readme: README.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
- Security: SECURITY.md
Awesome Lists containing this project
- Awesome-Domain-LLM - AGIEval
README
# AGIEval
This repository contains information about AGIEval, data, code and output of baseline systems for the benchmark.# Introduction
AGIEval is a human-centric benchmark specifically designed to evaluate the general abilities of foundation models in tasks pertinent to human cognition and problem-solving.
This benchmark is derived from 20 official, public, and high-standard admission and qualification exams intended for general human test-takers, such as general college admission tests (e.g., Chinese College Entrance Exam (Gaokao) and American SAT), law school admission tests, math competitions, lawyer qualification tests, and national civil service exams.
For a full description of the benchmark, please refer to our paper: [AGIEval: A Human-Centric Benchmark for
Evaluating Foundation Models](https://arxiv.org/pdf/2304.06364.pdf).# Tasks and Data
[We have updated the dataset to version 1.1.](data/v1_1) The new version updated Chinese Gaokao (chemistry, biology, physics) datasets with questions from 2023 and addressed annotation issues. To facilitate evaluation, now all multi-choice question (MCQ) tasks have one answer only (Gaokao-Physics and JEC-QA used to have multi-label answers). AGIEval-en datasets remain the same as Verison 1.0. The new version's statistics are as follows:AGIEval v1.1 contains 20 tasks, including 18 MCQ tasks and two cloze tasks (Gaokao-Math-Cloze and MATH). You can find the full list of tasks in the table below.
![The datasets used in AGIEVal](AGIEval_tasks.png)You can download all post-processed data in the [data/v1_1](data/v1_1) folder. All usage of the data should follow the license of the original datasets.
The data format for all datasets is as follows:
```
{
"passage": null,
"question": "设集合 $A=\\{x \\mid x \\geq 1\\}, B=\\{x \\mid-1-1\\}$",
"(B)$\\{x \\mid x \\geq 1\\}$",
"(C)$\\{x \\mid-1