https://github.com/ruixiangcui/AGIEval

Last synced: 28 days ago
JSON representation

Host: GitHub
URL: https://github.com/ruixiangcui/AGIEval
Owner: ruixiangcui
License: mit
Created: 2023-03-26T10:37:46.000Z (about 2 years ago)
Default Branch: main
Last Pushed: 2024-06-13T14:20:51.000Z (11 months ago)
Last Synced: 2025-03-29T15:34:32.407Z (about 1 month ago)
Language: Python
Homepage:
Size: 6.87 MB
Stars: 743
Watchers: 8
Forks: 50
Open Issues: 7
Metadata Files:
- Readme: README.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
- Security: SECURITY.md

Awesome Lists containing this project

Awesome-Domain-LLM - AGIEval

README

# AGIEval
This repository contains information about AGIEval, data, code and output of baseline systems for the benchmark.

# Introduction
AGIEval is a human-centric benchmark specifically designed to evaluate the general abilities of foundation models in tasks pertinent to human cognition and problem-solving.
This benchmark is derived from 20 official, public, and high-standard admission and qualification exams intended for general human test-takers, such as general college admission tests (e.g., Chinese College Entrance Exam (Gaokao) and American SAT), law school admission tests, math competitions, lawyer qualification tests, and national civil service exams.
For a full description of the benchmark, please refer to our paper: [AGIEval: A Human-Centric Benchmark for
Evaluating Foundation Models](https://arxiv.org/pdf/2304.06364.pdf).

# Tasks and Data
[We have updated the dataset to version 1.1.](data/v1_1) The new version updated Chinese Gaokao (chemistry, biology, physics) datasets with questions from 2023 and addressed annotation issues. To facilitate evaluation, now all multi-choice question (MCQ) tasks have one answer only (Gaokao-Physics and JEC-QA used to have multi-label answers). AGIEval-en datasets remain the same as Verison 1.0. The new version's statistics are as follows:

AGIEval v1.1 contains 20 tasks, including 18 MCQ tasks and two cloze tasks (Gaokao-Math-Cloze and MATH). You can find the full list of tasks in the table below.
![The datasets used in AGIEVal](AGIEval_tasks.png)

You can download all post-processed data in the [data/v1_1](data/v1_1) folder. All usage of the data should follow the license of the original datasets.

The data format for all datasets is as follows:
```
{
"passage": null,
"question": "设集合 $A=\\{x \\mid x \\geq 1\\}, B=\\{x \\mid-1-1\\}$",
"(B)$\\{x \\mid x \\geq 1\\}$",
"(C)$\\{x \\mid-1

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/ruixiangcui/AGIEval

Awesome Lists containing this project

README