Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/ruixiangcui/AGIEval?tab=readme-ov-file


https://github.com/ruixiangcui/AGIEval?tab=readme-ov-file

Last synced: 13 days ago
JSON representation

Awesome Lists containing this project

README

        

# AGIEval
This repository contains information about AGIEval, data, code and output of baseline systems for the benchmark.

# Introduction
AGIEval is a human-centric benchmark specifically designed to evaluate the general abilities of foundation models in tasks pertinent to human cognition and problem-solving.
This benchmark is derived from 20 official, public, and high-standard admission and qualification exams intended for general human test-takers, such as general college admission tests (e.g., Chinese College Entrance Exam (Gaokao) and American SAT), law school admission tests, math competitions, lawyer qualification tests, and national civil service exams.
For a full description of the benchmark, please refer to our paper: [AGIEval: A Human-Centric Benchmark for
Evaluating Foundation Models](https://arxiv.org/pdf/2304.06364.pdf).

# Tasks and Data
[We have updated the dataset to version 1.1.](data/v1_1) The new version updated Chinese Gaokao (chemistry, biology, physics) datasets with questions from 2023 and addressed annotation issues. To facilitate evaluation, now all multi-choice question (MCQ) tasks have one answer only (Gaokao-Physics and JEC-QA used to have multi-label answers). AGIEval-en datasets remain the same as Verison 1.0. The new version's statistics are as follows:

AGIEval v1.1 contains 20 tasks, including 18 MCQ tasks and two cloze tasks (Gaokao-Math-Cloze and MATH). You can find the full list of tasks in the table below.
![The datasets used in AGIEVal](AGIEval_tasks.png)

You can download all post-processed data in the [data/v1_1](data/v1_1) folder. All usage of the data should follow the license of the original datasets.

The data format for all datasets is as follows:
```
{
"passage": null,
"question": "设集合 $A=\\{x \\mid x \\geq 1\\}, B=\\{x \\mid-1-1\\}$",
"(B)$\\{x \\mid x \\geq 1\\}$",
"(C)$\\{x \\mid-1