Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/ruixiangcui/AGIEval?tab=readme-ov-file

Last synced: 3 months ago
JSON representation

Host: GitHub
URL: https://github.com/ruixiangcui/AGIEval?tab=readme-ov-file
Owner: ruixiangcui
License: mit
Created: 2023-03-26T10:37:46.000Z (almost 2 years ago)
Default Branch: main
Last Pushed: 2024-06-13T14:20:51.000Z (8 months ago)
Last Synced: 2024-08-02T06:19:35.876Z (6 months ago)
Language: Python
Homepage:
Size: 6.87 MB
Stars: 674
Watchers: 9
Forks: 45
Open Issues: 7
Metadata Files:
- Readme: README.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
- Security: SECURITY.md

Awesome Lists containing this project

awesome-foundation-model-leaderboards - AGIEval - centric benchmark to evaluate the general abilities of foundation models in tasks pertinent to human cognition and problem-solving. | (Model Ranking / Text)

README

# AGIEval
This repository contains information about AGIEval, data, code and output of baseline systems for the benchmark.

# Introduction
AGIEval is a human-centric benchmark specifically designed to evaluate the general abilities of foundation models in tasks pertinent to human cognition and problem-solving.
This benchmark is derived from 20 official, public, and high-standard admission and qualification exams intended for general human test-takers, such as general college admission tests (e.g., Chinese College Entrance Exam (Gaokao) and American SAT), law school admission tests, math competitions, lawyer qualification tests, and national civil service exams.
For a full description of the benchmark, please refer to our paper: [AGIEval: A Human-Centric Benchmark for
Evaluating Foundation Models](https://arxiv.org/pdf/2304.06364.pdf).

# Tasks and Data
[We have updated the dataset to version 1.1.](data/v1_1) The new version updated Chinese Gaokao (chemistry, biology, physics) datasets with questions from 2023 and addressed annotation issues. To facilitate evaluation, now all multi-choice question (MCQ) tasks have one answer only (Gaokao-Physics and JEC-QA used to have multi-label answers). AGIEval-en datasets remain the same as Verison 1.0. The new version's statistics are as follows:

AGIEval v1.1 contains 20 tasks, including 18 MCQ tasks and two cloze tasks (Gaokao-Math-Cloze and MATH). You can find the full list of tasks in the table below.
![The datasets used in AGIEVal](AGIEval_tasks.png)

You can download all post-processed data in the [data/v1_1](data/v1_1) folder. All usage of the data should follow the license of the original datasets.

The data format for all datasets is as follows:
```
{
"passage": null,
"question": "设集合 $A=\\{x \\mid x \\geq 1\\}, B=\\{x \\mid-1-1\\}$",
"(B)$\\{x \\mid x \\geq 1\\}$",
"(C)$\\{x \\mid-1