https://github.com/TIGER-AI-Lab/MMLU-Pro

The code and data for "MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark" [NeurIPS 2024]
https://github.com/TIGER-AI-Lab/MMLU-Pro

evaluation llm

Last synced: 13 days ago
JSON representation

The code and data for "MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark" [NeurIPS 2024]

Host: GitHub
URL: https://github.com/TIGER-AI-Lab/MMLU-Pro
Owner: TIGER-AI-Lab
License: apache-2.0
Created: 2024-05-16T23:59:51.000Z (12 months ago)
Default Branch: main
Last Pushed: 2025-02-28T17:39:35.000Z (2 months ago)
Last Synced: 2025-02-28T21:57:39.707Z (2 months ago)
Topics: evaluation, llm
Language: Python
Homepage:
Size: 325 MB
Stars: 193
Watchers: 6
Forks: 33
Open Issues: 7
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

awesome-golang-ai - MMLU-Pro - Task Language Understanding Benchmark. (Benchmark / English)
awesome-llm-eval - MMLU-Pro - Pro 是 MMLU 数据集的改进版本。MMLU 一直是多选知识数据集的参考。然而，最近的研究表明它既包含噪音（一些问题无法回答），又太容易（通过模型能力的进化和污染的增加）。MMLU-Pro 向模型提供十个选择而不是四个，要求在更多问题上进行推理，并经过专家审查以减少噪音量。它比原版质量更高且更难. MMLU-Pro减少了提示变化对模型性能的影响，这是其前身MMLU常见的问题。研究发现，在这个新基准上使用“Chain of Thought”推理的模型表现更好，这表明MMLU-Pro更适合评估人工智能的微妙推理能力. (2024-05-20) | (Datasets-or-Benchmark / 通用)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/TIGER-AI-Lab/MMLU-Pro

Awesome Lists containing this project