Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/TIGER-AI-Lab/MMLU-Pro
The code and data for "MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark" [NeurIPS 2024]
https://github.com/TIGER-AI-Lab/MMLU-Pro
evaluation llm
Last synced: 2 months ago
JSON representation
The code and data for "MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark" [NeurIPS 2024]
- Host: GitHub
- URL: https://github.com/TIGER-AI-Lab/MMLU-Pro
- Owner: TIGER-AI-Lab
- License: apache-2.0
- Created: 2024-05-16T23:59:51.000Z (8 months ago)
- Default Branch: main
- Last Pushed: 2024-10-24T03:56:29.000Z (3 months ago)
- Last Synced: 2024-10-25T06:42:07.270Z (3 months ago)
- Topics: evaluation, llm
- Language: Python
- Homepage:
- Size: 317 MB
- Stars: 117
- Watchers: 2
- Forks: 21
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-llm-eval - MMLU-Pro - Pro 是 MMLU 数据集的改进版本。MMLU 一直是多选知识数据集的参考。然而,最近的研究表明它既包含噪音(一些问题无法回答),又太容易(通过模型能力的进化和污染的增加)。MMLU-Pro 向模型提供十个选择而不是四个,要求在更多问题上进行推理,并经过专家审查以减少噪音量。它比原版质量更高且更难. MMLU-Pro减少了提示变化对模型性能的影响,这是其前身MMLU常见的问题。研究发现,在这个新基准上使用“Chain of Thought”推理的模型表现更好,这表明MMLU-Pro更适合评估人工智能的微妙推理能力. (2024-05-20) | (Datasets-or-Benchmark / 通用)