https://github.com/TIGER-AI-Lab/MMLU-Pro
The code and data for "MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark" [NeurIPS 2024]
https://github.com/TIGER-AI-Lab/MMLU-Pro
evaluation llm
Last synced: 12 months ago
JSON representation
The code and data for "MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark" [NeurIPS 2024]
- Host: GitHub
- URL: https://github.com/TIGER-AI-Lab/MMLU-Pro
- Owner: TIGER-AI-Lab
- License: apache-2.0
- Created: 2024-05-16T23:59:51.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2025-02-28T17:39:35.000Z (about 1 year ago)
- Last Synced: 2025-02-28T21:57:39.707Z (about 1 year ago)
- Topics: evaluation, llm
- Language: Python
- Homepage:
- Size: 325 MB
- Stars: 193
- Watchers: 6
- Forks: 33
- Open Issues: 7
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-llm-eval - link
- Awesome-AI-Evaluation-Guide - MMLU-Pro - Enhanced version with 10 choices per question, emphasizing reasoning over memorization (Benchmarks & Datasets / General Language Understanding)
- awesome-ai-eval - **MMLU-Pro** - AI-Lab/MMLU-Pro?style=social&label=github.com) - Harder 10-choice extension focused on reasoning-rich, low-leakage questions. (Benchmarks / General)
- Awesome-Prompt-Engineering - GitHub