Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Awesome-LLMs-Datasets
Summarize existing representative LLMs text datasets.
https://github.com/lmmlzn/Awesome-LLMs-Datasets
Last synced: 6 days ago
JSON representation
-
Pre-training Corpora
-
General Pre-training Corpora
- Paper
- Website
- Github
- Paper
- Paper - research/google-research/tree/master/madlad_400) | [Dataset](https://huggingface.co/datasets/allenai/MADLAD-400)**
- Paper
- Paper
- Github - 627B) | [Website](https://www.cerebras.net/blog/slimpajama-a-627b-token-cleaned-and-deduplicated-version-of-redpajama)**
- Paper
- Paper
- Paper
- Paper
- Github - Data-V2) | [Website](https://together.ai/blog/redpajama-data-v2)**
- Paper
- Website
- Paper
- Website
- Paper - me/phi-1)**
- Paper - stack)**
- Dataset
- Paper - un/)**
- Paper
- Paper
- Paper
- Dataset
- Paper - 2) | [Dataset](https://github.com/openai/gpt-2-output-dataset)**
- Website
- Dataset
- Dataset
- Paper - data)**
- Paper - pile) | [Dataset](https://pile.eleuther.ai/)**
- Paper
- Paper
- Paper
- Website
- Paper
- Paper
- Paper
- Paper - stories)**
- Paper - corpus/OSCAR-2201) | [Website](https://oscar-project.org/)**
- Paper
- Paper - refinedweb)**
- Paper
- Website
- Website
- Dataset
- Website
- Paper - datasheet) | [Dataset](https://huggingface.co/datasets/bookcorpusopen)**
- Paper - deepmind/pg19) | [Dataset](https://huggingface.co/datasets/pg19)**
- Website
- Website
- Paper - pile) | [Dataset](https://pile.eleuther.ai/)**
- Paper
- Paper - stories)**
- Paper
- Paper
- Paper - corpus/OSCAR-2201) | [Website](https://oscar-project.org/)**
- Paper - refinedweb)**
- Paper - deepmind/pg19) | [Dataset](https://huggingface.co/datasets/pg19)**
- Paper
- Paper - data)**
- Paper
- Paper - me/phi-1)**
- Paper - datasheet) | [Dataset](https://huggingface.co/datasets/bookcorpusopen)**
- Paper - stack)**
- Dataset
- Dataset1 - open/details/BAAI-CCI2)**
-
Domain-specific Pre-training Corpora
- Github
- Paper - web-math/open-web-math)**
- Paper - DI/XuanYuan) | [Dataset](https://huggingface.co/datasets/Duxiaoman-DI/FinCorpus)**
- Paper - lm) | [Dataset](https://huggingface.co/datasets/EleutherAI/proof-pile-2) | [Website](https://blog.eleuther.ai/llemma/)**
- Paper - lm) | [Dataset](https://huggingface.co/datasets/EleutherAI/proof-pile-2) | [Website](https://blog.eleuther.ai/llemma/)**
- Paper - DI/XuanYuan) | [Dataset](https://huggingface.co/datasets/Duxiaoman-DI/FinCorpus)**
- Paper - FinCUGE-Applications) | [Website](https://bbt.ssymmetry.com/index.html)**
-
-
Changelog
- NAH (Needle-in-a-Haystack)
- IEPile - tuning Datasets | General Instruction Fine-tuning Datasets | CI); **[InstructIE](https://arxiv.org/abs/2305.11527)** (Instruction Fine-tuning Datasets | General Instruction Fine-tuning Datasets | HG).
- SlimPajama - training Corpora | General Pre-training Corpora | Multi-category); **[MassiveText](https://arxiv.org/abs/2112.11446)** (Pre-training Corpora | General Pre-training Corpora | Multi-category); **[MADLAD-400](https://arxiv.org/abs/2309.04662)** (Pre-training Corpora | General Pre-training Corpora | Webpages); **[Minerva](https://arxiv.org/abs/2206.14858)** (Pre-training Corpora | General Pre-training Corpora | Multi-category); **[CCAligned](https://aclanthology.org/2020.emnlp-main.480/)** (Pre-training Corpora | General Pre-training Corpora | Parallel Corpus); **[WikiMatrix](https://aclanthology.org/2021.eacl-main.115/)** (Pre-training Corpora | General Pre-training Corpora | Parallel Corpus); **[OpenWebMath](https://arxiv.org/abs/2310.06786)** (Pre-training Corpora | Domain-specific Pre-training Corpora | Math).
- ALCE
- CLUE Benchmark Series - MTEB Leaderboard](https://github.com/FlagOpen/FlagEmbedding/tree/master/C_MTEB)** (Evaluation Datasets | Evaluation Platform).
- MathPile - training Corpora | Domain-specific Pre-training Corpora | Math); **[WanJuan-CC](https://arxiv.org/abs/2402.19282)** (Pre-training Corpora | General Pre-training Corpora | Webpages).
- WebQuestions
- CRUD-RAG - Instruct-Benchmark-Tester](https://huggingface.co/datasets/llmware/rag_instruct_benchmark_tester)** (RAG Datasets); **[ARES](https://arxiv.org/abs/2311.09476)** (RAG Datasets).
- MMRS-1M - tuning Datasets); **[VideoChat2-IT](https://arxiv.org/abs/2311.17005)** (MLLMs Datasets | Instruction Fine-tuning Datasets); **[InstructDoc](https://arxiv.org/abs/2401.13313)** (MLLMs Datasets | Instruction Fine-tuning Datasets); **[ALLaVA-4V Data](https://arxiv.org/abs/2402.11684)** (MLLMs Datasets | Instruction Fine-tuning Datasets); **[MVBench](https://arxiv.org/abs/2311.17005)** (MLLMs Datasets | Evaluation Datasets); **[OlympiadBench](https://arxiv.org/abs/2402.14008)** (MLLMs Datasets | Evaluation Datasets); **[MMMU](https://arxiv.org/abs/2311.16502)** (MLLMs Datasets | Evaluation Datasets).
- GameBench - 1.325/)** (Evaluation Datasets | NLU); **[SarcasmBench](https://arxiv.org/abs/2408.11319)** (Evaluation Datasets | NLU); **[C<sup>3</sup> Bench](https://arxiv.org/abs/2405.17732)** (Evaluation Datasets | Subject); **[TableBench](https://www.arxiv.org/abs/2408.09174)** (Evaluation Datasets | Reasoning); **[ArabLegalEval](https://www.arxiv.org/abs/2408.07983)** (Evaluation Datasets | Law).
- MultiTrust - training Corpora); **[MultiMed](https://www.arxiv.org/abs/2408.12682)** (MLLMs Datasets | Evaluation Datasets).
- Lithuanian-QA-v1 - tuning Datasets | CI & MC); **[REInstruct](https://www.arxiv.org/abs/2408.10663)** (Instruction Fine-tuning Datasets | HG & CI & MC); **[KoLLM-Converations](https://huggingface.co/datasets/davidkim205/kollm-converations)** (Instruction Fine-tuning Datasets | CI).
- OpenMathInstruct-1 - tuning Datasets | Domain-specific Instruction Fine-tuning Datasets | Math); **[FinBen](https://arxiv.org/abs/2402.12659)** (Evaluation Datasets | Financial).
- Dolma - training Corpora | General Pre-training Corpora | Multi-category).
- LongWriter-6K - tuning Datasets | CI & MC).
- MedTrinity-25M
- DebateQA - acl.334/)** (Evaluation Datasets | Subject); **[PersianMMLU](https://arxiv.org/abs/2404.06644)** (Evaluation Datasets | Subject); **[TMMLU+](https://arxiv.org/abs/2403.01858)** (Evaluation Datasets | Subject).
- Expository-Prose-V1 - training Corpora | General Pre-training Corpora | Multi-category).
- RAGEval - RAG](https://arxiv.org/abs/2401.15391)** (RAG Datasets).
- Aya Collection - tuning Datasets | General Instruction Fine-tuning Datasets | HG & CI & MC); **[Aya Dataset](https://arxiv.org/abs/2402.06619)** (Instruction Fine-tuning Datasets | General Instruction Fine-tuning Datasets | HG).
- AlphaFin - tuning Datasets | Domain-specific Instruction Fine-tuning Datasets | Other); **[COIG-CQIA](https://arxiv.org/abs/2403.18058)** (Instruction Fine-tuning Datasets | General Instruction Fine-tuning Datasets | HG & CI).
- MAP-CC - training Corpora | General Pre-training Corpora | Multi-category); **[FineWeb](https://huggingface.co/datasets/HuggingFaceFW/fineweb)** (Pre-training Corpora | General Pre-training Corpora | Webpages); **[CCI 2.0](https://huggingface.co/datasets/BAAI/CCI2-Data)** (Pre-training Corpora | General Pre-training Corpora | Webpages).
- CLUE - Bench](https://arxiv.org/abs/2404.04167)** (Evaluation Datasets | General); **[CIF-Bench](https://arxiv.org/abs/2402.13109)** (Evaluation Datasets | General); **[ACLUE](https://aclanthology.org/2023.alp-1.9/)** (Evaluation Datasets | Subject); **[LeSC](https://arxiv.org/abs/2405.05741)** (Evaluation Datasets | NLU); **[AlignBench](https://arxiv.org/abs/2311.18743)** (Evaluation Datasets | Multitask); **[SciKnowEval](https://arxiv.org/abs/2406.09098)** (Evaluation Datasets | Subject).
- WildChat - tuning Datasets | MC).
- OpenHermesPreferences - SCIR/huozi/blob/main/data/huozi-rlhf/huozi_rlhf_data.csv)** (Preference Datasets | Vote); **[HelpSteer](https://arxiv.org/abs/2311.09528)** (Preference Datasets | Score); **[HelpSteer2](https://arxiv.org/abs/2406.08673)** (Preference Datasets | Score).
- MMT-Bench - training Corpora); **[MM-NIAH](https://arxiv.org/abs/2406.07230)** (MLLMs Datasets | Evaluation Datasets).
- CRAG
- GPQA - Wild](https://arxiv.org/abs/2403.04307)** (Evaluation Datasets | Factuality); **[CMATH](https://arxiv.org/abs/2306.16636)** (Evaluation Datasets | Subject); **[FineMath](https://arxiv.org/abs/2403.07747)** (Evaluation Datasets | Subject); **[RealTime QA](https://arxiv.org/abs/2207.13332)** (Evaluation Datasets | Factuality); **[WYWEB](https://aclanthology.org/2023.findings-acl.204/)** (Evaluation Datasets | Subject); **[ChineseFactEval](https://gair-nlp.github.io/ChineseFactEval/)** (Evaluation Datasets | Factuality); **[Counting-Stars](https://arxiv.org/abs/2403.11802)** (Evaluation Datasets | Long Text).
-
Instruction Fine-tuning Datasets
-
General Instruction Fine-tuning Datasets
- Github
- Github - KOL)**
- Github
- Github
- Github - Chinese-English-90k)**
- Paper
- Github - train-1.1M)**
- Github - QA-B)**
- Github - road/Wizard-LM-Chinese-instruct-evol)**
- Paper
- Paper - a-p/COIG-CQIA)**
- Paper - ai/camel) | [Dataset](https://huggingface.co/camel-ai) | [Website](https://www.camel-ai.org/)**
- Paper
- Paper - baize/baize-chatbot) | [Dataset](https://github.com/project-baize/baize-chatbot/tree/main/data)**
- Dataset - first-open-commercially-viable-instruction-tuned-llm)**
- Paper - coai/CDial-GPT)**
- Paper
- Dataset
- Github
- Paper - ai/camel) | [Dataset](https://huggingface.co/camel-ai) | [Website](https://www.camel-ai.org/)**
- Github
- Paper - chat-1m)**
- Paper - instruct)**
- Website
- Paper
- Paper - USC/CrossFit)**
- Paper
- Paper - it.github.io/)**
- Paper - lab/flacuna) | [Dataset](https://huggingface.co/datasets/declare-lab/flan-mini)**
- Paper - research/flan)**
- Paper - research/FLAN/tree/main/flan/v2) | [Dataset](https://huggingface.co/datasets/SirNeural/flan_v2)**
- Dataset
- Paper
- Paper - instructions)**
- Paper - qa)**
- Paper
- Paper - instructions) | [Dataset](https://instructions.apps.allenai.org/)**
- Dataset
- Paper - bAInd/Open-Platypus) | [Website](https://platypus-llm.github.io/)**
- Paper
- Paper - workshop/promptsource)**
- Paper - instructions)**
- Paper
- Paper - ai/UnifiedSKG)**
- Paper - workshop/xmtf)**
- Paper
- Paper
- Paper - Tuning-with-GPT-4/GPT-4-LLM#data-release)**
- Paper - nlp/bactrian-x) | [Dataset](https://huggingface.co/datasets/MBZUAI/Bactrian-X)**
- Paper - baize/baize-chatbot) | [Dataset](https://github.com/project-baize/baize-chatbot/tree/main/data)**
- Paper - ai/gpt4all) | [Dataset](https://huggingface.co/datasets/QingyiSi/Alpaca-CoT/tree/main/GPT4all)**
- Dataset - model.github.io/)**
- Paper - nlp/LaMini-LM) | [Dataset](https://huggingface.co/datasets/MBZUAI/LaMini-instruction)**
- Paper
- Paper
- Paper - Orca/OpenOrca)**
- Paper
- Paper - SimpleAI/chatgpt-comparison-detection) | [Dataset1](https://huggingface.co/datasets/Hello-SimpleAI/HC3) | [Dataset2](https://huggingface.co/datasets/Hello-SimpleAI/HC3-Chinese)**
- Paper - sft-data-v1)**
- Paper - coai/CDial-GPT)**
- Paper
- Paper - it.github.io/)**
- Paper - bAInd/Open-Platypus) | [Website](https://platypus-llm.github.io/)**
- Paper - nlp/bactrian-x) | [Dataset](https://huggingface.co/datasets/MBZUAI/Bactrian-X)**
- Paper - nlp/LaMini-LM) | [Dataset](https://huggingface.co/datasets/MBZUAI/LaMini-instruction)**
- Paper
- Paper - USC/CrossFit)**
- Paper
- Paper
- Paper
- Paper - chat-1m)**
- Paper
- Paper - qa)**
- Dataset
- Paper
- Paper - lab/flacuna) | [Dataset](https://huggingface.co/datasets/declare-lab/flan-mini)**
- Paper
- Paper - Orca/OpenOrca)**
- Paper - sft-data-v1)**
- Paper - research/flan)**
- Paper
- Paper - instructions)**
- Paper - ai/UnifiedSKG)**
- Paper - research/FLAN/tree/main/flan/v2) | [Dataset](https://huggingface.co/datasets/SirNeural/flan_v2)**
- Paper
-
Domain-specific Instruction Fine-tuning Datasets
- Github
- Github
- Github
- Github - nlp/HanFei)**
- Github - GPT#数据集构建)**
- Github
- Github
- Paper - dialog)**
- Github - Lab/TransGPT-sft)**
- Paper - MedLLM) | [Dataset](https://huggingface.co/datasets/Flmc/DISC-Med-SFT) | [Website](https://med.fudan-disc.com)**
- Paper - LawLLM) | [Website](https://law.fudan-disc.com)**
- Paper - llama) | [Dataset](https://github.com/AndrewZhe/lawyer-llama/tree/main/data)**
- Paper - YF/MWPToolkit) | [Dataset](https://huggingface.co/datasets/Macropodus/MWP-Instruct)**
- Github
- Paper - Li/ChatDoctor) | [Dataset](https://github.com/Kent0n-Li/ChatDoctor)**
- Paper
- Paper - MedLLM) | [Dataset](https://huggingface.co/datasets/Flmc/DISC-Med-SFT) | [Website](https://med.fudan-disc.com)**
- Paper - sft-data-v1)**
- Paper - AI4H/Medical-Dialogue-System)**
- Paper
- Paper - deepmind/code_contests)**
- Paper
- Paper - sft-data-v1)**
- Paper
- Paper - deepmind/code_contests)**
- Paper
- Paper
- Paper
- Paper - nlp/EduChat) | [Dataset](https://huggingface.co/datasets/ecnu-icalk/educhat-sft-002-data-osm)**
- Paper
- Paper
- Paper
- Paper - LawLLM) | [Website](https://law.fudan-disc.com)**
- Paper - llama) | [Dataset](https://github.com/AndrewZhe/lawyer-llama/tree/main/data)**
- Paper
- Paper - YF/MWPToolkit) | [Dataset](https://huggingface.co/datasets/Macropodus/MWP-Instruct)**
- Paper - nlp/EduChat) | [Dataset](https://huggingface.co/datasets/ecnu-icalk/educhat-sft-002-data-osm)**
- Paper - FinLLM) | [Website](https://fin.fudan-disc.com)**
- Paper
- Paper - Instructions) | [Dataset](https://huggingface.co/datasets/zjunlp/Mol-Instructions)**
- Paper - dialog)**
- Paper - Instructions) | [Dataset](https://huggingface.co/datasets/zjunlp/Mol-Instructions)**
-
-
Preference Datasets
-
Preference Evaluation Methods
- Paper
- Github - SCIR/huozi/blob/main/data/huozi-rlhf/huozi_rlhf_data.csv)**
- Github
- Paper - Alignment/safe-rlhf) | [Dataset](https://huggingface.co/datasets/PKU-Alignment/PKU-SafeRLHF)**
- Github
- Paper - sys/FastChat/tree/main/fastchat/llm_judge) | [Dataset](https://huggingface.co/datasets/lmsys/mt_bench_human_judgments) | [Website](https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard)**
- Paper - PLUG/CValues) | [Dataset](https://www.modelscope.cn/datasets/damo/CValues-Comparison/summary)**
- Paper - templar/Stable-Alignment)**
- Paper
- Paper - sys/FastChat/tree/main/fastchat/llm_judge) | [Dataset](https://huggingface.co/datasets/lmsys/mt_bench_human_judgments) | [Website](https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard)**
- Paper1 - rlhf) | [Dataset](https://huggingface.co/datasets/Anthropic/hh-rlhf)**
- Paper - Alignment/safe-rlhf) | [Dataset](https://huggingface.co/datasets/PKU-Alignment/PKU-SafeRLHF)**
- Paper
- Dataset
- Paper
- Paper - PLUG/CValues) | [Dataset](https://www.modelscope.cn/datasets/damo/CValues-Comparison/summary)**
- Dataset
- Paper - exchange-preferences)**
- Paper
- Paper - templar/Stable-Alignment)**
- Paper
- Paper
- Paper - exchange-preferences)**
- Paper
- Paper - Aligner) | [Dataset](https://huggingface.co/datasets/nvidia/HelpSteer2)**
-
-
Evaluation Datasets
-
General
- Paper - sys/FastChat/tree/main/fastchat/llm_judge) | [Website](https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard)**
- Github - sys/vicuna-blog-eval/tree/main/eval) | [Website](https://lmsys.org/blog/2023-03-30-vicuna/)**
- Paper - lab/alpaca_eval) | [Dataset](https://huggingface.co/datasets/tatsu-lab/alpaca_eval) | [Website](https://tatsu-lab.github.io/alpaca_eval/)**
- Paper - 80) | [Dataset](https://github.com/ictnlp/BayLing/tree/main/data/BayLing-80)**
- Paper
- Paper
- Paper
- Paper - Baichuan-MLSystemLab/SysBench) | [Dataset](https://github.com/PKU-Baichuan-MLSystemLab/SysBench)**
- Paper - lab/alpaca_eval) | [Dataset](https://huggingface.co/datasets/tatsu-lab/alpaca_eval) | [Website](https://tatsu-lab.github.io/alpaca_eval/)**
- Paper - 80) | [Dataset](https://github.com/ictnlp/BayLing/tree/main/data/BayLing-80)**
- Paper
- Paper
- Paper - sys/FastChat/tree/main/fastchat/llm_judge) | [Website](https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard)**
- Paper
- Paper - Bench) | [Website](https://yizhilll.github.io/CIF-Bench/)**
-
Subject
- Github - clue)**
- Paper - lab/M3KE) | [Dataset](https://huggingface.co/datasets/TJUNLP/M3KE)**
- Paper
- Paper
- Paper
- Paper
- Paper
- Paper
- Paper
- Paper
- Paper
- Paper - LIT/ceval) | [Dataset](https://huggingface.co/datasets/ceval/ceval-exam) | [Website](https://cevalbenchmark.com/)**
- Paper - li/CMMLU) | [Dataset](https://huggingface.co/datasets/haonan-li/cmmlu)**
- Paper
- Paper - LIT/ceval) | [Dataset](https://huggingface.co/datasets/ceval/ceval-exam) | [Website](https://cevalbenchmark.com/)**
- Paper - Eval) | [Dataset](https://huggingface.co/datasets/Besteasy/CG-Eval) | [Website](http://cgeval.besteasy.com/)**
- Github
- Paper
- Paper
- Paper - lab/M3KE) | [Dataset](https://huggingface.co/datasets/TJUNLP/M3KE)**
- Paper
- Paper
- Paper
- Paper
- Paper - li/CMMLU) | [Dataset](https://huggingface.co/datasets/haonan-li/cmmlu)**
- Paper
- Paper - nlp/ArabicMMLU) | [Dataset](https://huggingface.co/datasets/MBZUAI/ArabicMMLU)**
- Paper - center)**
- Paper - zhang/aclue) | [Dataset](https://huggingface.co/datasets/tyouisen/aclue)**
- Paper - zju/sciknoweval) | [Dataset](https://huggingface.co/datasets/hicai-zju/SciKnowEval)**
- Paper
-
Long Text
- Github - 06-29-longchat/)**
- Github
- Paper - nlp/zero_scrolls) | [Dataset](https://huggingface.co/datasets/tau/zero_scrolls) | [Website](https://www.zero.scrolls-benchmark.com/)**
- Paper
- Paper - stars) | [Dataset](https://github.com/nick7nlp/counting-stars)**
- Paper
- Paper
- Paper - nlco/LooGLE) | [Dataset](https://huggingface.co/datasets/bigainlco/LooGLE)**
- Paper
- Paper
- Paper - nlp/zero_scrolls) | [Dataset](https://huggingface.co/datasets/tau/zero_scrolls) | [Website](https://www.zero.scrolls-benchmark.com/)**
- Paper - nlco/LooGLE) | [Dataset](https://huggingface.co/datasets/bigainlco/LooGLE)**
- Paper - compass/opencompass)**
-
Evaluation Platform
-
Medical <a id="medical03"></a>
-
Social Norms
- Github
- Paper - mll/crows-pairs)**
- Paper - coai/SafetyBench) | [Dataset](https://huggingface.co/datasets/thu-coai/SafetyBench) | [Website](https://llmbench.ai/safety)**
- Paper - coai/Safety-Prompts) | [Dataset](https://github.com/thu-coai/Safety-Prompts) | [Website](http://115.182.62.166:18000/)**
- Paper
- Paper - coai/SafetyBench) | [Dataset](https://huggingface.co/datasets/thu-coai/SafetyBench) | [Website](https://llmbench.ai/safety)**
- Paper
-
Factuality
-
Multitask <a id="multitask01"></a>
- Github
- Paper - Lab/CLEVA) | [Website](http://www.lavicleva.com/#/homepage/overview)**
- Paper - Bench-Hard)**
- Paper - bench)**
- Paper - Lab/CLEVA) | [Website](http://www.lavicleva.com/#/homepage/overview)**
- Paper
- Paper - crfm/helm) | [Website](https://crfm.stanford.edu/helm/latest/)**
- Paper
- Paper - Bench-Hard)**
- Paper - bench)**
- Paper
- Paper
-
Tool
- Paper - Ye/ToolEyes) | [Datasets](https://github.com/Junjie-Ye/ToolEyes)**
- Paper - ConvAI/tree/main/api-bank)**
- Paper
- Paper - llm/APIBench) | [Website](https://gorilla.cs.berkeley.edu/)**
- Paper - ConvAI/tree/main/api-bank)**
- Paper - llm/APIBench) | [Website](https://gorilla.cs.berkeley.edu/)**
- Paper
-
Multilingual
-
Reasoning
-
Knowledge
-
Code <a id="code03"></a>
- Paper
- Paper
- Paper - project/octopack) | [Dataset](https://huggingface.co/datasets/bigcode/humanevalpack)**
- Paper
- Paper - ai/DS-1000) | [Dataset](https://github.com/xlang-ai/DS-1000/tree/main/ds1000_example) | [Website](https://ds1000-code-gen.github.io/)**
- Paper - ConvAI/tree/main/bird) | [Dataset](https://bird-bench.github.io/) | [Website](https://bird-bench.github.io/)**
- Paper
- Paper - ai/DS-1000) | [Dataset](https://github.com/xlang-ai/DS-1000/tree/main/ds1000_example) | [Website](https://ds1000-code-gen.github.io/)**
- Paper - eval)**
- Paper
- Paper
- Paper
- Paper - project/octopack) | [Dataset](https://huggingface.co/datasets/bigcode/humanevalpack)**
- Paper
- Paper - eval)**
-
OOD
-
NLU
-
Law
- Paper
- Paper - shen/LAiW)**
- Paper
- Paper - glue)**
- Paper
- Paper
- Paper - compass/LawBench) | [Dataset](https://github.com/open-compass/LawBench/tree/main/data)**
- Paper - shen/LAiW)**
- Paper - compass/LawBench) | [Dataset](https://github.com/open-compass/LawBench/tree/main/data)**
- Paper
- Paper - glue)**
- Paper
- Paper
-
Other <a id="other04"></a>
- Paper
- Paper
- Paper - Guo/Owl)**
- Paper - NLP/EcomGPT)**
- Paper - bench) | [Dataset](https://github.com/xingyaoww/mint-bench/blob/main/docs/DATA.md) | [Website](https://xingyaoww.github.io/mint-bench/)**
- Paper
- Paper - ARISE/EmotionBench)**
- Paper - Guo/Owl)**
- Paper - ARISE/EmotionBench)**
- Paper - NLP/EcomGPT)**
- Paper - bench) | [Dataset](https://github.com/xingyaoww/mint-bench/blob/main/docs/DATA.md) | [Website](https://xingyaoww.github.io/mint-bench/)**
- Paper
-
Exam
-
Financial <a id="financial02"></a>
- Paper - FinCUGE-Applications) | [Website](https://bbt.ssymmetry.com/index.html)**
- Paper - FinAI)**
- Github
- Paper - AIFLM-Lab/FinEval) | [Dataset](https://huggingface.co/datasets/SUFE-AIFLM-Lab/FinEval) | [Website](https://fineval.readthedocs.io/en/latest/index.html)**
- Paper - nlp.github.io/FLANG/)**
- Paper - AIFLM-Lab/FinEval) | [Dataset](https://huggingface.co/datasets/SUFE-AIFLM-Lab/FinEval) | [Website](https://fineval.readthedocs.io/en/latest/index.html)**
- Paper - nlp.github.io/FLANG/)**
-
Agent
-
Evaluation
- Paper - Eval/FairEval) | [Dataset](https://github.com/i-Eval/FairEval)**
- Paper - ConvAI/tree/main/WideDeep) | [Dataset](https://github.com/AlibabaResearch/DAMO-ConvAI/tree/main/WideDeep)**
- Paper
- Paper - Eval/FairEval) | [Dataset](https://github.com/i-Eval/FairEval)**
- Paper - ConvAI/tree/main/WideDeep) | [Dataset](https://github.com/AlibabaResearch/DAMO-ConvAI/tree/main/WideDeep)**
- Paper
-
-
Retrieval Augmented Generation (RAG) Datasets <a id="retrieval-augmented-generation-rag-datasets"></a>
-
Evaluation Datasets <a id="evaluation02"></a>
- https://github.com/stanford-futuredata/ARES
- https://github.com/IAAR-Shanghai/CRUD_RAG
- https://github.com/explodinggradients/ragas
- https://github.com/chen700564/RGB
- https://github.com/princeton-nlp/ALCE
- RAGAS: Automated Evaluation of Retrieval Augmented Generation
- https://huggingface.co/datasets/explodinggradients/WikiEval
- Benchmarking Large Language Models in Retrieval-Augmented Generation
- https://huggingface.co/datasets/llmware/rag_instruct_benchmark_tester
- https://medium.com/@darrenoberst/how-accurate-is-rag-8f0706281fd9
- ARES: An Automated Evaluation Framework for Retrieval-Augmented Generation Systems
- https://huggingface.co/datasets/princeton-nlp/ALCE-data
- RAG-QA Arena: Evaluating Domain Robustness for Long-form Retrieval Augmented Question Answering
- https://github.com/awslabs/rag-qa-arena
- https://github.com/OpenBMB/RAGEval
- MultiHop-RAG: Benchmarking Retrieval-Augmented Generation for Multi-Hop Queries
- https://github.com/yixuantt/MultiHop-RAG/
- https://huggingface.co/datasets/yixuantt/MultiHopRAG
- https://www.aicrowd.com/challenges/meta-comprehensive-rag-benchmark-kdd-cup-2024
-
-
Traditional NLP Datasets
-
Question Answering
- Paper
- Github - medical-dialogue-data)**
- Github1 - Checklist)**
- Paper - nlp.sites.tau.ac.il/commonsenseqa) | [Website](https://www.tau-nlp.sites.tau.ac.il/commonsenseqa)**
- Paper
- Paper - research-datasets/boolean-questions)**
- Paper
- Paper - nlp/CONDAQA)**
- Paper
- Paper
- Paper
- Paper
- Paper - research-datasets/natural-questions) | [Dataset](https://huggingface.co/datasets/natural_questions)**
- Paper - z.github.io/ReCoRD-explorer/)**
- Paper
- Paper - research-datasets/tydiqa) | [Dataset](https://huggingface.co/datasets/tydiqa)**
- Paper
- Paper
- Paper
- Paper
- Paper
- Paper - machine.cs.uml.edu/lab2/projects/quail/)**
- Paper
- Paper - Dataset) | [Dataset](https://huggingface.co/datasets/thu-coai/chid/tree/main/original)**
- Paper
- Paper
- Paper
- Paper
- Paper
- Paper
- Paper
- Paper
- Paper - Robust)**
- Paper
- Paper - Question-Answering) | [Dataset](https://huggingface.co/datasets/ms_marco)**
- Paper
- Paper
- Paper
- Paper
- Paper - 2.0)**
- Paper
- Paper - nlp.sites.tau.ac.il/commonsenseqa) | [Website](https://www.tau-nlp.sites.tau.ac.il/commonsenseqa)**
- Paper
- Paper - rozet/piqa) | [Dataset](https://huggingface.co/datasets/piqa)**
- Paper - qa) | [Dataset](https://jecqa.thunlp.org/) | [Website](https://jecqa.thunlp.org/)**
- Paper
- Paper - qa) | [Dataset](https://huggingface.co/datasets/head_qa) | [Website](https://aghie.github.io/head-qa/)**
- Paper
- Paper - us/download/details.aspx?id=52419)**
- Paper - iitd/ECQA-Dataset)**
- Paper - coai/PsyQA)**
- Paper
- Paper
- Paper
- Paper
- Paper
- Paper
- Paper - dataset)**
- Paper - cub/prost) | [Dataset](https://huggingface.co/datasets/corypaik/prost)**
- Paper
- Paper
- Paper
- Paper
- Paper
- Paper
- Paper
- Paper - nlp/CONDAQA)**
- Paper
- Paper
- Paper
- Paper
- Paper - Dataset) | [Dataset](https://huggingface.co/datasets/thu-coai/chid/tree/main/original)**
- Paper
- Paper
- Paper - z.github.io/ReCoRD-explorer/)**
- Paper
- Paper
- Paper - Question-Answering) | [Dataset](https://huggingface.co/datasets/ms_marco)**
- Paper
- Paper - qa) | [Dataset](https://jecqa.thunlp.org/) | [Website](https://jecqa.thunlp.org/)**
- Paper - coai/PsyQA)**
- Paper
- Paper
- Paper
- Paper
- Paper
- Paper
- Paper
-
Recognizing Textual Entailment
- Github
- Paper
- Paper1 - Second-PASCAL-Recognising-Textual-Entailment-Bar-Haim-Dagan/136326377c122560768db674e35f5bcd6de3bc40) | [Paper3](https://dl.acm.org/doi/pdf/10.5555/1654536.1654538) | [Paper4](https://tac.nist.gov/publications/2009/additional.papers/RTE5_overview.proceedings.pdf) | [Dataset](https://huggingface.co/datasets/glue/viewer/rte/train)**
- Paper
- Paper
- Paper
- Paper
- Paper
- Paper
- Dataset
- Paper
- Paper
- Paper
- Paper
-
Math <a id="math02"></a>
- Paper - school-math) | [Dataset](https://github.com/openai/grade-school-math)**
- Paper - school-math) | [Dataset](https://github.com/openai/grade-school-math)**
- Paper
- Paper - asdiv-dataset) | [Dataset](https://huggingface.co/datasets/EleutherAI/asdiv)**
- Paper
- Paper
- Paper
- Paper - qa.github.io/math-QA/)**
- Paper - deepmind/AQuA) | [Dataset](https://huggingface.co/datasets/aqua_rat)**
- Paper
- Paper
- Paper
- Paper
- Paper - qa.github.io/math-QA/)**
- Paper
-
Text Classification
-
Coreference Resolution
-
Sentiment Analysis
-
Semantic Matching
- Paper
- Paper - research-datasets/paws) | [Dataset](https://huggingface.co/datasets/paws)**
- Paper - multi-mt) | [Dataset](https://huggingface.co/datasets/stsb_multi_mt) | [Website](https://ixa2.si.ehu.eus/stswiki/index.php/STSbenchmark)**
- Paper
- Paper
- Paper - research-datasets/paws/tree/master/pawsx)**
- Paper
- Paper
- Paper - multi-mt) | [Dataset](https://huggingface.co/datasets/stsb_multi_mt) | [Website](https://ixa2.si.ehu.eus/stswiki/index.php/STSbenchmark)**
- Paper - research-datasets/paws/tree/master/pawsx)**
- Paper
-
Text Generation
- Paper - USC/CommonGen) | [Dataset](https://huggingface.co/datasets/common_gen)**
- Paper - LILY/dart) | [Dataset](https://huggingface.co/datasets/dart)**
- Paper - dataset) | [Dataset](https://huggingface.co/datasets/e2e_nlg?row=0)**
- Paper - dataset) | [Dataset](https://huggingface.co/datasets/web_nlg)**
- Paper - dataset) | [Dataset](https://huggingface.co/datasets/e2e_nlg?row=0)**
-
Text Translation
-
Text Summarization
-
Named Entity Recognition
-
Text Quality Evaluation
- Paper - mll.github.io/CoLA/)**
- Paper1 - 6820.pdf) | [Paper3](https://aclanthology.org/W15-3106.pdf) | [Dataset1](http://ir.itc.ntnu.edu.tw/lre/sighan7csc.html) | [Dataset2](http://ir.itc.ntnu.edu.tw/lre/clp14csc.html) | [Dataset3](http://ir.itc.ntnu.edu.tw/lre/sighan8csc.html)**
- Paper
- Paper - ime)**
-
Text-to-Code
-
Relation Extraction
-
Multitask <a id="multitask02"></a>
-
-
Multi-modal Large Language Models (MLLMs) Datasets <a id="multi-modal-large-language-models-mllms-datasets"></a>
-
Instruction Fine-tuning Datasets <a id="instruction02"></a>
- https://github.com/wivizhang/EarthGPT
- https://github.com/nttmdlab-nlp/InstructDoc
- https://github.com/FreedomIntelligence/ALLaVA
- https://huggingface.co/datasets/OpenGVLab/VideoChat2-IT
- InstructDoc: A Dataset for Zero-Shot Generalization of Visual Document Understanding with Instructions
- ALLaVA: Harnessing GPT4V-synthesized Data for A Lite Vision-Language Model
- https://huggingface.co/datasets/FreedomIntelligence/ALLaVA-4V
-
Evaluation Datasets <a id="evaluation02"></a>
- https://github.com/OpenBMB/OlympiadBench
- https://github.com/MMMU-Benchmark/MMMU
- https://github.com/OpenGVLab/MMT-Bench
- https://github.com/OpenGVLab/MM-NIAH
- https://github.com/OpenGVLab/Ask-Anything/tree/main/video_chat2
- https://huggingface.co/datasets/OpenGVLab/MVBench
- OlympiadBench: A Challenging Benchmark for Promoting AGI with Olympiad-Level Bilingual Multimodal Scientific Problems
- MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI
- https://huggingface.co/datasets/MMMU/MMMU
- MVBench: A Comprehensive Multi-modal Video Understanding Benchmark
- https://github.com/thu-ml/MMTrustEval
- https://multi-trust.github.io/#leaderboard
- MultiMed: Massively Multimodal and Multitask Medical Understanding
- https://github.com/UCSC-VLAA/MedTrinity-25M
- https://huggingface.co/datasets/UCSC-VLAA/MedTrinity-25M
- https://yunfeixie233.github.io/MedTrinity-25M/
- MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models
- https://github.com/OpenGVLab/MMIU
- https://huggingface.co/datasets/FanqingM/MMIU-Benchmark
- https://mmiu-bench.github.io/
- https://huggingface.co/datasets/Kaining/MMT-Bench
- Needle In A Multimodal Haystack
-
Pre-training Corpora <a id="mllmpre"></a>
- OBELICS: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents
- https://github.com/huggingface/OBELICS
- https://huggingface.co/datasets/HuggingFaceM4/OBELICS
- mOSCAR: A Large-scale Multilingual and Multimodal Document-level Corpus
- https://oscar-project.github.io/documentation/versions/mOSCAR/
-
-
Paper
Programming Languages
Categories
Evaluation Datasets
224
Traditional NLP Datasets
188
Instruction Fine-tuning Datasets
127
Pre-training Corpora
74
Multi-modal Large Language Models (MLLMs) Datasets <a id="multi-modal-large-language-models-mllms-datasets"></a>
34
Changelog
28
Preference Datasets
25
Retrieval Augmented Generation (RAG) Datasets <a id="retrieval-augmented-generation-rag-datasets"></a>
19
Paper
1
Sub Categories
Question Answering
88
General Instruction Fine-tuning Datasets
85
General Pre-training Corpora
67
Domain-specific Instruction Fine-tuning Datasets
42
Evaluation Datasets <a id="evaluation02"></a>
41
Subject
31
Preference Evaluation Methods
25
Factuality
17
General
15
Math <a id="math02"></a>
15
Code <a id="code03"></a>
15
Evaluation Platform
14
Recognizing Textual Entailment
14
Long Text
13
Text Summarization
13
Law
13
Reasoning
13
Multitask <a id="multitask01"></a>
12
NLU
12
Other <a id="other04"></a>
12
Semantic Matching
11
Named Entity Recognition
9
Knowledge
8
Financial <a id="financial02"></a>
7
Social Norms
7
Medical <a id="medical03"></a>
7
Instruction Fine-tuning Datasets <a id="instruction02"></a>
7
Tool
7
Domain-specific Pre-training Corpora
7
Text Classification
6
Coreference Resolution
6
Exam
6
Evaluation
6
Text Generation
5
Pre-training Corpora <a id="mllmpre"></a>
5
OOD
4
Text-to-Code
4
Text Quality Evaluation
4
Multilingual
4
Relation Extraction
4
Text Translation
3
Multitask <a id="multitask02"></a>
3
Sentiment Analysis
3
Agent
1
Keywords
llm
7
large-language-models
5
benchmark
5
llama
5
chatgpt
4
gpt
4
nlp
3
alpaca
3
machine-learning
2
multimodal
2
evaluation
2
gpt-4
2
language-model
2
long-context
2
deep-learning
2
dataset
2
chatglm
2
natural-language-processing
2
chinese-nlp
2
chinese
2
instruction-tuning
1
fact-checking
1
generative-ai
1
fine-tuning
1
zephyr
1
qwen2
1
qwen
1
python
1
qlora
1
chinese-language
1
chinese-simplified
1
corpus-data
1
nlp-machine-learning
1
finacial
1
instruction-following
1
aquila
1
baichuan
1
gemma
1
internlm
1
llama2
1
llama3
1
lora
1
minicpm
1
mistral
1
mixtral
1
peft
1
visual-question-answering
1
multimodal-large-language-models
1
vision-language-model
1
retrieval-augmented-generation
1