An open API service indexing awesome lists of open source software.

Awesome-LLMs-Datasets

Summarize existing representative LLMs text datasets.
https://github.com/lmmlzn/Awesome-LLMs-Datasets

Last synced: 1 day ago
JSON representation

  • Pre-training Corpora

    • General Pre-training Corpora

    • Domain-specific Pre-training Corpora

      • Github
      • Paper - web-math/open-web-math)**
      • Paper - DI/XuanYuan) | [Dataset](https://huggingface.co/datasets/Duxiaoman-DI/FinCorpus)**
      • Paper - lm) | [Dataset](https://huggingface.co/datasets/EleutherAI/proof-pile-2) | [Website](https://blog.eleuther.ai/llemma/)**
      • Paper - lm) | [Dataset](https://huggingface.co/datasets/EleutherAI/proof-pile-2) | [Website](https://blog.eleuther.ai/llemma/)**
      • Paper - DI/XuanYuan) | [Dataset](https://huggingface.co/datasets/Duxiaoman-DI/FinCorpus)**
      • Paper - FinCUGE-Applications) | [Website](https://bbt.ssymmetry.com/index.html)**
  • Changelog

    • NAH (Needle-in-a-Haystack)
    • IEPile - tuning Datasets | General Instruction Fine-tuning Datasets | CI); **[InstructIE](https://arxiv.org/abs/2305.11527)** (Instruction Fine-tuning Datasets | General Instruction Fine-tuning Datasets | HG).
    • SlimPajama - training Corpora | General Pre-training Corpora | Multi-category); **[MassiveText](https://arxiv.org/abs/2112.11446)** (Pre-training Corpora | General Pre-training Corpora | Multi-category); **[MADLAD-400](https://arxiv.org/abs/2309.04662)** (Pre-training Corpora | General Pre-training Corpora | Webpages); **[Minerva](https://arxiv.org/abs/2206.14858)** (Pre-training Corpora | General Pre-training Corpora | Multi-category); **[CCAligned](https://aclanthology.org/2020.emnlp-main.480/)** (Pre-training Corpora | General Pre-training Corpora | Parallel Corpus); **[WikiMatrix](https://aclanthology.org/2021.eacl-main.115/)** (Pre-training Corpora | General Pre-training Corpora | Parallel Corpus); **[OpenWebMath](https://arxiv.org/abs/2310.06786)** (Pre-training Corpora | Domain-specific Pre-training Corpora | Math).
    • ALCE
    • CLUE Benchmark Series - MTEB Leaderboard](https://github.com/FlagOpen/FlagEmbedding/tree/master/C_MTEB)** (Evaluation Datasets | Evaluation Platform).
    • MathPile - training Corpora | Domain-specific Pre-training Corpora | Math); **[WanJuan-CC](https://arxiv.org/abs/2402.19282)** (Pre-training Corpora | General Pre-training Corpora | Webpages).
    • WebQuestions
    • CRUD-RAG - Instruct-Benchmark-Tester](https://huggingface.co/datasets/llmware/rag_instruct_benchmark_tester)** (RAG Datasets); **[ARES](https://arxiv.org/abs/2311.09476)** (RAG Datasets).
    • MMRS-1M - tuning Datasets); **[VideoChat2-IT](https://arxiv.org/abs/2311.17005)** (MLLMs Datasets | Instruction Fine-tuning Datasets); **[InstructDoc](https://arxiv.org/abs/2401.13313)** (MLLMs Datasets | Instruction Fine-tuning Datasets); **[ALLaVA-4V Data](https://arxiv.org/abs/2402.11684)** (MLLMs Datasets | Instruction Fine-tuning Datasets); **[MVBench](https://arxiv.org/abs/2311.17005)** (MLLMs Datasets | Evaluation Datasets); **[OlympiadBench](https://arxiv.org/abs/2402.14008)** (MLLMs Datasets | Evaluation Datasets); **[MMMU](https://arxiv.org/abs/2311.16502)** (MLLMs Datasets | Evaluation Datasets).
    • GameBench - 1.325/)** (Evaluation Datasets | NLU); **[SarcasmBench](https://arxiv.org/abs/2408.11319)** (Evaluation Datasets | NLU); **[C<sup>3</sup> Bench](https://arxiv.org/abs/2405.17732)** (Evaluation Datasets | Subject); **[TableBench](https://www.arxiv.org/abs/2408.09174)** (Evaluation Datasets | Reasoning); **[ArabLegalEval](https://www.arxiv.org/abs/2408.07983)** (Evaluation Datasets | Law).
    • MultiTrust - training Corpora); **[MultiMed](https://www.arxiv.org/abs/2408.12682)** (MLLMs Datasets | Evaluation Datasets).
    • Lithuanian-QA-v1 - tuning Datasets | General Instruction Fine-tuning Datasets | CI & MC); **[REInstruct](https://www.arxiv.org/abs/2408.10663)** (Instruction Fine-tuning Datasets | General Instruction Fine-tuning Datasets | HG & CI & MC); **[KoLLM-Converations](https://huggingface.co/datasets/davidkim205/kollm-converations)** (Instruction Fine-tuning Datasets | General Instruction Fine-tuning Datasets | CI).
    • OpenMathInstruct-1 - tuning Datasets | Domain-specific Instruction Fine-tuning Datasets | Math); **[FinBen](https://arxiv.org/abs/2402.12659)** (Evaluation Datasets | Financial).
    • Dolma - training Corpora | General Pre-training Corpora | Multi-category).
    • LongWriter-6K - tuning Datasets | General Instruction Fine-tuning Datasets | CI & MC).
    • MedTrinity-25M
    • DebateQA - acl.334/)** (Evaluation Datasets | Subject); **[PersianMMLU](https://arxiv.org/abs/2404.06644)** (Evaluation Datasets | Subject); **[TMMLU+](https://arxiv.org/abs/2403.01858)** (Evaluation Datasets | Subject).
    • Expository-Prose-V1 - training Corpora | General Pre-training Corpora | Multi-category).
    • RAGEval - RAG](https://arxiv.org/abs/2401.15391)** (RAG Datasets).
    • Aya Collection - tuning Datasets | General Instruction Fine-tuning Datasets | HG & CI & MC); **[Aya Dataset](https://arxiv.org/abs/2402.06619)** (Instruction Fine-tuning Datasets | General Instruction Fine-tuning Datasets | HG).
    • AlphaFin - tuning Datasets | Domain-specific Instruction Fine-tuning Datasets | Other); **[COIG-CQIA](https://arxiv.org/abs/2403.18058)** (Instruction Fine-tuning Datasets | General Instruction Fine-tuning Datasets | HG & CI).
    • SlimPajama - training Corpora | General Pre-training Corpora | Multi-category); **[MassiveText](https://arxiv.org/abs/2112.11446)** (Pre-training Corpora | General Pre-training Corpora | Multi-category); **[MADLAD-400](https://arxiv.org/abs/2309.04662)** (Pre-training Corpora | General Pre-training Corpora | Webpages); **[Minerva](https://arxiv.org/abs/2206.14858)** (Pre-training Corpora | General Pre-training Corpora | Multi-category); **[CCAligned](https://aclanthology.org/2020.emnlp-main.480/)** (Pre-training Corpora | General Pre-training Corpora | Parallel Corpus); **[WikiMatrix](https://aclanthology.org/2021.eacl-main.115/)** (Pre-training Corpora | General Pre-training Corpora | Parallel Corpus); **[OpenWebMath](https://arxiv.org/abs/2310.06786)** (Pre-training Corpora | Domain-specific Pre-training Corpora | Math).
    • MAP-CC - training Corpora | General Pre-training Corpora | Multi-category); **[FineWeb](https://huggingface.co/datasets/HuggingFaceFW/fineweb)** (Pre-training Corpora | General Pre-training Corpora | Webpages); **[CCI 2.0](https://huggingface.co/datasets/BAAI/CCI2-Data)** (Pre-training Corpora | General Pre-training Corpora | Webpages).
    • CLUE - Bench](https://arxiv.org/abs/2404.04167)** (Evaluation Datasets | General); **[CIF-Bench](https://arxiv.org/abs/2402.13109)** (Evaluation Datasets | General); **[ACLUE](https://aclanthology.org/2023.alp-1.9/)** (Evaluation Datasets | Subject); **[LeSC](https://arxiv.org/abs/2405.05741)** (Evaluation Datasets | NLU); **[AlignBench](https://arxiv.org/abs/2311.18743)** (Evaluation Datasets | Multitask); **[SciKnowEval](https://arxiv.org/abs/2406.09098)** (Evaluation Datasets | Subject).
    • WildChat - tuning Datasets | General Instruction Fine-tuning Datasets | MC).
    • OpenHermesPreferences - SCIR/huozi/blob/main/data/huozi-rlhf/huozi_rlhf_data.csv)** (Preference Datasets | Vote); **[HelpSteer](https://arxiv.org/abs/2311.09528)** (Preference Datasets | Score); **[HelpSteer2](https://arxiv.org/abs/2406.08673)** (Preference Datasets | Score).
    • MMT-Bench - training Corpora); **[MM-NIAH](https://arxiv.org/abs/2406.07230)** (MLLMs Datasets | Evaluation Datasets).
    • CRAG
    • Future-Idea-Generation
    • MME-RealWorld - Bench](https://arxiv.org/abs/2406.05862)** (MLLMs Datasets | Evaluation Datasets); **[CII-Bench](https://arxiv.org/abs/2410.13854)** (MLLMs Datasets | Evaluation Datasets); **[ALM-Bench](https://arxiv.org/abs/2411.16508)** (MLLMs Datasets | Evaluation Datasets).
    • MaLA - training Corpora | General Pre-training Corpora | Multi-category); **[CCI3.0-HQ](https://arxiv.org/abs/2410.18505)** (Pre-training Corpora | General Pre-training Corpora | Multi-category); **[GlotCC](https://arxiv.org/abs/2410.23825)** (Pre-training Corpora | General Pre-training Corpora | Webpages); **[ChineseWebText 2.0](https://arxiv.org/abs/2411.19668)** (Pre-training Corpora | General Pre-training Corpora | Webpages); **[ChineseWebText 1.0](https://arxiv.org/abs/2311.01149)** (Pre-training Corpora | General Pre-training Corpora | Webpages); **[SkyPile](https://arxiv.org/abs/2310.19341)** (Pre-training Corpora | General Pre-training Corpora | Webpages).
    • ViDoRe - BEIR](https://arxiv.org/abs/2311.17136)** (RAG Datasets); **[MRAG-Bench](https://arxiv.org/abs/2410.08182)** (RAG Datasets).
    • SlimOrca - tuning Datasets | General Instruction Fine-tuning Datasets | CI & MC); **[GPTeacher](https://huggingface.co/datasets/teknium/GPTeacher-General-Instruct)** (Instruction Fine-tuning Datasets | General Instruction Fine-tuning Datasets | MC); **[OrcaMathWordProblems](https://arxiv.org/abs/2402.14830)** (Instruction Fine-tuning Datasets | Domain-specific Instruction Fine-tuning Datasets | Math); **[MathInstruct](https://arxiv.org/abs/2309.05653)** (Instruction Fine-tuning Datasets | Domain-specific Instruction Fine-tuning Datasets | Math); **[MetaMathQA](https://arxiv.org/abs/2309.12284)** (Instruction Fine-tuning Datasets | Domain-specific Instruction Fine-tuning Datasets | Math); **[Magicoder-OSS-Instruct-75K](https://arxiv.org/abs/2312.02120)** (Instruction Fine-tuning Datasets | Domain-specific Instruction Fine-tuning Datasets | Code).
    • UltraInteract
    • GPQA - Wild](https://arxiv.org/abs/2403.04307)** (Evaluation Datasets | Factuality); **[CMATH](https://arxiv.org/abs/2306.16636)** (Evaluation Datasets | Subject); **[FineMath](https://arxiv.org/abs/2403.07747)** (Evaluation Datasets | Subject); **[RealTime QA](https://arxiv.org/abs/2207.13332)** (Evaluation Datasets | Factuality); **[WYWEB](https://aclanthology.org/2023.findings-acl.204/)** (Evaluation Datasets | Subject); **[ChineseFactEval](https://gair-nlp.github.io/ChineseFactEval/)** (Evaluation Datasets | Factuality); **[Counting-Stars](https://arxiv.org/abs/2403.11802)** (Evaluation Datasets | Long Text).
  • Instruction Fine-tuning Datasets

    • General Instruction Fine-tuning Datasets

      • Github
      • Github - KOL)**
      • Github
      • Github
      • Github - Chinese-English-90k)**
      • Paper
      • Github - train-1.1M)**
      • Github - QA-B)**
      • Github - road/Wizard-LM-Chinese-instruct-evol)**
      • Paper
      • Paper - a-p/COIG-CQIA)**
      • Paper - ai/camel) | [Dataset](https://huggingface.co/camel-ai) | [Website](https://www.camel-ai.org/)**
      • Paper
      • Paper - baize/baize-chatbot) | [Dataset](https://github.com/project-baize/baize-chatbot/tree/main/data)**
      • Dataset - first-open-commercially-viable-instruction-tuned-llm)**
      • Paper - coai/CDial-GPT)**
      • Paper
      • Dataset
      • Github
      • Paper - ai/camel) | [Dataset](https://huggingface.co/camel-ai) | [Website](https://www.camel-ai.org/)**
      • Github
      • Paper - chat-1m)**
      • Paper - instruct)**
      • Website
      • Paper
      • Paper - USC/CrossFit)**
      • Paper
      • Paper - it.github.io/)**
      • Paper - lab/flacuna) | [Dataset](https://huggingface.co/datasets/declare-lab/flan-mini)**
      • Paper - research/flan)**
      • Paper - research/FLAN/tree/main/flan/v2) | [Dataset](https://huggingface.co/datasets/SirNeural/flan_v2)**
      • Dataset
      • Paper
      • Paper - instructions)**
      • Paper - qa)**
      • Paper
      • Paper - instructions) | [Dataset](https://instructions.apps.allenai.org/)**
      • Dataset
      • Paper - bAInd/Open-Platypus) | [Website](https://platypus-llm.github.io/)**
      • Paper
      • Paper - workshop/promptsource)**
      • Paper - instructions)**
      • Paper
      • Paper - ai/UnifiedSKG)**
      • Paper - workshop/xmtf)**
      • Paper
      • Paper
      • Paper - Tuning-with-GPT-4/GPT-4-LLM#data-release)**
      • Paper - nlp/bactrian-x) | [Dataset](https://huggingface.co/datasets/MBZUAI/Bactrian-X)**
      • Paper - baize/baize-chatbot) | [Dataset](https://github.com/project-baize/baize-chatbot/tree/main/data)**
      • Paper - ai/gpt4all) | [Dataset](https://huggingface.co/datasets/QingyiSi/Alpaca-CoT/tree/main/GPT4all)**
      • Dataset - model.github.io/)**
      • Paper - nlp/LaMini-LM) | [Dataset](https://huggingface.co/datasets/MBZUAI/LaMini-instruction)**
      • Paper
      • Paper
      • Paper - Orca/OpenOrca)**
      • Paper
      • Paper - SimpleAI/chatgpt-comparison-detection) | [Dataset1](https://huggingface.co/datasets/Hello-SimpleAI/HC3) | [Dataset2](https://huggingface.co/datasets/Hello-SimpleAI/HC3-Chinese)**
      • Paper - sft-data-v1)**
      • Paper - coai/CDial-GPT)**
      • Paper
      • Paper - it.github.io/)**
      • Paper - bAInd/Open-Platypus) | [Website](https://platypus-llm.github.io/)**
      • Paper - nlp/bactrian-x) | [Dataset](https://huggingface.co/datasets/MBZUAI/Bactrian-X)**
      • Paper - nlp/LaMini-LM) | [Dataset](https://huggingface.co/datasets/MBZUAI/LaMini-instruction)**
      • Paper
      • Paper - USC/CrossFit)**
      • Paper
      • Paper
      • Paper
      • Paper - chat-1m)**
      • Paper
      • Paper - qa)**
      • Dataset
      • Paper
      • Paper - lab/flacuna) | [Dataset](https://huggingface.co/datasets/declare-lab/flan-mini)**
      • Paper
      • Paper - Orca/OpenOrca)**
      • Paper - sft-data-v1)**
      • Paper - research/flan)**
      • Paper - ai/UnifiedSKG)**
      • Paper - research/FLAN/tree/main/flan/v2) | [Dataset](https://huggingface.co/datasets/SirNeural/flan_v2)**
      • Paper
    • Domain-specific Instruction Fine-tuning Datasets

      • Github
      • Github
      • Github
      • Github - nlp/HanFei)**
      • Github - GPT#数据集构建)**
      • Github
      • Github
      • Paper - dialog)**
      • Github - Lab/TransGPT-sft)**
      • Paper - LawLLM) | [Website](https://law.fudan-disc.com)**
      • Paper - llama) | [Dataset](https://github.com/AndrewZhe/lawyer-llama/tree/main/data)**
      • Paper - YF/MWPToolkit) | [Dataset](https://huggingface.co/datasets/Macropodus/MWP-Instruct)**
      • Github
      • Paper - Li/ChatDoctor) | [Dataset](https://github.com/Kent0n-Li/ChatDoctor)**
      • Paper
      • Paper - MedLLM) | [Dataset](https://huggingface.co/datasets/Flmc/DISC-Med-SFT) | [Website](https://med.fudan-disc.com)**
      • Paper - sft-data-v1)**
      • Paper - AI4H/Medical-Dialogue-System)**
      • Paper
      • Paper - deepmind/code_contests)**
      • Paper
      • Paper - sft-data-v1)**
      • Paper
      • Paper - deepmind/code_contests)**
      • Paper
      • Paper
      • Paper
      • Paper - nlp/EduChat) | [Dataset](https://huggingface.co/datasets/ecnu-icalk/educhat-sft-002-data-osm)**
      • Paper
      • Paper
      • Paper
      • Paper - LawLLM) | [Website](https://law.fudan-disc.com)**
      • Paper - llama) | [Dataset](https://github.com/AndrewZhe/lawyer-llama/tree/main/data)**
      • Paper
      • Paper - YF/MWPToolkit) | [Dataset](https://huggingface.co/datasets/Macropodus/MWP-Instruct)**
      • Paper - nlp/EduChat) | [Dataset](https://huggingface.co/datasets/ecnu-icalk/educhat-sft-002-data-osm)**
      • Paper - FinLLM) | [Website](https://fin.fudan-disc.com)**
      • Paper
      • Paper - Instructions) | [Dataset](https://huggingface.co/datasets/zjunlp/Mol-Instructions)**
      • Paper - dialog)**
      • Paper - Instructions) | [Dataset](https://huggingface.co/datasets/zjunlp/Mol-Instructions)**
  • Preference Datasets

    • Preference Evaluation Methods

      • Paper
      • Github - SCIR/huozi/blob/main/data/huozi-rlhf/huozi_rlhf_data.csv)**
      • Github
      • Paper - Alignment/safe-rlhf) | [Dataset](https://huggingface.co/datasets/PKU-Alignment/PKU-SafeRLHF)**
      • Github
      • Paper - sys/FastChat/tree/main/fastchat/llm_judge) | [Dataset](https://huggingface.co/datasets/lmsys/mt_bench_human_judgments) | [Website](https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard)**
      • Paper - PLUG/CValues) | [Dataset](https://www.modelscope.cn/datasets/damo/CValues-Comparison/summary)**
      • Paper - templar/Stable-Alignment)**
      • Paper
      • Paper - sys/FastChat/tree/main/fastchat/llm_judge) | [Dataset](https://huggingface.co/datasets/lmsys/mt_bench_human_judgments) | [Website](https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard)**
      • Paper1 - rlhf) | [Dataset](https://huggingface.co/datasets/Anthropic/hh-rlhf)**
      • Paper - Alignment/safe-rlhf) | [Dataset](https://huggingface.co/datasets/PKU-Alignment/PKU-SafeRLHF)**
      • Paper
      • Dataset
      • Paper
      • Paper - PLUG/CValues) | [Dataset](https://www.modelscope.cn/datasets/damo/CValues-Comparison/summary)**
      • Dataset
      • Paper - exchange-preferences)**
      • Paper
      • Paper - templar/Stable-Alignment)**
      • Paper
      • Paper
      • Paper
      • Paper - Aligner) | [Dataset](https://huggingface.co/datasets/nvidia/HelpSteer2)**
  • Evaluation Datasets

    • General

      • Paper - sys/FastChat/tree/main/fastchat/llm_judge) | [Website](https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard)**
      • Github - sys/vicuna-blog-eval/tree/main/eval) | [Website](https://lmsys.org/blog/2023-03-30-vicuna/)**
      • Paper - lab/alpaca_eval) | [Dataset](https://huggingface.co/datasets/tatsu-lab/alpaca_eval) | [Website](https://tatsu-lab.github.io/alpaca_eval/)**
      • Paper - 80) | [Dataset](https://github.com/ictnlp/BayLing/tree/main/data/BayLing-80)**
      • Paper
      • Paper
      • Paper
      • Paper - Baichuan-MLSystemLab/SysBench) | [Dataset](https://github.com/PKU-Baichuan-MLSystemLab/SysBench)**
      • Paper - lab/alpaca_eval) | [Dataset](https://huggingface.co/datasets/tatsu-lab/alpaca_eval) | [Website](https://tatsu-lab.github.io/alpaca_eval/)**
      • Paper - 80) | [Dataset](https://github.com/ictnlp/BayLing/tree/main/data/BayLing-80)**
      • Paper
      • Paper
      • Paper - sys/FastChat/tree/main/fastchat/llm_judge) | [Website](https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard)**
      • Paper
      • Paper - Bench) | [Website](https://yizhilll.github.io/CIF-Bench/)**
    • Subject

      • Github - clue)**
      • Paper - lab/M3KE) | [Dataset](https://huggingface.co/datasets/TJUNLP/M3KE)**
      • Paper
      • Paper
      • Paper
      • Paper
      • Paper
      • Paper
      • Paper
      • Paper
      • Paper
      • Paper - LIT/ceval) | [Dataset](https://huggingface.co/datasets/ceval/ceval-exam) | [Website](https://cevalbenchmark.com/)**
      • Paper - li/CMMLU) | [Dataset](https://huggingface.co/datasets/haonan-li/cmmlu)**
      • Paper
      • Paper - LIT/ceval) | [Dataset](https://huggingface.co/datasets/ceval/ceval-exam) | [Website](https://cevalbenchmark.com/)**
      • Paper - Eval) | [Dataset](https://huggingface.co/datasets/Besteasy/CG-Eval) | [Website](http://cgeval.besteasy.com/)**
      • Github
      • Paper
      • Paper
      • Paper - lab/M3KE) | [Dataset](https://huggingface.co/datasets/TJUNLP/M3KE)**
      • Paper
      • Paper
      • Paper
      • Paper
      • Paper - li/CMMLU) | [Dataset](https://huggingface.co/datasets/haonan-li/cmmlu)**
      • Paper
      • Paper - nlp/ArabicMMLU) | [Dataset](https://huggingface.co/datasets/MBZUAI/ArabicMMLU)**
      • Paper - center)**
      • Paper - zhang/aclue) | [Dataset](https://huggingface.co/datasets/tyouisen/aclue)**
      • Paper - zju/sciknoweval) | [Dataset](https://huggingface.co/datasets/hicai-zju/SciKnowEval)**
      • Paper
    • Long Text

      • Github - 06-29-longchat/)**
      • Github
      • Paper - nlp/zero_scrolls) | [Dataset](https://huggingface.co/datasets/tau/zero_scrolls) | [Website](https://www.zero.scrolls-benchmark.com/)**
      • Paper
      • Paper - stars) | [Dataset](https://github.com/nick7nlp/counting-stars)**
      • Paper
      • Paper - nlco/LooGLE) | [Dataset](https://huggingface.co/datasets/bigainlco/LooGLE)**
      • Paper
      • Paper
      • Paper - nlp/zero_scrolls) | [Dataset](https://huggingface.co/datasets/tau/zero_scrolls) | [Website](https://www.zero.scrolls-benchmark.com/)**
      • Paper - nlco/LooGLE) | [Dataset](https://huggingface.co/datasets/bigainlco/LooGLE)**
      • Paper - compass/opencompass)**
    • Evaluation Platform

    • Medical <a id="medical03"></a>

    • Social Norms

      • Github
      • Paper - mll/crows-pairs)**
      • Paper - coai/SafetyBench) | [Dataset](https://huggingface.co/datasets/thu-coai/SafetyBench) | [Website](https://llmbench.ai/safety)**
      • Paper - coai/Safety-Prompts) | [Dataset](https://github.com/thu-coai/Safety-Prompts) | [Website](http://115.182.62.166:18000/)**
      • Paper
      • Paper - coai/SafetyBench) | [Dataset](https://huggingface.co/datasets/thu-coai/SafetyBench) | [Website](https://llmbench.ai/safety)**
      • Paper
    • Factuality

    • Multitask <a id="multitask01"></a>

      • Github
      • Paper - Lab/CLEVA) | [Website](http://www.lavicleva.com/#/homepage/overview)**
      • Paper - Bench-Hard)**
      • Paper - bench)**
      • Paper - Lab/CLEVA) | [Website](http://www.lavicleva.com/#/homepage/overview)**
      • Paper
      • Paper - crfm/helm) | [Website](https://crfm.stanford.edu/helm/latest/)**
      • Paper
      • Paper
      • Paper
    • Tool

      • Paper - Ye/ToolEyes) | [Datasets](https://github.com/Junjie-Ye/ToolEyes)**
      • Paper - ConvAI/tree/main/api-bank)**
      • Paper
      • Paper - llm/APIBench) | [Website](https://gorilla.cs.berkeley.edu/)**
      • Paper - ConvAI/tree/main/api-bank)**
      • Paper - llm/APIBench) | [Website](https://gorilla.cs.berkeley.edu/)**
      • Paper
    • Multilingual

      • Paper - research/url-nlp) | [Dataset](https://huggingface.co/datasets/juletxara/mgsm)**
      • Paper
      • Paper - research/xtreme) | [Website](https://sites.research.google/xtreme)**
      • Paper - research/xtreme) | [Website](https://sites.research.google/xtreme)**
    • Reasoning

    • Knowledge

      • Paper - KEG/KoLA) | [Website](http://103.238.162.37:31622/)**
      • Paper - pku/ALCUNA) | [Dataset](https://github.com/Arvid-pku/ALCUNA)**
      • Paper
      • Paper - pku/ALCUNA) | [Dataset](https://github.com/Arvid-pku/ALCUNA)**
      • Paper - KEG/KoLA) | [Website](http://103.238.162.37:31622/)**
      • Paper
      • Paper
      • Paper
    • Code <a id="code03"></a>

      • Paper
      • Paper
      • Paper - project/octopack) | [Dataset](https://huggingface.co/datasets/bigcode/humanevalpack)**
      • Paper
      • Paper - ai/DS-1000) | [Dataset](https://github.com/xlang-ai/DS-1000/tree/main/ds1000_example) | [Website](https://ds1000-code-gen.github.io/)**
      • Paper - ConvAI/tree/main/bird) | [Dataset](https://bird-bench.github.io/) | [Website](https://bird-bench.github.io/)**
      • Paper
      • Paper - ai/DS-1000) | [Dataset](https://github.com/xlang-ai/DS-1000/tree/main/ds1000_example) | [Website](https://ds1000-code-gen.github.io/)**
      • Paper - eval)**
      • Paper
      • Paper
      • Paper
      • Paper - project/octopack) | [Dataset](https://huggingface.co/datasets/bigcode/humanevalpack)**
      • Paper
      • Paper - eval)**
    • OOD

      • Paper - yuan/OOD_NLP)**
      • Paper - X) | [Dataset](https://github.com/YangLinyi/GLUE-X)**
      • Paper - yuan/OOD_NLP)**
      • Paper - X) | [Dataset](https://github.com/YangLinyi/GLUE-X)**
    • NLU

    • Law

    • Other <a id="other04"></a>

      • Paper
      • Paper
      • Paper - Guo/Owl)**
      • Paper - NLP/EcomGPT)**
      • Paper - bench) | [Dataset](https://github.com/xingyaoww/mint-bench/blob/main/docs/DATA.md) | [Website](https://xingyaoww.github.io/mint-bench/)**
      • Paper
      • Paper - ARISE/EmotionBench)**
      • Paper - Guo/Owl)**
      • Paper - ARISE/EmotionBench)**
      • Paper - bench) | [Dataset](https://github.com/xingyaoww/mint-bench/blob/main/docs/DATA.md) | [Website](https://xingyaoww.github.io/mint-bench/)**
      • Paper
    • Exam

    • Financial <a id="financial02"></a>

      • Paper - FinCUGE-Applications) | [Website](https://bbt.ssymmetry.com/index.html)**
      • Paper - FinAI)**
      • Github
      • Paper - AIFLM-Lab/FinEval) | [Dataset](https://huggingface.co/datasets/SUFE-AIFLM-Lab/FinEval) | [Website](https://fineval.readthedocs.io/en/latest/index.html)**
      • Paper - nlp.github.io/FLANG/)**
      • Paper - AIFLM-Lab/FinEval) | [Dataset](https://huggingface.co/datasets/SUFE-AIFLM-Lab/FinEval) | [Website](https://fineval.readthedocs.io/en/latest/index.html)**
      • Paper - nlp.github.io/FLANG/)**
    • Agent

    • Evaluation

      • Paper - Eval/FairEval) | [Dataset](https://github.com/i-Eval/FairEval)**
      • Paper - ConvAI/tree/main/WideDeep) | [Dataset](https://github.com/AlibabaResearch/DAMO-ConvAI/tree/main/WideDeep)**
      • Paper
      • Paper - Eval/FairEval) | [Dataset](https://github.com/i-Eval/FairEval)**
      • Paper - ConvAI/tree/main/WideDeep) | [Dataset](https://github.com/AlibabaResearch/DAMO-ConvAI/tree/main/WideDeep)**
      • Paper
  • Retrieval Augmented Generation (RAG) Datasets <a id="retrieval-augmented-generation-rag-datasets"></a>

  • Traditional NLP Datasets

    • Question Answering

    • Recognizing Textual Entailment

      • Github
      • Paper
      • Paper1 - Second-PASCAL-Recognising-Textual-Entailment-Bar-Haim-Dagan/136326377c122560768db674e35f5bcd6de3bc40) | [Paper3](https://dl.acm.org/doi/pdf/10.5555/1654536.1654538) | [Paper4](https://tac.nist.gov/publications/2009/additional.papers/RTE5_overview.proceedings.pdf) | [Dataset](https://huggingface.co/datasets/glue/viewer/rte/train)**
      • Paper
      • Paper
      • Paper
      • Paper
      • Paper
      • Paper
      • Dataset
      • Paper
      • Paper
      • Paper
      • Paper
    • Math <a id="math02"></a>

      • Paper - school-math) | [Dataset](https://github.com/openai/grade-school-math)**
      • Paper - school-math) | [Dataset](https://github.com/openai/grade-school-math)**
      • Paper
      • Paper - asdiv-dataset) | [Dataset](https://huggingface.co/datasets/EleutherAI/asdiv)**
      • Paper
      • Paper
      • Paper
      • Paper - qa.github.io/math-QA/)**
      • Paper - deepmind/AQuA) | [Dataset](https://huggingface.co/datasets/aqua_rat)**
      • Paper
      • Paper
      • Paper
      • Paper
      • Paper - qa.github.io/math-QA/)**
      • Paper
    • Text Classification

    • Coreference Resolution

    • Sentiment Analysis

    • Semantic Matching

      • Paper
      • Paper - research-datasets/paws) | [Dataset](https://huggingface.co/datasets/paws)**
      • Paper - multi-mt) | [Dataset](https://huggingface.co/datasets/stsb_multi_mt) | [Website](https://ixa2.si.ehu.eus/stswiki/index.php/STSbenchmark)**
      • Paper
      • Paper
      • Paper - research-datasets/paws/tree/master/pawsx)**
      • Paper
      • Paper
      • Paper - multi-mt) | [Dataset](https://huggingface.co/datasets/stsb_multi_mt) | [Website](https://ixa2.si.ehu.eus/stswiki/index.php/STSbenchmark)**
      • Paper - research-datasets/paws/tree/master/pawsx)**
      • Paper
    • Text Generation

      • Paper - USC/CommonGen) | [Dataset](https://huggingface.co/datasets/common_gen)**
      • Paper - LILY/dart) | [Dataset](https://huggingface.co/datasets/dart)**
      • Paper - dataset) | [Dataset](https://huggingface.co/datasets/e2e_nlg?row=0)**
      • Paper - dataset) | [Dataset](https://huggingface.co/datasets/web_nlg)**
      • Paper - dataset) | [Dataset](https://huggingface.co/datasets/e2e_nlg?row=0)**
    • Text Translation

    • Text Summarization

    • Named Entity Recognition

      • Paper
      • Paper
      • Paper
      • Paper - NERD) | [Dataset](https://tianchi.aliyun.com/dataset/102048) | [Website](https://ningding97.github.io/fewnerd/)**
      • Paper
      • Paper
      • Paper - horse) | [Dataset](https://github.com/hltcoe/golden-horse)**
      • Paper
      • Paper
    • Text Quality Evaluation

      • Paper - mll.github.io/CoLA/)**
      • Paper1 - 6820.pdf) | [Paper3](https://aclanthology.org/W15-3106.pdf) | [Dataset1](http://ir.itc.ntnu.edu.tw/lre/sighan7csc.html) | [Dataset2](http://ir.itc.ntnu.edu.tw/lre/clp14csc.html) | [Dataset3](http://ir.itc.ntnu.edu.tw/lre/sighan8csc.html)**
      • Paper
      • Paper - ime)**
    • Text-to-Code

      • Paper - research/google-research/tree/master/mbpp)**
      • Paper
      • Paper - explorer/)**
      • Paper - lily.github.io/spider)**
    • Relation Extraction

      • Paper
      • Paper - SLT/tacred) | [Website](https://catalog.ldc.upenn.edu/LDC2018T24)**
      • Paper
      • Paper1 - 1649.pdf) | [Github](https://github.com/thunlp/fewrel) | [Website](https://thunlp.github.io/fewrel)**
    • Multitask <a id="multitask02"></a>

      • Paper - ai/CSL)**
      • Paper - research-datasets/QED)**
      • Paper - Open/METS-CoV)**
  • Multi-modal Large Language Models (MLLMs) Datasets <a id="multi-modal-large-language-models-mllms-datasets"></a>

  • Paper