Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
awesome-instruction-dataset
A collection of open-source dataset to train instruction-following LLMs (ChatGPT,LLaMA,Alpaca)
https://github.com/yaodongC/awesome-instruction-dataset
Last synced: 5 days ago
JSON representation
-
Uncategorized
-
Uncategorized
- nichtdax/awesome-totally-open-chatgpt
- (gururise/Cleaned Alpaca)|52K|EN|MT|SI
- (XueFuzhao/InstructionWild)|52K|EN|CN|MT|SI
- (allenai/natural-instructions)|1.6K|ML|MT|HG
- (nomic-ai/gpt4all)|437k|EN|MT|COL
- (thunlp/UltraChat)|280k|EN|TS|MIX
- (cascip/ChatAlpaca)|10k|EN|MT|MIX
- (Instruction-Tuning-with-GPT-4/GPT-4-LLM)|52K|EN|MT|MIX
- (thu-coai/Safety-Prompts)|100k|CN|MT|MIX
- (Vision-CAIR/MiniGPT-4)|5K|EN|MT|MIX
- (haotian-liu/LLaVA)|150K|EN|MT|MIX
- (JosephusCheung/GuanacoDataset)|534K|ML|MT|SI
- (allenai/prosocial-dialog)|58K|EN|MT|MIX
- (bigscience/xP3)|N/A|ML|MT|MIX
- (PhoebusSi/Alpaca-CoT)|500k|ML|MT|COL
- (google-research/FLAN)|N/A|EN|MT|MIX
- (YeungNLP/firefly-train-1.1M)|1100k|CN|MT|COL
- (databrickslabs/dolly)|15K|EN|MT|HG
- (OpenAssistant/oasst1)|161K|ML|MT|HG
- (RyokoAI/ShareGPT52K)|90K|ML|MT|SI
- (zjunlp/Mol-Instructions)|2043K|ML|MT|MIX
- (Anthropic/hh-rlhf)|22k|EN|MT|MIX
- (HuggingFaceH4/stack-exchange-preferences)|10741k|EN|TS|HG
- (stanfordnlp/SHP)|385k|EN|MT|HG
-
-
[(Instruction-Tuning-with-GPT-4/GPT-4-LLM)|52K|EN|CN|MT|SI](https://github.com/Instruction-Tuning-with-GPT-4/GPT-4-LLM)
-
[(Vision-CAIR/MiniGPT-4)|5K|EN|MT|MIX](https://minigpt-4.github.io/)
-
[(PhoebusSi/Alpaca-CoT)|500k|ML|MT|COL](https://huggingface.co/datasets/QingyiSi/Alpaca-CoT)
-
[(tatsu-lab/Alpaca)|52K|EN|MT|SI](https://github.com/tatsu-lab/stanford_alpaca)
-
[(haotian-liu/LLaVA)|150K|EN|MT|MIX](https://huggingface.co/datasets/liuhaotian/LLaVA-Instruct-150K)
-
[(allenai/natural-instructions)|1.6K|ML|MT|HG](https://github.com/allenai/natural-instructions)
-
[({sunrainyg}/{InstructCV)|EN|MT|MIX}]{https://github.com/AlaaLab/InstructCV}
-
[(JosephusCheung/GuanacoDataset)|534K|ML|MT|SI](https://huggingface.co/datasets/JosephusCheung/GuanacoDataset)
-
[(allenai/prosocial-dialog)|58K|EN|MT|MIX](https://huggingface.co/datasets/allenai/prosocial-dialog)
-
[(bigscience/xP3)|N/A|ML|MT|MIX](https://huggingface.co/datasets/bigscience/xP3)
-
[(nomic-ai/gpt4all)|437k|EN|MT|COL](https://github.com/nomic-ai/gpt4all)
- laion/OIG - questions](https://huggingface.co/datasets/pacovaldez/stackoverflow-questions) 3. subset of [bigscience/bloomz-p3](https://huggingface.co/bigscience/bloomz-p3)
- GPT4All: Training an Assistant-style Chatbot with Large Scale Data Distillation from GPT-3.5-Turbo
-
[(google-research/FLAN)|N/A|EN|MT|MIX](https://github.com/google-research/FLAN/tree/main/flan/v2)
-
[(orhonovich/unnatural-instructions)|240K|EN|MT|MIX](https://github.com/orhonovich/unnatural-instructions)
-
[(databrickslabs/dolly)|15K|EN|MT|HG](https://github.com/databrickslabs/dolly/tree/master/data)
-
[(OpenAssistant/oasst1)|161K|ML|MT|HG](https://huggingface.co/datasets/OpenAssistant/oasst1)
-
[(RyokoAI/ShareGPT52K)|90K|ML|MT|SI](https://huggingface.co/datasets/RyokoAI/ShareGPT52K)
-
[(zjunlp/Mol-Instructions)|2043K|ML|MT|MIX](https://huggingface.co/datasets/zjunlp/Mol-Instructions)
-
[(Anthropic/hh-rlhf)|22k|EN|MT|MIX](https://huggingface.co/datasets/Anthropic/hh-rlhf)
-
[(thu-coai/Safety-Prompts)|100k|CN|MT|MIX](https://github.com/thu-coai/Safety-Prompts)
-
[(HuggingFaceH4/stack-exchange-preferences)|10741k|EN|TS|HG](https://huggingface.co/datasets/HuggingFaceH4/stack-exchange-preferences)
-
[(Reddit/eli5)|500k|EN|MT|HG](https://huggingface.co/datasets/eli5)
- r/explainlikeimfive
- eli5 dataset - exchange-paired](https://huggingface.co/datasets/lvwerra/stack-exchange-paired).
-
[(Instruction-Tuning-with-GPT-4/GPT-4-LLM)|52K|EN|MT|MIX](https://github.com/Instruction-Tuning-with-GPT-4/GPT-4-LLM)
-
[(Hello-SimpleAI/HC3-Chinese)|13K|CN|MT|MIX](https://huggingface.co/datasets/Hello-SimpleAI/HC3-Chinese)
Programming Languages
Categories
Uncategorized
24
[(Instruction-Tuning-with-GPT-4/GPT-4-LLM)|52K|EN|MT|MIX](https://github.com/Instruction-Tuning-with-GPT-4/GPT-4-LLM)
4
[(Anthropic/hh-rlhf)|22k|EN|MT|MIX](https://huggingface.co/datasets/Anthropic/hh-rlhf)
3
[(zjunlp/Mol-Instructions)|2043K|ML|MT|MIX](https://huggingface.co/datasets/zjunlp/Mol-Instructions)
3
[(Vision-CAIR/MiniGPT-4)|5K|EN|MT|MIX](https://minigpt-4.github.io/)
3
[(HuggingFaceH4/stack-exchange-preferences)|10741k|EN|TS|HG](https://huggingface.co/datasets/HuggingFaceH4/stack-exchange-preferences)
3
[(Reddit/eli5)|500k|EN|MT|HG](https://huggingface.co/datasets/eli5)
2
[(thu-coai/Safety-Prompts)|100k|CN|MT|MIX](https://github.com/thu-coai/Safety-Prompts)
2
[({sunrainyg}/{InstructCV)|EN|MT|MIX}]{https://github.com/AlaaLab/InstructCV}
2
[(databrickslabs/dolly)|15K|EN|MT|HG](https://github.com/databrickslabs/dolly/tree/master/data)
2
[(nomic-ai/gpt4all)|437k|EN|MT|COL](https://github.com/nomic-ai/gpt4all)
2
[(allenai/prosocial-dialog)|58K|EN|MT|MIX](https://huggingface.co/datasets/allenai/prosocial-dialog)
2
[(allenai/natural-instructions)|1.6K|ML|MT|HG](https://github.com/allenai/natural-instructions)
1
[(bigscience/xP3)|N/A|ML|MT|MIX](https://huggingface.co/datasets/bigscience/xP3)
1
[(OpenAssistant/oasst1)|161K|ML|MT|HG](https://huggingface.co/datasets/OpenAssistant/oasst1)
1
[(JosephusCheung/GuanacoDataset)|534K|ML|MT|SI](https://huggingface.co/datasets/JosephusCheung/GuanacoDataset)
1
[(Hello-SimpleAI/HC3-Chinese)|13K|CN|MT|MIX](https://huggingface.co/datasets/Hello-SimpleAI/HC3-Chinese)
1
[(tatsu-lab/Alpaca)|52K|EN|MT|SI](https://github.com/tatsu-lab/stanford_alpaca)
1
[(PhoebusSi/Alpaca-CoT)|500k|ML|MT|COL](https://huggingface.co/datasets/QingyiSi/Alpaca-CoT)
1
[(Instruction-Tuning-with-GPT-4/GPT-4-LLM)|52K|EN|CN|MT|SI](https://github.com/Instruction-Tuning-with-GPT-4/GPT-4-LLM)
1
[(google-research/FLAN)|N/A|EN|MT|MIX](https://github.com/google-research/FLAN/tree/main/flan/v2)
1
[(haotian-liu/LLaVA)|150K|EN|MT|MIX](https://huggingface.co/datasets/liuhaotian/LLaVA-Instruct-150K)
1
[(RyokoAI/ShareGPT52K)|90K|ML|MT|SI](https://huggingface.co/datasets/RyokoAI/ShareGPT52K)
1
[(orhonovich/unnatural-instructions)|240K|EN|MT|MIX](https://github.com/orhonovich/unnatural-instructions)
1
Sub Categories
Keywords
chatgpt
5
llm
2
llama
2
instruction-tuning
2
alpaca
2
deep-learning
2
chinese-language
1
attack-defense
1
gpt-4
1
large-language-models
1
chatbot
1
llm-inference
1
open-source
1
awesome-lists
1
awesome-list
1
awesome
1
alternative
1
instruction
1
prompt
1
prompt-engineering
1
safety
1
chatglm
1
cot
1
lora
1
moss
1
p-tuning
1
parameter-efficient
1
pytorch
1
tabul
1
tabular-data
1
tabular-model
1
instruction-following
1
language-model
1