Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

https://github.com/orhonovich/unnatural-instructions

Last synced: 24 days ago
JSON representation

Host: GitHub
URL: https://github.com/orhonovich/unnatural-instructions
Owner: orhonovich
License: mit
Created: 2022-12-19T14:01:06.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2023-02-23T06:26:18.000Z (over 1 year ago)
Last Synced: 2024-01-14T05:59:31.773Z (6 months ago)
Size: 27.6 MB
Stars: 162
Watchers: 7
Forks: 9
Open Issues: 2
Metadata Files:
- Readme: README.md
- License: LICENSE

Lists

awesome-chatgpt-dataset - Unnatural Instructions - ative and diverse instructions, collected with virtually no human labor. | MIT | (Dataset Detail)
awesome-prompt-engineering - Unnatural Instruction
Awesome-instruction-tuning - Unnatural Inst. - LM-Unnat. Inst. | T5-LM | 11B | (Datasets and Models / Modified from Traditional NLP)
awesome-stars - orhonovich/unnatural-instructions - (Others)
awesome-rlhf - unnatural-instructions
awesome-instruction-dataset - (orhonovich/unnatural-instructions)|240K|EN|MT|MIX
awesome-instruction-datasets - Dataset Link
Awesome-LLM - https://github.com/orhonovich/unnatural-instructions

README

# Unnatural Instructions

This repository contains the Unnatural Instructions dataset. Unnatural Instructions is a dataset of instructions automatically generated by a Large Language model.
See full details in the paper: "[Unnatural Instructions: Tuning Language Models with (Almost) No Human Labor](https://arxiv.org/abs/2212.09689)"

## 🗃️ Content
The `data` folder contains two files: `core_data.jsonl`, containing the Unnatural Instructions core dataset of 68,478 instruction-input-output triplets, and `full_data.jsonl`, containing the full 240,670 Unnatural Instructions examples. The full data was constructed by expanding the core data with automatically generated instruction paraphrases.

## 📄 Format
### Core data
Each line in `core_data.jsonl` is a JSON object with two fields - `instruction`, which is a natural language instruction describing a task, and `instances`, an array of JSON objects, each contains
- `input`: An input for the task described by the `instruction`
- `instruction_with_input`: The instruction concatenated with the `input`
- `constraints`: The task's output space constraints
- `output`: The output of executing `instruction` with the given `input`

### Full data
`core_data.jsonl` has the same structure as `core_data.jsonl`, but with one additional field - `reformulations`. `reformulations` is an array of JSON objects, each corresponds to an automatically generated paraphrase for the given instruction. Each reformulation contains the fields:
- `instruction`: A paraphrase of the original instruction
- `input`: An input for the task described by the `instruction`
- `instruction_with_input`: The paraphrased instruction concatenated with the `input`
- `output`: The output of executing `instruction` with the given `input`

## 📘 Citation
If you make use of Unnatural Instructions, please cite the following paper:
```
@misc{honovich2022unnatural,
title = {Unnatural Instructions: Tuning Language Models with (Almost) No Human Labor},
author = {Honovich, Or and Scialom, Thomas and Levy, Omer and Schick, Timo},
url = {https://arxiv.org/abs/2212.09689},
publisher = {arXiv},
year={2022}
}
```