Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/RenzeLou/awesome-instruction-learning

Papers and Datasets on Instruction Tuning and Following. ✨✨✨
https://github.com/RenzeLou/awesome-instruction-learning

List: awesome-instruction-learning

awesome-list datasets in-context-learning instruction instruction-learning instruction-tuning large-language-models paper-list pretrained-language-model prompt survey

Last synced: 2 months ago
JSON representation

Papers and Datasets on Instruction Tuning and Following. ✨✨✨

Awesome Lists containing this project

README

        

Awesome Instruction Learning


Awesome
Stars


Commit
PaperNumber
PullRequests


🔥🔥🔥 An awesome reading list of Instruction Tuning and Following, including papers and datasets.


👉 Explore our latest survey update! Feel free to dive in and discover the improvements we've made 👀 🤗 : Latest Survey

---

## ❤️ Contribution

This repository is currently maintained by [Renze Lou](https://renzelou.github.io/) @ PennState and [Kai Zhang](https://drogozhang.github.io/) @ OhioState. **We appreciate any contributions** ❤️.

If you have any suggestions or find any missed papers, feel free to [reach out](https://outlook.office.com/mail/deeplink/compose?mailtouri=mailto%3Amarionojump0722%40gmail.com) or submit a [pull request](https://github.com/RenzeLou/awesome-instruction-learning/pulls):

1. Use following markdown format.

```markdown
**Paper Title.** *Author 1, Author 2, and Author 3.* Conference/Journal/Preprint Year. [[pdf](link)]; [[other resources](link)].
```

2. If one preprint paper has multiple versions, please use **the earliest submitted year**.

3. Display the papers in **a year descending order** (the latest, the first).

## 🥳 Citation

Find this repository helpful? 😊😊😊

Please consider citing our paper. 👇👇👇

```
@article{lou2023instruction,
title={A Comprehensive Survey on Instruction Following},
author={Lou, Renze and Zhang, Kai and Yin, Wenpeng},
journal={arXiv preprint arXiv:2303.10475},
year={2023}
}
```

---

## 🔍 Table of Contents

- [1. 💁🏽‍♀️ Introduction](#1-️-introduction)
- [2. 🎓 Surveys and Tutorials](#2--surveys-and-tutorials)
- [3. 📚 Corpora](#3--corpora)
- [4. 🗂️ Taxonomy](#4-️-taxonomy)
- [4.1 Entailment-oriented Instruction](#41-entailment-oriented-instruction)
- [4.2 PLM-oriented Instruction](#42-plm-oriented-instruction)
- [4.3 Human-oriented Instruction](#43-human-oriented-instruction)
- [5. 📊 Analyses](#5--analyses)
- [5.1 Scale](#51-scale)
- [5.2 Explanability](#52-explanability)
- [5.3 Robustness and Safety](#53-robustness-and-safety)
- [5.4 Evaluation](#54-evaluation)
- [5.5 Negation](#55-negation)
- [5.6 Complexity](#56-complexity)
- [5.7 Other Papers](#57-other-papers)
- [6. 🤖 Applications](#6--applications)
- [6.1 Human-Computer Interaction](#61-human-computer-interaction)
- [6.2 Data and Feature Augmentation](#62-data-and-feature-augmentation)
- [6.3 General-purpose Language Models](#63-general-purpose-language-models)
- [6.4 Other Papers](#64-other-papers)
- [7. 📖 Extended Reading](#7--extended-reading)
- [7.1 Instruction Induction](#71-instruction-induction)
- [7.2 ChatGPT-related Papers](#72-chatgpt-related-papers)
- [7.3 Human Feedback vs. Model Feedback](#73-human-feedback-vs-model-feedback)
- [7.4 Scalable Oversight and Alignment](#74-scalable-oversight-and-alignment)
- [7.5 Other Papers](#75-other-papers)

---

## 1. 💁🏽‍♀️ Introduction



Why *instruction-driven* learning instead of *example-driven* learning?

- 👉 **Affordable.** For the conventional example-driven supervised learning, each *downstream* task usually requires extensive labeled examples 💰. While for instruction learning, each *downstream* task may require only one instruction and just a few examples 🤩.
- 👉 **One model, all tasks.** An ideal AI system should be able to quickly understand and handle various new tasks 💫.
- 👉 **A promising research direction.** Traditional example-driven supervised learning uses labeled instances to represent the task semantics, i.e., training models by observing numerous examples to recover the original task meaning. Therefore, **why not directly use the task instruction**, **which has already occupied the essential task semantics**?

## 2. 🎓 Surveys and Tutorials

We use the label ![comprehensive](https://img.shields.io/badge/comprehensive-FFA07A) to denote the papers with a more comprehensive perspective. While some other papers are more specific to a certain in-context instruction, including ![prompt](https://img.shields.io/badge/prompt-90EE90), few-shot ![in-context demonstrations](https://img.shields.io/badge/demonstrations-FFB6C1), and CoT ![reasoning](https://img.shields.io/badge/reasoning-9cf).

1. **A Comprehensive Survey on Instruction Following.** *Renze Lou, Kai Zhang, and Wenpeng Yin.* Preprint 2023. [[pdf](https://arxiv.org/abs/2303.10475)]; [[paper list](https://github.com/RenzeLou/awesome-instruction-learning)]. ![comprehensive](https://img.shields.io/badge/comprehensive-FFA07A)

2. **Learning from Task Instructions.** *Wenpeng Yin, Qinyuan Ye, Pengfei Liu, Xiang Ren, and Hinrich Schütze.* EMNLP Tutorial 2023. [[pdf](https://aclanthology.org/2023.emnlp-tutorial.4.pdf)]. ![comprehensive](https://img.shields.io/badge/comprehensive-FFA07A)

3. **Nature Language Reasoning, A Survey.** *Fei Yu, Hongbo Zhang, and Benyou Wang.* Preprint 2023. [[pdf](https://arxiv.org/pdf/2303.14725.pdf)]; [[paper list](https://github.com/FreedomIntelligence/ReasoningNLP)]. ![reasoning](https://img.shields.io/badge/reasoning-9cf)

4. **Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing.** *Pengfei Liu, Weizhe Yuan, Jinlan Fu, Zhengbao Jiang, Hiroaki Hayashi, and Graham Neubig.* ACM Computing Surveys 2023. [[pdf](https://dl.acm.org/doi/pdf/10.1145/3560815)]; [[website](http://pretrain.nlpedia.ai/)]. ![prompt](https://img.shields.io/badge/prompt-90EE90)

5. **A Survey on In-context Learning**. *Qingxiu Dong, Lei Li, Damai Dai, Ce Zheng, Zhiyong Wu, Baobao Chang, Xu Sun, Jingjing Xu, Lei Li, and Zhifang Sui*. Preprint 2022. [[pdf](https://arxiv.org/pdf/2301.00234.pdf)]. ![in-context demonstrations](https://img.shields.io/badge/demonstrations-FFB6C1)

6. **Towards Reasoning in Large Language Models: A Survey.** *Jie Huang, and Kevin Chen-Chuan Chang.* Preprint 2022. [[pdf](https://arxiv.org/pdf/2212.10403.pdf)]; [[paper list](https://github.com/jeffhj/LM-reasoning)]. ![reasoning](https://img.shields.io/badge/reasoning-9cf)

7. **Reasoning with Language Model Prompting: A Survey.** *Shuofei Qiao, Yixin Ou, Ningyu Zhang, Xiang Chen, Yunzhi Yao, Shumin Deng, Chuanqi Tan, Fei Huang, and Huajun Chen.* Preprint 2022. [[pdf](https://arxiv.org/pdf/2212.09597.pdf)]; [[paper list](https://github.com/zjunlp/Prompt4ReasoningPapers)]. ![reasoning](https://img.shields.io/badge/reasoning-9cf)

## 3. 📚 Corpora

**The high-quality dataset is the key factor for successful instruction tuning**. Therefore, we put the "corpora" section here to emphasize its importance.

We carefully design the following table, make it easy to be referred to, and keep it up-to-date. Hope it can contribute to future research of instruction tuning. 🤗

*(Some rows come from [Longpre et al.](https://arxiv.org/pdf/2301.13688.pdf), thanks for their great work ❤️.)*

Name 
Release
Data/Code
Scale
Language
Annotator

#Tasks
#Ins. (K)

UnifiedQA
05/2020
Link
46
750

✍ Human

CrossFit
04/2021
Link
159
71,000

✍ Human

Natural Inst. v1
04/2021
Link
61
620

✍ Human

Flan 2021
09/2021
Link
62
4,400

✍ Human

P3
10/2021
Link
62
12,000

✍ Human

MetaICL
10/2021
Link
142
3,500

✍ Human

ExMix
11/2021
Link
107
500

✍ Human

SuperNI


(Natural Inst. v2)

04/2022
Link
1,613
5,000

✍ Human

GLM
10/2022
Link
77
12,000

✍ Human

Flan 2022
10/2022
Link
1,836
15,000

✍ Human

xP3
11/2022
Link
71
81,000

✍ Human

Unnatural Inst.
12/2022
Link
117
64

🤖 InstructGPT002


text-davinci-002

Self-Instruct
12/2022
Link
/
82

🤖 GPT-3 


davinci

OPT-IML
12/2022
/
2,207
18,000

✍ Human

Alpaca
03/2023
Link
/
52

🤖 InstructGPT003


text-davinci-003

Baize
04/2023
Link
/
100


🤖 ChatGPT

Koala
04/2023
/
/
/


✍ Human


🤖 ChatGPT

GPT4All
04/2023
Link
/
808


✍ Human


🤖 ChatGPT

Alpaca-gpt4
04/2023
Link
/
113

🤖 GPT-4 


gpt-4

Vicuna
04/2023
/
/
76


✍ Human


🤖 ChatGPT

Dolly
04/2023
Link
/
15

✍ Human

Oasst
04/2023
Link
/
84


✍ Human

LongForm
04/2023
Link
/
27

✍ Human


🤖 InstructGPT003


text-davinci-003

Symbolic-Instruct
04/2023
Link
/
796

✍ Human


Synthetic Examples

LaMini
04/2023
Link
/
2,580

🤖 ChatGPT

WizardLM
04/2023
Link
/
196

🤖 ChatGPT

COEDIT
05/2023
Link
/
82

✍ Human

UltraChat
05/2023
Link
/
1,500


🤖 ChatGPT

CoT Collection
05/2023
Link
1,060
1,880

🤖 Codex

Dynosaur
05/2023
Link
5,740
801

🤖 ChatGPT

MUFFIN
10/2023
Link
/
68

🤖 ChatGPT


🤖 GPT-4 


✍ Human

Dynamics-of-Instruction
10/2023
Link
/
40

✍ Human

CoachLM
11/2023
Link
/
2

✍ Human

DEITA
12/2023
Link
/
10

🤖 ChatGPT

WaveCoder
12/2023
Link
4 code-related tasks
20

🤖 ChatGPT


🤖 GPT-4

Conifer
04/2024
Link
/
13

🤖 GPT-4

## 4. 🗂️ Taxonomy

In our paper, we divide the textual instructions into three categories.

### 4.1 Entailment-oriented Instruction

![entailment_oriented](./resources/entailment_oriented.png)

Entailment-oriented instruction regards the task **input** as the **premise**, and constructs the task **output** into the **hypothesis**. It unifies the conventional classification problems into a textual entailment paradigm.

1. **A Universal Discriminator for Zero-Shot Generalization.** *Haike Xu, Zongyu Lin, Jing Zhou, Yanan Zheng, and Zhilin Yang.* ACL 2023. [[pdf](https://arxiv.org/pdf/2211.08099.pdf)]; [[code](https://github.com/Rafa-zy/UD)].

2. **ConEntail: An Entailment-based Framework for Universal Zero and Few Shot Classification with Supervised Contrastive Pretraining.** *Ranran Haoran Zhang, Aysa Xuemo Fan, and Rui Zhang.* EACL 2023. [[pdf](https://arxiv.org/pdf/2210.07587.pdf)]; [[code](https://github.com/psunlpgroup/ConEntail)].

3. **OpenStance: Real-world Zero-shot Stance Detection.** *Hanzi Xu, Slobodan Vucetic, and Wenpeng Yin.* CoNLL 2022. [[pdf](https://arxiv.org/pdf/2210.14299.pdf)]; [[code](https://github.com/xhz0809/OpenStance)].

4. **Ultra-fine Entity Typing with Indirect Supervision from Natural Language Inference.** *Bangzheng Li, Wenpeng Yin, and Muhao Chen.* TACL 2022. [[pdf](https://aclanthology.org/2022.tacl-1.35.pdf)]; [[code](https://github.com/luka-group/lite)].

5. **Textual Entailment for Event Argument Extraction: Zero- and Few-Shot with Multi-Source Learning.** *Oscar Sainz, Itziar Gonzalez-Dios, Oier Lopez de Lacalle, Bonan Min, and Eneko Agirre.* Findings of NAACL 2022. [[pdf](https://aclanthology.org/2022.findings-naacl.187.pdf)]; [[code](https://github.com/luka-group/lite)].

6. **Label Verbalization and Entailment for Effective Zero and Few-Shot Relation Extraction.** *Oscar Sainz, Oier Lopez de Lacalle, Gorka Labaka, Ander Barrena, and Eneko Agirre.* EMNLP 2021. [[pdf](https://aclanthology.org/2021.emnlp-main.92.pdf)]; [[code](https://github.com/osainz59/Ask2Transformers)].

7. **Adapting Language Models for Zero-shot Learning by Meta-tuning on Dataset and Prompt Collections.** *Ruiqi Zhong, Kristy Lee, Zheng Zhang, and Dan Klein.* Findings of EMNLP 2021. [[pdf](https://aclanthology.org/2021.findings-emnlp.244.pdf)]; [[code](https://github.com/ruiqi-zhong/Meta-tuning)].

8. **Incremental Few-shot Text Classification with Multi-round New Classes: Formulation, Dataset and System.** *Congying Xia, Wenpeng Yin, Yihao Feng, and Philip Yu.* NAACL 2021. [[pdf](https://aclanthology.org/2021.naacl-main.106.pdf)]; [[code](https://github.com/congyingxia/IncrementalFSTC)].

9. **ExpBERT: Representation Engineering with Natural Language Explanations.** *Shikhar Murty, Pang Wei Koh, and Percy Liang.* ACL 2020. [[pdf](https://aclanthology.org/2020.acl-main.190.pdf)]; [[code](https://github.com/MurtyShikhar/ExpBERT)].

10. **Benchmarking Zero-shot Text Classification: Datasets, Evaluation and Entailment Approach.** *Wenpeng Yin, Jamaal Hay, Dan Roth* *.* EMNLP 2019. [[pdf](https://arxiv.org/pdf/1909.00161.pdf)]; [[website](https://cogcomp.seas.upenn.edu/page/publication_view/883)].

### 4.2 PLM-oriented Instruction

![plm_oriented](./resources/PLM_oriented.png)

PLM-oriented instruction (i.e., prompt) aims to construct a cloze-style input to steer pre-trained language models (PLM) for responses. Here, we diaplay several representative works of PLM-oriented instruction learning. For more works, please refer to [this repository](https://github.com/thunlp/PromptPapers) and [this survey](https://dl.acm.org/doi/pdf/10.1145/3560815).

1. **How Does In-Context Learning Help Prompt Tuning?** *Simeng Sun, Yang Liu, Dan Iter, Chenguang Zhu, and Mohit Iyyer.* Preprint 2023. [[pdf](https://arxiv.org/pdf/2302.11521.pdf)].

2. **Demystifying Prompts in Language Models via Perplexity Estimation.** *Hila Gonen, Srini Iyer, Terra Blevins, Noah A. Smith, and Luke Zettlemoyer.* Preprint 2022. [[pdf](https://arxiv.org/pdf/2212.04037.pdf)].

3. **RLPrompt: Optimizing Discrete Text Prompts with Reinforcement Learning.** *Mingkai Deng, Jianyu Wang, Cheng-Ping Hsieh, and et al.* EMNLP 2022. [[pdf](https://arxiv.org/pdf/2205.12548.pdf)]; [[code](https://github.com/mingkaid/rl-prompt)].

4. **PPT: Pre-trained Prompt Tuning for Few-shot Learning.** *Yuxian Gu, Xu Han, Zhiyuan Liu, and Minlie Huang.* ACL 2022. [[pdf](https://arxiv.org/pdf/2109.04332.pdf)]; [[code](https://github.com/thu-coai/PPT)].

5. **P-Tuning v2: Prompt Tuning Can Be Comparable to Fine-tuning Universally Across Scales and Tasks.** *Xiao Liu, Kaixuan Ji, Yicheng Fu, Weng Lam Tam, Zhengxiao Du, Zhilin Yang, and Jie Tang.* ACL 2022. [[pdf](https://arxiv.org/pdf/2110.07602.pdf)]; [[code](https://github.com/THUDM/P-tuning-v2)].

6. **KnowPrompt: Knowledge-aware Prompt-tuning with Synergistic Optimization for Relation Extraction.** *Xiang Chen, Ningyu Zhang, Xin Xie, and et al.* WWW 2022. [[pdf](http://128.84.21.203/pdf/2104.07650)]; [[code](https://github.com/zjunlp/KnowPrompt)].

7. **GPT Understands, Too.** *Xiao Liu, Yanan Zheng, Zhengxiao Du, Ming Ding, Yujie Qian, Zhilin Yang, and Jie Tang.* Preprint 2021. [[pdf](https://arxiv.org/pdf/2103.10385.pdf)]; [[code](https://github.com/THUDM/P-tuning)].

8. **Few-Shot Text Generation with Natural Language Instructions.** *Timo Schick and Hinrich Schütze.* EMNLP 2021. [[pdf](https://aclanthology.org/2021.emnlp-main.32.pdf)]; [[code](https://github.com/timoschick/pet)].

9. **It’s Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners.** *Timo Schick and Hinrich Schütze.* NAACL 2021. [[pdf](https://aclanthology.org/2021.naacl-main.185.pdf)]; [[code](https://github.com/timoschick/pet)].

10. **Learning How to Ask: Querying LMs with Mixtures of Soft Prompts.** *Guanghui Qin and Jason Eisner.* NAACL 2021. [[pdf](https://aclanthology.org/2021.naacl-main.410.pdf)]; [[code](https://github.com/hiaoxui/soft-prompts)].

11. **Prefix-Tuning: Optimizing Continuous Prompts for Generation.** *Xiang Lisa Li and Percy Liang.* ACL 2021. [[pdf](https://aclanthology.org/2021.acl-long.353.pdf)]; [[code](https://github.com/XiangLi1999/PrefixTuning)].

12. **Making Pre-trained Language Models Better Few-shot Learners.** *Tianyu Gao, Adam Fisch, and Danqi Chen.* ACL 2021. [[pdf](https://aclanthology.org/2021.acl-long.295.pdf)]; [[code](https://github.com/princeton-nlp/LM-BFF)].

13. **Template-Based Named Entity Recognition Using BART.** *Leyang Cui, Yu Wu, Jian Liu, Sen Yang, and Yue Zhang.* Findings of ACL 2021. [[pdf](https://aclanthology.org/2021.findings-acl.161.pdf)]; [[code](https://github.com/Nealcly/templateNER)].

14. **Exploiting Cloze-Questions for Few-Shot Text Classification and Natural Language Inference.** *Timo Schick and Hinrich Schütze.* EACL 2021. [[pdf](https://aclanthology.org/2021.eacl-main.20.pdf)]; [[code](https://github.com/timoschick/pet)].

15. **Language Models are Unsupervised Multitask Learners.** *Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever.* Preprint 2019. [[pdf](https://life-extension.github.io/2020/05/27/GPT%E6%8A%80%E6%9C%AF%E5%88%9D%E6%8E%A2/language-models.pdf)].

### 4.3 Human-oriented Instruction

![Human-oriented Instruction](./resources/human_oriented.png)

Human-oriented instruction is initially designed for human to understand the task and annotate the data, such as the [Amazon MTurk](https://www.mturk.com/) Instructions, which provides sufficient information about the task (e.g., detailed definition).

1. **Aligning Instruction Tasks Unlocks Large Language Models as Zero-Shot Relation Extractors.** *Kai Zhang, Bernal Jiménez Gutiérrez, and Yu Su.* Findings of ACL 2023. [[pdf](https://arxiv.org/pdf/2305.11159.pdf)]; [[code](https://github.com/OSU-NLP-Group/QA4RE)].

2. **Symbol tuning improves in-context learning in language models.** *Jerry Wei, Le Hou, Andrew Lampinen, Xiangning Chen, and et al.* Preprint 2023. [[pdf](https://arxiv.org/pdf/2305.08298.pdf)].

3. **Small Models are Valuable Plug-ins for Large Language Models.** *Canwen Xu, Yichong Xu, Shuohang Wang, Yang Liu, Chenguang Zhu, and Julian McAuley.* Preprint 2023. [[pdf](https://arxiv.org/pdf/2305.08848.pdf)]; [[code](https://github.com/JetRunner/SuperICL)].

4. **How Many Data Samples is an Additional Instruction Worth?** *Ravsehaj Singh Puri, Swaroop Mishra, Mihir Parmar, and Chitta Baral.* Findings of EACL 2023. [[pdf](https://arxiv.org/pdf/2203.09161.pdf)]; [[code](https://github.com/Ravsehajsinghpuri/Multi-Variant-Instructions)].

5. **In-Context Instruction Learning.** *Seonghyeon Ye, Hyeonbin Hwang, Sohee Yang, Hyeongu Yun, Yireun Kim, and Minjoon Seo.* Preprint 2023. [[pdf](https://arxiv.org/pdf/2302.14691.pdf)]; [[code](https://github.com/seonghyeonye/ICIL)].

6. **InstructABSA: Instruction Learning for Aspect Based Sentiment Analysis.** *Kevin Scaria, Himanshu Gupta, Saurabh Arjun Sawant, Swaroop Mishra, and Chitta Baral.* Preprint 2023. [[pdf](https://arxiv.org/pdf/2302.08624.pdf)]; [[code](https://github.com/kevinscaria/InstructABSA)].

7. **HINT: Hypernetwork Instruction Tuning for Efficient Zero-Shot Generalisation.** *Hamish Ivison, Akshita Bhagia, Yizhong Wang, Hannaneh Hajishirzi, and Matthew Peters.* Preprint 2022. [[pdf](https://arxiv.org/pdf/2212.10315.pdf)].

8. **Boosting Natural Language Generation from Instructions with Meta-Learning.** *Budhaditya Deb, Guoqing Zheng, and Ahmed Hassan Awadallah.* Preprint 2022. [[pdf](https://arxiv.org/pdf/2210.11617.pdf)].

9. **GrIPS: Gradient-free, Edit-based Instruction Search for Prompting Large Language Models.** *Archiki Prasad, Peter Hase, Xiang Zhou, and Mohit Bansal.* Preprint 2022. [[pdf](https://arxiv.org/pdf/2203.07281.pdf)]; [[code](https://github.com/archiki/GrIPS)].

10. **ConTinTin: Continual Learning from Task Instructions.** *Wenpeng Yin, Jia Li, and Caiming Xiong.* ACL 2022. [[pdf](https://aclanthology.org/2022.acl-long.218.pdf)].

11. **InstructDial: Improving Zero and Few-shot Generalization in Dialogue through Instruction Tuning.** *Prakhar Gupta, Cathy Jiao, Yi-Ting Yeh, Shikib Mehri, Maxine Eskenazi, and Jeffrey P. Bigham.* EMNLP 2022. [[pdf]([link](http://128.84.21.203/pdf/2205.12673))]; [[code](https://github.com/prakharguptaz/Instructdial)].

12. **Learning to Generate Task-Specific Adapters from Task Description.** *Qinyuan Ye and Xiang Ren.* ACL 2021. [[pdf](https://aclanthology.org/2021.acl-short.82.pdf)]; [[code](https://github.com/INK-USC/hypter)].

13. **The Turking Test: Can Language Models Understand Instructions?** *Avia Efrat and Omer Levy.* Preprint 2020. [[pdf](https://arxiv.org/pdf/2010.11982.pdf)].

## 5. 📊 Analyses

### 5.1 Scale
The model and task scale are found to be important for instruction-based fine-tuning. Basically, the larger model scale brings more benefits to the generalization, and so does the task scale. However, some works raised objections (e.g., [Jang et al.](https://arxiv.org/pdf/2302.03202.pdf) and [Wang et al.](https://arxiv.org/pdf/2210.00185.pdf)).


1. **Exploring the Benefits of Training Expert Language Models over Instruction Tuning.** *Joel Jang, Seungone Kim, Seonghyeon Ye, and et al.* Preprint 2023. [[pdf](https://arxiv.org/pdf/2302.03202.pdf)]; [[code](https://github.com/joeljang/ELM)].

2. **The Flan Collection: Designing Data and Methods for Effective Instruction Tuning.** *Shayne Longpre, Le Hou, Tu Vu, and et al.* Preprint 2023. [[pdf](https://arxiv.org/pdf/2301.13688.pdf)]; [[code](https://github.com/google-research/FLAN/tree/main/flan/v2)]; [[corpus](https://huggingface.co/datasets/SirNeural/flan_v2)].

3. **UL2: Unifying Language Learning Paradigms.** *Yi Tay, Mostafa Dehghani, Vinh Q. Tran, and et al.* Preprint 2022. [[pdf](https://arxiv.org/pdf/2205.05131.pdf)]; [[checkpoint](https://huggingface.co/google/flan-ul2)].

4. **OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization.** *Srinivasan Iyer, Xi Victoria Lin, Ramakanth Pasunuru, and et al.* Preprint 2022. [[pdf](https://arxiv.org/pdf/2212.12017.pdf)].

5. **Scaling Instruction-Finetuned Language Models.** *Hyung Won Chung, Le Hou, Shayne Longpre, and et al.* Preprint 2022. [[pdf](https://arxiv.org/pdf/2210.11416.pdf)]; [[checkpoint](https://huggingface.co/docs/transformers/model_doc/flan-t5)].

6. **Learning Instructions with Unlabeled Data for Zero-Shot Cross-Task Generalization.** *Yuxian Gu, Pei Ke, Xiaoyan Zhu, and Minlie Huang.* EMNLP 2022. [[pdf](https://arxiv.org/pdf/2210.09175.pdf)]; [[code](https://github.com/thu-coai/UDIT)].

7. **Emergent Abilities of Large Language Models.** *Jason Wei, Yi Tay, Rishi Bommasani, Colin Raffel, and et al.* TMLR 2022. [[pdf](https://openreview.net/pdf?id=yzkSU5zdwD)].

8. **Multitask Prompted Training Enables Zero-Shot Task Generalization.** *Victor Sanh, Albert Webson, Colin Raffel, and et al.* ICLR 2022. [[pdf](https://openreview.net/pdf?id=9Vrb9D0WI4)]; [[checkpoint](https://github.com/bigscience-workshop/t-zero)]; [[corpus](https://github.com/bigscience-workshop/promptsource)].

9. **Finetuned Language Models are Zero-Shot Learners.** *Jason Wei, Maarten Bosma, Vincent Zhao, and et al.* ICLR 2022. [[pdf](https://openreview.net/pdf?id=gEZrGCozdqR)]; [[code](https://github.com/google-research/flan)].

10. **Zemi: Learning Zero-Shot Semi-Parametric Language Models from Multiple Tasks.** *Zhenhailong Wang, Xiaoman Pan, Dian Yu, Dong Yu, Jianshu Chen, and Heng Ji.* Preprint 2022. [[pdf](https://arxiv.org/pdf/2210.00185.pdf)]; [[code](https://github.com/MikeWangWZHL/Zemi)].

11. **ZeroPrompt: Scaling Prompt-Based Pretraining to 1,000 Tasks Improves Zero-Shot Generalization.** *Hanwei Xu, Yujun Chen, Yulun Du, Nan Shao, Yanggang Wang, Haiyu Li, and Zhilin Yang.* Preprint 2022. [[pdf](https://arxiv.org/pdf/2201.06910.pdf)].

12. **The Power of Scale for Parameter-Efficient Prompt Tuning.** *Brian Lester, Rami Al-Rfou, and Noah Constant.* EMNLP 2021. [[pdf](https://aclanthology.org/2021.emnlp-main.243.pdf)]; [[code](https://github.com/google-research/prompt-tuning)].

### 5.2 Explanability

We exhibit works that focus on the interpretability and reliability of instruction learning, i.e., explaining *when* and *why* instruction can take effect.


1. **What In-Context Learning "Learns" In-Context: Disentangling Task Recognition and Task Learning.** *Jane Pan, Tianyu Gao, Howard Chen, and Danqi Chen.* Findings of ACL 2023. [[pdf](https://arxiv.org/pdf/2305.09731.pdf)]; [[code](https://github.com/princeton-nlp/WhatICLLearns)].

2. **REV: Information-Theoretic Evaluation of Free-Text Rationales.** *Hanjie Chen, Faeze Brahman, Xiang Ren, and et al.* ACL 2023. [[pdf](https://arxiv.org/pdf/2210.04982.pdf)]; [[code](https://github.com/HanjieChen/REV)].

3. **Interpretability at Scale: Identifying Causal Mechanisms in Alpaca.** *Zhengxuan Wu, Atticus Geiger, Christopher Potts, and Noah D. Goodman.* Preprint 2023. [[pdf](https://arxiv.org/pdf/2305.08809.pdf)]; [[code](https://github.com/frankaging/align-transformers)].

4. **Large Language Models Are Implicitly Topic Models: Explaining and Finding Good Demonstrations for In-Context Learning.** *Xinyi Wang, Wanrong Zhu, Michael Saxon, Mark Steyvers, and William Yang Wang.* Preprint 2023. [[pdf](https://arxiv.org/pdf/2301.11916.pdf)]; [[code](https://github.com/WANGXinyiLinda/concept-based-demonstration-selection)].

5. **The Learnability of In-Context Learning.** *Noam Wies, Yoav Levine, and Amnon Shashua.* Preprint 2023. [[pdf](https://arxiv.org/pdf/2303.07895.pdf)].

6. **Why think step-by-step? Reasoning emerges from the locality of experience.** *Ben Prystawski, and Noah D. Goodman.* Preprint 2023. [[pdf](https://arxiv.org/pdf/2304.03843.pdf)].

7. **Larger language models do in-context learning differently.** *Jerry Wei, Jason Wei, Yi Tay, and et al.* Preprint 2023. [[pdf](https://arxiv.org/pdf/2303.03846.pdf)].

8. **​​What learning algorithm is in-context learning? Investigations with linear models.** *Ekin Akyürek, Dale Schuurmans, Jacob Andreas, Tengyu Ma, and Denny Zhou.* ICLR 2023. [[pdf](https://openreview.net/pdf?id=0g0X4H8yN4I)]; [[code](https://github.com/ekinakyurek/google-research/tree/master/incontext)].

9. **Can language models learn from explanations in context?** *Andrew K. Lampinen, Ishita Dasgupta, Stephanie C. Y. Chan, and et al.* Findings of EMNLP 2022. [[pdf](https://arxiv.org/pdf/2204.02329.pdf)].

10. **Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?** *Sewon Min, Xinxi Lyu, Ari Holtzman, Mikel Artetxe, Mike Lewis, Hannaneh Hajishirzi, and Luke Zettlemoyer.* EMNLP 2022. [[pdf](https://arxiv.org/pdf/2202.12837.pdf)]; [[code](https://github.com/Alrope123/rethinking-demonstrations)].

11. **Prompt Waywardness: The Curious Case of Discretized Interpretation of Continuous Prompts.** *Daniel Khashabi, Xinxi Lyu, Sewon Min, and et al.* NAACL 2022. [[pdf](https://aclanthology.org/2022.naacl-main.266.pdf)]; [[code](https://github.com/Alrope123/prompt-waywardness)].

12. **Do Prompt-Based Models Really Understand the Meaning of Their Prompts?.** *Albert Webson and Ellie Pavlick.* NAACL 2022. [[pdf](https://aclanthology.org/2022.naacl-main.167.pdf)]; [[code](https://github.com/awebson/prompt_semantics)].

13. **Reframing Instructional Prompts to GPTk’s Language.** *Swaroop Mishra, Daniel Khashabi, Chitta Baral, Yejin Choi, and Hannaneh Hajishirzi.* Findings of ACL 2022. [[pdf](https://aclanthology.org/2022.findings-acl.50.pdf)]; [[code](https://github.com/allenai/reframing/)].

14. **What Makes Good In-Context Examples for GPT-3?** *Jiachang Liu, Dinghan Shen, Yizhe Zhang, Bill Dolan, Lawrence Carin, and Weizhu Chen.* ACL Workshop 2022. [[pdf](https://aclanthology.org/2022.deelio-1.10.pdf)]; [[code](https://github.com/jiachangliu/KATEGPT3)].

15. **Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensitivity.** *Yao Lu, Max Bartolo, Alastair Moore, Sebastian Riedel, and Pontus Stenetorp.* ACL 2022. [[pdf](https://aclanthology.org/2022.acl-long.556.pdf)].

16. **Calibrate Before Use: Improving Few-shot Performance of Language Models.** *Zihao Zhao, Eric Wallace, Shi Feng, Dan Klein, and Sameer Singh.* ICML 2021. [[pdf](https://arxiv.org/pdf/2102.09690.pdf)]; [[code](https://github.com/tonyzhaozh/few-shot-learning)].

### 5.3 Robustness and Safety


1. **Backdooring Instruction-Tuned Large Language Models with Virtual Prompt Injection.** *Jun Yan, Vikas Yadav, Shiyang Li, and et al.* Workshop @ NeurIPS 2023. [[pdf](https://arxiv.org/abs/2307.16888)].

2. **Evaluating the Zero-shot Robustness ofInstruction-tuned Language Models.** *Jiuding Sun, Chantal Shaib, and Byron C. Wallace.* Preprint 2023. [[pdf](https://arxiv.org/pdf/2306.11270.pdf)].

3. **Poisoning Language Models During Instruction Tuning.** *Alexander Wan, Eric Wallace, Sheng Shen, and Dan Klein.* ICML 2023. [[pdf](https://arxiv.org/pdf/2305.00944.pdf)]; [[code](https://github.com/AlexWan0/Poisoning-Instruction-Tuned-Models)].

4. **Multi-step Jailbreaking Privacy Attacks on ChatGPT.** *Haoran Li, Dadi Guo, Wei Fan, Mingshi Xu, Jie Huang, Fanpu Meng, and Yangqiu Song.* Preprint 2023. [[pdf](https://arxiv.org/pdf/2304.05197.pdf)].

5. **More than you've asked for: A Comprehensive Analysis of Novel Prompt Injection Threats to Application-Integrated Large Language Models.** *Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, and Mario Fritz.* Preprint 2023. [[pdf](https://arxiv.org/pdf/2302.12173.pdf)]; [[code](https://github.com/greshake/llm-security)].

6. **Robustness of Learning from Task Instructions.** *Jiasheng Gu, Hanzi Xu, Liangyu Nie, and Wenpeng Yin.* Preprint 2022. [[pdf](https://arxiv.org/pdf/2212.03813.pdf)].

7. **Learning from Task Descriptions.** *Orion Weller, Nicholas Lourie, Matt Gardner, and Matthew E. Peters.* EMNLP 2020. [[pdf](https://aclanthology.org/2020.emnlp-main.105.pdf)]; [[code](https://github.com/allenai/zest)]; [[corpus](https://allenai.org/data/zest)].

### 5.4 Evaluation
Stop using old-school automatic metrics to evaluate your instruction-tuned system; try more advanced methods to do it comprehensively!

1. **Camels in a Changing Climate: Enhancing LM Adaptation with Tulu 2.** *Hamish Ivison, Yizhong Wang, Valentina Pyatkin, and et al.* Preprint 2023. [[pdf](https://arxiv.org/pdf/2311.10702.pdf)]; [[model&data](https://huggingface.co/collections/allenai/tulu-v2-suite-6551b56e743e6349aab45101)]

2. **How Far Can Camels Go? Exploring the State of Instruction Tuning on Open Resources.** *Yizhong Wang, Hamish Ivison, Pradeep Dasigi, and et al.* NeurIPS Datasets and Benchmarks 2023. [[pdf](https://arxiv.org/pdf/2306.04751.pdf)]; [[code](https://github.com/allenai/open-instruct)].

3. **Instruction-following Evaluation through Verbalizer Manipulation.** *Shiyang Li, Jun Yan, Hai Wang, Zheng Tang, Xiang Ren, Vijay Srinivasan, Hongxia Jin* Preprint 2023. [[pdf](https://arxiv.org/pdf/2307.10558.pdf)].

4. **INSTRUCTEVAL: Towards Holistic Evaluation of Instruction-Tuned Large Language Models.** *Yew Ken Chia, Pengfei Hong, Lidong Bing, and Soujanya Poria.* Preprint 2023. [[pdf](https://arxiv.org/pdf/2306.04757.pdf)]; [[code](https://github.com/declare-lab/instruct-eval)]; [[leaderboard](https://declare-lab.net/instruct-eval/)].

### 5.5 Negation

Negation expressions, such as `do not` and `avoid doing`, are difficult for models to corretly understand and follow.

1. **Can Large Language Models Truly Understand Prompts? A Case Study with Negated Prompts.** *Joel Jang, Seonghyeon Ye, and Minjoon Seo.* ICML Workshop 2023. [[pdf](https://proceedings.mlr.press/v203/jang23a/jang23a.pdf)].

2. **Understanding by Understanding Not: Modeling Negation in Language Models.** *Arian Hosseini, Siva Reddy, Dzmitry Bahdanau, and et al.* NAACL 2021. [[pdf](https://aclanthology.org/2021.naacl-main.102.pdf)]; [[code](https://github.com/arianhosseini/negation-learning)].

### 5.6 Complexity

Papers are focusing on enhancing the complexity of instructions to enhance model competence. More complex data in the mix of instruction data, more competent performance model could achieve.

1. **Wizardlm: Empowering large language models to follow complex instructions.** *Xu, Can and Sun, Qingfeng and Zheng, Kai and Geng, Xiubo and Zhao, Pu and Feng, Jiazhan and Tao, Chongyang and Jiang, Daxin*. Prepint 2023. [[pdf](https://arxiv.org/pdf/2304.12244.pdf)]; [[code](https://github.com/nlpxucan/WizardLM)].

2. **Orca: Progressive learning from complex explanation traces of gpt-4.** *Mukherjee, Subhabrata and Mitra, Arindam and Jawahar, Ganesh and Agarwal, Sahaj and Palangi, Hamid and Awadallah, Ahmed*. Prepint 2023. [[pdf](https://arxiv.org/pdf/2306.02707.pdf)].

3. **A Preliminary Study of the Intrinsic Relationship between Complexity and Alignment.** *Zhao, Yingxiu and Yu, Bowen and Hui, Binyuan and Yu, Haiyang and Huang, Fei and Li, Yongbin and Zhang, Nevin L*. Prepint 2023. [[pdf](https://arxiv.org/pdf/2308.05696.pdf)]; [[code](https://github.com/AlibabaResearch/DAMO-ConvAI/tree/main/tree-instruct)].

### 5.7 Other Papers


1. **Don't Blame the Annotator: Bias Already Starts in the Annotation Instructions.** *Mihir Parmar, Swaroop Mishra, Mor Geva, and Chitta Baral.* EACL 2023. [[pdf](https://arxiv.org/pdf/2205.00415.pdf)]; [[code](https://github.com/Mihir3009/instruction-bias)].
2. **Instruction Tuned Models are Quick Learners.** *Himanshu Gupta, Saurabh Arjun Sawant, Swaroop Mishra, et al.* Preprint 2023. [[pdf](https://arxiv.org/pdf/2306.05539.pdf)]; [[code](https://github.com/srsawant34/efficient_instruction_learning)].
3. **Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning.** *Haokun Liu, Derek Tam, Mohammed Muqeeth, Jay Mohta, Tenghao Huang, Mohit Bansal, and Colin Raffel.* NeurIPS 2022. [[pdf](https://openreview.net/pdf?id=rBCvMG-JsPd)]; [[code](https://github.com/r-three/t-few)].
4. **A Survey of NLP-Related Crowdsourcing HITs: what works and what does not.** *Jessica Huynh, Jeffrey Bigham, and Maxine Eskenazi.* Preprint 2021. [[pdf](https://arxiv.org/pdf/2111.05241.pdf)].

## 6. 🤖 Applications

### 6.1 Human-Computer Interaction

Instructions are used in various human-computer interaction (HCI) tasks, such as virtual assistants, chatbots, etc.

1. **Help me write a poem: Instruction Tuning as a Vehicle for Collaborative Poetry Writing.** *Tuhin Chakrabarty, Vishakh Padmakumar, and He He.* EMNLP 2022. [[pdf](https://arxiv.org/pdf/2210.13669.pdf)]; [[code](https://github.com/vishakhpk/creative-instructions)].

2. **HELP ME THINK: A Simple Prompting Strategy for Non-experts to Create Customized Content with Models.** *Swaroop Mishra, and Elnaz Nouri.* Preprint 2022. [[pdf](https://arxiv.org/pdf/2208.08232.pdf)].

3. **EditEval: An Instruction-Based Benchmark for Text Improvements.** *Jane Dwivedi-Yu, Timo Schick, Zhengbao Jiang, and et al.* Preprint 2022. [[pdf](https://arxiv.org/pdf/2209.13331.pdf)]; [[code](https://github.com/facebookresearch/EditEval)]; [[website](https://eval.ai/web/challenges/challenge-page/1866/overview)].

4. **Communicating Natural Programs to Humans and Machines.** *Sam Acquaviva, Yewen Pu, Marta Kryven, and et al.* NeurIPS Workshop 2022. [[pdf](https://openreview.net/pdf?id=OxFoLTKDcNm)]; [[code](https://github.com/samacqua/LARC)].

5. **Interactive Task Learning from GUI-Grounded Natural Language Instructions and Demonstrations.** *Toby Jia-Jun Li, Tom Mitchell, and Brad Myers.* ACL Demo 2020. [[pdf](https://aclanthology.org/2020.acl-demos.25.pdf)]; [[code](https://github.com/tobyli/Sugilite_development)]; [[video](https://www.youtube.com/watch?v=tdHEk-GeaqE)].

6. **Multi-Modal Interactive Task Learning from Demonstrations and Natural Language Instructions.** *Toby Jia-Jun Li.* UIST 2020. [[pdf](https://dl.acm.org/doi/pdf/10.1145/3379350.3415803)]; [[code](https://github.com/tobyli/Sugilite_development)].

7. **Pre-Learning Environment Representations for Data-Efficient Neural Instruction Following.** *David Gaddy, and Dan Klein.* ACL 2019. [[pdf](https://aclanthology.org/P19-1188.pdf)].

8. **VirtualHome: Simulating Household Activities via Programs.** *Xavier Puig, Kevin Ra, Marko Boben, and et al.* CVPR 2018. [[pdf](https://openaccess.thecvf.com/content_cvpr_2018/papers/Puig_VirtualHome_Simulating_Household_CVPR_2018_paper.pdf)]; [[website](http://virtual-home.org/)].

9. **Natural Language Communication with Robots.** *Yonatan Bisk, Deniz Yuret, and Daniel Marcu.* NAACL 2016. [[pdf](https://aclanthology.org/N16-1089.pdf)]; [[website](https://groundedlanguage.github.io/)].

10. **Jointly Learning to Parse and Perceive: Connecting Natural Language to the Physical World.** *Jayant Krishnamurthy, and Thomas Kollar.* TACL 2013. [[pdf](http://rtw.ml.cmu.edu/tacl2013_lsp/tacl2013-krishnamurthy-kollar.pdf)]; [[code](http://rtw.ml.cmu.edu/tacl2013_lsp/)].

11. **Weakly Supervised Learning of Semantic Parsers for Mapping Instructions to Actions.** *Yoav Artzi, and Luke Zettlemoyer.* TACL 2013. [[pdf](https://aclanthology.org/Q13-1005.pdf)].

12. **Unsupervised PCFG Induction for Grounded Language Learning with Highly Ambiguous Supervision.** *Joohyun Kim, and Raymond Mooney.* EMNLP 2012. [[pdf](https://aclanthology.org/D12-1040.pdf)].

13. **A joint model of language and perception for grounded attribute learning.** *Cynthia Matuszek, Nicholas FitzGerald, Luke Zettlemoyer, Liefeng Bo, and Dieter Fox.* ICML 2012. [[pdf](https://arxiv.org/pdf/1206.6423.pdf)].

14. **Learning to Interpret Natural Language Instructions.** *Monica Babeş-Vroman, James MacGlashan, Ruoyuan Gao, and et al.* ACL Workshop 2012. [[pdf](https://aclanthology.org/W12-2801.pdf)].

15. **Fast Online Lexicon Learning for Grounded Language Acquisition.** *David Chen.* ACL 2012. [[pdf](https://aclanthology.org/P12-1045.pdf)].

16. **Learning to Win by Reading Manuals in a Monte-Carlo Framework.** *S.R.K. Branavan, David Silver, and Regina Barzilay.* ACL 2011. [[pdf](https://aclanthology.org/P11-1028.pdf)]; [[website](http://groups.csail.mit.edu/rbg/code/civ/)].

17. **Learning from natural instructions.** *Dan Goldwasse, and Dan Roth.* IJCAI 2011. [[pdf](https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=2aba84801935041774c1e2b749e0331efa322ed8)].

18. **Learning to Interpret Natural Language Navigation Instructions from Observations.** *David L. Chen and Raymond J. Mooney.* AAAI 2011. [[pdf](https://www.cs.utexas.edu/users/ml/papers/chen.aaai11.pdf)].

19. **Approaching the Symbol Grounding Problem with Probabilistic Graphical Models.** *Stefanie Tellex, Thomas Kollar, Steven Dickerson, and et al.* AAAI 2011. [[pdf](https://cs.brown.edu/people/stellex/publications/tellex11a.pdf)].

20. **Driving Semantic Parsing from the World’s Response.** *James Clarke, Dan Goldwasser, Ming-Wei Chang, and Dan Roth.* CoNLL 2010. [[pdf](https://aclanthology.org/W10-2903.pdf)].

21. **Learning to Follow Navigational Directions.** *Adam Vogel, and Daniel Jurafsky.* ACL 2010. [[pdf](https://aclanthology.org/P10-1083.pdf)].

22. **Reading between the Lines: Learning to Map High-Level Instructions to Commands.** *S.R.K. Branavan, Luke Zettlemoyer, and Regina Barzilay.* ACL 2010. [[pdf](https://aclanthology.org/P10-1129.pdf)]; [[website](http://groups.csail.mit.edu/rbg/code/rl-hli/)].

23. **Reading to Learn: Constructing Features from Semantic Abstracts.** *Jacob Eisenstein, James Clarke, Dan Goldwasser, and Dan Roth.* EMNLP 2009. [[pdf](https://aclanthology.org/D09-1100.pdf)]; [[website](http://www.comlab.ox.ac.uk/activities/machinelearning/Aleph/)].

24. **Learning Semantic Correspondences with Less Supervision.** *Percy Liang, Michael Jordan, and Dan Klein.* ACL 2009. [[pdf](https://aclanthology.org/P09-1011.pdf)].

25. **Reinforcement Learning for Mapping Instructions to Actions.** *S.R.K. Branavan, Harr Chen, Luke Zettlemoyer, and Regina Barzilay.* ACL 2009. [[pdf](https://aclanthology.org/P09-1010.pdf)]; [[website](http://groups.csail.mit.edu/rbg/code/rl/)].

26. **Learning to sportscast: a test of grounded language acquisition.** *David L. Chen and Raymond J. Mooney.* ICML 2008. [[pdf](https://dl.acm.org/doi/pdf/10.1145/1390156.1390173)].

27. **Guiding a Reinforcement Learner with Natural Language Advice: Initial Results in RoboCup Soccer.** *Gregory Kuhlmann, Peter Stone, Raymond Mooney, and Jude Shavlik.* AAAI Workshop 2004. [[pdf](https://ftp.cs.wisc.edu/machine-learning/shavlik-group/kuhlmann-aaai04.pdf)]; [[website](http://www.cs.utexas.edu/AustinVilla/sim/keepaway/)].

### 6.2 Data and Feature Augmentation

Some instructions (e.g., label explanations) are also be used for automatic annotation (i.e., data augmentation), or for enriching feature.

1. **One Embedder, Any Task: Instruction-Finetuned Text Embeddings.** *Hongjin Su, Weijia Shi, Jungo Kasai, and et al.* Preprint 2022. [[pdf](https://arxiv.org/pdf/2212.09741.pdf)]; [[website](https://instructor-embedding.github.io/)].

2. **Prompt Consistency for Zero-Shot Task Generalization.** *Chunting Zhou, Junxian He, Xuezhe Ma, Taylor Berg-Kirkpatrick, and Graham Neubig.* Findings of EMNLP 2022. [[pdf](https://arxiv.org/pdf/2205.00049.pdf)]; [[code](https://github.com/violet-zct/swarm-distillation-zero-shot)].

3. **Teaching Machine Comprehension with Compositional Explanations.** *Qinyuan Ye, Xiao Huang, Elizabeth Boschee, and Xiang Ren.* Findings of EMNLP 2020. [[pdf](https://aclanthology.org/2020.findings-emnlp.145.pdf)]; [[code](https://github.com/INK-USC/mrc-explanation)].

4. **Learning from Explanations with Neural Execution Tree.** *Ziqi Wang, Yujia Qin, Wenxuan Zhou, Jun Yan, Qinyuan Ye, Leonardo Neves, Zhiyuan Liu, and Xiang Ren.* ICLR 2020. [[pdf](https://openreview.net/pdf?id=rJlUt0EYwS)]; [[website](http://inklab.usc.edu/project-NExT/)].

5. **Training Classifiers with Natural Language Explanations.** *Braden Hancock, Paroma Varma, Stephanie Wang, Martin Bringmann, Percy Liang, and Christopher Ré.* ACL 2018. [[pdf](https://aclanthology.org/P18-1175.pdf)]; [[code](https://github.com/HazyResearch/babble)].

6. **Zero-shot Learning of Classifiers from Natural Language Quantification.** *Shashank Srivastava, Igor Labutov, and Tom Mitchell.* ACL 2018. [[pdf](https://aclanthology.org/P18-1029.pdf)].

7. **Joint Concept Learning and Semantic Parsing from Natural Language Explanations.** *Shashank Srivastava, Igor Labutov, and Tom Mitchell.* EMNLP 2017. [[pdf](https://aclanthology.org/D17-1161.pdf)].

### 6.3 General-purpose Language Models

General-purpose language models are also one of the most attractive applications of instruction learning, e.g., [ChatGPT](https://chat.openai.com/chat), which can align nicely with human values.

1. **Sparks of Artificial General Intelligence: Early experiments with GPT-4.** *Sébastien Bubeck, Varun Chandrasekaran, Ronen Eldan, and et al.* Preprint 2023. [[pdf](https://arxiv.org/pdf/2303.12712.pdf)].

2. **GPT-4 Technical Report.** *OpenAI.* Preprint 2023. [[pdf](https://cdn.openai.com/papers/gpt-4.pdf)]; [[blog](https://openai.com/research/gpt-4)].

3. **The Wisdom of Hindsight Makes Language Models Better Instruction Followers.** *Tianjun Zhang, Fangchen Liu, Justin Wong, Pieter Abbeel, and Joseph E. Gonzalez.* Preprint 2023. [[pdf](https://arxiv.org/pdf/2302.05206.pdf)]; [[code](https://github.com/tianjunz/HIR)].

4. **Adding Instructions during Pretraining: Effective Way of Controlling Toxicity in Language Models.** *Shrimai Prabhumoye, Mostofa Patwary, Mohammad Shoeybi, and Bryan Catanzaro.* Preprint 2023. [[pdf](https://arxiv.org/pdf/2302.07388.pdf)].

5. **Training language models to follow instructions with human feedback.** *Long Ouyang, Jeffrey Wu, Xu Jiang, and et al.* NeurIPS 2022. [[pdf](https://openreview.net/pdf?id=TG8KACxEON)].

### 6.4 Other Papers

1. **GPTScore: Evaluate as You Desire.** *Jinlan Fu, See-Kiong Ng, Zhengbao Jiang, and Pengfei Liu.* Preprint 2023. [[pdf](https://arxiv.org/pdf/2302.04166.pdf)]; [[code](https://github.com/jinlanfu/GPTScore)].

2. **MultiInstruct: Improving Multi-Modal Zero-Shot Learning via Instruction Tuning.** *Zhiyang Xu, Ying Shen, and Lifu Huang.* Preprint 2022. [[pdf](https://arxiv.org/pdf/2212.10773.pdf)].

3. **Task-aware Retrieval with Instructions.** *Akari Asai, Timo Schick, Patrick Lewis, and et al.* Preprint 2022. [[pdf](https://arxiv.org/pdf/2211.09260.pdf)]; [[code](https://github.com/facebookresearch/tart)].

4. **UnifiedABSA: A Unified ABSA Framework Based on Multi-task Instruction Tuning.** *Zengzhi Wang, Rui Xia, and Jianfei Yu.* Preprint 2022. [[pdf](https://arxiv.org/pdf/2211.10986.pdf)].

5. **In-Context Learning for Few-Shot Dialogue State Tracking.** *Yushi Hu, Chia-Hsuan Lee, Tianbao Xie, Tao Yu, Noah A. Smith, and Mari Ostendorf.* Findings of EMNLP 2022. [[pdf](https://arxiv.org/pdf/2203.08568.pdf)]; [[code](https://github.com/Yushi-Hu/IC-DST)].

6. **Few-shot Learning with Multilingual Language Models.** *Xi Victoria Lin, Todor Mihaylov, Mikel Artetxe, and et al.* EMNLP 2022. [[pdf](https://arxiv.org/pdf/2112.10668.pdf)]; [[code](https://github.com/facebookresearch/fairseq/tree/main/examples/xglm)].

7. **UnifiedSKG: Unifying and Multi-Tasking Structured Knowledge Grounding with Text-to-Text Language Models.** *Tianbao Xie, Chen Henry Wu, Peng Shi, and et al.* EMNLP 2022. [[pdf](https://arxiv.org/pdf/2201.05966.pdf)]; [[code](https://github.com/HKUNLP/UnifiedSKG)]; [[website](https://unifiedskg.com/)].

8. **In-BoXBART: Get Instructions into Biomedical Multi-Task Learning .** *Mihir Parmar, Swaroop Mishra, Mirali Purohit, Man Luo, M. Hassan Murad, and Chitta Baral.* Findings of NAACL 2022. [[pdf](https://arxiv.org/pdf/2204.07600.pdf)]; [[code](https://github.com/Mihir3009/In-BoXBART)].

## 7. 📖 Extended Reading

We also share some other awesome papers that might inspire the future work.

### 7.1 Instruction Induction


1. **Guess the Instruction! Flipped Learning Makes Language Models Stronger Zero-Shot Learners.** *Seonghyeon Ye, Doyoung Kim, Joel Jang, Joongbo Shin, and Minjoon Seo.* Preprint 2022. [[pdf](https://arxiv.org/pdf/2210.02969.pdf)]; [[code](https://github.com/seonghyeonye/Flipped-Learning)].

2. **Instruction Induction: From Few Examples to Natural Language Task Descriptions.** *Or Honovich, Uri Shaham, Samuel R. Bowman, and Omer Levy.* Preprint 2022. [[pdf](https://arxiv.org/pdf/2205.10782.pdf)]; [[code](https://github.com/orhonovich/instruction-induction)].

3. **Learning to Decompose and Organize Complex Tasks.** *Yi Zhang, Sujay Kumar Jauhar, Julia Kiseleva, Ryen White, and Dan Roth.* NAACL 2021. [[pdf](https://aclanthology.org/2021.naacl-main.217.pdf)]; [[corpus](https://github.com/microsoft/MSComplexTasks)].

4. **Analogous Process Structure Induction for Sub-event Sequence Prediction.** *Hongming Zhang, Muhao Chen, Haoyu Wang, Yangqiu Song, and Dan Roth.* EMNLP 2020. [[pdf](https://aclanthology.org/2020.emnlp-main.119.pdf)]; [[code](https://cogcomp.github.io/APSI/)].

### 7.2 ChatGPT-related Papers

Nowdays, ChatGPT is a super star 🌟 in the NLP community. Since there is no official paper for ChatGPT, we share some frontier works that can provide deep insights into ChatGPT.

1. **When do you need Chain-of-Thought Prompting for ChatGPT?** *Jiuhai Chen, Lichang Chen, Heng Huang, and Tianyi Zhou.* Preprint 2023. [[pdf](https://arxiv.org/pdf/2304.03262.pdf)].

2. **Toxicity in ChatGPT: Analyzing Persona-assigned Language Models.** *Ameet Deshpande, Vishvak Murahari, Tanmay Rajpurohit, Ashwin Kalyan, and Karthik Narasimhan.* Preprint 2023. [[pdf](https://arxiv.org/pdf/2304.05335.pdf)].

3. **Is ChatGPT a General-Purpose Natural Language Processing Task Solver?** *Chengwei Qin, Aston Zhang, Zhuosheng Zhang, Jiaao Chen, Michihiro Yasunaga, and Diyi Yang.* Preprint 2023. [[pdf](https://arxiv.org/pdf/2302.06476.pdf)].

4. **How Close is ChatGPT to Human Experts? Comparison Corpus, Evaluation, and Detection.** *Biyang Guo, Xin Zhang, Ziyuan Wang, and et al.* Preprint 2023. [[pdf](https://arxiv.org/pdf/2301.07597.pdf)]; [[corpus](https://github.com/Hello-SimpleAI/chatgpt-comparison-detection)].

5. **ChatGPT: Jack of all trades, master of none.** *Jan Kocoń, Igor Cichecki, Oliwier Kaszyca, and et al.* Preprint 2023. [[pdf](https://arxiv.org/pdf/2302.10724.pdf)].

6. **On the Robustness of ChatGPT: An Adversarial and Out-of-distribution Perspective.** *Jindong Wang, Xixu Hu, Wenxin Hou, and et al.* Preprint 2023. [[pdf](https://arxiv.org/pdf/2302.12095.pdf)]; [[code](https://github.com/microsoft/robustlearn)].

### 7.3 Human Feedback vs. Model Feedback


1. **Aligning Large Language Models through Synthetic Feedback.** *Sungdong Kim, Sanghwan Bae, Jamin Shin, Soyoung Kang, Donghyun Kwak, Kang Min Yoo, and Minjoon Seo.* Preprint 2023. [[pdf](https://arxiv.org/pdf/2305.13735.pdf)].

2. **LIMA: Less Is More for Alignment.** *Chunting Zhou, Pengfei Liu, Puxin Xu, Srini Iyer, and et al.* Preprint 2023. [[pdf](https://arxiv.org/pdf/2305.11206.pdf)].

3. **Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision.** *Zhiqing Sun, Yikang Shen, Qinhong Zhou, and et al.* Preprint 2023. [[pdf](https://arxiv.org/pdf/2305.03047.pdf)]; [[code](https://github.com/IBM/Dromedary)].

4. **Chain of Hindsight Aligns Language Models with Feedback.** *Hao Liu, Carmelo Sferrazza, and Pieter Abbeel.* Preprint 2023. [[pdf](https://arxiv.org/pdf/2302.02676.pdf)]; [[code](https://github.com/lhao499/CoH)].

5. **Pretraining Language Models with Human Preferences.** *Tomasz Korbak, Kejian Shi, Angelica Chen, and et al.* Preprint 2023. [[pdf](https://arxiv.org/pdf/2302.08582.pdf)].

6. **Constitutional AI: Harmlessness from AI Feedback.** *Yuntao Bai, Saurav Kadavath, Sandipan Kundu, and et al.* Preprint 2022. [[pdf](https://arxiv.org/pdf/2212.08073.pdf)]; [[corpus](https://github.com/anthropics/ConstitutionalHarmlessnessPaper)].

7. **Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback.** *Yuntao Bai, Andy Jones, Kamal Ndousse, and et al.* Preprint 2022. [[pdf](https://arxiv.org/pdf/2204.05862.pdf)]; [[corpus](https://github.com/anthropics/hh-rlhf)].

### 7.4 Scalable Oversight and Alignment

1. **Measuring Progress on Scalable Oversight for Large Language Models.** *Samuel R. Bowman, Jeeyoon Hyun, Ethan Perez, and et al.* Preprint 2022. [[pdf](https://arxiv.org/pdf/2211.03540.pdf)].

2. **Aligning AI With Shared Human Values.** *Dan Hendrycks, Collin Burns, Steven Basart, Andrew Critch, Jerry Li, Dawn Song, and Jacob Steinhardt.* ICLR 2021. [[pdf](https://openreview.net/pdf?id=dNy_RKzJacY)].

### 7.5 Other Papers

1. **Navigating the Grey Area: Expressions of Overconfidence and Uncertainty in Language Models.** *Kaitlyn Zhou, Dan Jurafsky, and Tatsunori Hashimoto.* Preprint 2023. [[pdf](https://arxiv.org/pdf/2302.13439.pdf)].

2. **The Capacity for Moral Self-Correction in Large Language Models.** *Deep Ganguli, Amanda Askell, Nicholas Schiefer, and et al.* Preprint 2023. [[pdf](https://arxiv.org/pdf/2302.07459.pdf)].

3. **Large Language Models Can Be Easily Distracted by Irrelevant Context.** *Freda Shi, Xinyun Chen, Kanishka Misra, Nathan Scales, David Dohan, Ed Chi, Nathanael Schärli, and Denny Zhou.* Preprint 2023. [[pdf](https://arxiv.org/pdf/2302.00093.pdf)]; [[corpus](https://github.com/google-research-datasets/GSM-IC)].

4. **Language Models (Mostly) Know What They Know.** *Saurav Kadavath, Tom Conerly, Amanda Askell, and et al.* Preprint 2022. [[pdf](https://arxiv.org/pdf/2207.05221.pdf)].

---

## ⭐ Star History

[![Star History Chart](https://api.star-history.com/svg?repos=RenzeLou/awesome-instruction-learning&type=Date)](https://star-history.com/#RenzeLou/awesome-instruction-learning&Date)