
An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

A curated list of resources on fine-tuning language models.

List: awesome-finetuning

Last synced: about 1 month ago
JSON representation

A curated list of resources on fine-tuning language models.

Awesome Lists containing this project



# Awesome Fine-tuning [![Awesome](](
A curated list of resources on fine-tuning language models, inspired by [awesome-implicit-representations](

## Disclaimer

This list does __not aim to be exhaustive__. Feel free to open a pull request in order to suggest papers that should be added to the list.

Disclosure. I'm an author of the following papers:

- [On the Stability of Fine-tuning BERT: Misconceptions, Explanations, and Strong Baselines](
- [On the Interplay Between Fine-tuning and Sentence-level Probing for Linguistic Knowledge in Pre-trained Transformers](

## Table of contents

- [Papers](#papers)
- [Fine-tuning before transformers](#fine-tuning-before-transformers)
- [Fine-tuning transformers](#fine-tuning-transformers)
- [Fine-tuning stability](#fine-tuning-stability)
- [Intermediate task fine-tuning](#intermediate-task-fine-tuning)
- [Parameter-efficient fine-tuning](#parameter-efficient-fine-tuning)
- [Prompt-based fine-tuning](#prompt-based-fine-tuning)
- [Evaluating few-shot fine-tuning](#evaluating-few-shot-fine-tuning)
- [Fine-tuning analysis](#fine-tuning-analysis)
- [Theoretical work](#theoretical-work)
- [Surveys](#surveys)
- [Misc.](#misc)

# Papers

## Fine-tuning before transformers

- [Semi-supervised Sequence Learning]( Dai & Le (2015) ![](
- [How Transferable are Neural Networks in NLP Applications?]( Mou et al. (2016) ![](

- [Improving Neural Machine Translation Models with Monolingual Data]( Sennrich et al. (2016) ![](

- [Question Answering through Transfer Learning from Large Fine-grained Supervision Data]( Min et al. (2017) ![](

- [Universal Language Model Fine-tuning for Text Classification]( Howard & Ruder (2018) ![](

- [An Embarrassingly Simple Approach for Transfer Learning from Pretrained Language Models]( Chronopoulou et al. (2019) ![](

- ...

## Fine-tuning transformers

- [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding]( Devlin et al. (2019) ![](

- [Better Fine-Tuning by Reducing Representational Collapse]( Aghajanyan et al. (2020) ![](

- [FreeLB: Enhanced Adversarial Training for Natural Language Understanding]( Zhu et al. (2020) ![](

- [SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization]( Jiang et al. (2020) ![](

- [Supervised Contrastive Learning for Pre-trained Language Model Fine-tuning]( Gunel et al. (2021) ![](

- ...

### Intermediate task fine-tuning

- [Sentence Encoders on STILTs: Supplementary Training on Intermediate Labeled-data Tasks]( Phang et al. (2018) ![](

- [Transfer Fine-Tuning: A BERT Case Study]( Arase & Tsujii (2019) ![](

- [Learning and Evaluating General Linguistic Intelligence]( Yogatama et al. (2019) ![](

- [Intermediate-Task Transfer Learning with Pretrained Language Models: When and Why Does It Work?]( Pruksachatkun et al. (2020) ![](

- [English Intermediate-Task Training Improves Zero-Shot Cross-Lingual Transfer Too]( Phang et al. (2020) ![](

- [What to Pre-Train on? Efficient Intermediate Task Selection]( Poth et al. (2021) ![](

- [Is Supervised Syntactic Parsing Beneficial for Language Understanding Tasks? An Empirical Investigation]( Glavaš & Vulić (2021) ![](

- [Muppet: Massive Multi-task Representations with Pre-Finetuning]() Aghajanyan et al. (2021) ![](

- ...

#### Intermediate (masked) language modeling

- [Unsupervised Domain Adaptation of Contextualized Embeddings for Sequence Labeling]( Han & Eisenstein (2019) ![](

- [Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks]( Gururangan et al. (2020) ![](

- [Mining Knowledge for Natural Language Inference from Wikipedia Categories]( Chen et al. (2020) ![](

- [Parsing with Multilingual BERT, a Small Corpus, and a Small Treebank]( Chau et al. (2020) ![](

- [Train No Evil: Selective Masking for Task-Guided Pre-Training]( Gu et al. (2020) ![](

- ...

##### Injecting "skills"

- [Injecting Numerical Reasoning Skills into Language Models]( Geva et al. (2020) ![](

- [Common Sense or World Knowledge? Investigating Adapter-Based Knowledge Injection into Pretrained Transformers]( Lauscher et al. (2020) ![](

- [Analyzing Commonsense Emergence in Few-shot Knowledge Models]( Da et al. (2021) ![](

- ...

### Parameter-efficient fine-tuning

- [Parameter-Efficient Transfer Learning for NLP]( Houlsby et al. (2019) ![](

- [BERT and PALs: Projected Attention Layers for Efficient Adaptation in Multi-Task Learning]( Stickland & Murray (2019) ![](

- [Simple, Scalable Adaptation for Neural Machine Translation]( Bapna & Firat (2019) ![](

- [Masking as an Efficient Alternative to Finetuning for Pretrained Language Models]( Zhao et al. (2020) ![](

- [Movement Pruning: Adaptive Sparsity by Fine-Tuning]( Sanh et al. (2020) ![](

- [AdapterFusion: Non-Destructive Task Composition for Transfer Learning]( Pfeiffer et al. (2021) ![](

- [MAD-X: An Adapter-Based Framework for Multi-Task Cross-Lingual Transfer]( Pfeiffer et al. (2020) ![](

- [AdapterDrop: On the Efficiency of Adapters in Transformers]( Rücklé et al. (2021) ![](

- [Parameter-efficient transfer learning with diff pruning]( Guo et al. (2021) ![](

- [Compacter: Efficient Low-Rank Hypercomplex Adapter Layers]( Mahabadi et al. (2021) ![](

- [LoRA: Low-Rank Adaptation of Large Language Models]( Hu et al. (2021) ![](

- [BitFit: Simple Parameter-efficient Fine-tuning for Transformer-based Masked Language-models]( Zaken et al. (2022) ![](

- [Training Neural Networks with Fixed Sparse Masks]( Sung et al. (2021) ![](

- [Towards a Unified View of Parameter-Efficient Transfer Learning]( He et al. (2021) ![](

- [Composable Sparse Fine-Tuning for Cross-Lingual Transfer]( Ansell et al. (2022) ![](

- [Revisiting Parameter-Efficient Tuning: Are We Really There Yet?]( Chen et al. (2022) ![](

- [Prompt-free and Efficient Few-shot Learning with Language Models]( Mahabadi et al. (2022) ![](

- [Adaptable Adapters]( Moosavi et al. (2022) ![](

- [Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning]( Liu et al. (2022) ![](

- ...

Some continuous prompt-based methods can also be seen as parameter-efficient fine-tuning methods. For a list of papers see [below](#continuous-prompts).

### Prompt-based fine-tuning

#### Discrete prompts

- [Exploiting Cloze Questions for Few Shot Text Classification and Natural Language Inference]( Schick & Schütze (2021a) ![](

- [It's Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners]( Schick & Schütze (2021b) ![](

- [Automatically Identifying Words That Can Serve as Labels for Few-Shot Text Classification]( Schick et al. (2020) ![](

- [Few-Shot Text Generation with Natural Language Instructions]( Schick & Schütze (2021c) ![](

- [Making Pre-trained Language Models Better Few-shot Learners]( Gao et al. (2021) ![](

- [AutoPrompt: Eliciting Knowledge from Language Models with Automatically Generated Prompts]( Shin et al. (2020) ![](

- [How Many Data Points is a Prompt Worth?]( Le Scao & Rush (2021) ![](

- [Improving and Simplifying Pattern Exploiting Training]( Tam et al. (2021) ![](

- [Adapting Language Models for Zero-shot Learning by Meta-tuning on Dataset and Prompt Collections]( Zhong et al. (2021) ![](

- [Calibrate Before Use: Improving Few-Shot Performance of Language Models]( Zhao et al. (2021) ![](

- [PTR: Prompt Tuning with Rules for Text Classification]( Han et al. (2021) ![](

- [Cutting Down on Prompts and Parameters: Simple Few-Shot Learning with Language Models]( Logan IV et al. (2021) ![](

- [Knowledgeable Prompt-tuning: Incorporating Knowledge into Prompt Verbalizer for Text Classification]( Hu et al. (2021) ![](

- [Prompt-Learning for Fine-Grained Entity Typing]( Ding et al. (2021) ![](

- [Do Prompt-Based Models Really Understand the Meaning of their Prompts?]( Webson & Pavlick (2022) ![](

- [Avoiding Inference Heuristics in Few-shot Prompt-based Finetuning]( Utama et al. (2021) ![](

- [Prototypical Verbalizer for Prompt-based Few-shot Tuning]( Cui et al. (2022) ![](

- ...

#### Multi-task fine-tuning using discrete prompts

- [Cross-Task Generalization via Natural Language Crowdsourcing Instructions]( Mishra et al. (2021) ![](

- [Discrete and Soft Prompting for Multilingual Models]( Zhao & Schütze (2021) ![](

- [Finetuned Language Models Are Zero-Shot Learners]( Wei et al. (2021) ![](

- [Multitask Prompted Training Enables Zero-Shot Task Generalization]( Sanh et al. (2021) ![](

- [Prompt Consistency for Zero-Shot Task Generalization]( Zhou et al. (2022) ![](

- [Few-shot Adaptation Works with UnpredicTable Data]( Chan et al. (2022) ![](

- [Benchmarking Generalization via In-Context Instructions on 1,600+ Language Tasks]( Wang et al. (2022) ![](

- ...

#### Continuous prompts

- [Prefix-Tuning: Optimizing Continuous Prompts for Generation]( Li & Liang (2021) ![](

- [WARP: Word-level Adversarial ReProgramming]( Hambardzumyan et al. (2021) ![](

- [Learning How to Ask: Querying LMs with Mixtures of Soft Prompts]( Qin & Eisner (2021) ![](

- [Factual Probing Is [MASK]: Learning vs. Learning to Recall]( Zhong et al. (2021) ![](

- [The Power of Scale for Parameter-Efficient Prompt Tuning]( Lester et al. (2021) ![](

- [Multimodal Few-Shot Learning with Frozen Language Models]( Tsimpoukelli et al. (2021) ![](

- [Noisy Channel Language Model Prompting for Few-Shot Text Classification]( Min et al. (2021) ![](

- [Continuous Entailment Patterns for Lexical Inference in Context]( Schmitt & Schütze (2021) ![](

- [Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners]( Zhang et al. (2022) ![](

- [SPoT: Better Frozen Model Adaptation through Soft Prompt Transfer]( Vu et al. (2022) ![](

- [P-Tuning: Prompt Tuning Can Be Comparable to Fine-tuning Across Scales and Tasks]( Liu et al. (2022) ![](

- ...

### Evaluating few-shot fine-tuning

- [True Few-Shot Learning with Language Models]( Perez et al. (2021) ![](

- [FLEX: Unifying Evaluation for Few-Shot NLP]( Bragg et al. (2021) ![](

- [FewNLU: Benchmarking State-of-the-Art Methods for Few-Shot Natural Language Understanding]( Zheng et al. (2022) ![](

- [True Few-Shot Learning with Prompts—A Real-World Perspective]( Schick & Schütze (2022) ![](

- ...

### Fine-tuning analysis

- [Visualizing and Understanding the Effectiveness of BERT]( Hao et al. (2019) ![](

- [oLMpics-On What Language Model Pre-training Captures]( Talmor et al. (2020) ![](

- [Pretrained Transformers Improve Out-of-Distribution Robustness]( Hendrycks et al. (2020) ![](

- [What Happens To BERT Embeddings During Fine-tuning?]( Merchant et al. (2020) ![](

- [Investigating Learning Dynamics of BERT Fine-Tuning]( Hao et al. (2020) ![](

- [Investigating Transferability in Pretrained Language Models]( Tamkin et al. (2020) ![](

- [Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tuning]( Aghajanyan et al. (2021) ![](

- [Fine-Tuned Transformers Show Clusters of Similar Representations Across Layers]( Phang et al. (2021) ![](

- [Fine-Tuning can Distort Pretrained Features and Underperform Out-of-Distribution]( Kumar et al. (2022) ![](

- [A Closer Look at How Fine-tuning Changes BERT]() Zhou & Srikumar (2022) ![](

- [Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tuning]( Aghajanyan et al. (2021) ![](

- [When Do You Need Billions of Words of Pretraining Data?]( Zhang et al. (2021) ![](

- [On the Effectiveness of Adapter-based Tuning for Pretrained Language Model Adaptation]( He et al. (2021) ![](

- [Pretrained Transformers as Universal Computation Engines]( Lu et al. (2021) ![](

- [Predicting Inductive Biases of Pre-Trained Models]( Lovering et al. (2021) ![](

- [Fine-Tuned Transformers Show Clusters of Similar Representations Across Layers]( Phang et al. (2021) ![](

- ...

#### Fine-tuning stability

- [Fine-Tuning Pretrained Language Models: Weight Initializations, Data Orders, and Early Stopping]( Dodge et al. (2020) ![](

- [Revisiting Few-sample BERT Fine-tuning]() Zhang et al. (2021) ![](

- [On the Stability of Fine-tuning BERT: Misconceptions, Explanations, and Strong Baselines]( Mosbach et al. (2021) ![](

- ...

#### Fine-tuning and probing

- [What Happens To BERT Embeddings During Fine-tuning?]( Merchant et al. (2020) ![](

- [On the Interplay Between Fine-tuning and Sentence-level Probing for Linguistic Knowledge in Pre-trained Transformers]( Mosbach et al. (2020) ![]( ![](

- [On the Importance of Data Size in Probing Fine-tuned Models]( Mehrafarin et al. (2022) ![](

- ...

#### Fine-tuning and generalization

- [BERTs of a feather do not generalize together: Large variability in generalization across models with similar test set performance]( McCoy et al. (2020) ![](

- [Generalization in NLI: Ways (Not) To Go Beyond Simple Heuristics]( Bhagava et al. (2021) ![](

- [Linear Connectivity Reveals Generalization Strategies]() Juneja et al. (2022) ![](

- ...

##### Fine-tuning and spurious features

- [An Empirical Study on Robustness to Spurious Correlations using Pre-trained Language Models]( Tu et al. (2020) ![](

- [Learning Which Features Matter: RoBERTa Acquires a Preference for Linguistic Generalizations (Eventually)]( Warstadt et al. (2020) ![](

- [Predicting Inductive Biases of Pre-Trained Models]( Lovering et al. (2021) ![](

- ...

## Theoretical work

- [A Mathematical Exploration of Why Language Models Help Solve Downstream Tasks]( Saunshi et al. (2021) ![](

- [Why Do Pretrained Language Models Help in Downstream Tasks? An Analysis of Head and Prompt Tuning]( Wei et al. (2021) ![](

- ...

## Surveys

- [Recent Advances in Language Model Fine-tuning]() Ruder (2021) ![](

- [On the Opportunities and Risks of Foundation Models]( *(Adaptation chapter)* Bommasani et al. (2021) ![](

- [Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing]( Liu et al. (2021) ![](

- [Delta Tuning: A Comprehensive Study of Parameter Efficient Methods for Pre-trained Language Models]( Ding et al. (2022) ![](

- ...

## Misc.

- [What is being transferred in transfer learning?]( Neyshabur et al. (2020) ![](

- [Leap-Of-Thought: Teaching Pre-Trained Models to Systematically Reason Over Implicit Knowledge]( Talmor et al. (2020) ![](

- [Exploring and Predicting Transferability across NLP Tasks]( Vu et al. (2020) ![](

- ...