https://github.com/YerevaNN/warp

Code for ACL'2021 paper WARP 🌀 Word-level Adversarial ReProgramming. Outperforming `GPT-3` on SuperGLUE Few-Shot text classification. https://aclanthology.org/2021.acl-long.381/
https://github.com/YerevaNN/warp

adversarial few-shot-learning natural-language-processing pretrained-models

Last synced: 5 months ago
JSON representation

Code for ACL'2021 paper WARP 🌀 Word-level Adversarial ReProgramming. Outperforming `GPT-3` on SuperGLUE Few-Shot text classification. https://aclanthology.org/2021.acl-long.381/

Host: GitHub
URL: https://github.com/YerevaNN/warp
Owner: YerevaNN
License: mit
Created: 2020-12-10T20:47:59.000Z (almost 5 years ago)
Default Branch: main
Last Pushed: 2021-10-03T19:31:03.000Z (about 4 years ago)
Last Synced: 2024-11-16T05:32:38.860Z (11 months ago)
Topics: adversarial, few-shot-learning, natural-language-processing, pretrained-models
Language: Python
Homepage: https://mahnerak.com/WARP
Size: 85 KB
Stars: 83
Watchers: 8
Forks: 16
Open Issues: 3
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

StarryDivineSky - YerevaNN/warp - level Adversarial ReProgramming 的代码。在 SuperGLUE 少样本文本分类上优于“GPT-3”。提出了一种基于对抗性重编程的替代方法，它是自动扩展提示模板生成的早期工作。而且参数量少了好多个数量级。 (文本分类)

README

# 🌀 WARP: Word-level Adversarial ReProgramming
This repository contains code for ACL'2021 Paper [WARP: Word-level Adversarial ReProgramming](https://aclanthology.org/2021.acl-long.381/).

^{WARP adds a few trainable embeddings around the input, which causes the masked language model to predict the sentiment of the sentence in the SST-2 task.}

Transfer learning from pretrained language models recently became the dominant approach for solving many NLP tasks. A common approach to transfer learning for multiple tasks that maximize parameter sharing trains one or more task-specific layers on top of the language model.

In this paper, we present an alternative approach based on adversarial reprogramming, which extends earlier work on automatic prompt generation. Adversarial reprogramming attempts to learn task-specific word embeddings that, when concatenated to the input text, instruct the language model to solve the specified task.

Using up to 25K trainable parameters per task, this approach outperforms all existing methods that use up to 25M trainable parameters on the public leaderboard of the GLUE benchmark. Our method, initialized with task-specific human-readable prompts, also works in a few-shot setting, outperforming GPT-3 on two SuperGLUE tasks after training on just 32 samples.

# Few-Shot Results

Set
Model
CB
RTE

F₁
Acc.
Acc.

dev

GPT-3 Small
26.1
42.9
52.3

GPT-3 Med
40.4
58.9
48.4

GPT-3
57.2
82.1
72.9

PET (ALBERT)
59.4
85.1
69.8

iPET (ALBERT)
92.4
92.9
74.0

WARP_init (ALBERT)
84.0
87.5
71.8

test

GPT-3
52.0
75.6
69.0

PET (ALBERT)
60.2
87.2
67.2

iPET (ALBERT)
79.9
88.8
70.8

WARP_init (ALBERT)
70.2
82.4
69.1

^{Results on SuperGLUE benchmark. The results for the test set are obtained from SuperGLUE evaluation server.

We only show systems performing in a similar few-shot training setup using 32 examples.}

# Setup
The code requires YerevaNN's internal version of `allennlp`
```
git clone https://github.com/YerevaNN/allennlp
git checkout warp
pip install .
```

# Training

### Linear Probing
```sh
for DATASET in 'cola' 'sst2' 'mrpc' 'qqp' 'stsb' 'mnli' 'rte' 'wnli' 'qnli'
do
export HPARAMS='{
"dataset": "'$DATASET'",
"lr": 0.0001,
"num_epochs": 20,
"prompts": [],
"reorder_optimized": false,
"max_batch_size": 8,
"max_tokens_sq": 262144, "on_logits": false, "pooling_index": null, "seed": 1}'
python -m allennlp train \
-s .aim/baseline-linear-${DATASET} configs/warp.jsonnet
done
```

### WARP_0
```sh
for DATASET in 'cola' 'sst2' 'mrpc' 'qqp' 'stsb' 'mnli' 'rte' 'wnli' 'qnli'
do
export HPARAMS='{
"dataset": "'$DATASET'",
"lr": 0.0001,
"num_epochs": 20,
"prompts": [null, ""],
"reorder_optimized": true,
"max_batch_size": 8,
"max_tokens_sq": 262144,
"on_logits": "pre_decoder_layer_norm",
"pooling_index": 1,
"seed": 1
}'
python -m allennlp train \
-s .aim/baseline-warp_0-${DATASET} configs/warp.jsonnet
done
```

## Training WARP

```sh
export DATASET="rte"
export HPARAMS='{
"benchmark":"super_glue",
"classifier_init":null,
"dataset":"'$DATASET'",
"ensure_whitespace_between":false,
"lr":0.001,
"max_batch_size":8,
"max_tokens_sq":262144,
"num_epochs":30,
"prompt_better_init":"",
"prompts":[-10,-11,-12,-13,-14,null,-15,-16,-17,-18,-19,"",-20,-21,-22,-23,-24,null,-25,-26,-27,-28,-29],
"seed":1,
"transformer_model":"roberta-large"
}'
python -m allennlp train \
-s .aim/t-${DATASET} configs/warp.jsonnet
```

## WARP_init
## Few-Shot Experiments
```sh
export HPARAMS='{
"benchmark":"super_glue",
"classifier_init": {
"entailment": " yes",
"not_entailment": " instead"
},
"dataset":"few_rte",
"eval_mode":false,
"lr":0.001,
"max_batch_size":2,
"max_tokens_sq":131072,
"num_epochs":100,
"num_gradient_accumulation_steps":2,
"prompt_better_init": "[PAD]",
"prompts":[-10,-11,[-14,"\""],null,[-15,"\""], [-16, "?"], "", [-20, ","], null, [-29, "!"],-30,-31],
"seed":3,
"str_cut_frac":0,
"transformer_model":"albert-xxlarge-v2",
"validation_metric": null
}'
python -m allennlp train \
-s .aim/t-${DATASET}-`date +%s` configs/warp.jsonnet
```

```sh
export HPARAMS='{
"benchmark":"super_glue",
"classifier_init":{
"entailment":" yes",
"not_entailment":" instead"
},
"dataset":"few_rte",
"grad_norm":1,
"lr":0.001,
"max_batch_size":2,
"max_tokens_sq":131072,
"num_epochs":30,
"num_gradient_accumulation_steps":2,
"prompt_better_init":"[PAD]",
"prompts":[-10,-11,[-14,"\""],null,[-15,"\""],[-16,"?"],"",[-20,","],null,[-29,"!"],-30,-31],
"seed":1,
"str_cut_frac":0.06,
"transformer_model":"albert-xxlarge-v2",
"validation_metric":"+training_val_metric"
}'
python -m allennlp train \
-s .aim/t-${DATASET}-`date +%s` configs/warp.jsonnet
```

## Evaluation

```sh
python -m allennlp predict \
--silent --use-dataset-reader --cuda-device 0 \
--batch-size 50 \
--predictor glue --output-file v0.1/AX.tsv /data/arp/.aim/H-93ae5ae9 ax/test
```

```sh
python -m allennlp predict \
--silent --use-dataset-reader --cuda-device 0 \
--batch-size 50 \
--predictor glue --output-file v0.1/MNLI-m.tsv /data/arp/.aim/H-93ae5ae9 test_matched
```

## Citation
If you want to refer to our work use this bibTeX:
```
@inproceedings{hambardzumyan-etal-2021-warp,
title = "{WARP}: {W}ord-level {A}dversarial {R}e{P}rogramming",
author = "Hambardzumyan, Karen and
Khachatrian, Hrant and
May, Jonathan",
booktitle = "Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)",
month = aug,
year = "2021",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2021.acl-long.381",
doi = "10.18653/v1/2021.acl-long.381",
pages = "4921--4933"
}
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/YerevaNN/warp

Awesome Lists containing this project

README