https://github.com/vita-group/bert-tickets
[NeurIPS 2020] "The Lottery Ticket Hypothesis for Pre-trained BERT Networks", Tianlong Chen, Jonathan Frankle, Shiyu Chang, Sijia Liu, Yang Zhang, Zhangyang Wang, Michael Carbin
https://github.com/vita-group/bert-tickets
bert lottery-ticket-hypothesis lottery-tickets pre-training universal-embeddings
Last synced: 6 months ago
JSON representation
[NeurIPS 2020] "The Lottery Ticket Hypothesis for Pre-trained BERT Networks", Tianlong Chen, Jonathan Frankle, Shiyu Chang, Sijia Liu, Yang Zhang, Zhangyang Wang, Michael Carbin
- Host: GitHub
- URL: https://github.com/vita-group/bert-tickets
- Owner: VITA-Group
- License: mit
- Created: 2020-07-20T19:09:03.000Z (about 5 years ago)
- Default Branch: master
- Last Pushed: 2021-12-30T13:39:49.000Z (almost 4 years ago)
- Last Synced: 2025-03-29T09:41:58.260Z (7 months ago)
- Topics: bert, lottery-ticket-hypothesis, lottery-tickets, pre-training, universal-embeddings
- Language: Python
- Homepage:
- Size: 3.29 MB
- Stars: 140
- Watchers: 11
- Forks: 19
- Open Issues: 4
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# The Lottery Ticket Hypothesis for Pre-trained BERT Networks
[](https://opensource.org/licenses/MIT)
Code for this paper [The Lottery Ticket Hypothesis for Pre-trained BERT Networks](https://arxiv.org/abs/2007.12223). [NeurIPS 2020]
Tianlong Chen, Jonathan Frankle, Shiyu Chang, Sijia Liu, Yang Zhang, Zhangyang Wang, Michael Carbin.
Our implementation is based on [Huggingface repo](https://github.com/huggingface/transformers). Details are referred to README [here](https://github.com/TAMU-VITA/BERT-Tickets/blob/master/transformers-master/README.md). Pre-trained subnetworks are coming soon.
## Overview
### The Existence of Matching Subnetworks in BERT

### Transfer Learning for BERT Winning Tickets

## Method

## Reproduce Details
### Prerequisites and Installation
Details are referred to README [here](https://github.com/TAMU-VITA/BERT-Tickets/blob/master/transformers-master/README.md).
### Iterative Magnitude Pruning (IMP)
#### MLM task:
```shell
python -u LT_pretrain.py
--output_dir LT_pretrain_model
--model_type bert
--model_name_or_path bert-base-uncased
--train_data_file pretrain_data/en.train
--do_train
--eval_data_file pretrain_data/en.valid
--do_eval
--per_gpu_train_batch_size 16
--per_gpu_eval_batch_size 16
--evaluate_during_training
--num_train_epochs 1
--logging_steps 10000
--save_steps 10000
--mlm
--overwrite_output_dir
--seed 57
```#### Glue task:
```shell
python -u LT_glue.py
--output_dir tmp/mnli
--logging_steps 36813
--task_name MNLI
--data_dir glue_data/MNLI
--model_type bert
--model_name_or_path bert-base-uncased
--do_train
--do_eval
--do_lower_case
--max_seq_length 128
--per_gpu_train_batch_size 32
--learning_rate 2e-5
--num_train_epochs 30
--overwrite_output_dir
--evaluate_during_training
--save_steps 36813
--eval_all_checkpoints
--seed 57
```#### SQuAD task:
```shell
python -u squad_trans.py
--output_dir tmp/530/squad
--model_type bert
--model_name_or_path bert-base-uncased
--do_train
--do_eval
--do_lower_case
--train_file SQuAD/train-v1.1.json
--predict_file SQuAD/dev-v1.1.json
--per_gpu_train_batch_size 16
--learning_rate 3e-5
--num_train_epochs 40
--max_seq_length 384
--doc_stride 128
--evaluate_during_training
--eval_all_checkpoints
--overwrite_output_dir
--logging_steps 22000
--save_steps 22000
--seed 57
```### One-shot Magnitude Pruning (OMP)
```shell
python oneshot.py --weight [pre or rand] --model [glue or squad or pretrain] --rate 0.5
```### Fine-tuning
#### MLM task:
```shell
python -u pretrain_trans.py
--dir pre\ [using random weight or official pretrain weight]
--weight_pertub tmp/shuffle_weight.pt\ [weight for Bert (not required)]
--mask_dir tmp/dif_mask/pretrain_mask.pt \ [mask file]
--output_dir tmp/530/pre
--model_type bert
--model_name_or_path bert-base-uncased
--train_data_file pretrain_data/en.train
--do_train --eval_data_file pretrain_data/en.valid
--do_eval
--per_gpu_train_batch_size 8
--per_gpu_eval_batch_size 8
--evaluate_during_training
--num_train_epochs 1
--logging_steps 2000
--save_steps 0
--max_steps 20000
--mlm
--overwrite_output_dir
--seed 57
```#### Glue task:
```shell
python -u glue_trans.py
--dir pre \ [using random weight or official pretrain weight]
--weight_pertub tmp/shuffle_weight.pt \ [weight for Bert (not required)]
--mask_dir tmp/dif_mask/mnli_mask.pt \ [mask file]
--output_dir tmp/530/mnli
--logging_steps 12271
--task_name MNLI
--data_dir glue_data/MNLI
--model_type bert
--model_name_or_path bert-base-uncased
--do_train
--do_eval
--do_lower_case
--max_seq_length 128
--per_gpu_train_batch_size 32
--learning_rate 2e-5
--num_train_epochs 3
--overwrite_output_dir
--evaluate_during_training
--save_steps 0
--eval_all_checkpoints
--seed 5
```#### SQuAD task:
```shell
python -u squad_trans.py
--dir pre \ [using random weight or official pretrain weight]
--weight_pertub tmp/shuffle_weight.pt \ [weight for Bert (not required)]
--mask_dir tmp/dif_mask/squad_mask.pt \ [mask file]
--output_dir tmp/530/squad
--model_type bert
--model_name_or_path bert-base-uncased
--do_train
--do_eval
--do_lower_case
--train_file SQuAD/train-v1.1.json
--predict_file SQuAD/dev-v1.1.json
--per_gpu_train_batch_size 16
--learning_rate 3e-5
--num_train_epochs 4
--max_seq_length 384
--doc_stride 128
--evaluate_during_training
--eval_all_checkpoints
--overwrite_output_dir
--logging_steps 5500
--save_steps 0
--seed 57
```### Subnetwork with Ramdomly Suffuled Pre-trined Weight
```
python pertub_weight.py
```## Citation
If you use this code for your research, please cite our paper:
```
@misc{chen2020lottery,
title={The Lottery Ticket Hypothesis for Pre-trained BERT Networks},
author={Tianlong Chen and Jonathan Frankle and Shiyu Chang and Sijia Liu and Yang Zhang and Zhangyang Wang and Michael Carbin},
year={2020},
eprint={2007.12223},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
```## Acknowlegement
We would like to express our deepest gratitude to the MIT-IBM Watson AI Lab. In particular, we would like to thank John Cohn for his generous help in providing us with the computing resources necessary to conduct this research.