Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/yitu-opensource/ConvBert
https://github.com/yitu-opensource/ConvBert
Last synced: about 1 month ago
JSON representation
- Host: GitHub
- URL: https://github.com/yitu-opensource/ConvBert
- Owner: yitu-opensource
- License: other
- Created: 2020-07-14T06:45:24.000Z (over 4 years ago)
- Default Branch: master
- Last Pushed: 2022-10-04T01:08:18.000Z (about 2 years ago)
- Last Synced: 2024-08-03T09:04:49.387Z (5 months ago)
- Language: Python
- Size: 198 KB
- Stars: 244
- Watchers: 8
- Forks: 54
- Open Issues: 5
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- StarryDivineSky - yitu-opensource/ConvBert
README
# ConvBERT
## Introduction
In this repo, we introduce a new architecture **ConvBERT** for pre-training based language model. The code is tested on a V100 GPU. For detailed description and experimental results, please refer to our NeurIPS 2020 paper [ConvBERT: Improving BERT with Span-based Dynamic Convolution](https://arxiv.org/abs/2008.02496).
## Requirements
* Python 3
* tensorflow 1.15
* numpy
* scikit-learn## Experiments
### Pre-training
These instructions pre-train a medium-small sized ConvBERT model (17M parameters) using the [OpenWebText](https://skylion007.github.io/OpenWebTextCorpus/) corpus.
To build the tf-record and pre-train the model, download the [OpenWebText](https://skylion007.github.io/OpenWebTextCorpus/) corpus (12G) and **setup your data directory** in `build_data.sh` and `pretrain.sh`. Then run
```bash
bash build_data.sh
```The processed data require roughly 30G of disk space. Then, to pre-train the model, run
```bash
bash pretrain.sh
```See `configure_pretraining.py` for the details of the supported hyperparameters.
### Fine-tining
We gives the instruction to fine-tune a pre-trained medium-small sized ConvBERT model (17M parameters) on GLUE. You can refer to the Google Colab notebook for a [quick example](https://colab.research.google.com/drive/1WIu2Cc1C8E7ayZBzEmpfd5sXOhe7Ehhz?usp=sharing). See our paper for more details on model performance. Pre-trained model can be found [here](https://drive.google.com/drive/folders/1pSsPcQrGXyt1FB45clALUQf-WTNAbUQa?usp=sharing). (You can also download it from [baidu cloud](https://pan.baidu.com/s/1jPo0e94p2dB8UBz33QuMrQ) with extraction code m9d2.)
To evaluate the performance on GLUE, you can download the GLUE data by running
```bash
python3 download_glue_data.py
```
Set up the data by running `mv CoLA cola && mv MNLI mnli && mv MRPC mrpc && mv QNLI qnli && mv QQP qqp && mv RTE rte && mv SST-2 sst && mv STS-B sts && mv diagnostic/diagnostic.tsv mnli && mkdir -p $DATA_DIR/finetuning_data && mv * $DATA_DIR/finetuning_data`. After preparing the GLUE data, **setup your data directory** in `finetune.sh` and run
```bash
bash finetune.sh
```
And you can test different tasks by changing configs in `finetune.sh`.If you find this repo helpful, please consider cite
```bibtex
@inproceedings{NEURIPS2020_96da2f59,
author = {Jiang, Zi-Hang and Yu, Weihao and Zhou, Daquan and Chen, Yunpeng and Feng, Jiashi and Yan, Shuicheng},
booktitle = {Advances in Neural Information Processing Systems},
editor = {H. Larochelle and M. Ranzato and R. Hadsell and M.F. Balcan and H. Lin},
pages = {12837--12848},
publisher = {Curran Associates, Inc.},
title = {ConvBERT: Improving BERT with Span-based Dynamic Convolution},
url = {https://proceedings.neurips.cc/paper/2020/file/96da2f590cd7246bbde0051047b0d6f7-Paper.pdf},
volume = {33},
year = {2020}
}
```
# ReferencesHere are some great resources we benefit:
Codebase: Our codebase are based on [ELECTRA](https://github.com/google-research/electra).
Dynamic convolution: [Implementation](https://github.com/pytorch/fairseq/blob/265791b727b664d4d7da3abd918a3f6fb70d7337/fairseq/modules/lightconv_layer/lightconv_layer.py#L75) from [Pay Less Attention with Lightweight and Dynamic Convolutions](https://openreview.net/pdf?id=SkVhlh09tX)
Dataset: [OpenWebText](https://skylion007.github.io/OpenWebTextCorpus/) from [Language Models are Unsupervised Multitask Learners](https://d4mucfpksywv.cloudfront.net/better-language-models/language-models.pdf)