{"id":13753019,"url":"https://github.com/yitu-opensource/ConvBert","last_synced_at":"2025-05-09T20:34:43.387Z","repository":{"id":46298882,"uuid":"279505044","full_name":"yitu-opensource/ConvBert","owner":"yitu-opensource","description":null,"archived":false,"fork":false,"pushed_at":"2022-10-04T01:08:18.000Z","size":203,"stargazers_count":246,"open_issues_count":5,"forks_count":54,"subscribers_count":8,"default_branch":"master","last_synced_at":"2024-11-16T05:32:30.841Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/yitu-opensource.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-07-14T06:45:24.000Z","updated_at":"2024-10-23T06:50:40.000Z","dependencies_parsed_at":"2022-08-03T17:31:09.652Z","dependency_job_id":null,"html_url":"https://github.com/yitu-opensource/ConvBert","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yitu-opensource%2FConvBert","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yitu-opensource%2FConvBert/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yitu-opensource%2FConvBert/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yitu-opensource%2FConvBert/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/yitu-opensource","download_url":"https://codeload.github.com/yitu-opensource/ConvBert/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253321816,"owners_count":21890471,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-03T09:01:14.605Z","updated_at":"2025-05-09T20:34:41.784Z","avatar_url":"https://github.com/yitu-opensource.png","language":"Python","funding_links":[],"categories":["BERT优化"],"sub_categories":["大语言对话模型及数据"],"readme":"# ConvBERT\n\n## Introduction\n\nIn this repo, we introduce a new architecture **ConvBERT** for pre-training based language model. The code is tested on a V100 GPU. For detailed description and experimental results, please refer to our NeurIPS 2020 paper [ConvBERT: Improving BERT with Span-based Dynamic Convolution](https://arxiv.org/abs/2008.02496).\n\n## Requirements\n* Python 3\n* tensorflow 1.15\n* numpy\n* scikit-learn\n\n## Experiments\n\n\n### Pre-training\n\nThese instructions pre-train a medium-small sized ConvBERT model (17M parameters)  using the [OpenWebText](https://skylion007.github.io/OpenWebTextCorpus/) corpus.\n\nTo build the tf-record and pre-train the model, download the [OpenWebText](https://skylion007.github.io/OpenWebTextCorpus/) corpus (12G) and **setup your data directory** in `build_data.sh` and `pretrain.sh`. Then run\n\n```bash\nbash build_data.sh\n```\n\nThe processed data require roughly 30G of disk space. Then, to pre-train the model, run\n\n```bash\nbash pretrain.sh\n```\n\nSee `configure_pretraining.py` for the details of the supported hyperparameters.\n\n### Fine-tining\n\nWe gives the instruction to fine-tune a pre-trained medium-small sized ConvBERT model (17M parameters) on GLUE. You can refer to the Google Colab notebook for a [quick example](https://colab.research.google.com/drive/1WIu2Cc1C8E7ayZBzEmpfd5sXOhe7Ehhz?usp=sharing). See our paper for more details on model performance. Pre-trained model can be found [here](https://drive.google.com/drive/folders/1pSsPcQrGXyt1FB45clALUQf-WTNAbUQa?usp=sharing). (You can also download it from [baidu cloud](https://pan.baidu.com/s/1jPo0e94p2dB8UBz33QuMrQ) with extraction code m9d2.)\n\nTo evaluate the performance on GLUE, you can download the GLUE data by running\n```bash\npython3 download_glue_data.py\n```\nSet up the data by running `mv CoLA cola \u0026\u0026 mv MNLI mnli \u0026\u0026 mv MRPC mrpc \u0026\u0026 mv QNLI qnli \u0026\u0026 mv QQP qqp \u0026\u0026 mv RTE rte \u0026\u0026 mv SST-2 sst \u0026\u0026 mv STS-B sts \u0026\u0026 mv diagnostic/diagnostic.tsv mnli \u0026\u0026 mkdir -p $DATA_DIR/finetuning_data \u0026\u0026 mv * $DATA_DIR/finetuning_data`. After preparing the GLUE data, **setup your data directory** in `finetune.sh` and  run\n```bash\nbash finetune.sh\n```\nAnd you can test different tasks by changing configs in `finetune.sh`.\n\nIf you find this repo helpful, please consider cite\n```bibtex\n@inproceedings{NEURIPS2020_96da2f59,\n author = {Jiang, Zi-Hang and Yu, Weihao and Zhou, Daquan and Chen, Yunpeng and Feng, Jiashi and Yan, Shuicheng},\n booktitle = {Advances in Neural Information Processing Systems},\n editor = {H. Larochelle and M. Ranzato and R. Hadsell and M.F. Balcan and H. Lin},\n pages = {12837--12848},\n publisher = {Curran Associates, Inc.},\n title = {ConvBERT: Improving BERT with Span-based Dynamic Convolution},\n url = {https://proceedings.neurips.cc/paper/2020/file/96da2f590cd7246bbde0051047b0d6f7-Paper.pdf},\n volume = {33},\n year = {2020}\n}\n```\n# References\n\nHere are some great resources we benefit:\n\nCodebase: Our codebase are based on [ELECTRA](https://github.com/google-research/electra).\n\nDynamic convolution: [Implementation](https://github.com/pytorch/fairseq/blob/265791b727b664d4d7da3abd918a3f6fb70d7337/fairseq/modules/lightconv_layer/lightconv_layer.py#L75) from [Pay Less Attention with Lightweight and Dynamic Convolutions](https://openreview.net/pdf?id=SkVhlh09tX)\n\nDataset: [OpenWebText](https://skylion007.github.io/OpenWebTextCorpus/) from [Language Models are Unsupervised Multitask Learners](https://d4mucfpksywv.cloudfront.net/better-language-models/language-models.pdf)\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyitu-opensource%2FConvBert","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fyitu-opensource%2FConvBert","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyitu-opensource%2FConvBert/lists"}