{"id":13546873,"url":"https://github.com/JetRunner/BERT-of-Theseus","last_synced_at":"2025-04-02T19:31:57.600Z","repository":{"id":37485485,"uuid":"238963165","full_name":"JetRunner/BERT-of-Theseus","owner":"JetRunner","description":"⛵️The official PyTorch implementation for \"BERT-of-Theseus: Compressing BERT by Progressive Module Replacing\" (EMNLP 2020).","archived":false,"fork":false,"pushed_at":"2023-06-12T21:27:41.000Z","size":1091,"stargazers_count":312,"open_issues_count":4,"forks_count":38,"subscribers_count":14,"default_branch":"master","last_synced_at":"2024-11-03T15:38:13.667Z","etag":null,"topics":["bert","glue","model-compression","nlp","transformers"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/JetRunner.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2020-02-07T15:52:06.000Z","updated_at":"2024-10-24T01:21:39.000Z","dependencies_parsed_at":"2024-01-14T21:28:27.837Z","dependency_job_id":null,"html_url":"https://github.com/JetRunner/BERT-of-Theseus","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JetRunner%2FBERT-of-Theseus","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JetRunner%2FBERT-of-Theseus/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JetRunner%2FBERT-of-Theseus/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JetRunner%2FBERT-of-Theseus/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/JetRunner","download_url":"https://codeload.github.com/JetRunner/BERT-of-Theseus/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246880078,"owners_count":20848807,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bert","glue","model-compression","nlp","transformers"],"created_at":"2024-08-01T12:00:46.819Z","updated_at":"2025-04-02T19:31:56.990Z","avatar_url":"https://github.com/JetRunner.png","language":"Python","funding_links":[],"categories":["🏎️ Model Compression/Acceleration"],"sub_categories":[],"readme":"# BERT-of-Theseus\nCode for paper [\"BERT-of-Theseus: Compressing BERT by Progressive Module Replacing\"](http://arxiv.org/abs/2002.02925).\n\n BERT-of-Theseus is a new compressed BERT by progressively replacing the components of the original BERT.\n\n![BERT of Theseus](https://github.com/JetRunner/BERT-of-Theseus/blob/master/bert-of-theseus.png?raw=true)\n\n## Citation\nIf you use this code in your research, please cite our paper:\n```bibtex\n@inproceedings{xu-etal-2020-bert,\n    title = \"{BERT}-of-Theseus: Compressing {BERT} by Progressive Module Replacing\",\n    author = \"Xu, Canwen  and\n      Zhou, Wangchunshu  and\n      Ge, Tao  and\n      Wei, Furu  and\n      Zhou, Ming\",\n    booktitle = \"Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)\",\n    month = nov,\n    year = \"2020\",\n    address = \"Online\",\n    publisher = \"Association for Computational Linguistics\",\n    url = \"https://www.aclweb.org/anthology/2020.emnlp-main.633\",\n    pages = \"7859--7869\"\n}\n```\n\n**NEW:** We have uploaded a script for making predictions on GLUE tasks and preparing for leaderboard submission. Check out [here](https://github.com/JetRunner/BERT-of-Theseus/tree/master/glue_script)!\n\n## How to run BERT-of-Theseus\n\n### Requirement\nOur code is built on [huggingface/transformers](https://github.com/huggingface/transformers). To use our code, you must clone and install [huggingface/transformers](https://github.com/huggingface/transformers).\n\n### Compress a BERT\n1. You should fine-tune a predecessor model following the [instruction from huggingface](https://github.com/huggingface/transformers/tree/master/examples#glue) and then save it to a directory if you haven't done so.\n2. Run compression following the examples below:\n```bash\n# For compression with a replacement scheduler\nexport GLUE_DIR=/path/to/glue_data\nexport TASK_NAME=MRPC\n\npython ./run_glue.py \\\n  --model_name_or_path /path/to/saved_predecessor \\\n  --task_name $TASK_NAME \\\n  --do_train \\\n  --do_eval \\\n  --do_lower_case \\\n  --data_dir \"$GLUE_DIR/$TASK_NAME\" \\\n  --max_seq_length 128 \\\n  --per_gpu_train_batch_size 32 \\\n  --per_gpu_eval_batch_size 32 \\\n  --learning_rate 2e-5 \\\n  --save_steps 50 \\\n  --num_train_epochs 15 \\\n  --output_dir /path/to/save_successor/ \\\n  --evaluate_during_training \\\n  --replacing_rate 0.3 \\\n  --scheduler_type linear \\\n  --scheduler_linear_k 0.0006\n```\n\n```bash\n# For compression with a constant replacing rate\nexport GLUE_DIR=/path/to/glue_data\nexport TASK_NAME=MRPC\n\npython ./run_glue.py \\\n  --model_name_or_path /path/to/saved_predecessor \\\n  --task_name $TASK_NAME \\\n  --do_train \\\n  --do_eval \\\n  --do_lower_case \\\n  --data_dir \"$GLUE_DIR/$TASK_NAME\" \\\n  --max_seq_length 128 \\\n  --per_gpu_train_batch_size 32 \\\n  --per_gpu_eval_batch_size 32 \\\n  --learning_rate 2e-5 \\\n  --save_steps 50 \\\n  --num_train_epochs 15 \\\n  --output_dir /path/to/save_successor/ \\\n  --evaluate_during_training \\\n  --replacing_rate 0.5 \\\n  --steps_for_replacing 2500 \n```\nFor the detailed description of arguments, please refer to the source code.\n\n## Load Pretrained Model on MNLI\n\nWe provide a 6-layer pretrained model on MNLI as a general-purpose model, which can transfer to other sentence classification tasks, outperforming DistillBERT (with the same 6-layer structure) on six tasks of GLUE (dev set).\n\n| Method          | MNLI | MRPC | QNLI | QQP  | RTE  | SST-2 | STS-B |\n|-----------------|------|------|------|------|------|-------|-------|\n| BERT-base       | 83.5 | 89.5 | 91.2 | 89.8 | 71.1 | 91.5  | 88.9  |\n| DistillBERT     | 79.0 | 87.5 | 85.3 | 84.9 | 59.9 | 90.7  | 81.2  |\n| BERT-of-Theseus | 82.1 | 87.5 | 88.8 | 88.8 | 70.1 | 91.8  | 87.8  |\n\nYou can easily load our general-purpose model using [huggingface/transformers](https://github.com/huggingface/transformers).\n\n```python\nfrom transformers import AutoTokenizer, AutoModel\n\ntokenizer = AutoTokenizer.from_pretrained(\"canwenxu/BERT-of-Theseus-MNLI\")\n\nmodel = AutoModel.from_pretrained(\"canwenxu/BERT-of-Theseus-MNLI\")\n\n```\n\n## Bug Report and Contribution\nIf you'd like to contribute and add more tasks (only GLUE is available at this moment), please submit a pull request and contact me. Also, if you find any problem or bug, please report with an issue. Thanks!\n\n## Third-Party Implementations\nWe list some third-party implementations from the community here. Please kindly add your implementation to this list:\n\n- `Tensorflow Implementation (tested on NER)`: https://github.com/qiufengyuyi/bert-of-theseus-tf\n- `Keras Implementation (tested on text classification)`: https://github.com/bojone/bert-of-theseus\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FJetRunner%2FBERT-of-Theseus","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FJetRunner%2FBERT-of-Theseus","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FJetRunner%2FBERT-of-Theseus/lists"}