{"id":22410121,"url":"https://github.com/kssteven418/i-bert","last_synced_at":"2025-04-06T05:17:56.085Z","repository":{"id":79693580,"uuid":"297697041","full_name":"kssteven418/I-BERT","owner":"kssteven418","description":"[ICML'21 Oral] I-BERT: Integer-only BERT Quantization","archived":false,"fork":false,"pushed_at":"2023-01-29T05:15:03.000Z","size":6692,"stargazers_count":241,"open_issues_count":28,"forks_count":34,"subscribers_count":3,"default_branch":"ibert","last_synced_at":"2025-03-30T04:11:08.695Z","etag":null,"topics":["bert","efficient-model","efficient-neural-networks","model-compression","natural-language-processing","quantization","transformer"],"latest_commit_sha":null,"homepage":"https://arxiv.org/abs/2101.01321","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/kssteven418.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-09-22T15:47:40.000Z","updated_at":"2025-03-17T03:17:45.000Z","dependencies_parsed_at":"2023-05-14T18:00:35.016Z","dependency_job_id":null,"html_url":"https://github.com/kssteven418/I-BERT","commit_stats":{"total_commits":1518,"total_committers":248,"mean_commits":6.120967741935484,"dds":0.5981554677206851,"last_synced_commit":"1b09c759d6aeb71312df9c6ef74fa268a87c934e"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kssteven418%2FI-BERT","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kssteven418%2FI-BERT/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kssteven418%2FI-BERT/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kssteven418%2FI-BERT/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/kssteven418","download_url":"https://codeload.github.com/kssteven418/I-BERT/tar.gz/refs/heads/ibert","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247436320,"owners_count":20938539,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bert","efficient-model","efficient-neural-networks","model-compression","natural-language-processing","quantization","transformer"],"created_at":"2024-12-05T12:12:14.631Z","updated_at":"2025-04-06T05:17:56.051Z","avatar_url":"https://github.com/kssteven418.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cimg width=\"900\" alt=\"Screen Shot 2020-12-19 at 9 51 50 PM\" src=\"https://user-images.githubusercontent.com/50283958/102689854-d5604e80-4244-11eb-83cd-5d75e76c8d04.png\"\u003e\n\n# I-BERT: Integer-only BERT Quantization\n\n## HuggingFace Implementation\nI-BERT is also available in the master branch of HuggingFace!\nVisit the following links for the HuggingFace implementation.\n\nGithub Link: https://github.com/huggingface/transformers/tree/master/src/transformers/models/ibert\n\nModel Links: \n* [ibert-roberta-base](https://huggingface.co/kssteven/ibert-roberta-base) \n* [ibert-roberta-large](https://huggingface.co/kssteven/ibert-roberta-large)\n* [ibert-roberta-large-mnli](https://huggingface.co/kssteven/ibert-roberta-large-mnli)\n\n## Installation \u0026 Requirements\nYou can find more detailed installation guides from the Fairseq repo: https://github.com/pytorch/fairseq\n\n**1. Fairseq Installation**\n\nReference: [Fairseq](https://github.com/pytorch/fairseq)\n* [PyTorch](http://pytorch.org/) version \u003e= 1.4.0\n* Python version \u003e= 3.6\n* Currently, I-BERT only supports training on GPU\n\n```bash\ngit clone https://github.com/kssteven418/I-BERT.git\ncd I-BERT\npip install --editable ./\n```\n\n**2. Download pre-trained RoBERTa models**\n\nReference: [Fairseq RoBERTa](https://github.com/pytorch/fairseq/blob/master/examples/roberta/README.md)\n\nDownload pretrained RoBERTa models from the links and unzip them.\n* RoBERTa-Base: [roberta.base.tar.gz](https://dl.fbaipublicfiles.com/fairseq/models/roberta.base.tar.gz)\n* RoBERTa-Large: [roberta.large.tar.gz](https://dl.fbaipublicfiles.com/fairseq/models/roberta.large.tar.gz)\n```bash\n# In I-BERT (root) directory\nmkdir models \u0026\u0026 cd models\nwget {link}\ntar -xvf roberta.{base|large}.tar.gz\n```\n\n\n**3. Download GLUE datasets**\n\nReference: [Fairseq Finetuning on GLUE](https://github.com/pytorch/fairseq/blob/master/examples/roberta/README.glue.md)\n\nFirst, download the data from the [GLUE website](https://gluebenchmark.com/tasks). Make sure to download the dataset in I-BERT (root) directory.\n```bash\n# In I-BERT (root) directory\nwget https://gist.githubusercontent.com/W4ngatang/60c2bdb54d156a41194446737ce03e2e/raw/17b8dd0d724281ed7c3b2aeeda662b92809aadd5/download_glue_data.py\npython download_glue_data.py --data_dir glue_data --tasks all\n```\n\nThen, preprocess the data. \n\n```bash\n# In I-BERT (root) directory\n./examples/roberta/preprocess_GLUE_tasks.sh glue_data {task_name}\n```\n`task_name` can be one of the following: `{ALL, QQP, MNLI, QNLI, MRPC, RTE, STS-B, SST-2, CoLA}` .\n`ALL` will preprocess all the tasks.\nIf the command is run propely, preprocessed datasets will be stored in `I-BERT/{task_name}-bin`\n\nNow, you have the models and the datasets ready, so you are ready to run I-BERT!\n\n\n## Task-specific Model Finetuning\n\nBefore quantizing the model, you first have to finetune the pre-trained models to a specific downstream task. \nAlthough you can finetune the model from the original Fairseq repo, we provide `ibert-base` branch where you can train non-quantized models without having to install the original Fairseq. \nThis branch is identical to the master branch of the original Fairseq repo, except for some loggings and run scripts that are irrelevant to the functionality.\nIf you already have finetuned models, you can skip this part.\n\nRun the following commands to fetch and move to the `ibert-base` branch:\n```bash\n# In I-BERT (root) directory\ngit fetch\ngit checkout -t origin/ibert-base\n```\n\nThen, run the script:\n```bash\n# In I-BERT (root) directory\n# CUDA_VISIBLE_DEVICES={device} python run.py --arch {roberta_base|roberta_large} --task {task_name}\nCUDA_VISIBLE_DEVICES=0 python run.py --arch roberta_base --task MRPC\n```\nCheckpoints and validation logs will be stored at `./outputs` directory. You can change this output location by adding the option `--output-dir OUTPUT_DIR`. The exact output location will look something like: `./outputs/none/MRPC-base/wd0.1_ad0.1_d0.1_lr2e-5/1219-101427_ckpt/checkpoint_best.pt`.\nBy default, models are trained according to the task-specific hyperparameters specified in [Fairseq Finetuning on GLUE](https://github.com/pytorch/fairseq/blob/master/examples/roberta/README.glue.md). However, you can also specify the hyperparameters with the options (use the option `-h` for more details). \n\n\n## Quantiation \u0026 Quantization-Aware-Finetuning\n\nNow, we come back to `ibert` branch for quantization. \n```bash\ngit checkout ibert\n```\n\nAnd then run the script. This will first quantize the model and do quantization-aware-finetuning with the learning rate that you specify with the option `--lr {lr}`.\n```bash\n# In I-BERT (root) directory\n# CUDA_VISIBLE_DEVICES={device} python run.py --arch {roberta_base|roberta_large} --task {task_name} \\\n# --restore-file {ckpt_path} --lr {lr}\nCUDA_VISIBLE_DEVICES=0 python run.py --arch roberta_base --task MRPC --restore-file ckpt-best.pt --lr 1e-6\n```\n\n**NOTE:** Our work is still on progress. Currently, all integer operations are executed with floating point.\n\n\n## Citation\nI-BERT has been developed as part of the following paper. We appreciate it if you would please cite the following paper if you found the library useful for your work:\n\n```text\n@article{kim2021bert,\n  title={I-BERT: Integer-only BERT Quantization},\n  author={Kim, Sehoon and Gholami, Amir and Yao, Zhewei and Mahoney, Michael W and Keutzer, Kurt},\n  journal={International Conference on Machine Learning (Accepted)},\n  year={2021}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkssteven418%2Fi-bert","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkssteven418%2Fi-bert","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkssteven418%2Fi-bert/lists"}