{"id":15521390,"url":"https://github.com/julesbelveze/bert-squeeze","last_synced_at":"2025-04-07T13:08:29.353Z","repository":{"id":42991557,"uuid":"419077262","full_name":"JulesBelveze/bert-squeeze","owner":"JulesBelveze","description":"🛠️  Tools for Transformers compression using PyTorch Lightning ⚡","archived":false,"fork":false,"pushed_at":"2024-11-10T14:38:39.000Z","size":2560,"stargazers_count":82,"open_issues_count":4,"forks_count":10,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-03-31T12:04:59.720Z","etag":null,"topics":["bert","deebert","distillation","fastbert","lstm","nlp","pruning","pytorch-lightning","quantization","theseus","transformers"],"latest_commit_sha":null,"homepage":"https://julesbelveze.github.io/bert-squeeze/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/JulesBelveze.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-10-19T20:13:08.000Z","updated_at":"2025-03-11T23:45:47.000Z","dependencies_parsed_at":"2024-01-16T09:49:07.336Z","dependency_job_id":"75cde90c-d33f-4038-9c6a-91940d5049d6","html_url":"https://github.com/JulesBelveze/bert-squeeze","commit_stats":{"total_commits":127,"total_committers":3,"mean_commits":"42.333333333333336","dds":0.2992125984251969,"last_synced_commit":"ee2fa6704a2fae5cbdf5d2e7ce3ec31743c1978b"},"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JulesBelveze%2Fbert-squeeze","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JulesBelveze%2Fbert-squeeze/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JulesBelveze%2Fbert-squeeze/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JulesBelveze%2Fbert-squeeze/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/JulesBelveze","download_url":"https://codeload.github.com/JulesBelveze/bert-squeeze/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247657281,"owners_count":20974345,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bert","deebert","distillation","fastbert","lstm","nlp","pruning","pytorch-lightning","quantization","theseus","transformers"],"created_at":"2024-10-02T10:34:23.933Z","updated_at":"2025-04-07T13:08:29.333Z","avatar_url":"https://github.com/JulesBelveze.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\n\u003cimg src=\"./images/bert-squeeze.png\" height=\"25%\" align=\"right\"/\u003e\n\n[![PyPI version](https://badge.fury.io/py/bert-squeeze.svg)](https://pypi.org/project/bert-squeeze/)\n[![github actions docs](https://github.com/JulesBelveze/bert-squeeze/actions/workflows/documentation.yaml/badge.svg)](https://julesbelveze.github.io/bert-squeeze/)\n\n\n# Bert-squeeze\n\n**Bert-squeeze** is a repository aiming to provide code to reduce the size of Transformer-based models or decrease their\nlatency at inference time.\n\nIt gathers a non-exhaustive list of techniques such as distillation, pruning, quantization, early-exiting. The repo is\nwritten using [PyTorch Lightning](https://www.pytorchlightning.ai)\nand [Transformers](https://huggingface.co/transformers/).\n\n# About the project\n\nAs a heavy user of transformer-based models (which are truly amazing from my point of view) I always struggled to put\nthose heavy models in production while having a decent inference speed. There are of course a bunch of existing\nlibraries to optimize and compress transformer-based models ([ONNX](https://github.com/onnx/onnx)\n, [distiller](https://github.com/IntelLabs/distiller), [compressors](https://github.com/elephantmipt/compressors)\n, [KD_Lib](https://github.com/SforAiDl/KD_Lib), ... ). \\\nI started this project because of the need to reduce the latency of models integrating transformers as subcomponents.\nFor this reason, this project aims at providing implementations to train various transformer-based models (and others)\nusing PyTorch Lightning but also to distill, prune, and quantize models. \\\nI chose to write this repo with Lightning because of its growing trend, its flexibility, and the very few repositories\nusing it. It currently only handles sequence classification models, but support for other tasks and custom architectures\nis [planned](https://github.com/JulesBelveze/bert-squeeze/projects/10).\n\n# Installation\n\nFirst download the repository:\n\n```commandline\ngit clone https://github.com/JulesBelveze/bert-squeeze.git\n```\n\nand then install dependencies using [uv](https://docs.astral.sh/uv/):\n\n```commandline\nuv venv\nsource .venv/bin/active\nuv sync\n```\n\nYou are all set!\n\n# Quickstarts\n\nYou can find a bunch of examples on how to use the library to simply train models or perform optimization techniques \n(distillation, pruning, quantization) in the [docs](https://julesbelveze.github.io/bert-squeeze/index.html).\n\nDisclaimer: I have not extensively tested all procedures and thus do not guarantee the performance of every implemented\nmethod.\n\n# Concepts\n\n### Transformers\n\nIf you never heard of it then I can only recommend you to read this\namazing [blog post](https://jalammar.github.io/illustrated-transformer) and if you want to dig deeper there is this\nawesome lecture was given by Stanford available [here](https://www.youtube.com/watch?v=ptuGllU5SQQ).\n\n### Distillation\n\nThe idea of distillation is to train a small network to mimic a big network by trying to replicate its outputs. The\nrepository provides the ability to transfer knowledge from any model to any other (if you need a model that is not\nwithin the `models` folder just write your own).\n\nThe repository also provides the possibility to perform soft-distillation or hard-distillation on an unlabeled dataset.\nIn the soft case, we use the probabilities of the teacher as a target. In the hard one, we assume that the teacher's\npredictions are the actual label.\n\nYou can find these implementations under the `distillation/` folder.\n\n### Quantization\n\nNeural network quantization is the process of reducing the weights precision in the neural network. The repo has two\ncallbacks one for dynamic quantization and one for quantization-aware training (using\nthe [Lightning callback](https://pytorch-lightning.readthedocs.io/en/latest/extensions/generated/pytorch_lightning.callbacks.QuantizationAwareTraining.html#pytorch_lightning.callbacks.QuantizationAwareTraining))\n.\n\nYou can find those implementations under the `utils/callbacks/` folder.\n\n### Pruning\n\nPruning neural networks consist of removing weights from trained models to compress them. This repo features various\npruning implementations and methods such as head-pruning, layer dropping, and weights dropping.\n\nYou can find those implementations under the `utils/callbacks/` folder.\n\n# Contributions and questions\n\nIf you are missing a feature that could be relevant to this repo, or a bug that you noticed feel free to open a PR or\nopen an issue. As you can see in the [roadmap](https://github.com/JulesBelveze/bert-squeeze/projects/1) there are a\nbunch more features to come :smiley:\n\nAlso, if you have any questions or suggestions feel free to ask!\n\n# References\n\n1. Alammar, J (2018). _The Illustrated Transformer_ [Blog post]. Retrieved\n   from https://jalammar.github.io/illustrated-transformer/\n2. stanfordonline (2021) _Stanford CS224N NLP with Deep Learning | Winter 2021 | Lecture 9 - Self- Attention and\n   Transformers_. [online video] Available at: https://www.youtube.com/watch?v=ptuGllU5SQQ\n3. Thomas Wolf and Lysandre Debut and Victor Sanh and Julien Chaumond and Clement Delangue and Anthony Moi and Pierric\n   Cistac and Tim Rault and Rémi Louf and Morgan Funtowicz and Jamie Brew (2019). [_HuggingFace's Transformers:\n   State-of-the-art Natural Language Processing_](http://arxiv.org/abs/1910.03771)\n4. Hassan Sajjad and Fahim Dalvi and Nadir Durrani and Preslav Nakov (2020). [_Poor Man's BERT Smaller and Faster\n   Transformer Models_](https://arxiv.org/abs/2004.03844)\n5. Angela Fan and Edouard Grave and Armand Joulin (2019). [_Reducing Transformer Depth on Demand with Structured\n   Dropout_](http://arxiv.org/abs/1909.11556)\n6. Paul Michel and Omer Levy and Graham Neubig (2019). [_Are Sixteen Heads Really Better than\n   One?_](http://arxiv.org/abs/1905.10650)\n7. Fangxiaoyu Feng and Yinfei Yang and Daniel Cer and Naveen Arivazhagan and Wei Wang (2020). [_Language-agnostic BERT\n   Sentence Embedding_](https://arxiv.org/abs/2007.01852)\n8. Weijie Liu and Peng Zhou and Zhe Zhao and Zhiruo Wang and Haotang Deng and Qi Ju (2020). [_FastBERT: a\n   Self-distilling {BERT} with Adaptive Inference Time_](https://arxiv.org/abs/2004.02178). \\\n   Repository: https://github.com/BitVoyage/FastBERT\n9. Xu, Canwen and Zhou, Wangchunshu and Ge, Tao and Wei, Furu and Zhou, Ming (2020). [_{BERT}-of-Theseus: Compressing\n   {BERT} by Progressive Module Replacing_](https://www.aclweb.org/anthology/2020.emnlp-main.633)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjulesbelveze%2Fbert-squeeze","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjulesbelveze%2Fbert-squeeze","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjulesbelveze%2Fbert-squeeze/lists"}