{"id":19984532,"url":"https://github.com/intellabs/model-compression-research-package","last_synced_at":"2025-07-13T11:40:59.142Z","repository":{"id":39988482,"uuid":"402110243","full_name":"IntelLabs/Model-Compression-Research-Package","owner":"IntelLabs","description":"A library for researching neural networks compression and acceleration methods.","archived":false,"fork":false,"pushed_at":"2024-08-30T03:20:00.000Z","size":128,"stargazers_count":140,"open_issues_count":1,"forks_count":25,"subscribers_count":11,"default_branch":"main","last_synced_at":"2025-03-02T19:54:08.794Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/IntelLabs.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-09-01T15:26:12.000Z","updated_at":"2025-02-06T07:28:24.000Z","dependencies_parsed_at":"2023-12-21T01:47:35.519Z","dependency_job_id":"4280aab2-703c-4592-ad56-3bbbd1dfa802","html_url":"https://github.com/IntelLabs/Model-Compression-Research-Package","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/IntelLabs%2FModel-Compression-Research-Package","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/IntelLabs%2FModel-Compression-Research-Package/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/IntelLabs%2FModel-Compression-Research-Package/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/IntelLabs%2FModel-Compression-Research-Package/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/IntelLabs","download_url":"https://codeload.github.com/IntelLabs/Model-Compression-Research-Package/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244047645,"owners_count":20389206,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-13T04:19:25.643Z","updated_at":"2025-03-17T14:15:46.304Z","avatar_url":"https://github.com/IntelLabs.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003c!-- \nApache v2 license\nCopyright (C) 2021 Intel Corporation\nSPDX-License-Identifier: Apache-2.0\n --\u003e\n# Model Compression Research Package\nThis package was developed to enable scalable, reusable and reproducable research of weight pruning, quantization and distillation methods with ease.\n\n## Installation\nTo install the library clone the repository and install using `pip`\n``` bash\ngit clone https://github.com/IntelLabs/Model-Compression-Research-Package\ncd Model-Compression-Research-Package\npip install [-e] .\n```\nAdd `-e` flag to install an editable version of the library.\n\n## Quick Tour\nThis package contains implementations of several weight pruning methods, knowledge distillation and quantization-aware training.\nHere we will show how to easily use those implementations with your existing model implementation and training loop.\nIt is also possible to combine several methods together in the same training process.\nPlease refer to the packages [examples](examples).\n\n### Weight Pruning\nWeight pruning is a method to induce zeros in a models weight while training.\nThere are several methods to prune a model and it is a widely explored research field.\n\nTo list the existing weight pruning implemtations in the package use `model_compression_research.list_methods()`.\nFor example, applying unstructured magnitude pruning while training your model can be done with a few single lines of code\n\n```python\nfrom model_compression_research import IterativePruningConfig, IterativePruningScheduler\n\ntraining_args = get_training_args()\nmodel = get_model()\ndataloader = get_dataloader()\ncriterion = get_criterion()\n\n# Initialize a pruning configuration and a scheduler and apply it on the model\npruning_config = IterativePruningConfig(\n    pruning_fn=\"unstructured_magnitude\",\n    pruning_fn_default_kwargs={\"target_sparsity\": 0.9}\n)\npruning_scheduler = IterativePruningScheduler(model, pruning_config)\n\n# Initialize optimizer after initializing the pruning scheduler\noptimizer = get_optimizer()\n\n# Training loop\nfor e in range(training_args.epochs):\n    for batch in dataloader:\n        inputs, labels = \n        model.train()\n        outputs = model(inputs)\n        loss = criterion(outputs, labels)\n        loss.backward()\n        optimizer.step()\n        # Call pruning scheduler step\n        pruning_schduler.step()\n        optimizer.zero_grad()\n\n# At the end of training rmeove the pruning parts and get the resulted pruned model\npruning_scheduler.remove_pruning()\n```\n\nFor using weight pruning with [`HuggingFace/transformers`](https://github.com/huggingface/transformers) dedicated transformers [`Trainer`](https://huggingface.co/transformers/main_classes/trainer.html) see the implementation of `HFTrainerPruningCallback` in [`api_utils.py`](model_compression_research/api_utils.py).\n\n### Knowledge Distillation\nModel distillation is a method to distill the knowledge learned by a teacher to a smaller student model.\nA method to do that is to compute the difference between the student's and teacher's output distribution using KL divergence.\nIn this package you can find a simple implementation that does just that.\n\nAssuming that your teacher and student models' outputs are of the same dimension, you can use the implementation in this package as follows:\n```python\nfrom model_compression_research import TeacherWrapper, DistillationModelWrapper\n\ntraining_args = get_training_args()\nteacher = get_teacher_trained_model()\nstudent = get_student_model()\ndataloader = get_dataloader()\ncriterion = get_criterion()\n\n# Wrap teacher model with TeacherWrapper and set loss scaling factor and temperature\nteacher = TeacherWrapper(teacher, ce_alpha=0.5, ce_temperature=2.0)\n# Initialize the distillation model with the student and teacher\ndistillation_model = DistillationModelWrapper(student, teacher, alpha_student=0.5)\n\noptimizer = get_optimizer()\n\n# Training loop\nfor e in range(training_args.epochs):\n    for batch in dataloader:\n        inputs, labels = batch\n        distillation_model.train()\n        # Calculate student loss w.r.t labels as you usually do\n        student_outputs = distillation_model(inputs)\n        loss_wrt_labels = criterion(student_outputs, labels)\n        # Add knowledge distillation term\n        loss = distillation_model.compute_loss(loss_wrt_labels, student_outputs)\n        loss.backward()\n        optimizer.step()\n        optimizer.zero_grad()\n```\n\nFor using knowledge distillation with [`HuggingFace/transformers`](https://github.com/huggingface/transformers) see the implementation of `HFTeacherWrapper` and `hf_add_teacher_to_student` in [`api_utils.py`](model_compression_research/api_utils.py).\n\n### Quantization-Aware Training\nQuantization-Aware Training is a method for training models that will be later quantized at the inference stage, as opposed to other post-training quantization methods where models are trained without any adaptation to the error caused by model quantization.\n\nA similar quantization-aware training method to the one introduced in [Q8BERT: Quantized 8Bit BERT](https://arxiv.org/abs/1910.06188) generelized to custom models is implemented in this package:\n\n```python\nfrom model_compression_research import QuantizerConfig, convert_model_for_qat\n\ntraining_args = get_training_args()\nmodel = get_model()\ndataloader = get_dataloader()\ncriterion = get_criterion()\n\n# Initialize quantizer configuration\nqat_config = QuantizerConfig()\n# Convert model to quantization-aware training model\nqat_model = convert_model_for_qat(model, qat_config)\n\noptimizer = get_optimizer()\n\n# Training loop\nfor e in range(training_args.epochs):\n    for batch in dataloader:\n        inputs, labels = \n        model.train()\n        outputs = model(inputs)\n        loss = criterion(outputs, labels)\n        loss.backward()\n        optimizer.step()\n        optimizer.zero_grad()\n```\n\n## Papers Implemented in Model Compression Research Package\nMethods from the following papers were implemented in this package and are ready for use:\n* [To prune, or not to prune: exploring the efficacy of pruning for model compression](https://arxiv.org/abs/1710.01878)\n* [Discovering Neural Wirings](https://arxiv.org/abs/1906.00586)\n* [Q8BERT: Quantized 8Bit BERT](https://arxiv.org/abs/1910.06188)\n* [Distilling the Knowledge in a Neural Network](https://arxiv.org/abs/1503.02531)\n* [Prune Once for All: Sparse Pre-Trained Language Models](https://arxiv.org/abs/2111.05754)\n* [Dynamic-TinyBERT: Boost TinyBERT's Inference Efficiency by Dynamic Sequence Length](https://arxiv.org/abs/2111.09645)\n\n## Citation\nIf you want to cite our paper and library, you can use the following:\n```bibtex\n@article{zafrir2021prune,\n  title={Prune Once for All: Sparse Pre-Trained Language Models},\n  author={Zafrir, Ofir and Larey, Ariel and Boudoukh, Guy and Shen, Haihao and Wasserblat, Moshe},\n  journal={arXiv preprint arXiv:2111.05754},\n  year={2021}\n}\n```\n```bibtex\n@software{zafrir_ofir_2021_5721732,\n  author       = {Zafrir, Ofir},\n  title        = {Model-Compression-Research-Package by Intel Labs},\n  month        = nov,\n  year         = 2021,\n  publisher    = {Zenodo},\n  version      = {v0.1.0},\n  doi          = {10.5281/zenodo.5721732},\n  url          = {https://doi.org/10.5281/zenodo.5721732}\n}\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fintellabs%2Fmodel-compression-research-package","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fintellabs%2Fmodel-compression-research-package","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fintellabs%2Fmodel-compression-research-package/lists"}