{"id":13562959,"url":"https://github.com/quic/aimet","last_synced_at":"2026-04-02T10:50:52.301Z","repository":{"id":37004086,"uuid":"257688216","full_name":"quic/aimet","owner":"quic","description":"AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.","archived":false,"fork":false,"pushed_at":"2025-04-24T18:47:46.000Z","size":20414,"stargazers_count":2286,"open_issues_count":230,"forks_count":401,"subscribers_count":49,"default_branch":"develop","last_synced_at":"2025-04-24T19:42:05.416Z","etag":null,"topics":["auto-ml","compression","deep-learning","deep-neural-networks","machine-learning","network-compression","network-quantization","open-source","opensource","pruning","quantization"],"latest_commit_sha":null,"homepage":"https://quic.github.io/aimet-pages/index.html","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/quic.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE-OF-CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2020-04-21T18:57:10.000Z","updated_at":"2025-04-24T18:47:50.000Z","dependencies_parsed_at":"2023-11-18T03:30:18.338Z","dependency_job_id":"f04a8d95-4ada-4c5d-9257-79d91ea6c9b0","html_url":"https://github.com/quic/aimet","commit_stats":{"total_commits":1690,"total_committers":66,"mean_commits":"25.606060606060606","dds":0.8988165680473372,"last_synced_commit":"76b8ad568e2ad80be77b6458f833f7d7ea329d5e"},"previous_names":[],"tags_count":45,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/quic%2Faimet","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/quic%2Faimet/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/quic%2Faimet/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/quic%2Faimet/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/quic","download_url":"https://codeload.github.com/quic/aimet/tar.gz/refs/heads/develop","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":251311298,"owners_count":21569006,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["auto-ml","compression","deep-learning","deep-neural-networks","machine-learning","network-compression","network-quantization","open-source","opensource","pruning","quantization"],"created_at":"2024-08-01T13:01:13.794Z","updated_at":"2026-04-02T10:50:52.288Z","avatar_url":"https://github.com/quic.png","language":"Python","funding_links":[],"categories":["Python","Tools","A01_文本生成_文本对话","Networks"],"sub_categories":["Approximations Frameworks","大语言对话模型及数据"],"readme":"\n![Qualcomm Innovation Center, Inc.](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/aimet/docs/assets/images/logo-quic-on@h68.png)\n\n[\u003cimg src=\"https://qaihub-public-assets.s3.us-west-2.amazonaws.com/aimet/docs/assets/images/readme/button-overview.png\" width=\"90\" height=\"40\"\u003e](https://quic.github.io/aimet-pages/releases/latest/index.html)\n[\u003cimg src=\"https://qaihub-public-assets.s3.us-west-2.amazonaws.com/aimet/docs/assets/images/readme/button-docs.png\" width=\"90\" height=\"40\"\u003e](https://quic.github.io/aimet-pages/releases/latest/index.html)\n[\u003cimg src=\"https://qaihub-public-assets.s3.us-west-2.amazonaws.com/aimet/docs/assets/images/readme/button-install.png\" width=\"90\" height=\"40\"\u003e](https://quic.github.io/aimet-pages/releases/latest/overview/install/quick-start.html)\n[\u003cimg src=\"https://qaihub-public-assets.s3.us-west-2.amazonaws.com/aimet/docs/assets/images/readme/button-forum.png\" width=\"90\" height=\"40\"\u003e](https://github.com/quic/aimet/discussions)\n[\u003cimg src=\"https://qaihub-public-assets.s3.us-west-2.amazonaws.com/aimet/docs/assets/images/readme/button-slack.png\" width=\"90\" height=\"40\"\u003e](https://qualcomm-ai-hub.slack.com/archives/C08JKBE0UHY)\n[\u003cimg src=\"https://qaihub-public-assets.s3.us-west-2.amazonaws.com/aimet/docs/assets/images/readme/button-new.png\" width=\"90\" height=\"40\"\u003e](https://quic.github.io/aimet-pages/releases/latest/release_notes.html)\n\n# AI Model Efficiency Toolkit (AIMET)\n\n\u003ca href=\"https://quic.github.io/aimet-pages/index.html\"\u003eAIMET\u003c/a\u003e is a software toolkit for quantizing trained ML models.\n\nAIMET improves the runtime performance of deep learning models by reducing compute load and memory footprint.\nModels quantized with AIMET facilitate its deployment on edge devices like mobile phones or laptops by reducing memory footprint.\n\nAIMET employs post-training and fine-tuning techniques to minimize accuracy loss during quantization and compression.\nAIMET supports models from the ONNX and PyTorch frameworks.\n\n![How AIMET works](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/aimet/docs/assets/images/how-it-works.png)\n\nAIMET is designed to work with [PyTorch](https://pytorch.org) and [ONNX](https://onnx.ai) models.\n\nYou can find models quantized with AIMET on [Qualcomm AI Hub Models](https://github.com/quic/ai-hub-models) - a collection of optimized and quantized models.\n\n## Why AIMET?\n\n![Benefits of AIMET](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/aimet/docs/assets/images/AImodelEfficency.png)\n\n* **Advanced quantization techniques**: Inference using integer runtimes is significantly faster than using floating-point runtimes. For example, models run 5x-15x faster on the Qualcomm Hexagon DSP than on the Qualcomm Kyro CPU. In addition, 8-bit precision models have a 4x smaller footprint than 32-bit precision models. However, maintaining model accuracy when quantizing ML models is often challenging. AIMET solves this using novel techniques like Data-Free Quantization that provide state-of-the-art INT8 results on several popular models.\n* **Supports advanced model compression techniques** that enable models to run faster at inference-time and require less memory\n* **AIMET is designed to automate optimization** of neural networks avoiding time-consuming and tedious manual tweaking. AIMET also provides user-friendly APIs that allow users to make calls directly from their [PyTorch](https://pytorch.org) pipelines.\n\nPlease visit the [AIMET on Github Pages](https://quic.github.io/aimet-pages/index.html) for more details.\n\n## Quick Start\n\n[aimet-onnx](https://pypi.org/project/aimet-onnx/) and [aimet-torch](https://pypi.org/project/aimet-torch/) is available on PyPI.\n\nCheck our [Quick Start](https://quic.github.io/aimet-pages/releases/latest/overview/install/quick-start.html) to get started with latest AIMET package.\n\n### Build from source\n\nTo build the latest AIMET code from the source, see [Build, install and run AIMET from source in *Docker* environment](https://quic.github.io/aimet-pages/releases/latest/overview/install/build_from_source.html#build-from-source)\n\n\n## Supported Features\n\n### Post-Training Quantization(PTQ)\n\n[Check out guide](https://quic.github.io/aimet-pages/releases/latest/techniques/ptq.html) to get started on PTQ technique.\n\nFollowing table summarizes basic technique such as `Calibration` to advanced techniques such as `SeqMSE` and `Adaptive Rounding(AdaRound)` that you can use with AIMET.\n\n| Technique | ONNX | PyTorch | What does it do? |\n| -- | -- | -- | -- |\n| [Calibration](https://quic.github.io/aimet-pages/releases/latest/techniques/ptq.html) | ✅ | ✅ | Computes Quantization parameters |\n| [AdaRound](https://quic.github.io/aimet-pages/releases/latest/ptq_techniques/adaround.html) | ✅ | ✅ | Rounds quantized weights |\n| [SeqMSE](https://quic.github.io/aimet-pages/releases/latest/ptq_techniques/seq_mse.html) | ✅ | ✅ | Optimizes encodings for each layer |\n| [BatchNorm Folding](https://quic.github.io/aimet-pages/releases/latest/ptq_techniques/bnf.html) | ✅ | ✅ | Folds batchnorm to bridge the gap between simulation and on-target |\n| [Cross Layer Equalization](https://quic.github.io/aimet-pages/releases/latest/ptq_techniques/cle.html) | ✅ | ✅ | Rescales the weight to reduce range imbalance |\n| [BatchNorm re-estimation](https://quic.github.io/aimet-pages/releases/latest/ptq_techniques/bn.html) | ✅ | ✅ | Re-estimates batchnorm statistics |\n| [AdaScale](https://quic.github.io/aimet-pages/releases/latest/ptq_techniques/adascale.html) | ✅ | ✅ | Optimizes quantized weights |\n| [OmniQuant](https://quic.github.io/aimet-pages/releases/latest/ptq_techniques/omniquant.html) | ❌ | ✅ | Optimizes quantized weights |\n| [SpinQuant](https://quic.github.io/aimet-pages/releases/latest/ptq_techniques/spinquant.html) | ❌ | ✅ | Optimizes quantized weights |\n\n### Quantization Aware Training(QAT)\n\nAIMET supports Quantization Aware Training(QAT) via [aimet-torch](https://pypi.org/project/aimet-torch/).\n\nIf you want to use both QAT and some of the advanced [PTQ techniques from AIMET](#post-training-quantizationptq), we recommend the following workflow:\n\n![QAT workflow](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/aimet/docs/assets/images/workflow/qat.png)\n\nCheck detailed [QAT guide here](https://quic.github.io/aimet-pages/releases/latest/techniques/qat.html)\n\n### Model Compression\n\n* *Spatial SVD*: Tensor decomposition technique to split a large layer into two smaller ones\n* *Channel Pruning*: Removes redundant input channels from a layer and reconstructs layer weights\n* *Per-layer compression-ratio selection*: Automatically selects how much to compress each layer in the model\n\n### Visualization\n\n* *Weight ranges*: Inspect visually if a model is a candidate for applying the Cross Layer Equalization technique. And the effect after applying the technique\n* *Per-layer compression sensitivity*: Visually get feedback about the sensitivity of any given layer in the model to compression\n\n## Results\nAIMET can quantize an existing 32-bit floating-point model to an 8-bit fixed-point model without sacrificing much accuracy and without model fine-tuning.\n\n\u003ch4\u003eDFQ\u003c/h4\u003e\n\nThe DFQ method applied to several popular networks, such as MobileNet-v2 and ResNet-50, result in less than 0.9% loss in accuracy all the way down to 8-bit quantization, in an automated way without any training data.\n\n\u003ctable style=\"width:50%\"\u003e\n  \u003ctr\u003e\n    \u003cth style=\"width:80px\"\u003eModels\u003c/th\u003e\n    \u003cth\u003eFP32\u003c/th\u003e\n    \u003cth\u003eINT8 Simulation \u003c/th\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eMobileNet v2 (top1)\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e71.72%\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e71.08%\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eResNet 50 (top1)\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e76.05%\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e75.45%\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eDeepLab v3 (mIOU)\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e72.65%\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e71.91%\u003c/td\u003e\n  \u003c/tr\u003e\n\u003c/table\u003e\n\u003cbr\u003e\n\n\u003ch4\u003eAdaRound (Adaptive Rounding)\u003c/h4\u003e\n\u003ch5\u003eADAS Object Detect\u003c/h5\u003e\n\u003cp\u003eFor this example ADAS object detection model, which was challenging to quantize to 8-bit precision, AdaRound can recover the accuracy to within 1% of the FP32 accuracy.\u003c/p\u003e\n\u003ctable style=\"width:50%\"\u003e\n  \u003ctr\u003e\n    \u003cth style=\"width:80px\" colspan=\"15\"\u003eConfiguration\u003c/th\u003e\n    \u003cth\u003emAP - Mean Average Precision\u003c/th\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd colspan=\"15\"\u003eFP32\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e82.20%\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd colspan=\"15\"\u003eNearest Rounding (INT8 weights, INT8 acts)\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e49.85%\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd colspan=\"15\"\u003eAdaRound (INT8 weights, INT8 acts)\u003c/td\u003e\n    \u003ctd align=\"center\" bgcolor=\"#add8e6\"\u003e81.21%\u003c/td\u003e\n  \u003c/tr\u003e\n\u003c/table\u003e\n\n\u003ch5\u003eDeepLabv3 Semantic Segmentation\u003c/h5\u003e\n\u003cp\u003eFor some models like the DeepLabv3 semantic segmentation model, AdaRound can even quantize the model weights to 4-bit precision without a significant drop in accuracy.\u003c/p\u003e\n\u003ctable style=\"width:50%\"\u003e\n  \u003ctr\u003e\n    \u003cth style=\"width:80px\" colspan=\"15\"\u003eConfiguration\u003c/th\u003e\n    \u003cth\u003emIOU - Mean intersection over union\u003c/th\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd colspan=\"15\"\u003eFP32\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e72.94%\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd colspan=\"15\"\u003eNearest Rounding (INT4 weights, INT8 acts)\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e6.09%\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd colspan=\"15\"\u003eAdaRound (INT4 weights, INT8 acts)\u003c/td\u003e\n    \u003ctd align=\"center\" bgcolor=\"#add8e6\"\u003e70.86%\u003c/td\u003e\n  \u003c/tr\u003e\n\u003c/table\u003e\n\u003cbr\u003e\n\n\u003ch4\u003eQuantization for Recurrent Models\u003c/h4\u003e\n\u003cp\u003eAIMET supports quantization simulation and quantization-aware training (QAT) for recurrent models (RNN, LSTM, GRU). Using QAT feature in AIMET, a DeepSpeech2 model with bi-directional LSTMs can be quantized to 8-bit precision with minimal drop in accuracy.\u003c/p\u003e\n\n\u003ctable style=\"width:50%\"\u003e\n  \u003ctr\u003e\n    \u003cth\u003eDeepSpeech2 \u003cbr\u003e(using bi-directional LSTMs)\u003c/th\u003e\n    \u003cth\u003eWord Error Rate\u003c/th\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eFP32\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e9.92%\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eINT8\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e10.22%\u003c/td\u003e\n  \u003c/tr\u003e\n\u003c/table\u003e\n\n\u003cbr\u003e\n\n\u003ch4\u003eModel Compression\u003c/h4\u003e\n\u003cp\u003eAIMET can also significantly compress models. For popular models, such as Resnet-50 and Resnet-18, compression with spatial SVD plus channel pruning achieves 50% MAC (multiply-accumulate) reduction while retaining accuracy within approx. 1% of the original uncompressed model.\u003c/p\u003e\n\n\u003ctable style=\"width:50%\"\u003e\n  \u003ctr\u003e\n    \u003cth\u003eModels\u003c/th\u003e\n    \u003cth\u003eUncompressed model\u003c/th\u003e\n    \u003cth\u003e50% Compressed model\u003c/th\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eResNet18 (top1)\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e69.76%\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e68.56%\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eResNet 50 (top1)\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e76.05%\u003c/td\u003e\n    \u003ctd align=\"center\"\u003e75.75%\u003c/td\u003e\n  \u003c/tr\u003e\n\u003c/table\u003e\n\n\u003cbr\u003e\n\n## Resources\n* [Documentation Main Page](https://quic.github.io/aimet-pages/releases/latest/index.html)\n* [API Reference](https://quic.github.io/aimet-pages/releases/latest/apiref/index.html)\n* [Discussion Forums](https://github.com/quic/aimet/discussions)\n* [Slack](https://qualcomm-ai-hub.slack.com/archives/C08JKBE0UHY)\n* [Tutorial Videos](https://quic.github.io/aimet-pages/index.html#video)\n* [Example Code](Examples/README.md)\n\n## Contributions\nThanks for your interest in contributing to AIMET! Please read our [Contributions Page](CONTRIBUTING.md) for more information on contributing features or bug fixes. We look forward to your participation!\n\n## Team\nAIMET aims to be a community-driven project maintained by Qualcomm Innovation Center, Inc.\n\n## License\nAIMET is licensed under the BSD 3-clause \"New\" or \"Revised\" License. Check out the [LICENSE](LICENSE) for more details.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fquic%2Faimet","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fquic%2Faimet","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fquic%2Faimet/lists"}