Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/quic/aimet
AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.
https://github.com/quic/aimet
auto-ml compression deep-learning deep-neural-networks machine-learning network-compression network-quantization open-source opensource pruning quantization
Last synced: 26 days ago
JSON representation
AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.
- Host: GitHub
- URL: https://github.com/quic/aimet
- Owner: quic
- License: other
- Created: 2020-04-21T18:57:10.000Z (over 4 years ago)
- Default Branch: develop
- Last Pushed: 2024-05-22T18:22:38.000Z (6 months ago)
- Last Synced: 2024-05-22T19:43:17.956Z (6 months ago)
- Topics: auto-ml, compression, deep-learning, deep-neural-networks, machine-learning, network-compression, network-quantization, open-source, opensource, pruning, quantization
- Language: Python
- Homepage: https://quic.github.io/aimet-pages/index.html
- Size: 13 MB
- Stars: 1,937
- Watchers: 46
- Forks: 353
- Open Issues: 182
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE-OF-CONDUCT.md
Awesome Lists containing this project
- awesome-approximate-dnn - Qualcomm AIMET - source lib for trained neural network quantization and compression + Model Zoo |TensorFlow, PyTorch |Pruning (Channel), Spatial SVD, per-layer compression ratio selection, Quantization (PQT, QAT, Simulation, Rounding, Bias correction, Cross layer equalization, Mix precision)| (Tools / Approximations Frameworks)
README
![Qualcomm Innovation Center, Inc.](Docs/images/[email protected])
[![AIMET on GitHub Pages](Docs/images/button-overview.png)](https://quic.github.io/aimet-pages/index.html)
[![Documentation](Docs/images/button-docs.png)](https://quic.github.io/aimet-pages/releases/latest/user_guide/index.html)
[![Install instructions](Docs/images/button-install.png)](#installation-instructions)
[![Discussion Forums](Docs/images/button-forums.png)](https://forums.quicinc.com)
[![What's New](Docs/images/button-whats-new.png)](#whats-new)# AI Model Efficiency Toolkit (AIMET)
AIMET is a library that provides advanced model quantization
and compression techniques for trained neural network models.
It provides features that have been proven to improve run-time performance of deep learning neural network models with
lower compute and memory requirements and minimal impact to task accuracy.![How AIMET works](Docs/images/how-it-works.png)
AIMET is designed to work with [PyTorch](https://pytorch.org), [TensorFlow](https://tensorflow.org) and [ONNX](https://onnx.ai) models.
We also host the [AIMET Model Zoo](https://github.com/quic/aimet-model-zoo) - a collection of popular neural network models optimized for 8-bit inference.
We also provide recipes for users to quantize floating point models using AIMET.## Table of Contents
- [Why AIMET?](#why-aimet)
- [Quick Installation](#quick-install)
- [Supported features](#supported-features)
- [What's New](#whats-new)
- [Results](#results)
- [Installation](#installation-instructions)
- [Resources](#resources)
- [Contributions](#contributions)
- [Team](#team)
- [License](#license)## Quick Installation
The AIMET PyTorch GPU PyPI packages are available for environments that meet the following requirements:
* 64-bit Intel x86-compatible processor
* Linux Ubuntu 22.04 LTS [Python 3.10] or Linux Ubuntu 20.04 LTS [Python 3.8]
* Torch 1.13+cu117#### Installation
```
apt-get install liblapacke
python3 -m pip install aimet-torch
```To install other AIMET variants and versions, please follow one of the links below for instructions:
- [Install and run AIMET in *Ubuntu* environment](https://quic.github.io/aimet-pages/releases/latest/install)
- [Build, install and run AIMET from source in *Docker* environment](./packaging/docker_install.md)## Why AIMET?
![Benefits of AIMET](Docs/images/AImodelEfficency.png)
* **Supports advanced quantization techniques**: Inference using integer runtimes is significantly faster than using floating-point runtimes. For example, models run
5x-15x faster on the Qualcomm Hexagon DSP than on the Qualcomm Kyro CPU. In addition, 8-bit precision models have a 4x
smaller footprint than 32-bit precision models. However, maintaining model accuracy when quantizing ML models is often
challenging. AIMET solves this using novel techniques like Data-Free Quantization that provide state-of-the-art INT8 results on
several popular models.
* **Supports advanced model compression techniques** that enable models to run faster at inference-time and require less memory
* **AIMET is designed to automate optimization** of neural networks avoiding time-consuming and tedious manual tweaking.
AIMET also provides user-friendly APIs that allow users to make calls directly from their [TensorFlow](https://tensorflow.org)
or [PyTorch](https://pytorch.org) pipelines.Please visit the [AIMET on Github Pages](https://quic.github.io/aimet-pages/index.html) for more details.
## Supported Features
#### Quantization
* *Cross-Layer Equalization*: Equalize weight tensors to reduce amplitude variation across channels
* *Bias Correction*: Corrects shift in layer outputs introduced due to quantization
* *Adaptive Rounding*: Learn the optimal rounding given unlabelled data
* *Quantization Simulation*: Simulate on-target quantized inference accuracy
* *Quantization-aware Training*: Use quantization simulation to train the model further to improve accuracy#### Model Compression
* *Spatial SVD*: Tensor decomposition technique to split a large layer into two smaller ones
* *Channel Pruning*: Removes redundant input channels from a layer and reconstructs layer weights
* *Per-layer compression-ratio selection*: Automatically selects how much to compress each layer in the model#### Visualization
* *Weight ranges*: Inspect visually if a model is a candidate for applying the Cross Layer Equalization technique. And the effect after applying the technique
* *Per-layer compression sensitivity*: Visually get feedback about the sensitivity of any given layer in the model to compression## What's New
Some recently added features include
* Adaptive Rounding (AdaRound): Learn the optimal rounding given unlabelled data
* Quantization-aware Training (QAT) for recurrent models (including with RNNs, LSTMs and GRUs)## Results
AIMET can quantize an existing 32-bit floating-point model to an 8-bit fixed-point model without sacrificing much accuracy and without model fine-tuning.
DFQ
The DFQ method applied to several popular networks, such as MobileNet-v2 and ResNet-50, result in less than 0.9%
loss in accuracy all the way down to 8-bit quantization, in an automated way without any training data.
Models
FP32
INT8 Simulation
MobileNet v2 (top1)
71.72%
71.08%
ResNet 50 (top1)
76.05%
75.45%
DeepLab v3 (mIOU)
72.65%
71.91%
AdaRound (Adaptive Rounding)
ADAS Object Detect
For this example ADAS object detection model, which was challenging to quantize to 8-bit precision,
AdaRound can recover the accuracy to within 1% of the FP32 accuracy.
Configuration
mAP - Mean Average Precision
FP32
82.20%
Nearest Rounding (INT8 weights, INT8 acts)
49.85%
AdaRound (INT8 weights, INT8 acts)
81.21%
DeepLabv3 Semantic Segmentation
For some models like the DeepLabv3 semantic segmentation model, AdaRound can even quantize the model weights to
4-bit precision without a significant drop in accuracy.
Configuration
mIOU - Mean intersection over union
FP32
72.94%
Nearest Rounding (INT4 weights, INT8 acts)
6.09%
AdaRound (INT4 weights, INT8 acts)
70.86%
Quantization for Recurrent Models
AIMET supports quantization simulation and quantization-aware training (QAT) for recurrent models (RNN, LSTM, GRU).
Using QAT feature in AIMET, a DeepSpeech2 model with bi-directional LSTMs can be quantized to 8-bit precision with
minimal drop in accuracy.
DeepSpeech2
(using bi-directional LSTMs)
Word Error Rate
FP32
9.92%
INT8
10.22%
Model Compression
AIMET can also significantly compress models. For popular models, such as Resnet-50 and Resnet-18,
compression with spatial SVD plus channel pruning achieves 50% MAC (multiply-accumulate) reduction while retaining
accuracy within approx. 1% of the original uncompressed model.
Models
Uncompressed model
50% Compressed model
ResNet18 (top1)
69.76%
68.56%
ResNet 50 (top1)
76.05%
75.75%
## Resources
* [User Guide](https://quic.github.io/aimet-pages/releases/latest/user_guide/index.html)
* [API Docs](https://quic.github.io/aimet-pages/releases/latest/api_docs/index.html)
* [Discussion Forums](https://forums.quicinc.com/)
* [Tutorial Videos](https://quic.github.io/aimet-pages/index.html#video)
* [Example Code](Examples/README.md)## Contributions
Thanks for your interest in contributing to AIMET! Please read our [Contributions Page](CONTRIBUTING.md) for more information on contributing features or bug fixes. We look forward to your participation!## Team
AIMET aims to be a community-driven project maintained by Qualcomm Innovation Center, Inc.## License
AIMET is licensed under the BSD 3-clause "New" or "Revised" License. Check out the [LICENSE](LICENSE) for more details.