https://github.com/mit-han-lab/apq
[CVPR 2020] APQ: Joint Search for Network Architecture, Pruning and Quantization Policy
https://github.com/mit-han-lab/apq
compression joint-optimization nas quantization
Last synced: 2 months ago
JSON representation
[CVPR 2020] APQ: Joint Search for Network Architecture, Pruning and Quantization Policy
- Host: GitHub
- URL: https://github.com/mit-han-lab/apq
- Owner: mit-han-lab
- License: other
- Created: 2019-11-28T05:18:32.000Z (almost 6 years ago)
- Default Branch: master
- Last Pushed: 2020-06-16T03:37:21.000Z (over 5 years ago)
- Last Synced: 2023-11-07T19:59:01.257Z (almost 2 years ago)
- Topics: compression, joint-optimization, nas, quantization
- Language: Python
- Homepage: https://hanlab.mit.edu
- Size: 601 KB
- Stars: 148
- Watchers: 11
- Forks: 32
- Open Issues: 6
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# APQ: Joint Search for Nerwork Architecture, Pruning and Quantization Policy
```
@inproceedings{Wang2020APQ,
title={APQ: Joint Search for Nerwork Architecture, Pruning and Quantization Policy},
author={Tianzhe Wang and Kuan Wang and Han Cai and Ji Lin and Zhijian Liu and Song Han},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
year={2020}
}
```
## Overview
We release the PyTorch code for the APQ. [[Paper](https://arxiv.org/pdf/2006.08509.pdf)|[Video](https://www.youtube.com/watch?v=s5v23hTe60s)|[Competition](https://github.com/mit-han-lab/lpcvc)]:
![]()
### Jointly Search for Optimal Model
![]()
### Save Orders of Magnitude Searching Cost
![]()
### Better Performance than Sequential Design
![]()
## How to Use
### Prerequisites
- Pytorch version >= 1.0
- Python version >= 3.6
- Progress >= 1.5
- For getting new models, you'll need the NVIDIA GPU### Dataset and Model Preparation
- Download [ImageNet dataset](http://www.image-net.org/) and put it into **dataset/imagenet**.
- Download checkpoints for [quantization-aware predictor](https://drive.google.com/file/d/1onIxkfLF-QCxi9YxzwQt6SpAaYNJBUDs/view?usp=sharing) and [once-for-all network](https://drive.google.com/file/d/1k9tv1ISsB-QDENspiuR82rDvaIYGIKD5/view?usp=sharing), put them into **models** folder.### Codebase Structure
```
apq
- dataset (imagenet data path)
- elastic_nn (super network builder , w/ or w/o quantization)
- modules (define the layers, w/ or w/o quantization)
- networks (define the networks, w/ or w/o quantization)
utils.py (some utility functions for elastic_nn folder)
- models (quantzation-aware predictor and once-for-all network checkpoint path)
- imagenet_codebase (training codebase for imagenet)
- lut (latency lookup table path)
- methods (methods to find the mixed-precision network)
- evolution (evolution search code)
- utils (some utility functions, including converter)
accuracy_predictor.py (construction of accuracy predictor)
latency_predictor.py (construction of latency predictor)
converter.py (encode a subnetwork in to 1-hot vector)
quant-aware.py (code for quantization-aware training)
main.py
Readme.md
```### Testing
For instance, if you want to test the model under *exps/test* folder.Run the following command:
``` bash
CUDA_VISIBLE_DEVICES=0,1,2,3 python test.py \
--exp_dir=exps/test
```
You will get the exact information (latency/energy) running on BitFusion platform and ImageNet Top-1 accuracy.
### Example
#### Evolution search
For instance, if you want to search a model under *12.80ms* latency constraint.Run the following command:
``` bash
CUDA_VISIBLE_DEVICES=0 python search.py \
--mode=evolution \
--acc_predictor_dir=models \
--exp_name=test \
--constraint=12.80 \
--type=latency
```
You will get the candidate under the resource constraints (latency or energy), which is stored in *exps/test* folder.
#### Quantization-aware finetune on imagenet
For instance, if you want to quantization-aware finetuning for the model under *exps/test* folder.Run the following command:
``` bash
CUDA_VISIBLE_DEVICES=0,1,2,3 python quant_aware.py \
--exp_name=test
```
You will get a mixed-precision model under the resource constraints (latency or energy) with considerable performance.## Models
We provide the checkpoints for our APQ reported in the paper:
| Latency | Energy | BitOps | Accuracy | Model
| :--:|:--:|:--:|:--:|:--:|
|6.11ms|**9.14mJ**|12.7G|72.8%|[download](https://drive.google.com/drive/folders/1qcdtJVXMl1eo12MkNUFWcqNAJjknHrQq?usp=sharing)
|8.45ms|**11.81mJ**|14.6G|73.8%|[download](https://drive.google.com/drive/folders/1Dnm8Id7ANVe3uoqfbIw6NqJFmx97pHHq?usp=sharing)
|**8.40ms**| 12.18mJ | 16.5G|74.1%|[download](https://drive.google.com/drive/folders/1N1UBOcNWQQc4cPOchfgUu518OBXy94LP?usp=sharing)
|**12.17ms**|14.14mJ|23.6G|75.1%|[download](https://drive.google.com/drive/folders/1--H3JbV50elbjRlwix1-cMAQvRwxLHDy?usp=sharing)You can download the models and put it into **exps** folder to test the performance.
Note that the **bold** item means the search under that constraint.## Related work on automated model compression and acceleration:
[Once for All: Train One Network and Specialize it for Efficient Deployment](https://arxiv.org/abs/1908.09791) (ICLR'20, [code](https://github.com/mit-han-lab/once-for-all))
[ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware](https://arxiv.org/pdf/1812.00332.pdf) (ICLR’19)
[AMC: AutoML for Model Compression and Acceleration on Mobile Devices](https://arxiv.org/pdf/1802.03494.pdf) (ECCV’18)
[HAQ: Hardware-Aware Automated Quantization](https://arxiv.org/pdf/1811.08886.pdf) (CVPR’19, oral)
[Defenstive Quantization: When Efficiency Meets Robustness](https://openreview.net/pdf?id=ryetZ20ctX) (ICLR'19)