https://github.com/mit-han-lab/apq

[CVPR 2020] APQ: Joint Search for Network Architecture, Pruning and Quantization Policy
https://github.com/mit-han-lab/apq

compression joint-optimization nas quantization

Last synced: 2 months ago
JSON representation

[CVPR 2020] APQ: Joint Search for Network Architecture, Pruning and Quantization Policy

Host: GitHub
URL: https://github.com/mit-han-lab/apq
Owner: mit-han-lab
License: other
Created: 2019-11-28T05:18:32.000Z (almost 6 years ago)
Default Branch: master
Last Pushed: 2020-06-16T03:37:21.000Z (over 5 years ago)
Last Synced: 2023-11-07T19:59:01.257Z (almost 2 years ago)
Topics: compression, joint-optimization, nas, quantization
Language: Python
Homepage: https://hanlab.mit.edu
Size: 601 KB
Stars: 148
Watchers: 11
Forks: 32
Open Issues: 6
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # APQ: Joint Search for Nerwork Architecture, Pruning and Quantization Policy

```

@inproceedings{Wang2020APQ,

  title={APQ: Joint Search for Nerwork Architecture, Pruning and Quantization Policy},

  author={Tianzhe Wang and Kuan Wang and Han Cai and Ji Lin and Zhijian Liu and Song Han},

  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},

  year={2020}

}

```

## Overview

We release the PyTorch code for the APQ. [[Paper](https://arxiv.org/pdf/2006.08509.pdf)|[Video](https://www.youtube.com/watch?v=s5v23hTe60s)|[Competition](https://github.com/mit-han-lab/lpcvc)]:



  



### Jointly Search for Optimal Model



  



### Save Orders of Magnitude Searching Cost 



  



### Better Performance than Sequential Design



  



## How to Use

### Prerequisites

    - Pytorch version >= 1.0

    - Python version >= 3.6

    - Progress >= 1.5

    - For getting new models, you'll need the NVIDIA GPU

### Dataset and Model Preparation

- Download [ImageNet dataset](http://www.image-net.org/) and put it into **dataset/imagenet**.

- Download checkpoints for [quantization-aware predictor](https://drive.google.com/file/d/1onIxkfLF-QCxi9YxzwQt6SpAaYNJBUDs/view?usp=sharing) and [once-for-all network](https://drive.google.com/file/d/1k9tv1ISsB-QDENspiuR82rDvaIYGIKD5/view?usp=sharing), put them into **models** folder.

### Codebase Structure

```

apq

- dataset (imagenet data path)

- elastic_nn (super network builder , w/ or w/o quantization)

    - modules (define the layers, w/ or w/o quantization)

    - networks (define the networks, w/ or w/o quantization)

    utils.py (some utility functions for elastic_nn folder)

- models (quantzation-aware predictor and once-for-all network checkpoint path)

- imagenet_codebase (training codebase for imagenet)

- lut (latency lookup table path)

- methods (methods to find the mixed-precision network)

    - evolution (evolution search code)

- utils (some utility functions, including converter)

    accuracy_predictor.py (construction of accuracy predictor)

    latency_predictor.py (construction of latency predictor)

    converter.py (encode a subnetwork in to 1-hot vector)

    quant-aware.py (code for quantization-aware training)

main.py

Readme.md

```

### Testing

For instance, if you want to test the model under *exps/test* folder.

Run the following command:

``` bash

CUDA_VISIBLE_DEVICES=0,1,2,3 python test.py \

    --exp_dir=exps/test

```

You will get the exact information (latency/energy) running on BitFusion platform and ImageNet Top-1 accuracy.

### Example

#### Evolution search

For instance, if you want to search a model under *12.80ms* latency constraint.

Run the following command: 

``` bash

CUDA_VISIBLE_DEVICES=0 python search.py \

    --mode=evolution \

    --acc_predictor_dir=models \

    --exp_name=test \

    --constraint=12.80 \

    --type=latency

```

You will get the candidate under the resource constraints (latency or energy), which is stored in *exps/test* folder.

#### Quantization-aware finetune on imagenet

For instance, if you want to quantization-aware finetuning for the model under *exps/test* folder.

Run the following command: 

``` bash

CUDA_VISIBLE_DEVICES=0,1,2,3 python quant_aware.py \

    --exp_name=test

```

You will get a mixed-precision model under the resource constraints (latency or energy) with considerable performance.

## Models

We provide the checkpoints for our APQ reported in the paper:

| Latency | Energy | BitOps | Accuracy | Model

| :--:|:--:|:--:|:--:|:--:|

|6.11ms|**9.14mJ**|12.7G|72.8%|[download](https://drive.google.com/drive/folders/1qcdtJVXMl1eo12MkNUFWcqNAJjknHrQq?usp=sharing)

|8.45ms|**11.81mJ**|14.6G|73.8%|[download](https://drive.google.com/drive/folders/1Dnm8Id7ANVe3uoqfbIw6NqJFmx97pHHq?usp=sharing)

|**8.40ms**| 12.18mJ | 16.5G|74.1%|[download](https://drive.google.com/drive/folders/1N1UBOcNWQQc4cPOchfgUu518OBXy94LP?usp=sharing)

|**12.17ms**|14.14mJ|23.6G|75.1%|[download](https://drive.google.com/drive/folders/1--H3JbV50elbjRlwix1-cMAQvRwxLHDy?usp=sharing)

You can download the models and put it into **exps** folder to test the performance.

Note that the **bold** item means the search under that constraint. 

## Related work on automated model compression and acceleration:

[Once for All: Train One Network and Specialize it for Efficient Deployment](https://arxiv.org/abs/1908.09791) (ICLR'20, [code](https://github.com/mit-han-lab/once-for-all))

[ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware](https://arxiv.org/pdf/1812.00332.pdf) (ICLR’19)

[AMC: AutoML for Model Compression and Acceleration on Mobile Devices](https://arxiv.org/pdf/1802.03494.pdf) (ECCV’18)

[HAQ: Hardware-Aware Automated Quantization](https://arxiv.org/pdf/1811.08886.pdf)  (CVPR’19, oral)

	

[Defenstive Quantization: When Efficiency Meets Robustness](https://openreview.net/pdf?id=ryetZ20ctX) (ICLR'19)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/mit-han-lab/apq

Awesome Lists containing this project

README