https://github.com/facebookresearch/kill-the-bits
Code for: "And the bit goes down: Revisiting the quantization of neural networks"
https://github.com/facebookresearch/kill-the-bits
Last synced: 8 months ago
JSON representation
Code for: "And the bit goes down: Revisiting the quantization of neural networks"
- Host: GitHub
- URL: https://github.com/facebookresearch/kill-the-bits
- Owner: facebookresearch
- License: other
- Archived: true
- Created: 2019-05-21T11:52:40.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2020-11-09T09:57:19.000Z (about 5 years ago)
- Last Synced: 2024-12-17T01:37:37.559Z (11 months ago)
- Language: Python
- Size: 25.6 MB
- Stars: 633
- Watchers: 25
- Forks: 123
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
- awesome-ml-model-compression - facebookresearch/kill-the-bits - code and compressed models for the paper, "And the bit goes down: Revisiting the quantization of neural networks" by Facebook AI Research. (Tools / Paper Implementations)
README
# And the bit goes down
This repository contains the implementation of our paper: [And the bit goes down: Revisiting the quantization of neural networks](https://arxiv.org/abs/1907.05686) (ICLR 2020) as well as the compressed models we obtain (ResNets and Mask R-CNN).
Our compression method is based on vector quantization. It takes as input an already trained neural network and, through a distillation procedure at all layers and a fine-tuning stage, optimizes the accuracy of the network.
This approach outperforms the state-of-the-art w.r.t. compression/accuracy trade-off for standard networks like ResNet-18 and ResNet-50 (see [Compressed models](#Compressed-Models)).
## Installation
Our code works with Python 3.6 and newest. To run the code, you must have the following packages installed:
- [NumPy](http://www.numpy.org/)
- [PyTorch](http://pytorch.org/) (version=1.0.1.post2)
These dependencies can be installed with:
`
pip install -r requirements.txt
`
## Compressed Models
The compressed models (centroids + assignments) are available in the `models/compressed` folder. We provide code to evaluate those models on their standard benchmarks (ImageNet/COCO). Note that inference can be performed both on GPU or on CPU. Note also that we did not optimize this precise part of the code for speed. Indeed, the code for inference should rather be regarded as a proof of concept: based on the centroids and the assignments, we recover the accuracies mentioned in the table above by instantiating the full, non-compressed model.
### Vanilla ResNets
We provide the vanilla compressed ResNet-18 and ResNet-50 models for 256 centroids in the low and high compression regimes. As mentioned in the paper, the low compression regime corresponds to a block size of 9 for standard 3x3 convolutions and to a block size of 4 for 1x1 pointwise convolutions. Similarly, the high compression regime corresponds to a block size of 18 for standard 3x3 convolutions and to a block size of 8 for 1x1 pointwise convolutions.
|Model (non-compressed top-1) | Compression | Size ratio | Model size | Top-1 (%)|
|:-:|:-:|:-:|:-:|:--:|
ResNet-18 (69.76%) | Small blocks
Large blocks | 29x
43x |1.54 MB
1.03 MB|**65.81**
**61.18**
ResNet-50 (76.15%) | Small blocks
Large blocks | 19x
31x |5.09 MB
3.19 MB|**73.79**
**68.21**
To evaluate on the standard test set of ImageNet: clone the repo, `cd` into `src/` and run:
```bash
python inference.py --model resnet18 --state-dict-compressed models/compressed/resnet18_small_blocks.pth --device cuda --data-path YOUR_IMAGENET_PATH
```
### Semi-supervised ResNet50
We provide the compressed [semi-supervised ResNet50](https://arxiv.org/abs/1905.00546) trained and open-sourced by Yalniz *et. al.* We use 256 centroids and the small blocks compression regime.
|Model (non-compressed top-1) | Compression | Size ratio | Model size | Top-1 (%)|
|:-:|:-:|:-:|:-:|:--:|
Semi-Supervised ResNet-50 (79.30%) | Small blocks| 19x | 5.20 MB | **76.12**
To evaluate on the standard test set of ImageNet: clone the repo, `cd` into `src/` and run:
```bash
python inference.py --model resnet50_semisup --state-dict-compressed models/compressed/resnet50_semisup_small_blocks.pth --device cuda --data-path YOUR_IMAGENET_PATH
```
### Mask R-CNN
We provide the compressed Mask R-CNN (backbone ResNet50-FPN) available in the [PyTorch Model Zoo](https://pytorch.org/docs/stable/torchvision/models.html). As mentioned in the paper, we use 256 centroids and various block sizes to reach an interesting size/accuracy tradeoff (with a 26x compression factor). Note that you need [torchvision 0.3](https://pytorch.org/blog/torchvision03/) in order to run this part of the code.
|Model | Size | Box AP| Mask AP |
|:-:|:-:|:-:|:-:|
|Non-compressed | 170 MB | 37.9 | 34.6|
|Compressed | 6.65 MB | 33.9 | 30.8 |
To evaluate on COCO: clone the repo, run `git checkout mask_r_cnn`, `cd` into `src/` and run:
```bash
python inference.py --model maskrcnn_resnet50_fpn --state-dict-compressed models/compressed/mask_r_cnn.pth --device cuda --data-path YOUR_COCO_PATH
```
## Results
You can also compress the vanilla ResNet models and reproduce the results of our paper by `cd` into `src/` and by running the following commands:
- For the *small blocks* compression regime:
```bash
python quantize.py --model resnet18 --block-size-cv 9 --block-size-pw 4 --n-centroids-cv 256 --n-centroids-pw 256 --n-centroids-fc 2048 --data-path YOUR_IMAGENET_PATH
python quantize.py --model resnet50 --block-size-cv 9 --block-size-pw 4 --n-centroids-cv 256 --n-centroids-pw 256 --n-centroids-fc 1024 --data-path YOUR_IMAGENET_PATH
```
- For the *large blocks* compression regime:
```bash
python quantize.py --model resnet18 --block-size-cv 18 --block-size-pw 4 --n-centroids-cv 256 --n-centroids-pw 256 --n-centroids-fc 2048 --data-path YOUR_IMAGENET_PATH
python quantize.py --model resnet50 --block-size-cv 18 --block-size-pw 8 --n-centroids-cv 256 --n-centroids-pw 256 --n-centroids-fc 1024 --data-path YOUR_IMAGENET_PATH
```
Note that the vanilla ResNet-18 and ResNet-50 teacher (non-compressed) models are taken from the PyTorch model zoo. Note also that we run our code on a single 16GB Volta V100 GPU.
## License
This repository is released under Creative Commons Attribution 4.0 International (CC BY 4.0) license, as found in the LICENSE file.
## Bibliography
Please consider citing [1] if you found the resources in this repository useful.
[1] Stock, Pierre and Joulin, Armand and Gribonval, Rémi and Graham, Benjamin and Jégou, Hervé. [And the bit goes down: Revisiting the quantization of neural networks](https://arxiv.org/abs/1907.05686).
```
@inproceedings{stock2019killthebits,
title = {And the bit goes down: Revisiting the quantization of neural networks},
author = {Stock, Pierre and Joulin, Armand and Gribonval, R{\'e}mi and Graham, Benjamin and J{\'e}gou, Herv{\'e}},
booktitle = {International Conference on Learning Representations (ICLR)},
year = {2020}
}
```