https://github.com/rakutentech/iterative_training
Iterative Training: Finding Binary Weight Deep Neural Networks with Layer Binarization
https://github.com/rakutentech/iterative_training
deep-learning machine-learning model-compression neural-network
Last synced: 10 months ago
JSON representation
Iterative Training: Finding Binary Weight Deep Neural Networks with Layer Binarization
- Host: GitHub
- URL: https://github.com/rakutentech/iterative_training
- Owner: rakutentech
- License: mit
- Created: 2019-08-21T10:11:19.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2021-11-19T04:13:40.000Z (over 4 years ago)
- Last Synced: 2025-05-01T12:43:02.978Z (12 months ago)
- Topics: deep-learning, machine-learning, model-compression, neural-network
- Language: Python
- Homepage:
- Size: 179 KB
- Stars: 0
- Watchers: 10
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project
README
# Iterative Training: Finding Binary Weight Deep Neural Networks with Layer Binarization
This repository contains the source code for the paper: [https://arxiv.org/abs/2111.07046](https://arxiv.org/abs/2111.07046).
## Requirements
* GPU
* Python 3
* PyTorch 1.9
* Earlier version may work, but untested.
* `pip install -r requirements.txt`
* If running ResNet-21 or ImageNet experiments, first download and prepare the ImageNet 2012 dataset with [bin/imagenet_prep.sh](bin/imagenet_prep.sh) script.
## Running
For non-ImageNet experiments, the main python file is [main.py](main.py). To see its arguments:
```sh
python main.py --help
```
Running for the first time can take a little longer due to automatic downloading of the MNIST and Cifar-10 dataset from the Internet.
For ImageNet experiments, the main python files are [main_imagenet_float.py](main_imagenet_float.py) and [main_imagenet_binary.py](main_imagenet_binary.py).
Too see their arguments:
```sh
python main_imagenet_float.py --help
```
and
```sh
python main_imagenet_binary.py --help
```
The ImageNet dataset must be already downloaded and prepared. Please see the requirements section for details.
## Scripts
The main python file has many options. The following scripts runs training with hyper-parameters given in the paper.
Output includes a run-log text file and tensorboard files.
These files are saved to `./logs` and reused for subsequent runs.
### 300-100-10
#### Sensitivity Pre-training
```sh
# Layer 1. Learning rate 0.1.
./scripts/mnist/300/sensitivity/layer.sh sensitivity forward 0.1 0
# Layer 2. Learning rate 0.1.
./scripts/mnist/300/sensitivity/layer.sh sensitivity 231 0.1 0
# Layer 3. Learning rate 0.1.
./scripts/mnist/300/sensitivity/layer.sh sensitivity reverse 0.1 0
```
Output files and run-log are written to `./logs/mnist/val/sensitivity/`.
#### Hyperparam search
For floating-point training:
```sh
# Learning rate 0.1.
./scripts/mnist/300/val/float.sh hyperparam 0.1 0
```
For full binary training:
```sh
# Learning rate 0.1.
./scripts/mnist/300/val/binary.sh hyperparam 0.1 0
```
For iterative training:
```sh
# Forward order. Learning rate 0.1.
./scripts/mnist/300/val/layer.sh hyperparam forward 0.1 0
# Reverse order. Learning rate 0.1.
./scripts/mnist/300/val/layer.sh hyperparam reverse 0.1 0
# 1, 3, 2 order. Learning rate 0.1.
./scripts/mnist/300/val/layer.sh hyperparam 132 0.1 0
# 2, 1, 3 order. Learning rate 0.1.
./scripts/mnist/300/val/layer.sh hyperparam 213 0.1 0
# 2, 3, 1 order. Learning rate 0.1.
./scripts/mnist/300/val/layer.sh hyperparam 231 0.1 0
# 3, 1, 2 order. Learning rate 0.1.
./scripts/mnist/300/val/layer.sh hyperparam 312 0.1 0
```
Output files and run-log are written to `./logs/mnist/val/hyperparam/`.
#### Full Training
For floating-point training:
```sh
# Learning rate 0.1. Seed 316.
./scripts/mnist/300/run/float.sh full 0.1 316 0
```
For full binary training:
```sh
# Learning rate 0.1. Seed 316.
./scripts/mnist/300/run/binary.sh full 0.1 316 0
```
For iterative training:
```sh
# Forward order. Learning rate 0.1. Seed 316.
./scripts/mnist/300/run/layer.sh full forward 0.1 316 0
# Reverse order. Learning rate 0.1. Seed 316.
./scripts/mnist/300/run/layer.sh full reverse 0.1 316 0
# 1, 3, 2 order. Learning rate 0.1. Seed 316.
./scripts/mnist/300/run/layer.sh full 132 0.1 316 0
# 2, 1, 3 order. Learning rate 0.1. Seed 316.
./scripts/mnist/300/run/layer.sh full 213 0.1 316 0
# 2, 3, 1 order. Learning rate 0.1. Seed 316.
./scripts/mnist/300/run/layer.sh full 231 0.1 316 0
# 3, 1, 2 order. Learning rate 0.1. Seed 316.
./scripts/mnist/300/run/layer.sh full 312 0.1 316 0
```
Output files and run-log are written to `./logs/mnist/run/full/`.
### 784-100-10
#### Sensitivity Pre-training
```sh
# Layer 1. Learning rate 0.1.
./scripts/mnist/784/sensitivity/layer.sh sensitivity forward 0.1 0
# Layer 2. Learning rate 0.1.
./scripts/mnist/784/sensitivity/layer.sh sensitivity 231 0.1 0
# Layer 3. Learning rate 0.1.
./scripts/mnist/784/sensitivity/layer.sh sensitivity reverse 0.1 0
```
Output files and run-log are written to `./logs/mnist/val/sensitivity/`.
#### Hyperparam search
For floating-point training:
```sh
# Learning rate 0.1.
./scripts/mnist/784/val/float.sh hyperparam 0.1 0
```
For full binary training:
```sh
# Learning rate 0.1.
./scripts/mnist/784/val/binary.sh hyperparam 0.1 0
```
For iterative training:
```sh
# Forward order. Learning rate 0.1.
./scripts/mnist/784/val/layer.sh hyperparam forward 0.1 0
# Reverse order. Learning rate 0.1.
./scripts/mnist/784/val/layer.sh hyperparam reverse 0.1 0
# 1, 3, 2 order. Learning rate 0.1.
./scripts/mnist/784/val/layer.sh hyperparam 132 0.1 0
# 2, 1, 3 order. Learning rate 0.1.
./scripts/mnist/784/val/layer.sh hyperparam 213 0.1 0
# 2, 3, 1 order. Learning rate 0.1.
./scripts/mnist/784/val/layer.sh hyperparam 231 0.1 0
# 3, 1, 2 order. Learning rate 0.1.
./scripts/mnist/784/val/layer.sh hyperparam 312 0.1 0
```
Output files and run-log are written to `./logs/mnist/val/hyperparam/`.
#### Full Training
For floating-point training:
```sh
# Learning rate 0.1. Seed 316.
./scripts/mnist/784/run/float.sh full 0.1 316 0
```
For full binary training:
```sh
# Learning rate 0.1. Seed 316.
./scripts/mnist/784/run/binary.sh full 0.1 316 0
```
For iterative training:
```sh
# Forward order. Learning rate 0.1. Seed 316.
./scripts/mnist/784/run/layer.sh full forward 0.1 316 0
# Reverse order. Learning rate 0.1. Seed 316.
./scripts/mnist/784/run/layer.sh full reverse 0.1 316 0
# 1, 3, 2 order. Learning rate 0.1. Seed 316.
./scripts/mnist/784/run/layer.sh full 132 0.1 316 0
# 2, 1, 3 order. Learning rate 0.1. Seed 316.
./scripts/mnist/784/run/layer.sh full 213 0.1 316 0
# 2, 3, 1 order. Learning rate 0.1. Seed 316.
./scripts/mnist/784/run/layer.sh full 231 0.1 316 0
# 3, 1, 2 order. Learning rate 0.1. Seed 316.
./scripts/mnist/784/run/layer.sh full 312 0.1 316 0
```
Output files and run-log are written to `./logs/mnist/run/full/`.
### Vgg-5
#### Sensitivity Pre-training
```sh
# Layer 1. Learning rate 0.1.
./scripts/cifar10/vgg5/sensitivity/layer.sh sensitivity 1 0.1 0
# Layer 2. Learning rate 0.1.
./scripts/cifar10/vgg5/sensitivity/layer.sh sensitivity 2 0.1 0
# Layer 5. Learning rate 0.1.
./scripts/cifar10/vgg5/sensitivity/layer.sh sensitivity 5 0.1 0
```
Output files and run-log are written to `./logs/cifar10/val/sensitivity/`.
#### Hyperparam Search
For floating-point training:
```sh
# Learning rate 0.1.
./scripts/cifar10/vgg5/val/float.sh hyperparam 0.1 0
```
For full binary training:
```sh
# Learning rate 0.1.
./scripts/cifar10/vgg5/val/binary.sh hyperparam 0.1 0
```
For iterative training:
```sh
# Forward order. Learning rate 0.1.
./scripts/cifar10/vgg5/val/layer.sh hyperparam forward 0.1 0
# Ascend order. Learning rate 0.1.
./scripts/cifar10/vgg5/val/layer.sh hyperparam ascend 0.1 0
# Reverse order. Learning rate 0.1.
./scripts/cifar10/vgg5/val/layer.sh hyperparam reverse 0.1 0
# Descend order. Learning rate 0.1.
./scripts/cifar10/vgg5/val/layer.sh hyperparam descend 0.1 0
# Random order. Learning rate 0.1.
./scripts/cifar10/vgg5/val/layer.sh hyperparam random 0.1 0
```
Output files and run-log are written to `./logs/cifar10/val/hyperparam/`.
#### Full Training
For floating-point training:
```sh
# Learning rate 0.1. Seed 316.
./scripts/cifar10/vgg5/run/float.sh full 0.1 316 0
```
For full binary training:
```sh
# Learning rate 0.1. Seed 316.
./scripts/cifar10/vgg5/run/binary.sh full 0.1 316 0
```
For iterative training:
```sh
# Forward order. Learning rate 0.1. Seed 316.
./scripts/cifar10/vgg5/run/layer.sh full forward 0.1 316 0
# Ascend order. Learning rate 0.1. Seed 316.
./scripts/cifar10/vgg5/run/layer.sh full ascend 0.1 316 0
# Reverse order. Learning rate 0.1. Seed 316.
./scripts/cifar10/vgg5/run/layer.sh full reverse 0.1 316 0
# Descend order. Learning rate 0.1. Seed 316.
./scripts/cifar10/vgg5/run/layer.sh full descend 0.1 316 0
# Random order. Learning rate 0.1. Seed 316.
./scripts/cifar10/vgg5/run/layer.sh full random 0.1 316 0
```
Output files and run-log are written to `./logs/cifar10/run/full/`.
### Vgg-9
#### Sensitivity Pre-training
```sh
# Layer 1. Learning rate 0.1.
./scripts/cifar10/vgg9/sensitivity/layer.sh sensitivity 1 0.1 0
# Layer 2. Learning rate 0.1.
./scripts/cifar10/vgg9/sensitivity/layer.sh sensitivity 2 0.1 0
# Layer 5. Learning rate 0.1.
./scripts/cifar10/vgg9/sensitivity/layer.sh sensitivity 5 0.1 0
```
Output files and run-log are written to `./logs/cifar10/val/sensitivity/`.
#### Hyperparam Search
For floating-point training:
```sh
# Learning rate 0.1.
./scripts/cifar10/vgg9/val/float.sh hyperparam 0.1 0
```
For full binary training:
```sh
# Learning rate 0.1.
./scripts/cifar10/vgg9/val/binary.sh hyperparam 0.1 0
```
For iterative training:
```sh
# Forward order. Learning rate 0.1.
./scripts/cifar10/vgg9/val/layer.sh hyperparam forward 0.1 0
# Ascend order. Learning rate 0.1.
./scripts/cifar10/vgg9/val/layer.sh hyperparam ascend 0.1 0
# Reverse order. Learning rate 0.1.
./scripts/cifar10/vgg9/val/layer.sh hyperparam reverse 0.1 0
# Descend order. Learning rate 0.1.
./scripts/cifar10/vgg9/val/layer.sh hyperparam descend 0.1 0
# Random order. Learning rate 0.1.
./scripts/cifar10/vgg9/val/layer.sh hyperparam random 0.1 0
```
Output files and run-log are written to `./logs/cifar10/val/hyperparam/`.
#### Full Training
For floating-point training:
```sh
# Learning rate 0.1. Seed 316.
./scripts/cifar10/vgg9/run/float.sh full 0.1 316 0
```
For full binary training:
```sh
# Learning rate 0.1. Seed 316.
./scripts/cifar10/vgg9/run/binary.sh full 0.1 316 0
```
For iterative training:
```sh
# Forward order. Learning rate 0.1. Seed 316.
./scripts/cifar10/vgg9/run/layer.sh full forward 0.1 316 0
# Ascend order. Learning rate 0.1. Seed 316.
./scripts/cifar10/vgg9/run/layer.sh full ascend 0.1 316 0
# Reverse order. Learning rate 0.1. Seed 316.
./scripts/cifar10/vgg9/run/layer.sh full reverse 0.1 316 0
# Descend order. Learning rate 0.1. Seed 316.
./scripts/cifar10/vgg9/run/layer.sh full descend 0.1 316 0
# Random order. Learning rate 0.1. Seed 316.
./scripts/cifar10/vgg9/run/layer.sh full random 0.1 316 0
```
Output files and run-log are written to `./logs/cifar10/run/full/`.
### ResNet-20
#### Sensitivity Pre-training
```sh
# Layer 1. Learning rate 0.1.
./scripts/cifar10/resnet20/sensitivity/layer.sh sensitivity 1 0.1 0
# Layer 2. Learning rate 0.1.
./scripts/cifar10/resnet20/sensitivity/layer.sh sensitivity 2 0.1 0
# ...
# Layer 20. Learning rate 0.1.
./scripts/cifar10/resnet20/sensitivity/layer.sh sensitivity 20 0.1 0
```
Output files and run-log are written to `./logs/cifar10/val/sensitivity/`.
#### Hyperparam Search
For floating-point training:
```sh
# Learning rate 0.1
./scripts/cifar10/resnet20/val/float.sh hyperparam 0.1 0
```
For full binary training:
```sh
# Learning rate 0.1
./scripts/cifar10/resnet20/val/binary.sh hyperparam 0.1 0
```
For iterative training:
```sh
# Forward order. Learning rate 0.1
./scripts/cifar10/resnet20/val/layer.sh hyperparam forward 0.1 0
# Ascend order. Learning rate 0.1
./scripts/cifar10/resnet20/val/layer.sh hyperparam ascend 0.1 0
# Reverse order. Learning rate 0.1
./scripts/cifar10/resnet20/val/layer.sh hyperparam reverse 0.1 0
# Descend order. Learning rate 0.1
./scripts/cifar10/resnet20/val/layer.sh hyperparam descend 0.1 0
# Random order. Learning rate 0.1
./scripts/cifar10/resnet20/val/layer.sh hyperparam random 0.1 0
```
Output files and run-log are written to `./logs/cifar10/val/hyperparam/`.
#### Full Training
For floating-point training:
```sh
# Learning rate 0.1. Seed 316.
./scripts/cifar10/resnet20/run/float.sh full 0.1 316 0
```
For full binary training:
```sh
# Learning rate 0.1. Seed 316.
./scripts/cifar10/resnet20/run/binary.sh full 0.1 316 0
```
For iterative training:
```sh
# Forward order. Learning rate 0.1. Seed 316.
./scripts/cifar10/resnet20/run/layer.sh full forward 0.1 316 0
# Ascend order. Learning rate 0.1. Seed 316.
./scripts/cifar10/resnet20/run/layer.sh full ascend 0.1 316 0
# Reverse order. Learning rate 0.1. Seed 316.
./scripts/cifar10/resnet20/run/layer.sh full reverse 0.1 316 0
# Descend order. Learning rate 0.1. Seed 316.
./scripts/cifar10/resnet20/run/layer.sh full descend 0.1 316 0
# Random order. Learning rate 0.1. Seed 316.
./scripts/cifar10/resnet20/run/layer.sh full random 0.1 316 0
```
Output files and run-log are written to `./logs/cifar10/run/full/`.
### ResNet-21
To run experiments for ResNet-21, first download and prepare the ImageNet dataset.
See the requirements section at the beginning of this readme.
We assume the dataset is prepared and is at `./imagenet`.
#### Sensitivity Pre-training
```sh
# Layer 1. Learning rate 0.01.
./scripts/imagenet/layer.sh sensitivity ./imagenet 20 "[20]" 20 1 0.01
# Layer 2. Learning rate 0.01.
./scripts/imagenet/layer.sh sensitivity ./imagenet 20 "[20]" 20 2 0.01
# Layer 21. Learning rate 0.01.
./scripts/imagenet/layer.sh sensitivity ./imagenet 20 "[20]" 20 21 0.01
```
Output files and run-log are written to `./logs/imagenet/sensitivity/`.
#### Full Training
For floating-point training:
```sh
# Learning rate 0.01.
./scripts/imagenet/float.sh full ./imagenet 67 "[42,57]" 0.01
```
For full binary training:
```sh
# Learning rate 0.01.
./scripts/imagenet/binary.sh full ./imagenet 67 "[42,57]" 0.01
```
For layer-by-layer training:
```sh
# Forward order
./scripts/imagenet/layer.sh full ./imagenet 67 "[42,57]" 2 forward 0.01
# Ascending order
./scripts/imagenet/layer.sh full ./imagenet 67 "[42,57]" 2 ascend 0.01
```
For all scripts, output files and run-log are written to `./logs/imagenet/full/`.
## License
See [LICENSE](LICENSE)
## Contributing
See the [contributing guide](CONTRIBUTING.md) for details of how to participate in development of the module.