Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/gpleiss/efficient_densenet_pytorch
A memory-efficient implementation of DenseNets
https://github.com/gpleiss/efficient_densenet_pytorch
deep-learning densenet pytorch
Last synced: 30 days ago
JSON representation
A memory-efficient implementation of DenseNets
- Host: GitHub
- URL: https://github.com/gpleiss/efficient_densenet_pytorch
- Owner: gpleiss
- License: mit
- Created: 2017-05-31T17:19:25.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2023-06-01T13:11:51.000Z (over 1 year ago)
- Last Synced: 2024-10-01T13:01:18.177Z (about 1 month ago)
- Topics: deep-learning, densenet, pytorch
- Language: Python
- Homepage:
- Size: 1.09 MB
- Stars: 1,516
- Watchers: 44
- Forks: 327
- Open Issues: 12
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# efficient_densenet_pytorch
A PyTorch >=1.0 implementation of DenseNets, optimized to save GPU memory.## Recent updates
1. **Now works on PyTorch 1.0!** It uses the checkpointing feature, which makes this code WAY more efficient!!!## Motivation
While DenseNets are fairly easy to implement in deep learning frameworks, most
implmementations (such as the [original](https://github.com/liuzhuang13/DenseNet)) tend to be memory-hungry.
In particular, the number of intermediate feature maps generated by batch normalization and concatenation operations
grows quadratically with network depth.
*It is worth emphasizing that this is not a property inherent to DenseNets, but rather to the implementation.*This implementation uses a new strategy to reduce the memory consumption of DenseNets.
We use [checkpointing](https://pytorch.org/docs/stable/checkpoint.html?highlight=checkpointing) to compute the Batch Norm and concatenation feature maps.
These intermediate feature maps are discarded during the forward pass and recomputed for the backward pass.
This adds 15-20% of time overhead for training, but **reduces feature map consumption from quadratic to linear.**This implementation is inspired by this [technical report](https://arxiv.org/pdf/1707.06990.pdf), which outlines a strategy for efficient DenseNets via memory sharing.
## Requirements
- PyTorch >=1.0.0
- CUDA## Usage
**In your existing project:**
There is one file in the `models` folder.
- `models/densenet.py` is an implementation based off the [torchvision](https://github.com/pytorch/vision/blob/master/torchvision/models/densenet.py) and
[project killer](https://github.com/felixgwu/img_classification_pk_pytorch/blob/master/models/densenet.py) implementations.If you care about speed, and memory is not an option, pass the `efficient=False` argument into the `DenseNet` constructor.
Otherwise, pass in `efficient=True`.**Options:**
- All options are described in [the docstrings of the model files](https://github.com/gpleiss/efficient_densenet_pytorch/blob/master/models/densenet_efficient.py#L189)
- The depth is controlled by `block_config` option
- `efficient=True` uses the memory-efficient version
- If you want to use the model for ImageNet, set `small_inputs=False`. For CIFAR or SVHN, set `small_inputs=True`.**Running the demo:**
The only extra package you need to install is [python-fire](https://github.com/google/python-fire):
```sh
pip install fire
```- Single GPU:
```sh
CUDA_VISIBLE_DEVICES=0 python demo.py --efficient True --data --save
```- Multiple GPU:
```sh
CUDA_VISIBLE_DEVICES=0,1,2 python demo.py --efficient True --data --save
```Options:
- `--depth` (int) - depth of the network (number of convolution layers) (default 40)
- `--growth_rate` (int) - number of features added per DenseNet layer (default 12)
- `--n_epochs` (int) - number of epochs for training (default 300)
- `--batch_size` (int) - size of minibatch (default 256)
- `--seed` (int) - manually set the random seed (default None)## Performance
A comparison of the two implementations (each is a DenseNet-BC with 100 layers, batch size 64, tested on a NVIDIA Pascal Titan-X):
| Implementation | Memory cosumption (GB/GPU) | Speed (sec/mini batch) |
|----------------|------------------------|------------------------|
| Naive | 2.863 | 0.165 |
| Efficient | 1.605 | 0.207 |
| Efficient (multi-GPU) | 0.985 | - |## Other efficient implementations
- [LuaTorch](https://github.com/liuzhuang13/DenseNet/tree/master/models) (by Gao Huang)
- [Tensorflow](https://github.com/joeyearsley/efficient_densenet_tensorflow) (by Joe Yearsley)
- [Caffe](https://github.com/Tongcheng/DN_CaffeScript) (by Tongcheng Li)## Reference
```
@article{pleiss2017memory,
title={Memory-Efficient Implementation of DenseNets},
author={Pleiss, Geoff and Chen, Danlu and Huang, Gao and Li, Tongcheng and van der Maaten, Laurens and Weinberger, Kilian Q},
journal={arXiv preprint arXiv:1707.06990},
year={2017}
}
```