Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/frgfm/torch-scan
Seamless analysis of your PyTorch models (RAM usage, FLOPs, MACs, receptive field, etc.)
https://github.com/frgfm/torch-scan
benchmark deep-learning deep-neural-networks flops flops-counter keras python pytorch pytorch-utils receptive-field summary
Last synced: 12 days ago
JSON representation
Seamless analysis of your PyTorch models (RAM usage, FLOPs, MACs, receptive field, etc.)
- Host: GitHub
- URL: https://github.com/frgfm/torch-scan
- Owner: frgfm
- License: apache-2.0
- Created: 2020-03-16T21:57:33.000Z (over 4 years ago)
- Default Branch: main
- Last Pushed: 2024-10-21T05:03:20.000Z (21 days ago)
- Last Synced: 2024-10-23T04:20:43.986Z (19 days ago)
- Topics: benchmark, deep-learning, deep-neural-networks, flops, flops-counter, keras, python, pytorch, pytorch-utils, receptive-field, summary
- Language: Python
- Homepage: https://frgfm.github.io/torch-scan/latest
- Size: 7.2 MB
- Stars: 208
- Watchers: 7
- Forks: 22
- Open Issues: 13
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- Funding: .github/FUNDING.yml
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
README
The very useful [summary](https://www.tensorflow.org/api_docs/python/tf/keras/Model#summary) method of `tf.keras.Model` but for PyTorch, with more useful information.
## Quick Tour
### Inspecting your PyTorch architecture
Similarly to the `torchsummary` implementation, `torchscan` brings useful module information into readable format. For nested complex architectures, you can use a maximum depth of display as follows:
```python
from torchvision.models import densenet121
from torchscan import summarymodel = densenet121().eval().cuda()
summary(model, (3, 224, 224), max_depth=2)
```which would yield
```shell
__________________________________________________________________________________________
Layer Type Output Shape Param #
==========================================================================================
densenet DenseNet (-1, 1000) 0
├─features Sequential (-1, 1024, 7, 7) 0
| └─conv0 Conv2d (-1, 64, 112, 112) 9,408
| └─norm0 BatchNorm2d (-1, 64, 112, 112) 257
| └─relu0 ReLU (-1, 64, 112, 112) 0
| └─pool0 MaxPool2d (-1, 64, 56, 56) 0
| └─denseblock1 _DenseBlock (-1, 256, 56, 56) 338,316
| └─transition1 _Transition (-1, 128, 28, 28) 33,793
| └─denseblock2 _DenseBlock (-1, 512, 28, 28) 930,072
| └─transition2 _Transition (-1, 256, 14, 14) 133,121
| └─denseblock3 _DenseBlock (-1, 1024, 14, 14) 2,873,904
| └─transition3 _Transition (-1, 512, 7, 7) 528,385
| └─denseblock4 _DenseBlock (-1, 1024, 7, 7) 2,186,272
| └─norm5 BatchNorm2d (-1, 1024, 7, 7) 4,097
├─classifier Linear (-1, 1000) 1,025,000
==========================================================================================
Trainable params: 7,978,856
Non-trainable params: 0
Total params: 7,978,856
------------------------------------------------------------------------------------------
Model size (params + buffers): 30.76 Mb
Framework & CUDA overhead: 423.57 Mb
Total RAM usage: 454.32 Mb
------------------------------------------------------------------------------------------
Floating Point Operations on forward: 5.74 GFLOPs
Multiply-Accumulations on forward: 2.87 GMACs
Direct memory accesses on forward: 2.90 GDMAs
__________________________________________________________________________________________
```Results are aggregated to the selected depth for improved readability.
For reference, here are explanations of a few acronyms:
- **FLOPs**: floating-point operations (not to be confused with FLOPS which is FLOPs per second)
- **MACs**: mutiply-accumulate operations (cf. [wikipedia](https://en.wikipedia.org/wiki/Multiply%E2%80%93accumulate_operation))
- **DMAs**: direct memory accesses (many argue that it is more relevant than FLOPs or MACs to compare model inference speeds cf. [wikipedia](https://en.wikipedia.org/wiki/Direct_memory_access))Additionally, for highway nets (models without multiple branches / skip connections), `torchscan` supports receptive field estimation.
```python
from torchvision.models import vgg16
from torchscan import summarymodel = vgg16().eval().cuda()
summary(model, (3, 224, 224), receptive_field=True, max_depth=0)
```which will add the layer's receptive field (relatively to the last convolutional layer) to the summary.
## Setup
Python 3.8 (or newer) and [pip](https://pip.pypa.io/en/stable/)/[conda](https://docs.conda.io/en/latest/miniconda.html) are required to install Torchscan.
### Stable release
You can install the last stable release of the package using [pypi](https://pypi.org/project/torch-scan/) as follows:
```shell
pip install torchscan
```or using [conda](https://anaconda.org/frgfm/torchscan):
```shell
conda install -c frgfm torchscan
```### Developer installation
Alternatively, if you wish to use the latest features of the project that haven't made their way to a release yet, you can install the package from source:
```shell
git clone https://github.com/frgfm/torch-scan.git
pip install -e torch-scan/.
```## Benchmark
Below are the results for classification models supported by `torchvision` for a single image with 3 color channels of size `224x224` (apart from `inception_v3` which uses `299x299`).
| Model | Params (M) | FLOPs (G) | MACs (G) | DMAs (G) | RF |
| ------------------ | ---------- | --------- | -------- | -------- | ---- |
| alexnet | 61.1 | 1.43 | 0.71 | 0.72 | 195 |
| googlenet | 6.62 | 3.01 | 1.51 | 1.53 | -- |
| vgg11 | 132.86 | 15.23 | 7.61 | 7.64 | 150 |
| vgg11_bn | 132.87 | 15.26 | 7.63 | 7.66 | 150 |
| vgg13 | 133.05 | 22.63 | 11.31 | 11.35 | 156 |
| vgg13_bn | 133.05 | 22.68 | 11.33 | 11.37 | 156 |
| vgg16 | 138.36 | 30.96 | 15.47 | 15.52 | 212 |
| vgg16_bn | 138.37 | 31.01 | 15.5 | 15.55 | 212 |
| vgg19 | 143.67 | 39.28 | 19.63 | 19.69 | 268 |
| vgg19_bn | 143.68 | 39.34 | 19.66 | 19.72 | 268 |
| resnet18 | 11.69 | 3.64 | 1.82 | 1.84 | -- |
| resnet34 | 21.8 | 7.34 | 3.67 | 3.7 | -- |
| resnet50 | 25.56 | 8.21 | 4.11 | 4.15 | -- |
| resnet101 | 44.55 | 15.66 | 7.83 | 7.9 | -- |
| resnet152 | 60.19 | 23.1 | 11.56 | 11.65 | -- |
| inception_v3 | 27.16 | 11.45 | 5.73 | 5.76 | -- |
| squeezenet1_0 | 1.25 | 1.64 | 0.82 | 0.83 | -- |
| squeezenet1_1 | 1.24 | 0.7 | 0.35 | 0.36 | -- |
| wide_resnet50_2 | 68.88 | 22.84 | 11.43 | 11.51 | -- |
| wide_resnet101_2 | 126.89 | 45.58 | 22.8 | 22.95 | -- |
| densenet121 | 7.98 | 5.74 | 2.87 | 2.9 | -- |
| densenet161 | 28.68 | 15.59 | 7.79 | 7.86 | -- |
| densenet169 | 14.15 | 6.81 | 3.4 | 3.44 | -- |
| densenet201 | 20.01 | 8.7 | 4.34 | 4.39 | -- |
| resnext50_32x4d | 25.03 | 8.51 | 4.26 | 4.3 | -- |
| resnext101_32x8d | 88.79 | 32.93 | 16.48 | 16.61 | -- |
| mobilenet_v2 | 3.5 | 0.63 | 0.31 | 0.32 | -- |
| shufflenet_v2_x0_5 | 1.37 | 0.09 | 0.04 | 0.05 | -- |
| shufflenet_v2_x1_0 | 2.28 | 0.3 | 0.15 | 0.15 | -- |
| shufflenet_v2_x1_5 | 3.5 | 0.6 | 0.3 | 0.31 | -- |
| shufflenet_v2_x2_0 | 7.39 | 1.18 | 0.59 | 0.6 | -- |
| mnasnet0_5 | 2.22 | 0.22 | 0.11 | 0.12 | -- |
| mnasnet0_75 | 3.17 | 0.45 | 0.23 | 0.24 | -- |
| mnasnet1_0 | 4.38 | 0.65 | 0.33 | 0.34 | -- |
| mnasnet1_3 | 6.28 | 1.08 | 0.54 | 0.56 | -- |The above results were produced using the `scripts/benchmark.py` script.
*Note: receptive field computation is currently only valid for highway nets.*
## What else
### Documentation
The full package documentation is available [here](https://frgfm.github.io/torch-scan/) for detailed specifications.
### Example script
An example script is provided for you to benchmark torchvision models using the library:
```shell
python scripts/benchmark.py
```## Credits
This project is developed and maintained by the repo owner, but the implementation was inspired or helped by the following contributions:
- [Pytorch summary](https://github.com/sksq96/pytorch-summary): existing PyTorch porting of `tf.keras.Model.summary`
- [Torchstat](https://github.com/Swall0w/torchstat): another module inspection tool
- [Flops counter Pytorch](https://github.com/sovrasov/flops-counter.pytorch): operation counter tool
- [THOP](https://github.com/Lyken17/pytorch-OpCounter): PyTorch Op counter
- Number of operations and memory estimation articles by [Matthijs Hollemans](https://machinethink.net/blog/how-fast-is-my-model/), and [Sicara](https://www.sicara.ai/blog/2019-28-10-deep-learning-memory-usage-and-pytorch-optimization-tricks)
- [Pruning Convolutional Neural Networks for Resource Efficient Inference](https://arxiv.org/abs/1611.06440)## Citation
If you wish to cite this project, feel free to use this [BibTeX](http://www.bibtex.org/) reference:
```bibtex
@misc{torchscan2020,
title={Torchscan: meaningful module insights},
author={François-Guillaume Fernandez},
year={2020},
month={March},
publisher = {GitHub},
howpublished = {\url{https://github.com/frgfm/torch-scan}}
}
```## Contributing
Any sort of contribution is greatly appreciated!
You can find a short guide in [`CONTRIBUTING`](CONTRIBUTING.md) to help grow this project!
## License
Distributed under the Apache 2.0 License. See [`LICENSE`](LICENSE) for more information.