Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/roatienza/benchmark
Utilities to perform deep learning models benchmarking (number of parameters, FLOPS and inference latency)
https://github.com/roatienza/benchmark
deep-learning flops latency model parameters
Last synced: 6 days ago
JSON representation
Utilities to perform deep learning models benchmarking (number of parameters, FLOPS and inference latency)
- Host: GitHub
- URL: https://github.com/roatienza/benchmark
- Owner: roatienza
- License: apache-2.0
- Created: 2022-08-03T01:05:47.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2022-08-07T05:40:29.000Z (over 2 years ago)
- Last Synced: 2023-03-08T14:41:08.267Z (over 1 year ago)
- Topics: deep-learning, flops, latency, model, parameters
- Language: Python
- Homepage:
- Size: 90.8 KB
- Stars: 3
- Watchers: 1
- Forks: 3
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# `benchmark`
Utilities to perform deep learning models benchmarking.Model inference efficiency is a big concern in deploying deep learning models. Efficiency is quantified as the Pareto-optimality of the target metric (eg accuracy) and model number of parameters, computational complexity like FLOPS and latency. `benchmark` is a tool to compute parameters, FLOPS and latency. The sample usage below shows how to determine the number of parameters and FLOPS. Also indicated are the different latency improvements as a function of accelerator and model format. The fastest is when both ONNX and TensorRT are utilized.
## FLOPS, Parameters and Latency of ResNet18
Experiment performed on GPU: Quadro RTX 6000 24GB, CPU: AMD Ryzen Threadripper 3970X 32-Core Processor. Assuming 1k classes, `224x224x3` image and batch size of `1`.
```
FLOPS: 1,819,065,856
Parameters: 11,689,512
```| **Accelerator** | **Latency (usec)** | Speed up (x) |
| :--- | ---: | --: |
| CPU | 8,550 | 1 |
| CPU + ONNX | 3,830 | 2.7 |
| GPU | 1,982 | 5.4 |
| GPU + ONNX | 1,218 | 8.8 |
| GPU + ONNX + TensorRT | 917 | 11.7 |## Install requirements
```
pip3 install -r requirements.txt
```Additional packages.
- CUDA:
Remove the old.
```
conda uninstall cudatoolkit
```
Update to the new cudnn
```
conda install cudnn
```- [TensorRT](https://docs.nvidia.com/deeplearning/tensorrt/install-guide/index.html#installing-pip)
```
python3 -m pip install --upgrade setuptools pip
python3 -m pip install nvidia-pyindex
python3 -m pip install --upgrade nvidia-tensorrt
```- (Optional) Torch-tensort
```
pip3 install torch-tensorrt -f https://github.com/NVIDIA/Torch-TensorRT/releases
```
Warning: need super user access
```
sudo apt install python3-libnvinfer-dev python3-libnvinfer
```## Sample benchmarking of `resnet18`
- GPU + ONNX + TensorRT
```
python3 benchmark.py --model resnet18 --onnx --tensorrt
```- GPU + ONNX
```
python3 benchmark.py --model resnet18 --onnx
```- GPU
```
python3 benchmark.py --model resnet18
```- CPU
```
python3 benchmark.py --model resnet18 --device cpu
```- CPU + ONNX
```
python3 benchmark.py --model resnet18 --device cpu --onnx
```## Compute model accuracy on ImageNet1k
Assuming imagenet dataset folder is `/data/imagenet`. Else modify the location using `--imagenet` option.```
python3 benchmark.py --model resnet18 --compute-accuracy
```## List all supported models
All `torchvision.models` and `timm` models will be listed:```
python3 benchmark.py --list-models
```## Find a specific model
```
python3 benchmark.py --find-model xcit_tiny_24_p16_224
```## Other models
- Latency in usec| **Accelerator** | **R50** | **MV2** | **MV3** | **SV2** | **Sq** | **SwV2** | **De** | **Ef0** | **CNext** | **RN4X** | **RN64X** |
| :--- | ---: | --: | ---: | --: | ---: | --: | --: | --: | --: | --: | --: |
| CPU | 29,840 | 11,870 | 6,498 | 6,607 | 8,717 | 52,120 | 14,952 | 14,089 | 33,182 | 11,068 | 41,301 |
| CPU + ONNX | 10,666 | 2,564 | 4,484 | 2,479 | 3,136 | 50,094 | 10,484 | 8,356 | 28,055 | 1,990 | 14,358 |
| GPU | 1,982 | 4,781 | 3,689 | 4,135 | 1,741 | 6,963 | 3,526 | 5,817| 3,588 | 5,886 | 6,050 |
| GPU + ONNX | 2,715 | 1,107 | 1,128 | 1,392 | 851 | 3,731 | 1,650 | 2,175 | 2,789 | 1,525| 3,280 |
| GPU + ONNX + TensorRT | 1,881 | 670 | 570 | 404 | 443 | 3,327 | 1,170 | 1,250 | 2,630 | 1,137| 2,283 |R50 - `resnet50`, MV2 - `mobilenet_v2`, MV3 - `mobilenet_v3_small`, SV2 - `shufflenet_v2_x0_5`, Sq - `squeezenet1_0`, SwV2 - `swinv2_cr_tiny_ns_224`, De - `deit_tiny_patch16_224`, Ef0 - `efficientnet_b0` , CNext - `convnext_tiny`, RN4X - `regnetx_004` , RN64X - `regnetx_064`
- Parameters and FLOPS
| **Model** | **Parameters (M)** | **GFLOPS** | **Top1 (%)** | **Top5 (%)** |
| :--- | ---: | --: | --: | --: |
| `resnet18` | 11.7 | 1.8 | 69.76 | 89.08 |
| `resnet50` | 25.6 | 4.1 | 80.11 | 94.49 |
| `mobilenet_v2` | 3.5 | 0.3 | 71.87 | 90.29 |
| `mobilenet_v3_small` | 2.5 | 0.06 | 67.67 | 87.41 |
| `shufflenet_v2_x0_5` | 1.4 | 0.04 | 60.55 | 81.74 |
| `squeezenet1_0` | 1.2 | 0.8 | 58.10 | 80.42 |
| `swinv2_cr_tiny_ns_224` | 28.3 | 4.7 | 81.54 | 95.77 |
| `deit_tiny_patch16_224` | 5.7 | 1.3 | 72.02 | 91.10 |
| `efficientnet_b0` | 5.3 | 0.4 | 77.67 | 93.58 |
| `convnext_tiny` | 28.6 | 4.5 | 82.13 | 95.95 |
| `regnetx_004` | 5.2 | 0.4 | 72.30 | 90.59 |
| `regnetx_064` | 26.2 | 6.5 | 78.90 | 94.44 |