{"id":18653306,"url":"https://github.com/roatienza/benchmark","last_synced_at":"2025-04-11T16:32:24.732Z","repository":{"id":52752757,"uuid":"520695221","full_name":"roatienza/benchmark","owner":"roatienza","description":"Utilities to perform deep learning models benchmarking (number of parameters, FLOPS and inference latency)","archived":false,"fork":false,"pushed_at":"2022-08-07T05:40:29.000Z","size":93,"stargazers_count":3,"open_issues_count":0,"forks_count":3,"subscribers_count":1,"default_branch":"main","last_synced_at":"2023-03-08T14:41:08.267Z","etag":null,"topics":["deep-learning","flops","latency","model","parameters"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/roatienza.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2022-08-03T01:05:47.000Z","updated_at":"2023-03-07T11:53:12.000Z","dependencies_parsed_at":"2022-08-22T01:40:17.970Z","dependency_job_id":null,"html_url":"https://github.com/roatienza/benchmark","commit_stats":null,"previous_names":[],"tags_count":null,"template":null,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/roatienza%2Fbenchmark","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/roatienza%2Fbenchmark/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/roatienza%2Fbenchmark/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/roatienza%2Fbenchmark/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/roatienza","download_url":"https://codeload.github.com/roatienza/benchmark/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":223472762,"owners_count":17150745,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deep-learning","flops","latency","model","parameters"],"created_at":"2024-11-07T07:11:05.267Z","updated_at":"2024-11-07T07:11:06.147Z","avatar_url":"https://github.com/roatienza.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# `benchmark`\nUtilities to perform deep learning models benchmarking.\n\nModel inference efficiency is a big concern in deploying deep learning models. Efficiency is quantified as the Pareto-optimality of the target metric (eg accuracy) and model number of parameters, computational complexity like FLOPS and latency. `benchmark` is a tool to compute parameters, FLOPS and latency. The sample usage below shows how to determine the number of parameters and FLOPS. Also indicated are the different latency improvements as a function of accelerator and model format. The fastest is when both ONNX and TensorRT are utilized.\n\n## FLOPS, Parameters and Latency of ResNet18\n\nExperiment performed on GPU: Quadro RTX 6000 24GB, CPU: AMD Ryzen Threadripper 3970X 32-Core Processor. Assuming 1k classes, `224x224x3` image and batch size of `1`.\n```\nFLOPS: 1,819,065,856\nParameters: 11,689,512\n```\n\n| **Accelerator** | **Latency (usec)** | Speed up (x) |\n| :--- | ---: | --: |\n| CPU | 8,550 | 1 |\n| CPU + ONNX | 3,830 | 2.7 |\n| GPU | 1,982 | 5.4 |\n| GPU + ONNX | 1,218 | 8.8 |\n| GPU + ONNX + TensorRT | 917 | 11.7 |\n\n\n## Install requirements\n```\npip3 install -r requirements.txt\n```\n\nAdditional packages.\n\n- CUDA:\nRemove the old.\n```\nconda uninstall cudatoolkit\n```\nUpdate to the new cudnn\n```\nconda install cudnn\n```\n\n- [TensorRT](https://docs.nvidia.com/deeplearning/tensorrt/install-guide/index.html#installing-pip)\n```\npython3 -m pip install --upgrade setuptools pip\npython3 -m pip install nvidia-pyindex\npython3 -m pip install --upgrade nvidia-tensorrt\n```\n\n- (Optional) Torch-tensort\n```\npip3 install torch-tensorrt -f https://github.com/NVIDIA/Torch-TensorRT/releases\n```\nWarning: need super user access\n```\nsudo apt install python3-libnvinfer-dev python3-libnvinfer \n```\n\n## Sample benchmarking of `resnet18`\n\n- GPU + ONNX + TensorRT\n```\npython3 benchmark.py --model resnet18 --onnx --tensorrt\n```\n\n- GPU + ONNX\n```\npython3 benchmark.py --model resnet18 --onnx\n```\n\n- GPU \n```\npython3 benchmark.py --model resnet18 \n```\n\n- CPU \n```\npython3 benchmark.py --model resnet18  --device cpu\n```\n\n- CPU + ONNX\n```\npython3 benchmark.py --model resnet18 --device cpu --onnx\n```\n\n## Compute model accuracy on ImageNet1k\nAssuming imagenet dataset folder is `/data/imagenet`. Else modify the location using `--imagenet` option.\n\n```\npython3 benchmark.py --model resnet18 --compute-accuracy\n```\n\n## List all supported models\nAll `torchvision.models` and `timm` models will be listed:\n\n```\npython3 benchmark.py --list-models\n```\n\n## Find a specific model\n\n```\npython3 benchmark.py --find-model xcit_tiny_24_p16_224\n```\n\n## Other models \n- Latency in usec\n\n| **Accelerator** | **R50** | **MV2** | **MV3** | **SV2** | **Sq** | **SwV2** | **De** | **Ef0** | **CNext** | **RN4X** | **RN64X** |\n| :--- | ---: | --: | ---: | --: | ---: | --: | --: | --: | --: | --: | --: |\n| CPU | 29,840 | 11,870 | 6,498 | 6,607 | 8,717 | 52,120 | 14,952 | 14,089 | 33,182 | 11,068 | 41,301 | \n| CPU + ONNX | 10,666 | 2,564 | 4,484 | 2,479 | 3,136 | 50,094  | 10,484 | 8,356 | 28,055 | 1,990 | 14,358 |\n| GPU | 1,982 | 4,781 | 3,689 |  4,135 | 1,741 | 6,963 | 3,526 | 5,817| 3,588 | 5,886 | 6,050 |\n| GPU + ONNX | 2,715 | 1,107 | 1,128 | 1,392 | 851 | 3,731 | 1,650 | 2,175 | 2,789 | 1,525| 3,280 |\n| GPU + ONNX + TensorRT | 1,881 | 670 | 570 | 404 | 443 | 3,327 | 1,170 | 1,250 | 2,630 | 1,137| 2,283 |\n\nR50 - `resnet50`, MV2 - `mobilenet_v2`, MV3 - `mobilenet_v3_small`, SV2 - `shufflenet_v2_x0_5`, Sq - `squeezenet1_0`, SwV2 - `swinv2_cr_tiny_ns_224`, De - `deit_tiny_patch16_224`, Ef0 - `efficientnet_b0` , CNext - `convnext_tiny`, RN4X - `regnetx_004` , RN64X - `regnetx_064`\n\n- Parameters and FLOPS\n\n| **Model** | **Parameters (M)** | **GFLOPS** | **Top1 (%)** | **Top5 (%)** |\n| :--- | ---: | --: | --: |  --: |\n| `resnet18` | 11.7 | 1.8 | 69.76 | 89.08 | \n| `resnet50` | 25.6 | 4.1 | 80.11 | 94.49 | \n| `mobilenet_v2` | 3.5 | 0.3 | 71.87 | 90.29  |\n| `mobilenet_v3_small` | 2.5 | 0.06 | 67.67 | 87.41 |\n| `shufflenet_v2_x0_5` | 1.4 | 0.04 | 60.55 | 81.74 |\n| `squeezenet1_0` | 1.2 | 0.8 | 58.10  | 80.42 |\n| `swinv2_cr_tiny_ns_224` | 28.3 | 4.7 | 81.54 | 95.77 |\n| `deit_tiny_patch16_224` | 5.7 | 1.3  |  72.02 | 91.10 |\n| `efficientnet_b0` | 5.3 | 0.4 | 77.67 |  93.58 |\n| `convnext_tiny` | 28.6 | 4.5 | 82.13 | 95.95 |\n| `regnetx_004` | 5.2 | 0.4 | 72.30 | 90.59 |\n| `regnetx_064` | 26.2 | 6.5 | 78.90 | 94.44 |\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Froatienza%2Fbenchmark","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Froatienza%2Fbenchmark","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Froatienza%2Fbenchmark/lists"}