{"id":13441622,"url":"https://github.com/DerryHub/BEVFormer_tensorrt","last_synced_at":"2025-03-20T12:31:55.909Z","repository":{"id":64712810,"uuid":"572373070","full_name":"DerryHub/BEVFormer_tensorrt","owner":"DerryHub","description":"BEVFormer inference on TensorRT, including INT8 Quantization and Custom TensorRT Plugins (float/half/half2/int8).","archived":false,"fork":false,"pushed_at":"2023-11-20T08:20:32.000Z","size":413,"stargazers_count":424,"open_issues_count":43,"forks_count":69,"subscribers_count":5,"default_branch":"main","last_synced_at":"2024-10-28T04:18:50.606Z","etag":null,"topics":["bevformer","cuda","int8-inference","pytorch","quantization","tensorrt-plugins"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/DerryHub.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2022-11-30T06:01:45.000Z","updated_at":"2024-10-21T07:13:27.000Z","dependencies_parsed_at":"2024-01-13T22:23:31.975Z","dependency_job_id":"aa995d31-ee39-473e-a724-c0758078b838","html_url":"https://github.com/DerryHub/BEVFormer_tensorrt","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DerryHub%2FBEVFormer_tensorrt","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DerryHub%2FBEVFormer_tensorrt/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DerryHub%2FBEVFormer_tensorrt/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DerryHub%2FBEVFormer_tensorrt/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/DerryHub","download_url":"https://codeload.github.com/DerryHub/BEVFormer_tensorrt/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244611310,"owners_count":20481171,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bevformer","cuda","int8-inference","pytorch","quantization","tensorrt-plugins"],"created_at":"2024-07-31T03:01:36.181Z","updated_at":"2025-03-20T12:31:55.902Z","avatar_url":"https://github.com/DerryHub.png","language":"Python","funding_links":[],"categories":["Python","Engineering"],"sub_categories":["Workshop"],"readme":"# Deployment of BEV 3D Detection on TensorRT\n\nThis repository is a deployment project of BEV 3D Detection (including [BEVFormer](https://github.com/fundamentalvision/BEVFormer), [BEVDet](https://github.com/HuangJunJie2017/BEVDet)) on [TensorRT](https://developer.nvidia.com/tensorrt), supporting **FP32/FP16/INT8** inference. Meanwhile, in order to improve the inference speed of BEVFormer on TensorRT, this project implements some TensorRT Ops that support [**nv_half**](https://docs.nvidia.com/cuda/cuda-math-api/group__CUDA__MATH____HALF__ARITHMETIC.html#group__CUDA__MATH____HALF__ARITHMETIC),  [**nv_half2**](https://docs.nvidia.com/cuda/cuda-math-api/group__CUDA__MATH____HALF2__ARITHMETIC.html#group__CUDA__MATH____HALF2__ARITHMETIC) and **INT8**. With the accuracy almost unaffected, the inference speed of the **BEVFormer base** can be increased by more than **four times**, the engine size can be reduced by more than **90%**, and the GPU memory usage can be saved by more than **80%**. In addition, the project also supports common 2D object detection models in [MMDetection](https://github.com/open-mmlab/mmdetection), which support **INT8 Quantization** and **TensorRT Deployment** with a small number of code changes.\n\n## Benchmarks\n\n### BEVFormer\n\n\n#### BEVFormer PyTorch\n\n|                            Model                             |   Data   | Batch Size |          NDS/mAP          | FPS  | Size (MB) | Memory (MB) |  Device  |\n| :----------------------------------------------------------: | :------: | :--------: | :-----------------------: | :--: | :-------: | :---------: | :------: |\n| BEVFormer tiny\u003cbr /\u003e[download](https://drive.google.com/file/d/1VJ_WgA9UZZ-DJr5kUFYAFfIC0UF1oCeQ/view?usp=share_link) | NuScenes |     1      | NDS: 0.354\u003cbr/\u003emAP: 0.252 | 15.9 |    383    |    2167     | RTX 3090 |\n| BEVFormer small\u003cbr /\u003e[download](https://drive.google.com/file/d/1n5_Ca2bqkCY3Q19M5U7futKUT1pEyQZ7/view?usp=sharing) | NuScenes |     1      | NDS: 0.478\u003cbr/\u003emAP: 0.370 | 5.1  |    680    |    3147     | RTX 3090 |\n| BEVFormer base\u003cbr /\u003e[download](https://drive.google.com/file/d/1UMN35jPHeJbVK-8P4HhxIZl341KS1bDU/view?usp=share_link) | NuScenes |     1      | NDS: 0.517\u003cbr/\u003emAP: 0.416 | 2.4  |    265    |    5435     | RTX 3090 |\n\n#### BEVFormer TensorRT with MMDeploy Plugins (Only Support FP32)\n\n|         Model         |   Data   | Batch Size | Float/Int |     Quantization Method     |          NDS/mAP          |     FPS      |  Size (MB)  |  Memory (MB)  |  Device  |\n| :-------------------: | :------: | :--------: | :-------: | :-------------------------: | :-----------------------: | :----------: | :---------: | :-----------: | :------: |\n|    BEVFormer tiny     | NuScenes |     1      |   FP32    |              -              | NDS: 0.354\u003cbr/\u003emAP: 0.252 |  37.9 (x1)   |  136 (x1)   |   2159 (x1)   | RTX 3090 |\n|    BEVFormer tiny     | NuScenes |     1      |   FP16    |              -              | NDS: 0.354\u003cbr/\u003emAP: 0.252 | 69.2 (x1.83) | 74 (x0.54)  | 1729 (x0.80)  | RTX 3090 |\n|    BEVFormer tiny     | NuScenes |     1      | FP32/INT8 | PTQ entropy\u003cbr /\u003eper-tensor | NDS: 0.353\u003cbr/\u003emAP: 0.249 | 65.1 (x1.72) | 58 (x0.43)  | 1737 (x0.80)  | RTX 3090 |\n|    BEVFormer tiny     | NuScenes |     1      | FP16/INT8 | PTQ entropy\u003cbr /\u003eper-tensor | NDS: 0.353\u003cbr/\u003emAP: 0.249 | 70.7 (x1.87) | 54 (x0.40)  | 1665 (x0.77)  | RTX 3090 |\n|    BEVFormer small    | NuScenes |     1      |   FP32    |              -              | NDS: 0.478\u003cbr/\u003emAP: 0.370 |   6.6 (x1)   |  245 (x1)   |   4663 (x1)   | RTX 3090 |\n|    BEVFormer small    | NuScenes |     1      |   FP16    |              -              | NDS: 0.478\u003cbr/\u003emAP: 0.370 | 12.8 (x1.94) | 126 (x0.51) | 3719 (x0.80)  | RTX 3090 |\n|    BEVFormer small    | NuScenes |     1      | FP32/INT8 | PTQ entropy\u003cbr /\u003eper-tensor | NDS: 0.476\u003cbr/\u003emAP: 0.367 | 8.7 (x1.32)  | 158 (x0.64) | 4079 (x0.87)  | RTX 3090 |\n|    BEVFormer small    | NuScenes |     1      | FP16/INT8 | PTQ entropy\u003cbr /\u003eper-tensor | NDS: 0.477\u003cbr/\u003emAP: 0.368 | 13.3 (x2.02) | 106 (x0.43) | 3441 (x0.74)  | RTX 3090 |\n| BEVFormer base **\\*** | NuScenes |     1      |   FP32    |              -              | NDS: 0.517\u003cbr/\u003emAP: 0.416 |   1.5 (x1)   |  1689 (x1)  |  13893 (x1)   | RTX 3090 |\n|    BEVFormer base     | NuScenes |     1      |   FP16    |              -              | NDS: 0.517\u003cbr/\u003emAP: 0.416 | 1.8 (x1.20)  | 849 (x0.50) | 11865 (x0.85) | RTX 3090 |\n| BEVFormer base **\\*** | NuScenes |     1      | FP32/INT8 | PTQ entropy\u003cbr /\u003eper-tensor | NDS: 0.516\u003cbr/\u003emAP: 0.414 | 1.8 (x1.20)  | 426 (x0.25) | 12429 (x0.89) | RTX 3090 |\n| BEVFormer base **\\*** | NuScenes |     1      | FP16/INT8 | PTQ entropy\u003cbr /\u003eper-tensor | NDS: 0.515\u003cbr/\u003emAP: 0.414 | 2.2 (x1.47)  | 244 (x0.14) | 11011 (x0.79) | RTX 3090 |\n\n**\\*** `Out of Memory` when onnx2trt with TensorRT-8.5.1.7, but they convert successfully with TensorRT-8.4.3.1. So the version of these engines is TensorRT-8.4.3.1.\n\n#### BEVFormer TensorRT with Custom Plugins (Support nv_half, nv_half2 and int8)\n\n**FP16 Plugins with nv_half**\n\n|      Model      |   Data   | Batch Size | Float/Int |     Quantization Method     |          NDS/mAP          |  FPS/Improve  |  Size (MB)  | Memory (MB)  |  Device  |\n| :-------------: | :------: | :--------: | :-------: | :-------------------------: | :-----------------------: | :-----------: | :---------: | :----------: | :------: |\n| BEVFormer tiny  | NuScenes |     1      |   FP32    |              -              | NDS: 0.354\u003cbr/\u003emAP: 0.252 | 40.0 (x1.06)  | 135 (x0.99) | 1693 (x0.78) | RTX 3090 |\n| BEVFormer tiny  | NuScenes |     1      |   FP16    |              -              | NDS: 0.355\u003cbr/\u003emAP: 0.252 | 81.2 (x2.14)  | 70 (x0.51)  | 1203 (x0.56) | RTX 3090 |\n| BEVFormer tiny  | NuScenes |     1      | FP32/INT8 | PTQ entropy\u003cbr /\u003eper-tensor | NDS: 0.351\u003cbr/\u003emAP: 0.249 | 90.1 (x2.38)  | 58 (x0.43)  | 1105 (x0.51) | RTX 3090 |\n| BEVFormer tiny  | NuScenes |     1      | FP16/INT8 | PTQ entropy\u003cbr /\u003eper-tensor | NDS: 0.351\u003cbr/\u003emAP: 0.249 | 107.4 (x2.83) | 52 (x0.38)  | 1095 (x0.51) | RTX 3090 |\n| BEVFormer small | NuScenes |     1      |   FP32    |              -              | NDS: 0.478\u003cbr/\u003emAP: 0.37  |  7.4 (x1.12)  | 250 (x1.02) | 2585 (x0.55) | RTX 3090 |\n| BEVFormer small | NuScenes |     1      |   FP16    |              -              | NDS: 0.479\u003cbr/\u003emAP: 0.37  | 15.8 (x2.40)  | 127 (x0.52) | 1729 (x0.37) | RTX 3090 |\n| BEVFormer small | NuScenes |     1      | FP32/INT8 | PTQ entropy\u003cbr /\u003eper-tensor | NDS: 0.477\u003cbr/\u003emAP: 0.367 | 17.9 (x2.71)  | 166 (x0.68) | 1637 (x0.35) | RTX 3090 |\n| BEVFormer small | NuScenes |     1      | FP16/INT8 | PTQ entropy\u003cbr /\u003eper-tensor | NDS: 0.476\u003cbr/\u003emAP: 0.366 | 20.4 (x3.10)  | 108 (x0.44) | 1467 (x0.31) | RTX 3090 |\n| BEVFormer base  | NuScenes |     1      |   FP32    |              -              | NDS: 0.517\u003cbr/\u003emAP: 0.416 |  3.0 (x2.00)  | 292 (x0.17) | 5715 (x0.41) | RTX 3090 |\n| BEVFormer base  | NuScenes |     1      |   FP16    |              -              | NDS: 0.517\u003cbr/\u003emAP: 0.416 |  4.9 (x3.27)  | 148 (x0.09) | 3417 (x0.25) | RTX 3090 |\n| BEVFormer base  | NuScenes |     1      | FP32/INT8 | PTQ entropy\u003cbr /\u003eper-tensor | NDS: 0.515\u003cbr/\u003emAP: 0.414 |  6.9 (x4.60)  | 202 (x0.12) | 3307 (x0.24) | RTX 3090 |\n| BEVFormer base  | NuScenes |     1      | FP16/INT8 | PTQ entropy\u003cbr /\u003eper-tensor | NDS: 0.514\u003cbr/\u003emAP: 0.413 |  8.0 (x5.33)  | 131 (x0.08) | 2429 (x0.17) | RTX 3090 |\n\n**FP16 Plugins with nv_half2**\n\n|      Model      |   Data   | Batch Size | Float/Int |     Quantization Method     |          NDS/mAP          |      FPS      |  Size (MB)  | Memory (MB)  |  Device  |\n| :-------------: | :------: | :--------: | :-------: | :-------------------------: | :-----------------------: | :-----------: | :---------: | :----------: | :------: |\n| BEVFormer tiny  | NuScenes |     1      |   FP16    |              -              | NDS: 0.355\u003cbr/\u003emAP: 0.251 | 84.2 (x2.22)  | 72 (x0.53)  | 1205 (x0.56) | RTX 3090 |\n| BEVFormer tiny  | NuScenes |     1      | FP16/INT8 | PTQ entropy\u003cbr /\u003eper-tensor | NDS: 0.354\u003cbr/\u003emAP: 0.250 | 108.3 (x2.86) | 52 (x0.38)  | 1093 (x0.51) | RTX 3090 |\n| BEVFormer small | NuScenes |     1      |   FP16    |              -              | NDS: 0.479\u003cbr/\u003emAP: 0.371 | 18.6 (x2.82)  | 124 (x0.51) | 1725 (x0.37) | RTX 3090 |\n| BEVFormer small | NuScenes |     1      | FP16/INT8 | PTQ entropy\u003cbr /\u003eper-tensor | NDS: 0.477\u003cbr/\u003emAP: 0.368 | 22.9 (x3.47)  | 110 (x0.45) | 1487 (x0.32) | RTX 3090 |\n| BEVFormer base  | NuScenes |     1      |   FP16    |              -              | NDS: 0.517\u003cbr/\u003emAP: 0.416 |  6.6 (x4.40)  | 146 (x0.09) | 3415 (x0.25) | RTX 3090 |\n| BEVFormer base  | NuScenes |     1      | FP16/INT8 | PTQ entropy\u003cbr /\u003eper-tensor | NDS: 0.516\u003cbr/\u003emAP: 0.415 |  8.6 (x5.73)  | 159 (x0.09) | 2479 (x0.18) | RTX 3090 |\n\n### BEVDet\n\n#### BEVDet PyTorch\n\n|      Model      |   Data   | Batch Size |         NDS/mAP          | FPS  | Size (MB) | Memory (MB) |   Device   |\n| :-------------: | :------: | :--------: | :----------------------: | :--: | :-------: | :---------: | :--------: |\n| BEVDet R50 CBGS | NuScenes |     1      | NDS: 0.38\u003cbr/\u003emAP: 0.298 | 29.0 |    170    |    1858     | RTX 2080Ti |\n\n#### BEVDet TensorRT\n\n**with Custom Plugin bev_pool_v2 (Support nv_half, nv_half2 and int8), modified from [Official BEVDet](https://github.com/HuangJunJie2017/BEVDet)**\n\n|      Model      |   Data   | Batch Size | Float/Int |     Quantization Method     |          NDS/mAP          |  FPS  | Size (MB) | Memory (MB) |   Device   |\n| :-------------: | :------: | :--------: | :-------: | :-------------------------: | :-----------------------: | :---: | :-------: | :---------: | :--------: |\n| BEVDet R50 CBGS | NuScenes |     1      |   FP32    |              -              | NDS: 0.38\u003cbr/\u003emAP: 0.298  | 44.6  |    245    |    1032     | RTX 2080Ti |\n| BEVDet R50 CBGS | NuScenes |     1      |   FP16    |              -              | NDS: 0.38\u003cbr/\u003emAP: 0.298  | 135.1 |    86     |     790     | RTX 2080Ti |\n| BEVDet R50 CBGS | NuScenes |     1      | FP32/INT8 | PTQ entropy\u003cbr /\u003eper-tensor | NDS: 0.355\u003cbr/\u003emAP: 0.274 | 234.7 |    44     |     706     | RTX 2080Ti |\n| BEVDet R50 CBGS | NuScenes |     1      | FP16/INT8 | PTQ entropy\u003cbr /\u003eper-tensor | NDS: 0.357\u003cbr/\u003emAP: 0.277 | 236.4 |    44     |     706     | RTX 2080Ti |\n\n### 2D Detection Models\n\nThis project also supports common 2D object detection models in MMDetection with little modification. The following are deployment examples of YOLOx and CenterNet.\n\n#### YOLOx\n\n|                            Model                             | Data | Framework | Batch Size | Float/Int |     Quantization Method     |    mAP     |       FPS       |  Size (MB)  | Memory (MB)  |  Device  |\n| :----------------------------------------------------------: | :--: | :-------: | :--------: | :-------: | :-------------------------: | :--------: | :-------------: | :---------: | :----------: | :------: |\n| YOLOx\u003cbr /\u003e[download](https://drive.google.com/file/d/10_mRoiLfK1JEIVq2uqtkBLHES9QgoFgm/view?usp=share_link) | COCO |  PyTorch  |     32     |   FP32    |              -              | mAP: 0.506 |      63.1       |     379     |     7617     | RTX 3090 |\n|                            YOLOx                             | COCO | TensorRT  |     32     |   FP32    |              -              | mAP: 0.506 |    71.3 (x1)    |  546 (x1)   |  9943 (x1)   | RTX 3090 |\n|                            YOLOx                             | COCO | TensorRT  |     32     |   FP16    |              -              | mAP: 0.506 | 296.8   (x4.16) | 192 (x0.35) | 4567 (x0.46) | RTX 3090 |\n|                            YOLOx                             | COCO | TensorRT  |     32     | FP32/INT8 | PTQ entropy\u003cbr /\u003eper-tensor | mAP: 0.488 |  556.4 (x7.80)  | 99 (x0.18)  | 5225 (x0.53) | RTX 3090 |\n|                            YOLOx                             | COCO | TensorRT  |     32     | FP16/INT8 | PTQ entropy\u003cbr /\u003eper-tensor | mAP: 0.479 |  550.6 (x7.72)  | 99 (x0.18)  | 5119 (x0.51) | RTX 3090 |\n\n#### CenterNet\n\n|                            Model                             | Data | Framework | Batch Size | Float/Int |     Quantization Method     |    mAP     |      FPS       | Size (MB)  | Memory (MB)  |  Device  |\n| :----------------------------------------------------------: | :--: | :-------: | :--------: | :-------: | :-------------------------: | :--------: | :------------: | :--------: | :----------: | :------: |\n| CenterNet\u003cbr /\u003e[download](https://drive.google.com/file/d/1uZVIpDQEWIgY5-1IivQte-9Kue6k71A9/view?usp=share_link) | COCO |  PyTorch  |     32     |   FP32    |              -              | mAP: 0.299 |     337.4      |     56     |     5171     | RTX 3090 |\n|                          CenterNet                           | COCO | TensorRT  |     32     |   FP32    |              -              | mAP: 0.299 |   475.6 (x1)   |  58 (x1)   |  8241 (x1)   | RTX 3090 |\n|                          CenterNet                           | COCO | TensorRT  |     32     |   FP16    |              -              | mAP: 0.297 | 1247.1 (x2.62) | 29 (x0.50) | 5183 (x0.63) | RTX 3090 |\n|                          CenterNet                           | COCO | TensorRT  |     32     | FP32/INT8 | PTQ entropy\u003cbr /\u003eper-tensor | mAP: 0.27  | 1534.0 (x3.22) | 20 (x0.34) | 6549 (x0.79) | RTX 3090 |\n|                          CenterNet                           | COCO | TensorRT  |     32     | FP16/INT8 | PTQ entropy\u003cbr /\u003eper-tensor | mAP: 0.285 | 1889.0 (x3.97) | 17 (x0.29) | 6453 (x0.78) | RTX 3090 |\n\n## Clone\n\n```shell\ngit clone git@github.com:DerryHub/BEVFormer_tensorrt.git\ncd BEVFormer_tensorrt\nPROJECT_DIR=$(pwd)\n```\n\n## Data Preparation\n\n### MS COCO (For 2D Detection)\n\nDownload the [COCO 2017](https://cocodataset.org/#download) datasets to `/path/to/coco` and unzip them.\n\n```shell\ncd ${PROJECT_DIR}/data\nln -s /path/to/coco coco\n```\n\n### NuScenes and CAN bus (For BEVFormer)\n\nDownload nuScenes V1.0 full dataset data and CAN bus expansion data [HERE](https://www.nuscenes.org/download) as `/path/to/nuscenes` and `/path/to/can_bus`.\n\nPrepare nuscenes data like [BEVFormer](https://github.com/fundamentalvision/BEVFormer/blob/master/docs/prepare_dataset.md).\n\n```shell\ncd ${PROJECT_DIR}/data\nln -s /path/to/nuscenes nuscenes\nln -s /path/to/can_bus can_bus\n\ncd ${PROJECT_DIR}\nsh samples/bevformer/create_data.sh\n```\n\n### Tree\n\n```shell\n${PROJECT_DIR}/data/.\n├── can_bus\n│   ├── scene-0001_meta.json\n│   ├── scene-0001_ms_imu.json\n│   ├── scene-0001_pose.json\n│   └── ...\n├── coco\n│   ├── annotations\n│   ├── test2017\n│   ├── train2017\n│   └── val2017\n└── nuscenes\n    ├── maps\n    ├── samples\n    ├── sweeps\n    └── v1.0-trainval\n```\n\n## Install\n\n### With Docker\n\n```shell\ncd ${PROJECT_DIR}\ndocker build -t trt85 -f docker/Dockerfile .\ndocker run -it --gpus all -v ${PROJECT_DIR}:/workspace/BEVFormer_tensorrt/ \\\n-v /path/to/can_bus:/workspace/BEVFormer_tensorrt/data/can_bus \\\n-v /path/to/coco:/workspace/BEVFormer_tensorrt/data/coco \\\n-v /path/to/nuscenes:/workspace/BEVFormer_tensorrt/data/nuscenes \\\n--shm-size 8G trt85 /bin/bash\n\n# in container\ncd /workspace/BEVFormer_tensorrt/TensorRT/build\ncmake .. -DCMAKE_TENSORRT_PATH=/usr\nmake -j$(nproc)\nmake install\ncd /workspace/BEVFormer_tensorrt/third_party/bev_mmdet3d\npython setup.py build develop --user\n```\n\n**NOTE:** You can download the **Docker Image** [HERE](https://pan.baidu.com/s/1dPR6kvgpUoKow51870KNug?pwd=6xkq).\n\n### From Source\n\n#### CUDA/cuDNN/TensorRT\n\nDownload and install the `CUDA-11.6/cuDNN-8.6.0/TensorRT-8.5.1.7` following [NVIDIA](https://www.nvidia.com/en-us/).\n\n#### PyTorch\n\nInstall PyTorch and TorchVision following the [official instructions](https://pytorch.org/get-started/locally/).\n\n```shell\npip install torch==1.12.1+cu116 torchvision==0.13.1+cu116 torchaudio==0.12.1+cu116 --extra-index-url https://download.pytorch.org/whl/cu116\n```\n\n#### MMCV-full\n\n```shell\ngit clone https://github.com/open-mmlab/mmcv.git\ncd mmcv\ngit checkout v1.5.0\npip install -r requirements/optional.txt\nMMCV_WITH_OPS=1 pip install -e .\n```\n\n#### MMDetection\n\n```shell\ngit clone https://github.com/open-mmlab/mmdetection.git\ncd mmdetection\ngit checkout v2.25.1\npip install -v -e .\n# \"-v\" means verbose, or more output\n# \"-e\" means installing a project in editable mode,\n# thus any local modifications made to the code will take effect without reinstallation.\n```\n\n#### MMDeploy\n\n```shell\ngit clone git@github.com:open-mmlab/mmdeploy.git\ncd mmdeploy\ngit checkout v0.10.0\n\ngit clone git@github.com:NVIDIA/cub.git third_party/cub\ncd third_party/cub\ngit checkout c3cceac115\n\n# go back to third_party directory and git clone pybind11\ncd ..\ngit clone git@github.com:pybind/pybind11.git pybind11\ncd pybind11\ngit checkout 70a58c5\n```\n\n##### Build TensorRT Plugins of MMDeploy\n\n**Make sure cmake version \u003e= 3.14.0 and gcc version \u003e= 7.**\n\n```shell\nexport MMDEPLOY_DIR=/the/root/path/of/MMDeploy\nexport TENSORRT_DIR=/the/path/of/tensorrt\nexport CUDNN_DIR=/the/path/of/cuda\n\nexport LD_LIBRARY_PATH=$TENSORRT_DIR/lib:$LD_LIBRARY_PATH\nexport LD_LIBRARY_PATH=$CUDNN_DIR/lib64:$LD_LIBRARY_PATH\n\ncd ${MMDEPLOY_DIR}\nmkdir -p build\ncd build\ncmake -DCMAKE_CXX_COMPILER=g++-7 -DMMDEPLOY_TARGET_BACKENDS=trt -DTENSORRT_DIR=${TENSORRT_DIR} -DCUDNN_DIR=${CUDNN_DIR} ..\nmake -j$(nproc) \nmake install\n```\n\n##### Install MMDeploy\n\n```shell\ncd ${MMDEPLOY_DIR}\npip install -v -e .\n# \"-v\" means verbose, or more output\n# \"-e\" means installing a project in editable mode,\n# thus any local modifications made to the code will take effect without reinstallation.\n```\n\n#### Install this Project\n\n```shell\ncd ${PROJECT_DIR}\npip install -r requirements.txt\n```\n\n##### Build and Install Custom TensorRT Plugins\n\n**NOTE: CUDA\u003e=11.4, SM version\u003e=7.5**\n\n```shell\ncd ${PROJECT_DIR}/TensorRT/build\ncmake .. -DCMAKE_TENSORRT_PATH=/path/to/TensorRT\nmake -j$(nproc)\nmake install\n```\n\n**Run Unit Test of  Custom TensorRT Plugins**\n\n```shell\ncd ${PROJECT_DIR}\nsh samples/test_trt_ops.sh\n```\n\n##### Build and Install Part of Ops in MMDetection3D\n\n```shell\ncd ${PROJECT_DIR}/third_party/bev_mmdet3d\npython setup.py build develop\n```\n\n#### Prepare the Checkpoints\n\nDownload above PyTorch checkpoints to `${PROJECT_DIR}/checkpoints/pytorch/`. The ONNX files and TensorRT engines will be saved in `${PROJECT_DIR}/checkpoints/onnx/` and `${PROJECT_DIR}/checkpoints/tensorrt/`.\n\n## Custom TensorRT Plugins\n\nSupport Common TensorRT Ops in BEVFormer:\n\n* Grid Sampler\n* Multi-scale Deformable Attention\n* Modulated Deformable Conv2d\n* Rotate\n* Inverse\n* BEV Pool V2\n* Flash Multi-Head Attention\n\nEach operation is implemented as 2 versions: **FP32/FP16 (nv_half)/INT8** and **FP32/FP16 (nv_half2)/INT8**.\n\nFor specific speed comparison, see [**Custom TensorRT Plugins**](./TensorRT/).\n\n## Run\n\nThe following tutorial uses `BEVFormer base` as an example.\n\n* Evaluate with PyTorch\n\n```shell\ncd ${PROJECT_DIR}\n# defult gpu_id is 0\nsh samples/bevformer/base/pth_evaluate.sh -d ${gpu_id}\n```\n\n* Evaluate with TensorRT and MMDeploy Plugins\n\n```shell\n# convert .pth to .onnx\nsh samples/bevformer/base/pth2onnx.sh -d ${gpu_id}\n# convert .onnx to TensorRT engine (FP32)\nsh samples/bevformer/base/onnx2trt.sh -d ${gpu_id}\n# convert .onnx to TensorRT engine (FP16)\nsh samples/bevformer/base/onnx2trt_fp16.sh -d ${gpu_id}\n# evaluate with TensorRT engine (FP32)\nsh samples/bevformer/base/trt_evaluate.sh -d ${gpu_id}\n# evaluate with TensorRT engine (FP16)\nsh samples/bevformer/base/trt_evaluate_fp16.sh -d ${gpu_id}\n\n# Quantization\n# calibration and convert .onnx to TensorRT engine (FP32/INT8)\nsh samples/bevformer/base/onnx2trt_int8.sh -d ${gpu_id}\n# calibration and convert .onnx to TensorRT engine (FP16/INT8)\nsh samples/bevformer/base/onnx2trt_int8_fp16.sh -d ${gpu_id}\n# evaluate with TensorRT engine (FP32/INT8)\nsh samples/bevformer/base/trt_evaluate_int8.sh -d ${gpu_id}\n# evaluate with TensorRT engine (FP16/INT8)\nsh samples/bevformer/base/trt_evaluate_int8_fp16.sh -d ${gpu_id}\n\n# quantization aware train\n# defult gpu_ids is 0,1,2,3,4,5,6,7\nsh samples/bevformer/base/quant_aware_train.sh -d ${gpu_ids}\n# then following the post training quantization process\n```\n\n* Evaluate with TensorRT and Custom Plugins\n\n```shell\n# nv_half\n# convert .pth to .onnx\nsh samples/bevformer/plugin/base/pth2onnx.sh -d ${gpu_id}\n# convert .onnx to TensorRT engine (FP32)\nsh samples/bevformer/plugin/base/onnx2trt.sh -d ${gpu_id}\n# convert .onnx to TensorRT engine (FP16-nv_half)\nsh samples/bevformer/plugin/base/onnx2trt_fp16.sh -d ${gpu_id}\n# evaluate with TensorRT engine (FP32)\nsh samples/bevformer/plugin/base/trt_evaluate.sh -d ${gpu_id}\n# evaluate with TensorRT engine (FP16-nv_half)\nsh samples/bevformer/plugin/base/trt_evaluate_fp16.sh -d ${gpu_id}\n\n# nv_half2\n# convert .pth to .onnx\nsh samples/bevformer/plugin/base/pth2onnx_2.sh -d ${gpu_id}\n# convert .onnx to TensorRT engine (FP16-nv_half2)\nsh samples/bevformer/plugin/base/onnx2trt_fp16_2.sh -d ${gpu_id}\n# evaluate with TensorRT engine (FP16-nv_half2)\nsh samples/bevformer/plugin/base/trt_evaluate_fp16_2.sh -d ${gpu_id}\n\n# Quantization\n# nv_half\n# calibration and convert .onnx to TensorRT engine (FP32/INT8)\nsh samples/bevformer/plugin/base/onnx2trt_int8.sh -d ${gpu_id}\n# calibration and convert .onnx to TensorRT engine (FP16-nv_half/INT8)\nsh samples/bevformer/plugin/base/onnx2trt_int8_fp16.sh -d ${gpu_id}\n# evaluate with TensorRT engine (FP32/INT8)\nsh samples/bevformer/plugin/base/trt_evaluate_int8.sh -d ${gpu_id}\n# evaluate with TensorRT engine (FP16-nv_half/INT8)\nsh samples/bevformer/plugin/base/trt_evaluate_int8_fp16.sh -d ${gpu_id}\n\n# nv_half2\n# calibration and convert .onnx to TensorRT engine (FP16-nv_half2/INT8)\nsh samples/bevformer/plugin/base/onnx2trt_int8_fp16_2.sh -d ${gpu_id}\n# evaluate with TensorRT engine (FP16-nv_half2/INT8)\nsh samples/bevformer/plugin/base/trt_evaluate_int8_fp16_2.sh -d ${gpu_id}\n```\n\n## Acknowledgement\n\nThis project is mainly based on these excellent open source projects:\n\n* [BEVFormer](https://github.com/fundamentalvision/BEVFormer)\n* [BEVDet](https://github.com/HuangJunJie2017/BEVDet)\n* [PyTorch](https://github.com/pytorch/pytorch)\n* [MMCV](https://github.com/open-mmlab/mmcv)\n* [MMDetection](https://github.com/open-mmlab/mmdetection)\n* [MMDeploy](https://github.com/open-mmlab/mmdeploy)\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FDerryHub%2FBEVFormer_tensorrt","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FDerryHub%2FBEVFormer_tensorrt","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FDerryHub%2FBEVFormer_tensorrt/lists"}