{"id":13438374,"url":"https://github.com/wang-xinyu/tensorrtx","last_synced_at":"2025-05-14T00:05:03.589Z","repository":{"id":37305291,"uuid":"223904726","full_name":"wang-xinyu/tensorrtx","owner":"wang-xinyu","description":"Implementation of popular deep learning networks with TensorRT network definition API","archived":false,"fork":false,"pushed_at":"2025-05-03T04:21:57.000Z","size":2225,"stargazers_count":7326,"open_issues_count":5,"forks_count":1814,"subscribers_count":102,"default_branch":"master","last_synced_at":"2025-05-03T05:25:17.834Z","etag":null,"topics":["arcface","crnn","detr","mnasnet","mobilenetv2","mobilenetv3","resnet","retinaface","squeezenet","swin-transformer","tensorrt","yolo11","yolov3","yolov3-spp","yolov4","yolov5","yolov7","yolov8","yolov9"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/wang-xinyu.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2019-11-25T09:01:36.000Z","updated_at":"2025-05-02T11:48:56.000Z","dependencies_parsed_at":"2024-01-14T12:17:57.440Z","dependency_job_id":"36bab039-81f5-4289-a48f-b7e94bfd56de","html_url":"https://github.com/wang-xinyu/tensorrtx","commit_stats":null,"previous_names":[],"tags_count":9,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wang-xinyu%2Ftensorrtx","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wang-xinyu%2Ftensorrtx/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wang-xinyu%2Ftensorrtx/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wang-xinyu%2Ftensorrtx/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/wang-xinyu","download_url":"https://codeload.github.com/wang-xinyu/tensorrtx/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254043326,"owners_count":22004926,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["arcface","crnn","detr","mnasnet","mobilenetv2","mobilenetv3","resnet","retinaface","squeezenet","swin-transformer","tensorrt","yolo11","yolov3","yolov3-spp","yolov4","yolov5","yolov7","yolov8","yolov9"],"created_at":"2024-07-31T03:01:04.970Z","updated_at":"2025-05-14T00:05:03.582Z","avatar_url":"https://github.com/wang-xinyu.png","language":"C++","funding_links":[],"categories":["C++","Lighter and Deployment Frameworks","Applications","其他_机器学习与深度学习"],"sub_categories":[],"readme":"# TensorRTx\n\nTensorRTx aims to implement popular deep learning networks with TensorRT network definition API.\n\nWhy don't we use a parser (ONNX parser, UFF parser, caffe parser, etc), but use complex APIs to build a network from scratch? I have summarized the advantages in the following aspects.\n- **Flexible**, easy to modify the network, add/delete a layer or input/output tensor, replace a layer, merge layers, integrate preprocessing and postprocessing into network, etc.\n- **Debuggable**, construct the entire network in an incremental development manner, easy to get middle layer results.\n- **Educational**, learn about the network structure during this development, rather than treating everything as a black box.\n\nThe basic workflow of TensorRTx is:\n1. Get the trained models from pytorch, mxnet or tensorflow, etc. Some pytorch models can be found in my repo [pytorchx](https://github.com/wang-xinyu/pytorchx), the remaining are from popular open-source repos.\n2. Export the weights to a plain text file -- [.wts file](./tutorials/getting_started.md#the-wts-content-format).\n3. Load weights in TensorRT, define the network, build a TensorRT engine.\n4. Load the TensorRT engine and run inference.\n\n## News\n\n- `10 May 2025`. [pranavm-nvidia](https://github.com/pranavm-nvidia): [YOLO11](./yolo11_tripy) writen in [Tripy](https://github.com/NVIDIA/TensorRT-Incubator/tree/main/tripy).\n- `2 May 2025`. [fazligorkembal](https://github.com/fazligorkembal): YOLO12\n- `12 Apr 2025`. [pranavm-nvidia](https://github.com/pranavm-nvidia): First [Lenet](https://github.com/wang-xinyu/tensorrtx/tree/master/lenet#tripy-new-tensorrt-python-programming-model) example writen in [Tripy](https://github.com/NVIDIA/TensorRT-Incubator/tree/main/tripy).\n- `11 Apr 2025`. [mpj1234](https://github.com/mpj1234): [YOLO11-obb](https://github.com/wang-xinyu/tensorrtx/tree/master/yolo11)\n- `22 Oct 2024`. [lindsayshuo](https://github.com/lindsayshuo): YOLOv8-obb\n- `18 Oct 2024`. [zgjja](https://github.com/zgjja): Rafactor docker image.\n- `11 Oct 2024`. [mpj1234](https://github.com/mpj1234): YOLO11\n- `9 Oct 2024`. [Phoenix8215](https://github.com/Phoenix8215): GhostNet V1 and V2.\n- `21 Aug 2024`. [Lemonononon](https://github.com/Lemonononon): real-esrgan-general-x4v3\n- `29 Jul 2024`. [mpj1234](https://github.com/mpj1234): Check the YOLOv5, YOLOv8 \u0026 YOLOv10 in TensorRT 10.x API, branch → [trt10](https://github.com/wang-xinyu/tensorrtx/tree/trt10)\n- `29 Jul 2024`. [mpj1234](https://github.com/mpj1234): YOLOv10\n- `21 Jun 2024`. [WuxinrongY](https://github.com/WuxinrongY): YOLOv9-T, YOLOv9-S, YOLOv9-M\n- `28 Apr 2024`. [lindsayshuo](https://github.com/lindsayshuo): YOLOv8-pose\n- `22 Apr 2024`. [B1SH0PP](https://github.com/B1SH0PP): EfficientAd: Accurate Visual Anomaly Detection at Millisecond-Level Latencies.\n- `18 Apr 2024`. [lindsayshuo](https://github.com/lindsayshuo): YOLOv8-p2\n\n## Tutorials\n\n- [How to make contribution](./tutorials/contribution.md)\n- [Install the dependencies.](./tutorials/install.md)\n- [A guide for quickly getting started, taking lenet5 as a demo.](./tutorials/getting_started.md)\n- [The .wts file content format](./tutorials/getting_started.md#the-wts-content-format)\n- [Frequently Asked Questions (FAQ)](./tutorials/faq.md)\n- [Migrating from TensorRT 4 to 7](./tutorials/migrating_from_tensorrt_4_to_7.md)\n- [How to implement multi-GPU processing, taking YOLOv4 as example](./tutorials/multi_GPU_processing.md)\n- [Check if Your GPU support FP16/INT8](./tutorials/check_fp16_int8_support.md)\n- [How to Compile and Run on Windows](./tutorials/run_on_windows.md)\n- [Deploy YOLOv4 with Triton Inference Server](https://github.com/isarsoft/yolov4-triton-tensorrt)\n- [From pytorch to trt step by step, hrnet as example(Chinese)](./tutorials/from_pytorch_to_trt_stepbystep_hrnet.md)\n\n## Test Environment\n\n1. TensorRT 7.x\n2. TensorRT 8.x(Some of the models support 8.x)\n\n## How to run\n\nEach folder has a readme inside, which explains how to run the models inside.\n\n## Models\n\nFollowing models are implemented.\n\n|Name | Description |\n|-|-|\n|[mlp](./mlp) | the very basic model for starters, properly documented |\n|[lenet](./lenet) | the simplest, as a \"hello world\" of this project |\n|[alexnet](./alexnet)| easy to implement, all layers are supported in tensorrt |\n|[googlenet](./googlenet)| GoogLeNet (Inception v1) |\n|[inception](./inception)| Inception v3, v4 |\n|[mnasnet](./mnasnet)| MNASNet with depth multiplier of 0.5 from the paper |\n|[mobilenet](./mobilenet)| MobileNet v2, v3-small, v3-large |\n|[resnet](./resnet)| resnet-18, resnet-50 and resnext50-32x4d are implemented |\n|[senet](./senet)| se-resnet50 |\n|[shufflenet](./shufflenetv2)| ShuffleNet v2 with 0.5x output channels |\n|[squeezenet](./squeezenet)| SqueezeNet 1.1 model |\n|[vgg](./vgg)| VGG 11-layer model |\n|[yolov3-tiny](./yolov3-tiny)| weights and pytorch implementation from [ultralytics/yolov3](https://github.com/ultralytics/yolov3) |\n|[yolov3](./yolov3)| darknet-53, weights and pytorch implementation from [ultralytics/yolov3](https://github.com/ultralytics/yolov3) |\n|[yolov3-spp](./yolov3-spp)| darknet-53, weights and pytorch implementation from [ultralytics/yolov3](https://github.com/ultralytics/yolov3) |\n|[yolov4](./yolov4)| CSPDarknet53, weights from [AlexeyAB/darknet](https://github.com/AlexeyAB/darknet#pre-trained-models), pytorch implementation from [ultralytics/yolov3](https://github.com/ultralytics/yolov3) |\n|[yolov5](./yolov5)| yolov5 v1.0-v7.0 of [ultralytics/yolov5](https://github.com/ultralytics/yolov5), detection, classification and instance segmentation |\n|[yolov7](./yolov7)| yolov7 v0.1, pytorch implementation from [WongKinYiu/yolov7](https://github.com/WongKinYiu/yolov7) |\n|[yolov8](./yolov8)| yolov8, pytorch implementation from [ultralytics](https://github.com/ultralytics/ultralytics) |\n|[yolov9](./yolov9)| The Pytorch implementation is [WongKinYiu/yolov9](https://github.com/WongKinYiu/yolov9). |\n|[yolov10](./yolov10)| The Pytorch implementation is [THU-MIG/yolov10](https://github.com/THU-MIG/yolov10). |\n|[yolo11](./yolo11)| The Pytorch implementation is [ultralytics](https://github.com/ultralytics/ultralytics). |\n|[yolo12](./yolov12)| The Pytorch implementation is [ultralytics](https://github.com/ultralytics/ultralytics). |\n|[yolop](./yolop)| yolop, pytorch implementation from [hustvl/YOLOP](https://github.com/hustvl/YOLOP) |\n|[retinaface](./retinaface)| resnet50 and mobilnet0.25, weights from [biubug6/Pytorch_Retinaface](https://github.com/biubug6/Pytorch_Retinaface) |\n|[arcface](./arcface)| LResNet50E-IR, LResNet100E-IR and MobileFaceNet, weights from [deepinsight/insightface](https://github.com/deepinsight/insightface) |\n|[retinafaceAntiCov](./retinafaceAntiCov)| mobilenet0.25, weights from [deepinsight/insightface](https://github.com/deepinsight/insightface), retinaface anti-COVID-19, detect face and mask attribute |\n|[dbnet](./dbnet)| Scene Text Detection, weights from [BaofengZan/DBNet.pytorch](https://github.com/BaofengZan/DBNet.pytorch) |\n|[crnn](./crnn)| pytorch implementation from [meijieru/crnn.pytorch](https://github.com/meijieru/crnn.pytorch) |\n|[ufld](./ufld)| pytorch implementation from [Ultra-Fast-Lane-Detection](https://github.com/cfzd/Ultra-Fast-Lane-Detection), ECCV2020 |\n|[hrnet](./hrnet)| hrnet-image-classification and hrnet-semantic-segmentation, pytorch implementation from [HRNet-Image-Classification](https://github.com/HRNet/HRNet-Image-Classification) and [HRNet-Semantic-Segmentation](https://github.com/HRNet/HRNet-Semantic-Segmentation) |\n|[psenet](./psenet)| PSENet Text Detection, tensorflow implementation from [liuheng92/tensorflow_PSENet](https://github.com/liuheng92/tensorflow_PSENet) |\n|[ibnnet](./ibnnet)| IBN-Net, pytorch implementation from [XingangPan/IBN-Net](https://github.com/XingangPan/IBN-Net), ECCV2018 |\n|[unet](./unet)| U-Net, pytorch implementation from [milesial/Pytorch-UNet](https://github.com/milesial/Pytorch-UNet) |\n|[repvgg](./repvgg)| RepVGG, pytorch implementation from [DingXiaoH/RepVGG](https://github.com/DingXiaoH/RepVGG) |\n|[lprnet](./lprnet)| LPRNet, pytorch implementation from [xuexingyu24/License_Plate_Detection_Pytorch](https://github.com/xuexingyu24/License_Plate_Detection_Pytorch) |\n|[refinedet](./refinedet)| RefineDet, pytorch implementation from [luuuyi/RefineDet.PyTorch](https://github.com/luuuyi/RefineDet.PyTorch) |\n|[densenet](./densenet)| DenseNet-121, from torchvision.models |\n|[rcnn](./rcnn)| FasterRCNN and MaskRCNN, model from [detectron2](https://github.com/facebookresearch/detectron2) |\n|[tsm](./tsm)| TSM: Temporal Shift Module for Efficient Video Understanding, ICCV2019 |\n|[scaled-yolov4](./scaled-yolov4)| yolov4-csp, pytorch from [WongKinYiu/ScaledYOLOv4](https://github.com/WongKinYiu/ScaledYOLOv4) |\n|[centernet](./centernet)| CenterNet DLA-34, pytorch from [xingyizhou/CenterNet](https://github.com/xingyizhou/CenterNet) |\n|[efficientnet](./efficientnet)| EfficientNet b0-b8 and l2, pytorch from [lukemelas/EfficientNet-PyTorch](https://github.com/lukemelas/EfficientNet-PyTorch) |\n|[detr](./detr)| DE⫶TR, pytorch from [facebookresearch/detr](https://github.com/facebookresearch/detr) |\n|[swin-transformer](./swin-transformer)| Swin Transformer - Semantic Segmentation, only support Swin-T. The Pytorch implementation is [microsoft/Swin-Transformer](https://github.com/microsoft/Swin-Transformer.git) |\n|[real-esrgan](./real-esrgan)| Real-ESRGAN. The Pytorch implementation is [real-esrgan](https://github.com/xinntao/Real-ESRGAN) |\n|[superpoint](./superpoint)| SuperPoint. The Pytorch model is from [magicleap/SuperPointPretrainedNetwork](https://github.com/magicleap/SuperPointPretrainedNetwork) |\n|[csrnet](./csrnet)| CSRNet. The Pytorch implementation is [leeyeehoo/CSRNet-pytorch](https://github.com/leeyeehoo/CSRNet-pytorch) |\n|[EfficientAd](./efficient_ad)| EfficientAd: Accurate Visual Anomaly Detection at Millisecond-Level Latencies. From [anomalib](https://github.com/openvinotoolkit/anomalib) |\n\n## Model Zoo\n\nThe .wts files can be downloaded from model zoo for quick evaluation. But it is recommended to convert .wts from pytorch/mxnet/tensorflow model, so that you can retrain your own model.\n\n[GoogleDrive](https://drive.google.com/drive/folders/1Ri0IDa5OChtcA3zjqRTW57uG6TnfN4Do?usp=sharing) | [BaiduPan](https://pan.baidu.com/s/19s6hO8esU7-TtZEXN7G3OA) pwd: uvv2\n\n## Tricky Operations\n\nSome tricky operations encountered in these models, already solved, but might have better solutions.\n\n|Name | Description |\n|-|-|\n|BatchNorm| Implement by a scale layer, used in resnet, googlenet, mobilenet, etc. |\n|MaxPool2d(ceil_mode=True)| use a padding layer before maxpool to solve ceil_mode=True, see googlenet. |\n|average pool with padding| use setAverageCountExcludesPadding() when necessary, see inception. |\n|relu6| use `Relu6(x) = Relu(x) - Relu(x-6)`, see mobilenet. |\n|torch.chunk()| implement the 'chunk(2, dim=C)' by tensorrt plugin, see shufflenet. |\n|channel shuffle| use two shuffle layers to implement `channel_shuffle`, see shufflenet. |\n|adaptive pool| use fixed input dimension, and use regular average pooling, see shufflenet. |\n|leaky relu| I wrote a leaky relu plugin, but PRelu in `NvInferPlugin.h` can be used, see yolov3 in branch `trt4`. |\n|yolo layer v1| yolo layer is implemented as a plugin, see yolov3 in branch `trt4`. |\n|yolo layer v2| three yolo layers implemented in one plugin, see yolov3-spp. |\n|upsample| replaced by a deconvolution layer, see yolov3. |\n|hsigmoid| hard sigmoid is implemented as a plugin, hsigmoid and hswish are used in mobilenetv3 |\n|retinaface output decode| implement a plugin to decode bbox, confidence and landmarks, see retinaface. |\n|mish| mish activation is implemented as a plugin, mish is used in yolov4 |\n|prelu| mxnet's prelu activation with trainable gamma is implemented as a plugin, used in arcface |\n|HardSwish| hard_swish = x * hard_sigmoid, used in yolov5 v3.0 |\n|LSTM| Implemented pytorch nn.LSTM() with tensorrt api |\n\n## Speed Benchmark\n\n| Models | Device | BatchSize | Mode | Input Shape(HxW) | FPS |\n|-|-|:-:|:-:|:-:|:-:|\n| YOLOv3-tiny | Xeon E5-2620/GTX1080 | 1 | FP32 | 608x608 | 333 |\n| YOLOv3(darknet53) | Xeon E5-2620/GTX1080 | 1 | FP32 | 608x608 | 39.2 |\n| YOLOv3(darknet53) | Xeon E5-2620/GTX1080 | 1 | INT8 | 608x608 | 71.4 |\n| YOLOv3-spp(darknet53) | Xeon E5-2620/GTX1080 | 1 | FP32 | 608x608 | 38.5 |\n| YOLOv4(CSPDarknet53) | Xeon E5-2620/GTX1080 | 1 | FP32 | 608x608 | 35.7 |\n| YOLOv4(CSPDarknet53) | Xeon E5-2620/GTX1080 | 4 | FP32 | 608x608 | 40.9 |\n| YOLOv4(CSPDarknet53) | Xeon E5-2620/GTX1080 | 8 | FP32 | 608x608 | 41.3 |\n| YOLOv5-s v3.0 | Xeon E5-2620/GTX1080 | 1 | FP32 | 608x608 | 142 |\n| YOLOv5-s v3.0 | Xeon E5-2620/GTX1080 | 4 | FP32 | 608x608 | 173 |\n| YOLOv5-s v3.0 | Xeon E5-2620/GTX1080 | 8 | FP32 | 608x608 | 190 |\n| YOLOv5-m v3.0 | Xeon E5-2620/GTX1080 | 1 | FP32 | 608x608 | 71 |\n| YOLOv5-l v3.0 | Xeon E5-2620/GTX1080 | 1 | FP32 | 608x608 | 43 |\n| YOLOv5-x v3.0 | Xeon E5-2620/GTX1080 | 1 | FP32 | 608x608 | 29 |\n| YOLOv5-s v4.0 | Xeon E5-2620/GTX1080 | 1 | FP32 | 608x608 | 142 |\n| YOLOv5-m v4.0 | Xeon E5-2620/GTX1080 | 1 | FP32 | 608x608 | 71 |\n| YOLOv5-l v4.0 | Xeon E5-2620/GTX1080 | 1 | FP32 | 608x608 | 40 |\n| YOLOv5-x v4.0 | Xeon E5-2620/GTX1080 | 1 | FP32 | 608x608 | 27 |\n| RetinaFace(resnet50) | Xeon E5-2620/GTX1080 | 1 | FP32 | 480x640 | 90 |\n| RetinaFace(resnet50) | Xeon E5-2620/GTX1080 | 1 | INT8 | 480x640 | 204 |\n| RetinaFace(mobilenet0.25) | Xeon E5-2620/GTX1080 | 1 | FP32 | 480x640 | 417 |\n| ArcFace(LResNet50E-IR) | Xeon E5-2620/GTX1080 | 1 | FP32 | 112x112 | 333 |\n| CRNN | Xeon E5-2620/GTX1080 | 1 | FP32 | 32x100 | 1000 |\n\nHelp wanted, if you got speed results, please add an issue or PR.\n\n## Acknowledgments \u0026 Contact\n\nAny contributions, questions and discussions are welcomed, contact me by following info.\n\nE-mail: wangxinyu_es@163.com\n\nWeChat ID: wangxinyu0375 (可加我微信进tensorrtx交流群，**备注：tensorrtx**)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwang-xinyu%2Ftensorrtx","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fwang-xinyu%2Ftensorrtx","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwang-xinyu%2Ftensorrtx/lists"}