{"id":14113540,"url":"https://github.com/emptysoal/cuda-image-preprocess","last_synced_at":"2025-08-01T18:31:21.438Z","repository":{"id":170578738,"uuid":"646742828","full_name":"emptysoal/cuda-image-preprocess","owner":"emptysoal","description":"Speed up image preprocess with cuda when handle image or tensorrt inference","archived":false,"fork":false,"pushed_at":"2024-11-13T02:05:16.000Z","size":91,"stargazers_count":52,"open_issues_count":0,"forks_count":4,"subscribers_count":2,"default_branch":"main","last_synced_at":"2024-11-13T03:18:27.484Z","etag":null,"topics":["cnn","cuda","cuda-demo","cuda-kernels","cuda-programming","deep-learning","image-processing","tensorrt"],"latest_commit_sha":null,"homepage":"","language":"Cuda","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/emptysoal.png","metadata":{"files":{"readme":"README-en.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-05-29T08:35:57.000Z","updated_at":"2024-11-13T02:05:20.000Z","dependencies_parsed_at":"2024-07-26T09:31:10.761Z","dependency_job_id":"2c2a3d67-0345-4f4d-9c83-9185c632b1f4","html_url":"https://github.com/emptysoal/cuda-image-preprocess","commit_stats":null,"previous_names":["emptysoal/cuda-image-preprocess"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/emptysoal%2Fcuda-image-preprocess","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/emptysoal%2Fcuda-image-preprocess/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/emptysoal%2Fcuda-image-preprocess/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/emptysoal%2Fcuda-image-preprocess/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/emptysoal","download_url":"https://codeload.github.com/emptysoal/cuda-image-preprocess/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":228397690,"owners_count":17913538,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cnn","cuda","cuda-demo","cuda-kernels","cuda-programming","deep-learning","image-processing","tensorrt"],"created_at":"2024-08-14T11:00:38.338Z","updated_at":"2024-12-06T01:31:33.305Z","avatar_url":"https://github.com/emptysoal.png","language":"Cuda","funding_links":[],"categories":["Applications"],"sub_categories":[],"readme":"# Cuda programming speed up image preprocessing\n\n## Introduction\n\n- Based on `cuda` and `opencv` \n\n- Target:\n  - Can be used alone to speed up image processing operations;\n  - Combined with the use of TensorRT, the inferencing speed is further accelerated.\n\n## Speed\n\n- Here we compare the tensorrt inference speed before and after `Deeplabv3+` preprocessing with `cuda`\n- Not using cuda code of image preprocessing, refer to my another [tensorrt](https://github.com/emptysoal/tensorrt-experiment)  project\n\nFP32:\n\n| C++ image preproce | CUDA image preprocess |\n| :----------------: | :-------------------: |\n|       25 ms        |         19 ms         |\n\nInt8 quantization:\n\n| C++ image preproce | CUDA image preprocess |\n| :----------------: | :-------------------: |\n|       10 ms        |       **3 ms**        |\n\n## File description\n\n```bash\nproject dir\n    ├── bgr2rgb  # cuda code achieve BGR to RGB \n    |   ├── Makefile\n    |   └── bgr2rgb.cu\n    ├── bilinear  # cuda code achieve bilinear resize\n    |   ├── Makefile\n    |   └── resize.cu\n    ├── hwc2chw  # cuda code achieve shape from HWC to CHW, such as np.transpose((2, 0, 1))\n    |   ├── Makefile\n    |   └── transpose.cu\n    ├── normalize  # cuda code achieve image data normalization\n    |   ├── Makefile\n    |   └── normal.cu\n    ├── preprocess  # unite the above(not simple stitching), achieve common image preprocessing\n    |   ├── Makefile\n    |   └── preprocess.cu\n    ├── union_tensorrt  # An example for uniting TensorRT, speed up Deeplabv3+ inferencing\n    |   ├── Makefile\n    |   ├── preprocess.cu\n    |   ├── preprocess.h\n    |   └── trt_infer.cpp\n    └── lena.jpg  # Pictures for testing\n```\n\n## Usages\n\n### A single operation to speed up image processing\n\n- For directories: bgr2rgb、bilinear、hwc2chw、normalize\n\n```bash\ncd \u003cdir name\u003e\nmake\n./\u003cbin file\u003e \u003cimage path\u003e\n\n# For example:\ncd bgr2rgb\nmake\n./bgr2rgb ../lena.jpg\n# Then you can see the result of the image lena.jpg after the exchange of R channel and B channel, and save it in the current directory \n```\n\nNote: If the cuda or opencv installation directory is different from the one in the Makefile, remember to switch to your own \n\n### General image preprocessing\n\n- Before model inference，images usually need to be Resize, BGR to RGB, HWC to CHW, and Normalize\n- You can implement this process using the following steps:\n\n```bash\ncd preprocess\nmake\n./preprocess ../lena.jpg\n```\n\n### Used in combination with TensorRT\n\nMethod：\n\n1）According to my another [tensorrt](https://github.com/emptysoal/tensorrt-experiment) project, building environment, download datasets, and training Deeplabv3+ network \n\n2）Enter into directory: `Deeplabv3+/TensorRT/C++/api_model/`\n\n3）Place the files which in this project `union_tensorrt` directory into the above directory (or replace the original file) \n\n4）Execute the following commands in sequence to use TensorRT inference\n\n```bash\npython pth2wts.py\nmake\n./trt_infer\n```\n\n5）The following results indicate that the operation is successful, and the segmentation result image will be generated in the same directory \n\n```bash\nLoading weights: ./para.wts\nSucceeded building backbone!\nSucceeded building aspp!\nSucceeded building decoder!\nSucceeded building total network!\nSucceeded building serialized engine!\nSucceeded building engine!\nSucceeded saving .plan file!\nTotal image num is: 8 inference total cost is: 105ms average cost is: 19ms\n```\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Femptysoal%2Fcuda-image-preprocess","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Femptysoal%2Fcuda-image-preprocess","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Femptysoal%2Fcuda-image-preprocess/lists"}