Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/emptysoal/cuda-image-preprocess
Speed up image preprocess with cuda when handle image or tensorrt inference
https://github.com/emptysoal/cuda-image-preprocess
cnn cuda cuda-demo cuda-kernels cuda-programming deep-learning image-processing tensorrt
Last synced: 2 months ago
JSON representation
Speed up image preprocess with cuda when handle image or tensorrt inference
- Host: GitHub
- URL: https://github.com/emptysoal/cuda-image-preprocess
- Owner: emptysoal
- License: mit
- Created: 2023-05-29T08:35:57.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-11-13T02:05:16.000Z (3 months ago)
- Last Synced: 2024-11-13T03:18:27.484Z (3 months ago)
- Topics: cnn, cuda, cuda-demo, cuda-kernels, cuda-programming, deep-learning, image-processing, tensorrt
- Language: Cuda
- Homepage:
- Size: 88.9 KB
- Stars: 52
- Watchers: 2
- Forks: 4
- Open Issues: 0
-
Metadata Files:
- Readme: README-en.md
- License: LICENSE
Awesome Lists containing this project
- awesome-cuda-triton-hpc - emptysoal/cuda-image-preprocess - image-preprocess?style=social"/> : Speed up image preprocess with cuda when handle image or tensorrt inference. Cuda编程加速图像预处理。 (Applications)
- awesome-cuda-triton-hpc - emptysoal/cuda-image-preprocess - image-preprocess?style=social"/> : Speed up image preprocess with cuda when handle image or tensorrt inference. Cuda编程加速图像预处理。 (Applications)
README
# Cuda programming speed up image preprocessing
## Introduction
- Based on `cuda` and `opencv`
- Target:
- Can be used alone to speed up image processing operations;
- Combined with the use of TensorRT, the inferencing speed is further accelerated.## Speed
- Here we compare the tensorrt inference speed before and after `Deeplabv3+` preprocessing with `cuda`
- Not using cuda code of image preprocessing, refer to my another [tensorrt](https://github.com/emptysoal/tensorrt-experiment) projectFP32:
| C++ image preproce | CUDA image preprocess |
| :----------------: | :-------------------: |
| 25 ms | 19 ms |Int8 quantization:
| C++ image preproce | CUDA image preprocess |
| :----------------: | :-------------------: |
| 10 ms | **3 ms** |## File description
```bash
project dir
├── bgr2rgb # cuda code achieve BGR to RGB
| ├── Makefile
| └── bgr2rgb.cu
├── bilinear # cuda code achieve bilinear resize
| ├── Makefile
| └── resize.cu
├── hwc2chw # cuda code achieve shape from HWC to CHW, such as np.transpose((2, 0, 1))
| ├── Makefile
| └── transpose.cu
├── normalize # cuda code achieve image data normalization
| ├── Makefile
| └── normal.cu
├── preprocess # unite the above(not simple stitching), achieve common image preprocessing
| ├── Makefile
| └── preprocess.cu
├── union_tensorrt # An example for uniting TensorRT, speed up Deeplabv3+ inferencing
| ├── Makefile
| ├── preprocess.cu
| ├── preprocess.h
| └── trt_infer.cpp
└── lena.jpg # Pictures for testing
```## Usages
### A single operation to speed up image processing
- For directories: bgr2rgb、bilinear、hwc2chw、normalize
```bash
cd
make
./# For example:
cd bgr2rgb
make
./bgr2rgb ../lena.jpg
# Then you can see the result of the image lena.jpg after the exchange of R channel and B channel, and save it in the current directory
```Note: If the cuda or opencv installation directory is different from the one in the Makefile, remember to switch to your own
### General image preprocessing
- Before model inference,images usually need to be Resize, BGR to RGB, HWC to CHW, and Normalize
- You can implement this process using the following steps:```bash
cd preprocess
make
./preprocess ../lena.jpg
```### Used in combination with TensorRT
Method:
1)According to my another [tensorrt](https://github.com/emptysoal/tensorrt-experiment) project, building environment, download datasets, and training Deeplabv3+ network
2)Enter into directory: `Deeplabv3+/TensorRT/C++/api_model/`
3)Place the files which in this project `union_tensorrt` directory into the above directory (or replace the original file)
4)Execute the following commands in sequence to use TensorRT inference
```bash
python pth2wts.py
make
./trt_infer
```5)The following results indicate that the operation is successful, and the segmentation result image will be generated in the same directory
```bash
Loading weights: ./para.wts
Succeeded building backbone!
Succeeded building aspp!
Succeeded building decoder!
Succeeded building total network!
Succeeded building serialized engine!
Succeeded building engine!
Succeeded saving .plan file!
Total image num is: 8 inference total cost is: 105ms average cost is: 19ms
```