https://github.com/emptysoal/cuda-image-preprocess

Speed up image preprocess with cuda when handle image or tensorrt inference
https://github.com/emptysoal/cuda-image-preprocess

cnn cuda cuda-demo cuda-kernels cuda-programming deep-learning image-processing tensorrt

Last synced: 7 months ago
JSON representation

Speed up image preprocess with cuda when handle image or tensorrt inference

Host: GitHub
URL: https://github.com/emptysoal/cuda-image-preprocess
Owner: emptysoal
License: mit
Created: 2023-05-29T08:35:57.000Z (about 2 years ago)
Default Branch: main
Last Pushed: 2024-11-13T02:05:16.000Z (8 months ago)
Last Synced: 2024-11-13T03:18:27.484Z (8 months ago)
Topics: cnn, cuda, cuda-demo, cuda-kernels, cuda-programming, deep-learning, image-processing, tensorrt
Language: Cuda
Homepage:
Size: 88.9 KB
Stars: 52
Watchers: 2
Forks: 4
Open Issues: 0
Metadata Files:
- Readme: README-en.md
- License: LICENSE

Awesome Lists containing this project

awesome-cuda-and-hpc - emptysoal/cuda-image-preprocess - image-preprocess?style=social"/> : Speed up image preprocess with cuda when handle image or tensorrt inference. Cuda编程加速图像预处理。 (Applications)
awesome-cuda-and-hpc - emptysoal/cuda-image-preprocess - image-preprocess?style=social"/> : Speed up image preprocess with cuda when handle image or tensorrt inference. Cuda编程加速图像预处理。 (Applications)

README

# Cuda programming speed up image preprocessing

## Introduction

- Based on `cuda` and `opencv`

- Target:
- Can be used alone to speed up image processing operations;
- Combined with the use of TensorRT, the inferencing speed is further accelerated.

## Speed

- Here we compare the tensorrt inference speed before and after `Deeplabv3+` preprocessing with `cuda`
- Not using cuda code of image preprocessing, refer to my another [tensorrt](https://github.com/emptysoal/tensorrt-experiment) project

FP32:

| C++ image preproce | CUDA image preprocess |
| :----------------: | :-------------------: |
| 25 ms | 19 ms |

Int8 quantization:

| C++ image preproce | CUDA image preprocess |
| :----------------: | :-------------------: |
| 10 ms | **3 ms** |

## File description

```bash
project dir
├── bgr2rgb # cuda code achieve BGR to RGB
| ├── Makefile
| └── bgr2rgb.cu
├── bilinear # cuda code achieve bilinear resize
| ├── Makefile
| └── resize.cu
├── hwc2chw # cuda code achieve shape from HWC to CHW, such as np.transpose((2, 0, 1))
| ├── Makefile
| └── transpose.cu
├── normalize # cuda code achieve image data normalization
| ├── Makefile
| └── normal.cu
├── preprocess # unite the above(not simple stitching), achieve common image preprocessing
| ├── Makefile
| └── preprocess.cu
├── union_tensorrt # An example for uniting TensorRT, speed up Deeplabv3+ inferencing
| ├── Makefile
| ├── preprocess.cu
| ├── preprocess.h
| └── trt_infer.cpp
└── lena.jpg # Pictures for testing
```

## Usages

### A single operation to speed up image processing

- For directories: bgr2rgb、bilinear、hwc2chw、normalize

```bash
cd
make
./

# For example:
cd bgr2rgb
make
./bgr2rgb ../lena.jpg
# Then you can see the result of the image lena.jpg after the exchange of R channel and B channel, and save it in the current directory
```

Note: If the cuda or opencv installation directory is different from the one in the Makefile, remember to switch to your own

### General image preprocessing

- Before model inference，images usually need to be Resize, BGR to RGB, HWC to CHW, and Normalize
- You can implement this process using the following steps:

```bash
cd preprocess
make
./preprocess ../lena.jpg
```

### Used in combination with TensorRT

Method：

1）According to my another [tensorrt](https://github.com/emptysoal/tensorrt-experiment) project, building environment, download datasets, and training Deeplabv3+ network

2）Enter into directory: `Deeplabv3+/TensorRT/C++/api_model/`

3）Place the files which in this project `union_tensorrt` directory into the above directory (or replace the original file)

4）Execute the following commands in sequence to use TensorRT inference

```bash
python pth2wts.py
make
./trt_infer
```

5）The following results indicate that the operation is successful, and the segmentation result image will be generated in the same directory

```bash
Loading weights: ./para.wts
Succeeded building backbone!
Succeeded building aspp!
Succeeded building decoder!
Succeeded building total network!
Succeeded building serialized engine!
Succeeded building engine!
Succeeded saving .plan file!
Total image num is: 8 inference total cost is: 105ms average cost is: 19ms
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/emptysoal/cuda-image-preprocess

Awesome Lists containing this project

README