https://github.com/tgautam03/xfilters
GPU (CUDA) accelerated filters using 2D convolution for high resolution images.
https://github.com/tgautam03/xfilters
2d-convolution c cpp cuda cuda-programming gpu-acceleration gpu-computing gpu-programming image-filters image-processing
Last synced: 8 months ago
JSON representation
GPU (CUDA) accelerated filters using 2D convolution for high resolution images.
- Host: GitHub
- URL: https://github.com/tgautam03/xfilters
- Owner: tgautam03
- License: mit
- Created: 2025-01-12T04:08:02.000Z (over 1 year ago)
- Default Branch: master
- Last Pushed: 2025-02-01T02:54:03.000Z (over 1 year ago)
- Last Synced: 2025-08-20T22:54:50.350Z (10 months ago)
- Topics: 2d-convolution, c, cpp, cuda, cuda-programming, gpu-acceleration, gpu-computing, gpu-programming, image-filters, image-processing
- Language: C++
- Homepage:
- Size: 58.2 MB
- Stars: 9
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# xFilters
**Convolution** is a popular array operation used in signal processing, digital recording, image/video processing, and computer vision. This repository provides **2D convolution algorithm** written from scratch in **C++ (for CPU)** and **CUDA C++ (for GPU)**, which can be used to apply **filters** to **high resolution** images.
**Tested on NVIDIA RTX 3090 using Ubuntu 24.04.1 LTS with nvidia-driver-560 and CUDA 12.6.**
> Images are first converted to grayscale, and then the filter is applied.
**Table of contents**
0. Naive 2D convolution on a CPU.
1. Naive 2D convolution on a GPU.
2. 2D convolution on a GPU using constant memory for filter matrix.
3. 2D convolution on a GPU using constant memory for filter matrix and tiling for shared memory usage.
4. Naive 2D convolution on a GPU (using pinned memory).
5. 2D convolution on a GPU using constant memory for filter matrix (using pinned memory).
6. 2D convolution on a GPU using constant memory for filter matrix and tiling for shared memory usage (using pinned memory).
## Example Run
**CPU/GPU Filter**
1. In the terminal run: `make filters_cpu` or `make filters_gpu`
2. You will be asked to enter the location of the image. For example, `data/8k.jpg`.
3. You will be asked to type the filter name. Supported filters are as follows:
### Supported Filters
#### Sharpen

#### High-pass (edge detection)

#### Low-pass

#### Gaussian (image blurring)

#### Derivative of Gaussian (edge detection)

## Benchmarks
### Runtime Overview (time in seconds)
||CPU|GPU (Naive)|GPU (Constant Memory)|GPU (Constant Memory + Tiling)|GPU (Pinned Memory)|GPU (Constant + Pinned Memory)|GPU (Constant + Pinned Memory + tiling)|
|-|-|-|-|-|-|-|-|
|Allocating Memory|--- | 0.00044032 | 0.000191488 | 0.000313344 | 0.000217088 | 0.000176064 | 0.000154464 |
|Moving input to Memory|--- | 0.0028009 | 0.00271984 | 0.00283443 | 0.00265677 | 0.00267555 | 0.0026567 |
|Moving filter to Memory|--- | 8.736e-06 | 0.000128704 | 0.0002504 | 9.632e-06 | 0.000199776 | 0.000105152 |
|Kernel execution| 0.0607285 | 5.2029e-05 | 5.16403e-05 | 5.53062e-05 | 4.50765e-05 | 4.3735e-05 | 5.37395e-05 |
|Moving output to Memory| --- | 0.00601299 | 0.00601722 | 0.0065999 | 0.00249299 | 0.00250381 | 0.0024945 |
|Total| 0.0607285| 0.00931497 | 0.00910889 | 0.0100534 | 0.00542156 | 0.00559894 | 0.00546456 |
### Naive CPU
```bash
make 00_cpu_conv2d_benchmark.out
```
```
Loaded image with Width: 2048 and Height: 1328
Applying filter...
Time for kernel execution (seconds): 0.0607285
---------------------
Benchmarking details:
---------------------
FPS (total): 16.4667
GFLOPS (kernel): 1.2432
------------------------------------
```
### Naive GPU
```bash
make 01_gpu_conv2d_benchmark.out
```
```
Loaded image with Width: 2048 and Height: 1328
Allocating GPU memory...
Time for GPU memory allocation (seconds): 0.00044032
Moving input to GPU memory...
Time for input data transfer (seconds): 0.0028009
Moving filter to GPU memory...
Time for filter data transfer (seconds): 8.736e-06
Applying filter...
Time for kernel execution (seconds): 5.20294e-05
Moving result to CPU memory...
Time for output data transfer (seconds): 0.00601299
---------------------
Benchmarking details:
---------------------
Time (total): 0.00931497
FPS (total): 107.354
Time (kernel): 5.20294e-05
FPS (kernel): 19219.9
GFLOPS (kernel): 1451.05
------------------------------------
```
### GPU using constant memory
```bash
make 02_gpu_conv2d_constMem_benchmark.out
```
```
Loaded image with Width: 2048 and Height: 1328
Allocating GPU memory...
Time for GPU memory allocation (seconds): 0.000191488
Moving input to GPU memory...
Time for input data transfer (seconds): 0.00271984
Moving filter to GPU memory...
Time for filter data transfer (seconds): 0.000128704
Applying filter...
Time for kernel execution (seconds): 5.16403e-05
Moving result to CPU memory...
Time for output data transfer (seconds): 0.00601722
---------------------
Benchmarking details:
---------------------
Time (total): 0.00910889
FPS (total): 109.783
Time (kernel): 5.16403e-05
FPS (kernel): 19364.7
GFLOPS (kernel): 1461.99
------------------------------------
```
### GPU using constant memory and tiling
```bash
make 03_gpu_conv2d_tiled_benchmark.out
```
```
Loaded image with Width: 2048 and Height: 1328
Allocating GPU memory...
Time for GPU memory allocation (seconds): 0.000313344
Moving input to GPU memory...
Time for input data transfer (seconds): 0.00283443
Moving filter to GPU memory...
Time for filter data transfer (seconds): 0.0002504
Applying filter...
Time for kernel execution (seconds): 5.53062e-05
Moving result to CPU memory...
Time for output data transfer (seconds): 0.0065999
---------------------
Benchmarking details:
---------------------
Time (total): 0.0100534
FPS (total): 99.469
Time (kernel): 5.53062e-05
FPS (kernel): 18081.1
GFLOPS (kernel): 1365.08
------------------------------------
```
### Naive GPU (pinned memory)
```bash
make 04_gpu_conv2d_pinnedMem_benchmark.out
```
```
Loaded image with Width: 2048 and Height: 1328
Allocating GPU memory...
Time for GPU memory allocation (seconds): 0.000217088
Moving input to GPU memory...
Time for input data transfer (seconds): 0.00265677
Moving filter to GPU memory...
Time for filter data transfer (seconds): 9.632e-06
Applying filter...
Time for kernel execution (seconds): 4.50765e-05
Moving result to CPU memory...
Time for output data transfer (seconds): 0.00249299
---------------------
Benchmarking details:
---------------------
Time (total): 0.00542156
FPS (total): 184.449
Time (kernel): 4.50765e-05
FPS (kernel): 22184.5
GFLOPS (kernel): 1674.88
------------------------------------
```
### GPU using constant memory (pinned memory)
```bash
make 05_gpu_conv2d_pinnedConstMem_benchmark.out
```
```
Loaded image with Width: 2048 and Height: 1328
Allocating GPU memory...
Time for GPU memory allocation (seconds): 0.000176064
Moving input to GPU memory...
Time for input data transfer (seconds): 0.00267555
Moving filter to GPU memory...
Time for filter data transfer (seconds): 0.000199776
Applying filter...
Time for kernel execution (seconds): 4.3735e-05
Moving result to CPU memory...
Time for output data transfer (seconds): 0.00250381
---------------------
Benchmarking details:
---------------------
Time (total): 0.00559894
FPS (total): 178.605
Time (kernel): 4.3735e-05
FPS (kernel): 22865
GFLOPS (kernel): 1726.25
------------------------------------
```
### GPU using constant memory and tiling (pinned memory)
```bash
make 06_gpu_conv2d_pinnedTiled_benchmark.out
```
```
Loaded image with Width: 2048 and Height: 1328
Allocating GPU memory...
Time for GPU memory allocation (seconds): 0.000154464
Moving input to GPU memory...
Time for input data transfer (seconds): 0.0026567
Moving filter to GPU memory...
Time for filter data transfer (seconds): 0.000105152
Applying filter...
Time for kernel execution (seconds): 5.37395e-05
Moving result to CPU memory...
Time for output data transfer (seconds): 0.0024945
---------------------
Benchmarking details:
---------------------
Time (total): 0.00546456
FPS (total): 182.997
Time (kernel): 5.37395e-05
FPS (kernel): 18608.3
GFLOPS (kernel): 1404.88
------------------------------------
```
## References
- Image load/save done using [stb single-file public domain libraries for C/C++](https://github.com/nothings/stb). Check out [lib](https://github.com/tgautam03/xFilters/tree/master/lib) for the specific source code.
- Example images in [data](https://github.com/tgautam03/xFilters/tree/master/data):
- [Image by Eberhard Grossgasteiger](https://www.pexels.com/photo/mountain-at-night-under-a-starry-sky-1624496/)
- [Image by Pok Rie](https://www.pexels.com/photo/seawaves-on-sands-982263/)