Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/grindelfp/cuda-streams-pinned-memory
This program compares the execution time of a specific mathematical computation both on CPU and GPU using CUDA Streams and pinned memory.
https://github.com/grindelfp/cuda-streams-pinned-memory
Last synced: about 2 months ago
JSON representation
This program compares the execution time of a specific mathematical computation both on CPU and GPU using CUDA Streams and pinned memory.
- Host: GitHub
- URL: https://github.com/grindelfp/cuda-streams-pinned-memory
- Owner: GrindelfP
- License: mit
- Created: 2024-11-25T16:15:20.000Z (about 2 months ago)
- Default Branch: main
- Last Pushed: 2024-11-25T16:43:32.000Z (about 2 months ago)
- Last Synced: 2024-11-25T17:40:16.581Z (about 2 months ago)
- Language: Cuda
- Size: 2.93 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.adoc
- License: LICENSE
Awesome Lists containing this project
README
= Cuda streams and pinned memory excercise =
== Description ==
This program compares the execution time of a specific mathematical computation both on CPU and GPU using CUDA Streams and pinned memory. The computation involves multiplying elements of two arrays and applying a summation of trigonometric functions over a fixed range.
The program utilizes pinned memory (`cudaMallocHost`) to accelerate data transfer between host and device, and CUDA Streams to perform asynchronous data transfer and kernel execution. These optimizations significantly improve the performance of GPU computations by overlapping data transfer and processing.
The arrays `A`, `B`, and `C` contain floating-point values initialized with trigonometric functions. The computation is performed on:
1. CPU, using a sequential approach.
2. GPU, using multiple CUDA Streams for parallel processing.The program outputs the execution time for CPU and GPU, and calculates the acceleration coefficient achieved by using the GPU.
This program is developed in C++ using the CUDA runtime library. It is presented here as a source code file `kernel.cu`. The executable file can be generated using a CUDA-compatible development environment, such as Visual Studio or the NVIDIA CUDA toolkit.
== Program structure ==
The program source code is stored in the file `kernel.cu`, which includes the following functions:
* **main**: Executes the program, initializes data, performs computations on both CPU and GPU, measures execution time and prints results.
* **kernel**: The CUDA kernel function executed on the GPU for performing element-wise computations and summation.
* **cpu_compute**: Performs the computation sequentially on the CPU for comparison.