https://github.com/rmeli/cuda-pg
CUDA C++ Playground
https://github.com/rmeli/cuda-pg
cpp cuda gpu
Last synced: 2 months ago
JSON representation
CUDA C++ Playground
- Host: GitHub
- URL: https://github.com/rmeli/cuda-pg
- Owner: RMeli
- Created: 2019-05-24T15:33:22.000Z (about 7 years ago)
- Default Branch: master
- Last Pushed: 2022-07-22T09:10:00.000Z (almost 4 years ago)
- Last Synced: 2025-10-05T08:39:43.248Z (9 months ago)
- Topics: cpp, cuda, gpu
- Language: Cuda
- Homepage:
- Size: 318 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSES/README.md
Awesome Lists containing this project
README
# CUDA Playground
Playground for experimentation with CUDA C++. Most of the code in this repository is an adaptation of the code associated to [CUDA by Example - An Introduction to General Purpose GPU Programming](https://developer.nvidia.com/cuda-example) (see `LICENSES`).
## CUDA Concepts
* `01_info`: [device information](01_info/README.md#cuda-concepts)
* `02_axpy`: [kernel functions](02_axpy/README.md#kernel-function), [kernel calls](02_axpy/README.md#kernel-call), [blocks](02_axpy/README.md#blocks), [host/device memory](02_axpy/README.md#device-memory)
* `03_mandelbrot`: [device functions](03_mandelbrot/README.md#device-functions), [grid of blocks](03_mandelbrot/README.md#grid-of-blocks)
* `04_axpy`: [threads](04_axpy/README.md#threads), [blocks and threads](04_axpy/README.md#blocks-and-threads)
* `05_dot`: [shared memory](05_dot/README.md#shared-memory), [thread synchronization](05_dot/README.md#thread-synchronization)
* `06_raytracer`: [constant memory](06_raytracer#constant-memory), [warps](06_raytracer#warps), [host/device functions](06_raytracer#hostdevice-functions), [GPU timing and events](06_raytracer#gpu-timing)
* `07_hist`: [atomic operations](07_hist#atomic-operations), [set device memory](07_hist#set-device-memory)
* `08_mem`: [page-locked memory](08_mem#page-locked-memory)
* `09_memasync`: [CUDA streams](09_memasync#cuda-streams), [asyncronous memory copy](09_memasync#asyncronous-memory-copy)
### Timings
CPU timings are measured for the kernel execution. GPU timings are measured for device memory allocation, copy of data from host to device, kernel execution, and copy of data from device to host (unless such timings are clearly disaggregated).
## Build Experiments
```bash
mkdir build # create build directory
cd build # change to build directory
cmake .. # generato Makefile
make # compile project
```
### Singularity
Run [Singularity](https://singularity.hpcng.org/) container with CUDA and [CMake](https://cmake.org/) interactively:
```bash
singularity shell --nv singularity/.sif
```
## References
* [CSCS HPC Summer School 2018](https://github.com/eth-cscs/SummerSchool2018)
* [CUDA by Example - An Introduction to General Purpose GPU Programming](https://developer.nvidia.com/cuda-example)