Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/abaksy/cuda-examples
A repository of examples coded in CUDA C/C++
https://github.com/abaksy/cuda-examples
cuda
Last synced: 27 days ago
JSON representation
A repository of examples coded in CUDA C/C++
- Host: GitHub
- URL: https://github.com/abaksy/cuda-examples
- Owner: abaksy
- Created: 2020-09-10T12:26:51.000Z (over 4 years ago)
- Default Branch: master
- Last Pushed: 2021-05-15T05:31:02.000Z (almost 4 years ago)
- Last Synced: 2024-11-16T16:35:08.358Z (3 months ago)
- Topics: cuda
- Language: Cuda
- Homepage:
- Size: 15.6 KB
- Stars: 2
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# cuda-examples
A repository of examples coded in CUDA C++
All examples were compiled using NVCC version 10.1 on Linux v 5.4## Setup on Linux
1) Install Nvidia drivers for the installed Nvidia GPU. On Ubuntu-based distributions this can be done from the Software & Updates application
in the tab listed as "Additional Drivers" (make sure to install the **recommended** version of Nvidia drivers)2) After installing and restarting, verify that the drivers were installed by running
```
nvidia-smi
```
in a terminal window. The output should list the name of the installed card, along with some usage statistics3) Install the ```nvcc``` compiler using the package manager
```
sudo apt install nvidia-cuda-toolkit
```4) Verify the installation using
```
nvcc -V
```
or
```
nvcc --version
```## vecadd
CUDA implementation of vector-vector addition, adding vectors of length N, using:
* One thread per block, N blocks
* N threads in one block, grid contains only one block
* M threads in block, N/M blocks## matadd
CUDA implementation of matrix-matrix addition, adding matrices of size M x N, using:
* One thread per block, 2D grid of MxN blocks
* MxN threads in one block, grid contains only one block## matmul
CUDA implementation of matrix-matrix multiplication, with matrices of size MxN and PxQ (where N = P)
Implementation using:
* 2D grid of size MxQ blocks, with 1 thread in each block
* 2D grid of size MxQ blocks, with N threads in each block and shared memory (block level)