Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/david-palma/cuda-programming

Educational CUDA C/C++ programming repository with commented examples on GPU parallel computing, matrix operations, and performance profiling. Requires a CUDA-enabled NVIDIA GPU.
https://github.com/david-palma/cuda-programming

c-cpp cpp cuda cuda-toolkit education gpu gpu-programming kernel matrix-operations nvcc nvidia parallel-computing parallel-programming practice profiling threads

Last synced: about 2 months ago
JSON representation

Educational CUDA C/C++ programming repository with commented examples on GPU parallel computing, matrix operations, and performance profiling. Requires a CUDA-enabled NVIDIA GPU.

Awesome Lists containing this project

README

        

# CUDA C/C++ programming

This repository is meant to provide open source resources for educational purposes about CUDA C/C++ programming, which is the C/C++ interface to the CUDA parallel computing platform.
In CUDA, the host refers to the CPU and its memory, while the device refers to the GPU and its memory.
Code run on the host can manage memory on both the host and device, and also launches kernels which are functions executed on the device by many GPU threads in parallel.

**NOTE**: it is assumed that you have access to a computer with a CUDA-enabled NVIDIA GPU.

## List of the exercises

Here you can find the solutions for different simple exercises about GPU programming in CUDA C/C++.
The source code is well commented and easy to follow, though a minimum knowledge of parallel architectures is recommended.

- [exercise 00](./exercises/ex00.cu): hello, world!
- [exercise 01](./exercises/ex01.cu): print devices properties
- [exercise 02](./exercises/ex02.cu): addition
- [exercise 03](./exercises/ex03.cu): vector addition using parallel blocks
- [exercise 04](./exercises/ex04.cu): vector addition using parallel threads
- [exercise 05](./exercises/ex05.cu): vector addition combining blocks and threads
- [exercise 06](./exercises/ex06.cu): single-precision A\*X Plus Y
- [exercise 07](./exercises/ex07.cu): time, bandwidth, and throughput computation (single-precision A\*X Plus Y)
- [exercise 08](./exercises/ex08.cu): multiplication of square matrices
- [exercise 09](./exercises/ex09.cu): transpose of a square matrix
- [exercise 10](./exercises/ex10.cu): dot product using shared memory
- [exercise 11](./exercises/ex11.cu): prefix sum (exclusive scan) using shared memory

## Compiling and running the code

The CUDA C/C++ compiler `nvcc` is part of the NVIDIA CUDA Toolkit which is used to separate source code into host and device components. Then, you can compile the code with `nvcc`.

**NOTE**: to find out how long the kernel takes to run or to check the memory usage, you can type `nvprof ./` or `cuda-memcheck ./` on the command line, respectively.

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.