Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/RussWong/CUDATutorial
A CUDA tutorial to make people learn CUDA program from 0
https://github.com/RussWong/CUDATutorial
Last synced: 2 months ago
JSON representation
A CUDA tutorial to make people learn CUDA program from 0
- Host: GitHub
- URL: https://github.com/RussWong/CUDATutorial
- Owner: RussWong
- Created: 2023-08-05T00:50:17.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-07-09T12:06:15.000Z (7 months ago)
- Last Synced: 2024-08-04T02:06:35.789Z (6 months ago)
- Language: Cuda
- Size: 159 KB
- Stars: 162
- Watchers: 2
- Forks: 38
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- awesome-cuda-triton-hpc - RussWong/CUDATutorial
- awesome-cuda-triton-hpc - RussWong/CUDATutorial
README
# CUDATutorial
A CUDA tutorial to make people learn CUDA program from 0## test enviroment
Turing T4 GPU
## compile command1. compile by hand
`nvcc xxx.cu -o xxx`
if that does not work, pls try:
`nvcc xxx.cu --gpu-architecture=compute_yy -o xxx`
xxx is file name, yy is GPU compute capability, ep.A100's compute capability is 86.
2. one-click compile and run
please ensure:
1.cmake version >= 3.8
2.you have CUDA TOOLKIT installed in system root directory, downloaded link is https://developer.nvidia.com/cuda-downloads.
```
mkdir build
cd build
cmake .. && make -j8
cd bin
./xxx
```
## remark
* related performance data is attached at the top of code file.
* the performance data is diverse and diverse on different GPU platforms and NVCC compiler, so some counter-intuitive result is normal, we should only explore and debug the result.
* welcome all comments and pull requests.## update notes
### v2.0
* add cuda stream
* add quantize
### v2.1
* add fp32/fp16 gemv(vec * mat,mat is col major)
### v2.2
* add fp32/fp16 gemv(vec * mat,mat is row major)
* add some code explaination(WIP)
### v2.6
* add fp32 dropout