https://github.com/rurumimic/cuda
compute unified device architecture
https://github.com/rurumimic/cuda
cuda deep-learning gpu nvidia
Last synced: 5 days ago
JSON representation
compute unified device architecture
- Host: GitHub
- URL: https://github.com/rurumimic/cuda
- Owner: rurumimic
- Created: 2023-11-23T10:02:08.000Z (over 2 years ago)
- Default Branch: master
- Last Pushed: 2026-02-18T09:58:41.000Z (4 months ago)
- Last Synced: 2026-02-18T14:25:48.739Z (4 months ago)
- Topics: cuda, deep-learning, gpu, nvidia
- Language: Cuda
- Homepage:
- Size: 5.54 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 4
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# CUDA
- nvidia developer
- [cuda-toolkit](https://developer.nvidia.com/cuda-toolkit)
- [gpu compute capability](https://developer.nvidia.com/cuda-gpus)
- docs
- [quick start](https://docs.nvidia.com/cuda/cuda-quick-start-guide/index.html)
- [support compiler](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/#host-compiler-support-policy)
- [best practices guide](https://docs.nvidia.com/cuda/cuda-c-best-practices-guide/)
- [cuda c++ programming guide](https://docs.nvidia.com/cuda/cuda-c-programming-guide/)
- source
- [samples](https://developer.nvidia.com/cuda-code-samples)
- github: [nvidia/cuda-samples](https://github.com/nvidia/cuda-samples)
- repos
- [cutlass](https://github.com/NVIDIA/cutlass)
---
## GPU Compute Capability
- [gpu compute capability](https://developer.nvidia.com/cuda-gpus)
```bash
nvidia-smi --query-gpu=compute_cap --format=csv
compute_cap
8.6
```
---
## Code
### Samples
```bash
git clone https://github.com/NVIDIA/cuda-samples.git
```
#### c++11_cuda
- Introduction: [c++11_cuda](https://github.com/NVIDIA/cuda-samples/tree/master/Samples/0_Introduction/c++11_cuda)
```bash
cd Samples/0_Introduction/c++11_cuda
```
##### Compile
```bash
make HOST_COMPILER=clang++ SMS="86" dbg=1
make HOST_COMPILER=g++ SMS="86" dbg=1
make HOST_COMPILER=g++-13 SMS="86" dbg=1
```
#### Run
```bash
./c++11_cuda
GPU Device 0: "Ampere" with compute capability 8.6
Read 3223503 byte corpus from ./warandpeace.txt
counted 107310 instances of 'x', 'y', 'z', or 'w' in "./warandpeace.txt"
```
---
## Docs
- [install](docs/install.md)
- [clang](docs/clang.md): format
- [api](docs/api.md): driver, runtime
- [huggingface](docs/huggingface.md)
- [text embeddings inference](docs/text.embeddings.inference.md)
- [docker](docs/docker.md)
- nvidia
- [triton](docs/triton.md)
- [libnvidia-container](docs/libnvidia.container.md)
- [dynamo](docs/dynamo.md)
- [tensorRT](docs/tensorrt.md), src/[tensorrt](src/tensorrt/README.md)
- [leetgpu](docs/leetgpu.md)
---
## Code
- Hello CUDA: [hello_cuda](src/hello_cuda/README.md), [hello_cuda with C++](src/hello_cuda_cpp/README.md)
- Thread: [thread_layout](src/thread_layout/README.md)
- Device: [device_query](src/device_query/README.md)
- Vector: [vector_add](src/vector_add/README.md)
- Matrix
- add: [matrix_add](src/matrix_add/README.md), [matrix_add_large](src/matrix_add_large/README.md)
- mul: [matrix_mul](src/matrix_mul/README.md), [matrix_mul_shared_memory](src/matrix_mul_shared_memory/README.md), [matrix_mul_shared_memory_large](src/matrix_mul_shared_memory_large/README.md)
- TensorRT: [tensorrt](src/tensorrt/README.md)
- Sync: [sync](src/sync/README.md), [streams + event](src/streams/README.md)
---
## Ref
- [CUDA Books archive](https://developer.nvidia.com/cuda-books-archive)
- book: [Programming Massively Parallel Processors](https://www.oreilly.com/library/view/programming-massively-parallel/9780323984638)
- book: [CUDA Programming](https://github.com/bluekds/CUDA_Programming)
- book: [The Art of HPC](https://theartofhpc.com/)
- youtube: [CUDA Programming Course – High-Performance Computing with GPUs](https://www.youtube.com/watch?v=86FAWCzIe_4)
- youtube: [GPU MODE](https://www.youtube.com/@GPUMODE)
- [GPU Glossary](https://modal.com/gpu-glossary)
- UIUC: [Introduction to Parallel Programming with CUDA](https://newfrontiers.illinois.edu/news-and-events/introduction-to-parallel-programming-with-cuda/)