https://github.com/gama1903/cuda_programming
Practice of cuda programming
https://github.com/gama1903/cuda_programming
cuda parallel-computing
Last synced: 5 months ago
JSON representation
Practice of cuda programming
- Host: GitHub
- URL: https://github.com/gama1903/cuda_programming
- Owner: Gama1903
- Created: 2024-12-25T14:29:14.000Z (6 months ago)
- Default Branch: main
- Last Pushed: 2025-01-01T06:15:49.000Z (6 months ago)
- Last Synced: 2025-02-17T06:13:08.067Z (5 months ago)
- Topics: cuda, parallel-computing
- Language: Jupyter Notebook
- Homepage:
- Size: 34.3 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# CUDA Programming
## Introduction
Practice of cuda programming.## Reference
1. Book: [Programming Massively Parallel Processors 4th(PMPP)](/Programming%20Massively%20Parallel%20Processors-%20A%20Hands-on%20--%20Wen-mei%20W_%20Hwu,%20David%20B_%20Kirk,%20Izzat%20El%20Hajj,%20Ph_D_%20--%204th,%202023%20--%20Morgan%20Kaufmann.pdf)
2. Lecture: [CUDA MODE](https://github.com/cuda-mode/lectures)
3. https://github.com/heyuhhh/Programming-Massively-Parallel-Processors-4th## Requirement
1. python==3.8
2. torch==2.0.0+cu118
3. torchaudio==2.0.1+cu118
4. torchvision==0.15.1+cu118
5. ninja==1.11.1.3
6. setuptools==60.2.0
7. ipykernel==6.29.5## Index
1. [L001_How_to_profile_cuda_kernels_in_pytorch](L001_How_to_profile_cuda_kernels_in_pytorch/index.md)
2. [Ch02_Heterogeneous_data_parallel_computing](Ch02_Heterogeneous_data_parallel_computing/index.md)
3. [L002_Ch1-3_PMPP_book](L002_Ch1-3_PMPP_book/index.md)
4. [Ch03_Multidimensional_grids_and_data](Ch03_Multidimensional_grids_and_data/index.md)
5. [L003_Get_started_with_cuda_for_python_programmer](L003_Get_started_with_cuda_for_python_programmer/index.md)
6. [L004_Compute_and_memory_basics](L004_Compute_and_memory_basics/index.md)
7. [L005_Going_futher_with_cuda_for_python_programmer](L005_Going_futher_with_cuda_for_python_programmer/index.md)
8. [Ch04_Compute_architecture_and_scheduling](Ch04_Compute_architecture_and_scheduling/index.md)
9. [Ch05_Memory_architecture_and_data_locality](Ch05_Memory_architecture_and_data_locality/index.md)
10. [L006_Optimizing_opitimizers](L006_Optimizing_opitimizers/index.md)
11. [L007_Advanced_quantization](L007_Advanced_quantization/index.md)
12. [L008_Cuda_performance_checklist](L008_Cuda_performance_checklist/index.md)
13. [Ch06_Performance_considerations](Ch06_Performance_considerations/index.md)
14. [Ch07_Convolution](Ch07_Convolution/index.md)
15. [Ch08_Stencil](Ch08_Stencil/index.md)
16. [L009_Reductions](L009_Reductions/index.md)
17. [Ch10_Reduction](Ch10_Reduction/index.md)
18. [L011_Sparsity](L011_Sparsity/index.md)
19. [Ch14_Sparse_matrix_computation](Ch14_Sparse_matrix_computation/index.md)
20. [L012_Flash_attenion](L012_Flash_attention/index.md)
21. [L013_Ring_attention](L013_Ring_attention/index.md)
22. [L014_Practitioners_guide_to_triton](L014_Practitioners_guide_to_triton/index.md)
23. [L015_CUTLASS](L015_CUTLASS/index.md)
24. [L016_On_hands_profiling](L016_On_hands_profiling/index.md)
25. [L018_Fusing_kernels](L018_Fusing_kernels/index.md)
26. [L020_Scan_algorithm](L020_Scan_algorithm/index.md)
27. [L021_Scan_algorithm_part2](L021_Scan_algorithm_part2/index.md)
28. [Ch11_Scan](Ch11_Scan/index.md)