https://github.com/gama1903/cuda_programming

Practice of cuda programming
https://github.com/gama1903/cuda_programming

cuda parallel-computing

Last synced: 5 months ago
JSON representation

Practice of cuda programming

Host: GitHub
URL: https://github.com/gama1903/cuda_programming
Owner: Gama1903
Created: 2024-12-25T14:29:14.000Z (6 months ago)
Default Branch: main
Last Pushed: 2025-01-01T06:15:49.000Z (6 months ago)
Last Synced: 2025-02-17T06:13:08.067Z (5 months ago)
Topics: cuda, parallel-computing
Language: Jupyter Notebook
Homepage:
Size: 34.3 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

        # CUDA Programming

## Introduction

Practice of cuda programming.

## Reference

1. Book: [Programming Massively Parallel Processors 4th(PMPP)](/Programming%20Massively%20Parallel%20Processors-%20A%20Hands-on%20--%20Wen-mei%20W_%20Hwu,%20David%20B_%20Kirk,%20Izzat%20El%20Hajj,%20Ph_D_%20--%204th,%202023%20--%20Morgan%20Kaufmann.pdf)

2. Lecture: [CUDA MODE](https://github.com/cuda-mode/lectures)

3. https://github.com/heyuhhh/Programming-Massively-Parallel-Processors-4th

## Requirement

1. python==3.8

2. torch==2.0.0+cu118

3. torchaudio==2.0.1+cu118

4. torchvision==0.15.1+cu118

5. ninja==1.11.1.3

6. setuptools==60.2.0

7. ipykernel==6.29.5

## Index

1. [L001_How_to_profile_cuda_kernels_in_pytorch](L001_How_to_profile_cuda_kernels_in_pytorch/index.md)

2. [Ch02_Heterogeneous_data_parallel_computing](Ch02_Heterogeneous_data_parallel_computing/index.md)

3. [L002_Ch1-3_PMPP_book](L002_Ch1-3_PMPP_book/index.md)

4. [Ch03_Multidimensional_grids_and_data](Ch03_Multidimensional_grids_and_data/index.md)

5. [L003_Get_started_with_cuda_for_python_programmer](L003_Get_started_with_cuda_for_python_programmer/index.md)

6. [L004_Compute_and_memory_basics](L004_Compute_and_memory_basics/index.md)

7. [L005_Going_futher_with_cuda_for_python_programmer](L005_Going_futher_with_cuda_for_python_programmer/index.md)

8. [Ch04_Compute_architecture_and_scheduling](Ch04_Compute_architecture_and_scheduling/index.md)

9. [Ch05_Memory_architecture_and_data_locality](Ch05_Memory_architecture_and_data_locality/index.md)

10. [L006_Optimizing_opitimizers](L006_Optimizing_opitimizers/index.md)

11. [L007_Advanced_quantization](L007_Advanced_quantization/index.md)

12. [L008_Cuda_performance_checklist](L008_Cuda_performance_checklist/index.md)

13. [Ch06_Performance_considerations](Ch06_Performance_considerations/index.md)

14. [Ch07_Convolution](Ch07_Convolution/index.md)

15. [Ch08_Stencil](Ch08_Stencil/index.md)

16. [L009_Reductions](L009_Reductions/index.md)

17. [Ch10_Reduction](Ch10_Reduction/index.md)

18. [L011_Sparsity](L011_Sparsity/index.md)

19. [Ch14_Sparse_matrix_computation](Ch14_Sparse_matrix_computation/index.md)

20. [L012_Flash_attenion](L012_Flash_attention/index.md)

21. [L013_Ring_attention](L013_Ring_attention/index.md)

22. [L014_Practitioners_guide_to_triton](L014_Practitioners_guide_to_triton/index.md)

23. [L015_CUTLASS](L015_CUTLASS/index.md)

24. [L016_On_hands_profiling](L016_On_hands_profiling/index.md)

25. [L018_Fusing_kernels](L018_Fusing_kernels/index.md)

26. [L020_Scan_algorithm](L020_Scan_algorithm/index.md)

27. [L021_Scan_algorithm_part2](L021_Scan_algorithm_part2/index.md)

28. [Ch11_Scan](Ch11_Scan/index.md)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/gama1903/cuda_programming

Awesome Lists containing this project

README