{"id":13809599,"url":"https://github.com/PaddleJitLab/CUDATutorial","last_synced_at":"2025-05-14T08:33:19.752Z","repository":{"id":112060099,"uuid":"549562784","full_name":"PaddleJitLab/CUDATutorial","owner":"PaddleJitLab","description":"A self-learning tutorail for CUDA High Performance Programing.","archived":false,"fork":false,"pushed_at":"2025-04-06T05:20:43.000Z","size":113236,"stargazers_count":542,"open_issues_count":1,"forks_count":57,"subscribers_count":7,"default_branch":"develop","last_synced_at":"2025-04-06T06:18:31.174Z","etag":null,"topics":["cuda-programming","deep-learning"],"latest_commit_sha":null,"homepage":"","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/PaddleJitLab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-10-11T11:31:15.000Z","updated_at":"2025-04-06T05:20:32.000Z","dependencies_parsed_at":"2024-01-04T13:46:26.244Z","dependency_job_id":"b4fbb0be-675c-43ee-9b90-d8b653c91ff7","html_url":"https://github.com/PaddleJitLab/CUDATutorial","commit_stats":null,"previous_names":["paddlejitlab/cudatutorial"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PaddleJitLab%2FCUDATutorial","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PaddleJitLab%2FCUDATutorial/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PaddleJitLab%2FCUDATutorial/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PaddleJitLab%2FCUDATutorial/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/PaddleJitLab","download_url":"https://codeload.github.com/PaddleJitLab/CUDATutorial/tar.gz/refs/heads/develop","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254104929,"owners_count":22015567,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cuda-programming","deep-learning"],"created_at":"2024-08-04T02:00:32.104Z","updated_at":"2025-05-14T08:33:14.744Z","avatar_url":"https://github.com/PaddleJitLab.png","language":"JavaScript","readme":"# CUDATutorial \n![](https://img.shields.io/badge/version-v0.1-brightgreen) ![](https://img.shields.io/badge/docs-latest-brightgreen) ![](https://img.shields.io/badge/PRs-welcome-orange) ![](https://img.shields.io/badge/pre--commit-Yes-brightgreen)\n\n从零开始学习 CUDA 高性能编程，从入门到放弃，哦不！一起来边学习，边打笔记，日拱一卒！\n\n\u003e [!NOTE]\n\u003e 你可以访问 https://cuda.keter.top/ 来访问本仓库的网页版\n\n\u003cp align=\"center\"\u003e\n\u003cimg align=\"center\" src=\"./img/kernel-execution-on-gpu.png\" width=75%\u003e\n\u003cp\u003e\n\n\n## 学习路线\n\n### 新手村系列 🐸\n\n+ [构建 CUDA 编程环境](./docs/01_build_dev_env/)\n+ [手写第一个 Kernel](./docs/02_first_kernel/)\n+ [nvprof 性能分析](./docs/03_nvprof_usage/)\n+ [尝试第一次优化 Kernel](./docs/04_first_refine_kernel/)\n+ [了解 CUDA 线程分布](./docs/10_what_my_id/)\n+ [CUDA 编程模型](./docs/00_prev_concept/)\n\n### 初阶系列 ⚔\n\n+ [初识多线程并行计算](./docs/05_intro_parallel/)\n+ [手写实现矩阵乘 Matmul](./docs/06_impl_matmul/)\n+ [矩阵乘 Matmul 性能优化实践](./docs/07_optimize_matmul/)\n\n### 中阶系列 🚀\n\n+ [手写实现 Reduce](./docs/08_impl_reduce/)\n+ [Reduce 性能优化实践—交叉寻址](./docs/09_optimize_reduce/01_interleaved_addressing/README.md)\n+ [Reduce 性能优化实践—解决 Bank Conflict](./docs/09_optimize_reduce/02_bank_conflict/README.md)\n+ [Reduce 性能优化实践—解决空闲线程](./docs/09_optimize_reduce/03_idle_threads_free/README.md)\n+ [Reduce 性能优化实践—展开最后一个 warp](./docs/09_optimize_reduce/04_unroll/README.md)\n+ [GEMM 优化专题-二维 Thread Tile 并行优化](./docs/11_gemm_optimize/01_tiled2d/README.md)\n+ [GEMM 优化专题-向量化访存](./docs/11_gemm_optimize/02_vectorize_smem_and_gmem_accesses/README.md)\n+ [GEMM 优化专题-warp tiling](./docs/11_gemm_optimize/03_warptiling/README.md)\n+ [GEMM 优化专题-双缓冲](./docs/11_gemm_optimize/04_double_buffer/README.md)\n+ [GEMM 优化专题-解决 Bank Conflict](./docs/11_gemm_optimize/05_bank_conflicts/README.md)\n+ [卷积算子优化专题-卷积算子简易实现](./docs/12_convolution/01_naive_conv/README.md)\n+ [卷积算子优化专题-卷积算子优化思路介绍](./docs/12_convolution/02_intro_conv_optimize/README.md)\n+ [卷积算子优化专题-im2col + gemm 实现卷积](./docs/12_convolution/03_im2col_conv/README.md)\n+ [卷积算子优化专题-隐式 GEMM 实现卷积](./docs/12_convolution/04_implicit_gemm/README.md)\n+ [卷积算子优化专题-CUTLASS 中的卷积优化策略](./docs/12_convolution/05_cutlass_conv/README.md)\n\n\n### 高阶系列 ✈️\n\n+ 页锁定和主机内存\n+ CUDA 流和多流使用\n+ 使用多个 GPU 计算\n+ ...(补充中)\n\n### 大师系列 💡\n\n我现在还不知道写啥，毕竟我现在还是菜鸡~~\n\n### LLM 推理技术 🤖\n\n+ [连续批处理](./docs/13_continuous_batch/README.md)\n+ [Page Attention - 原理篇](./docs/14_page_attention/README.md)\n+ [Page Attention - 源码解析](./docs/15_vllm_page_attention/README.md)\n+ [vLLM 源码解读系列 - vLLM 代码架构介绍](./docs/16_vllm_source_code/01_vllm_arch.md)\n+ [vLLM 源码解读系列 - 调度前的预处理工作](./docs/16_vllm_source_code/02_preprocess_before_scheduler.md)\n+ [vLLM 源码解读系列 - 调度器策略](./docs/16_vllm_source_code/03_scheduler.md)\n+ [vLLM 源码解读系列 - vLLM BlockManager - NaiveBlockAllocator](./docs/16_vllm_source_code/04_block_manager_part1.md)\n+ [vLLM 源码解读系列 - vLLM BlockManager - PrefixCachingBlockAllocator](./docs/16_vllm_source_code/05_block_manager_part2.md)\n\n[![Star History Chart](https://api.star-history.com/svg?repos=PaddleJitLab/CUDATutorial\u0026type=Date)](https://star-history.com/#PaddleJitLab/CUDATutorial\u0026Date)","funding_links":[],"categories":["JavaScript","Learning Resources","Learning Resources 📚"],"sub_categories":["University Courses \u0026 Tutorials 🎓"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FPaddleJitLab%2FCUDATutorial","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FPaddleJitLab%2FCUDATutorial","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FPaddleJitLab%2FCUDATutorial/lists"}