{"id":13809595,"url":"https://github.com/QINZHAOYU/CudaSteps","last_synced_at":"2025-05-14T08:32:51.951Z","repository":{"id":38400592,"uuid":"425864667","full_name":"QINZHAOYU/CudaSteps","owner":"QINZHAOYU","description":"基于《cuda编程-基础与实践》（樊哲勇 著）的cuda学习之路。","archived":true,"fork":false,"pushed_at":"2024-01-15T01:46:13.000Z","size":213,"stargazers_count":201,"open_issues_count":0,"forks_count":46,"subscribers_count":3,"default_branch":"master","last_synced_at":"2024-08-04T02:06:35.978Z","etag":null,"topics":["cuda","gpu","nvidia"],"latest_commit_sha":null,"homepage":"","language":"Cuda","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/QINZHAOYU.png","metadata":{"files":{"readme":"ReadMe.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-11-08T14:14:13.000Z","updated_at":"2024-07-31T15:55:32.000Z","dependencies_parsed_at":"2024-08-04T02:01:31.864Z","dependency_job_id":null,"html_url":"https://github.com/QINZHAOYU/CudaSteps","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/QINZHAOYU%2FCudaSteps","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/QINZHAOYU%2FCudaSteps/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/QINZHAOYU%2FCudaSteps/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/QINZHAOYU%2FCudaSteps/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/QINZHAOYU","download_url":"https://codeload.github.com/QINZHAOYU/CudaSteps/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":225282501,"owners_count":17449524,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cuda","gpu","nvidia"],"created_at":"2024-08-04T02:00:32.012Z","updated_at":"2024-11-19T02:30:57.731Z","avatar_url":"https://github.com/QINZHAOYU.png","language":"Cuda","funding_links":[],"categories":["Learning Resources"],"sub_categories":[],"readme":"# CUDA Study Steps\n\n**NOTE**\n\n```\n作为一个cuda入门学习的笔记，很惊讶也很高兴这个小项目能给大家提供帮助，这对于我来说也是第一次。（通常我都是受帮助的那个:D）\n\n由于个人精力、能力和兴趣方向，这个项目目前没有再继续维护。希望这里能作为大家开始gpu编程的敲门砖吧。\n\n另外：\n\nNVIDIA在2012年 OpenMP 4.0 中提出了针对加速器的TEAMS构造，并于2013 年发布了对加速器的支持。相较于原生cuda开发，使用openmp框架可以实现更兼容的gpu加速方案（当然效果和灵活度会打上折扣）。\n```\n\n\n\nCUDA gpu 编程学习，基于 《CUDA 编程——基础与实践》（樊哲勇）。\n\n包含章节：\n\n1. [GPU 硬件与 CUDA 程序开发工具](./capter1/ReadMe.md)\n2. [CUDA 中的线程组织](./capter2/ReadMe.md)\n3. [简单 CUDA 程序的基本框架](./capter3/ReadMe.md)\n4. [CUDA 程序的错误检测](./capter4/ReadMe.md)\n5. [GPU 加速的关键](./capter5/ReadMe.md)\n6. [CUDA 内存组织](./capter6/ReadMe.md)\n7. [全局内存的合理使用](./capter7/ReadMe.md)\n8. [共享内存的合理使用](./capter8/ReadMe.md)\n9. [原子函数的合理使用](./capter9/ReadMe.md)\n10. [线程束基本函数与协作组](./capter10/ReadMe.md)\n11. [CUDA 流](./capter11/ReadMe.md)\n12. [使用同一内存编程]()\n13. [分子动力学模型](./capter13/ReadMe.md)\n14. [CUDA 标准库](./capter14/ReadMe.md)\n\n\n## CUDA 官方文档\n\n[CUDA c++编程指南](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html)  \n[CUDA c++最佳实践指南](https://docs.nvidia.com/cuda/cuda-c-best-practices-guide/index.html)  \n[CUDA 运行时API手册](https://docs.nvidia.com/cuda/cuda-runtime-api/index.html)  \n[CUDA 数学函数库API手册](https://docs.nvidia.com/cuda/cuda-math-api/index.html)  \n\n\n## CUDA 编程案例\n\n[CUDA Samples](https://github.com/NVIDIA/cuda-samples)\n+ Simple Reference\n基础CUDA示例，适用于初学者， 反映了运用CUDA和CUDA runtime APIs的一些基本概念.\n+ Utilities Reference\n演示如何查询设备能力和衡量GPU/CPU 带宽的实例程序。\n+ Graphics Reference\n图形化示例展现的是 CUDA, OpenGL, DirectX 之间的互通性。\n+ Imaging Reference\n图像处理，压缩，和数据分析。\n+ Finance Reference\n金融计算的并行处理。\n+ Simulations Reference\n展现一些运用CUDA的模拟算法。\n+ Advanced Reference\n用CUDA实现的一些先进的算法。\n+ Cudalibraries Reference\n这类示例主要告诉我们该如何使用CUDA各种函数库(NPP, CUBLAS, CUFFT,CUSPARSE, and CURAND)。\n\n## CUDA 性能测试\n\n[CUDA Bechmarks](https://github.com/ekondis/mixbench)\n\n+ Four types of experiments are executed combined with global memory accesses:\nSingle precision Flops (multiply-additions)\nDouble precision Flops (multiply-additions)\nHalf precision Flops (multiply-additions)\nInteger multiply-addition operations\n\n+ Building is based now on CMake files. Each implementation resides in a separate folder:\nCUDA implementation: mixbench-cuda\nOpenCL implementation: mixbench-opencl\nHIP implementation: mixbench-hip\nSYCL implementation: mixbench-sycl\n\n生成的测试结果类似：\n```\nmixbench/read-only (v0.03-2-gbccfd71)\n------------------------ Device specifications ------------------------\nDevice:              GeForce RTX 2070\nCUDA driver version: 10.20\nGPU clock rate:      1620 MHz\nMemory clock rate:   3500 MHz\nMemory bus width:    256 bits\nWarpSize:            32\nL2 cache size:       4096 KB\nTotal global mem:    7979 MB\nECC enabled:         No\nCompute Capability:  7.5\nTotal SPs:           2304 (36 MPs x 64 SPs/MP)\nCompute throughput:  7464.96 GFlops (theoretical single precision FMAs)\nMemory bandwidth:    448.06 GB/sec\n-----------------------------------------------------------------------\nTotal GPU memory 8366784512, free 7941521408\nBuffer size:          256MB\nTrade-off type:       compute with global memory (block strided)\nElements per thread:  8\nThread fusion degree: 4\n----------------------------------------------------------------------------- CSV data -----------------------------------------------------------------------------\nExperiment ID, Single Precision ops,,,,              Double precision ops,,,,              Half precision ops,,,,                Integer operations,,, \nCompute iters, Flops/byte, ex.time,  GFLOPS, GB/sec, Flops/byte, ex.time,  GFLOPS, GB/sec, Flops/byte, ex.time,  GFLOPS, GB/sec, Iops/byte, ex.time,   GIOPS, GB/sec\n            0,      0.250,    0.32,  104.42, 417.68,      0.125,    0.63,   53.04, 424.35,      0.500,    0.32,  211.41, 422.81,     0.250,    0.32,  105.58, 422.30\n            1,      0.750,    0.32,  316.34, 421.79,      0.375,    0.63,  158.69, 423.18,      1.500,    0.32,  634.22, 422.81,     0.750,    0.32,  317.30, 423.07\n            2,      1.250,    0.32,  528.46, 422.77,      0.625,    0.78,  215.91, 345.45,      2.500,    0.32, 1055.97, 422.39,     1.250,    0.32,  528.57, 422.86\n            3,      1.750,    0.32,  738.81, 422.17,      0.875,    1.08,  218.17, 249.34,      3.500,    0.32, 1478.95, 422.56,     1.750,    0.32,  740.59, 423.20\n            4,      2.250,    0.32,  951.33, 422.81,      1.125,    1.38,  219.57, 195.17,      4.500,    0.32, 1902.66, 422.81,     2.250,    0.32,  950.66, 422.51\n            5,      2.750,    0.32, 1162.74, 422.81,      1.375,    1.67,  220.38, 160.28,      5.500,    0.32, 2328.52, 423.37,     2.750,    0.32, 1162.74, 422.81\n            6,      3.250,    0.32, 1374.56, 422.94,      1.625,    1.97,  220.99, 135.99,      6.500,    0.32, 2756.62, 424.10,     3.250,    0.32, 1375.81, 423.32\n            7,      3.750,    0.32, 1592.45, 424.65,      1.875,    2.27,  221.38, 118.07,      7.500,    0.32, 3169.50, 422.60,     3.750,    0.32, 1585.55, 422.81\n            8,      4.250,    0.32, 1796.95, 422.81,      2.125,    2.57,  221.71, 104.33,      8.500,    0.32, 3587.76, 422.09,     4.250,    0.37, 1545.63, 363.68\n            9,      4.750,    0.32, 2006.34, 422.39,      2.375,    2.87,  221.85,  93.41,      9.500,    0.32, 3995.38, 420.57,     4.750,    0.32, 1998.29, 420.69\n           10,      5.250,    0.32, 2209.52, 420.86,      2.625,    3.17,  222.02,  84.58,     10.500,    0.32, 4439.54, 422.81,     5.250,    0.32, 2220.44, 422.94\n           11,      5.750,    0.32, 2434.12, 423.32,      2.875,    3.47,  222.17,  77.28,     11.500,    0.32, 4855.01, 422.17,     5.750,    0.32, 2426.77, 422.05\n           12,      6.250,    0.32, 2638.06, 422.09,      3.125,    3.78,  222.18,  71.10,     12.500,    0.32, 5227.20, 418.18,     6.250,    0.38, 2202.15, 352.34\n           13,      6.750,    0.32, 2841.95, 421.03,      3.375,    4.08,  222.30,  65.87,     13.500,    0.32, 5712.58, 423.15,     6.750,    0.32, 2850.54, 422.30\n           14,      7.250,    0.32, 3065.39, 422.81,      3.625,    4.37,  222.45,  61.36,     14.500,    0.32, 6135.74, 423.15,     7.250,    0.32, 3065.08, 422.77\n           15,      7.750,    0.33, 3143.40, 405.60,      3.875,    4.67,  222.57,  57.44,     15.500,    0.32, 6546.34, 422.34,     7.750,    0.32, 3268.89, 421.79\n           16,      8.250,    0.32, 3482.59, 422.13,      4.125,    4.98,  222.57,  53.96,     16.500,    0.32, 6957.48, 421.67,     8.250,    0.39, 2803.68, 339.84\n           17,      8.750,    0.32, 3693.66, 422.13,      4.375,    5.28,  222.53,  50.86,     17.500,    0.32, 7396.24, 422.64,     8.750,    0.32, 3694.77, 422.26\n           18,      9.250,    0.32, 3901.58, 421.79,      4.625,    5.58,  222.58,  48.12,     18.500,    0.32, 7786.72, 420.90,     9.250,    0.32, 3897.66, 421.37\n           20,     10.250,    0.32, 4312.53, 420.73,      5.125,    6.18,  222.66,  43.45,     20.500,    0.32, 8640.66, 421.50,    10.250,    0.41, 3374.54, 329.22\n           22,     11.250,    0.32, 4729.94, 420.44,      5.625,    6.78,  222.74,  39.60,     22.500,    0.32, 9452.31, 420.10,    11.250,    0.32, 4734.21, 420.82\n           24,     12.250,    0.32, 5148.83, 420.31,      6.125,    7.36,  223.51,  36.49,     24.500,    0.32,10346.40, 422.30,    12.250,    0.42, 3900.12, 318.38\n           28,     14.250,    0.32, 6009.94, 421.75,      7.125,    8.53,  224.23,  31.47,     28.500,    0.32,11975.32, 420.19,    14.250,    0.44, 4368.11, 306.53\n           32,     16.250,    0.32, 6795.36, 418.18,      8.125,    9.72,  224.31,  27.61,     32.500,    0.32,13605.64, 418.64,    16.250,    0.45, 4797.12, 295.21\n           40,     20.250,    0.34, 7899.43, 390.10,     10.125,   12.11,  224.50,  22.17,     40.500,    0.33,16371.37, 404.23,    20.250,    0.50, 5464.85, 269.87\n           48,     24.250,    0.41, 8029.04, 331.09,     12.125,   14.49,  224.58,  18.52,     48.500,    0.40,16468.89, 339.56,    24.250,    0.54, 5986.22, 246.85\n           56,     28.250,    0.47, 8114.58, 287.24,     14.125,   16.88,  224.65,  15.90,     56.500,    0.46,16443.12, 291.03,    28.250,    0.60, 6342.42, 224.51\n           64,     32.250,    0.53, 8154.47, 252.85,     16.125,   19.26,  224.72,  13.94,     64.500,    0.52,16536.22, 256.38,    32.250,    0.66, 6591.93, 204.40\n           80,     40.250,    0.66, 8242.80, 204.79,     20.125,   24.03,  224.79,  11.17,     80.500,    0.65,16644.88, 206.77,    40.250,    0.78, 6909.54, 171.67\n           96,     48.250,    0.78, 8321.35, 172.46,     24.125,   28.80,  224.85,   9.32,     96.500,    0.78,16685.23, 172.90,    48.250,    0.91, 7108.62, 147.33\n          128,     64.250,    1.03, 8337.22, 129.76,     32.125,   38.34,  224.91,   7.00,    128.500,    1.03,16775.65, 130.55,    64.250,    1.18, 7295.18, 113.54\n          192,     96.250,    1.54, 8414.49,  87.42,     48.125,   57.42,  224.97,   4.67,    192.500,    1.53,16847.93,  87.52,    96.250,    1.74, 7431.64,  77.21\n          256,    128.250,    2.06, 8362.01,  65.20,     64.125,   76.50,  225.02,   3.51,    256.500,    2.06,16693.65,  65.08,   128.250,    2.30, 7477.75,  58.31\n--------------------------------------------------------------------------------------------------------------------------------------------------------------------\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FQINZHAOYU%2FCudaSteps","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FQINZHAOYU%2FCudaSteps","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FQINZHAOYU%2FCudaSteps/lists"}