{"id":28719158,"url":"https://github.com/eunomia-bpf/basic-cuda-tutorial","last_synced_at":"2025-06-15T06:00:22.776Z","repository":{"id":294778107,"uuid":"988039176","full_name":"eunomia-bpf/basic-cuda-tutorial","owner":"eunomia-bpf","description":"A collection of CUDA programming examples to learn GPU programming","archived":false,"fork":false,"pushed_at":"2025-06-05T09:08:40.000Z","size":2120,"stargazers_count":5,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-06-05T10:23:52.120Z","etag":null,"topics":["cuda","tutorial"],"latest_commit_sha":null,"homepage":"https://eunomia.dev/en/others/cuda-tutorial/","language":"Cuda","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/eunomia-bpf.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null},"funding":{"github":["yunwei37","Officeyutong"],"patreon":null,"open_collective":null,"ko_fi":null,"tidelift":null,"community_bridge":null,"liberapay":null,"issuehunt":null,"otechie":null,"lfx_crowdfunding":null,"custom":null}},"created_at":"2025-05-22T01:00:40.000Z","updated_at":"2025-06-05T09:08:41.000Z","dependencies_parsed_at":"2025-06-07T18:30:17.396Z","dependency_job_id":null,"html_url":"https://github.com/eunomia-bpf/basic-cuda-tutorial","commit_stats":null,"previous_names":["eunomia-bpf/basic-cuda-tutorial"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/eunomia-bpf/basic-cuda-tutorial","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eunomia-bpf%2Fbasic-cuda-tutorial","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eunomia-bpf%2Fbasic-cuda-tutorial/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eunomia-bpf%2Fbasic-cuda-tutorial/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eunomia-bpf%2Fbasic-cuda-tutorial/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/eunomia-bpf","download_url":"https://codeload.github.com/eunomia-bpf/basic-cuda-tutorial/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eunomia-bpf%2Fbasic-cuda-tutorial/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":259929946,"owners_count":22933527,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cuda","tutorial"],"created_at":"2025-06-15T06:00:18.618Z","updated_at":"2025-06-15T06:00:22.719Z","avatar_url":"https://github.com/eunomia-bpf.png","language":"Cuda","readme":"# basic-cuda-tutorial\n\nYou can find the code in \u003chttps://github.com/eunomia-bpf/basic-cuda-tutorial\u003e\n\nA collection of CUDA programming examples to learn GPU programming with NVIDIA CUDA.\n\nMake sure to change the gpu architecture `sm_61` to your own gpu architecture in Makefile\n\n## Examples and tutorials\n\n- **01-vector-addition.cu** and [01-vector-addition.md](01-vector-addition.md): Introduction to CUDA programming with a vector addition example\n- **02-ptx-assembly.cu** and [02-ptx-assembly.md](02-ptx-assembly.md): Demonstration of CUDA PTX inline assembly with a vector multiplication example\n- **03-gpu-programming-methods.cu** and [03-gpu-programming-methods.md](03-gpu-programming-methods.md): Comprehensive comparison of GPU programming methods including CUDA, PTX, Thrust, Unified Memory, Shared Memory, CUDA Streams, and Dynamic Parallelism using matrix multiplication\n- **04-gpu-architecture.cu** and [04-gpu-architecture.md](04-gpu-architecture.md): Detailed exploration of GPU organization hierarchy including hardware architecture, thread/block/grid structure, memory hierarchy, and execution model\n- **05-neural-network.cu** and [05-neural-network.md](05-neural-network.md): Implementing a basic neural network forward pass on GPU with CUDA\n- **06-cnn-convolution.cu** and [06-cnn-convolution.md](06-cnn-convolution.md): GPU-accelerated convolution operations for CNN with shared memory optimization\n- **07-attention-mechanism.cu** and [07-attention-mechanism.md](07-attention-mechanism.md): CUDA implementation of attention mechanism for transformer models\n- **08-profiling-tracing.cu** and [08-profiling-tracing.md](08-profiling-tracing.md): Profiling and tracing CUDA applications with CUDA Events, NVTX, and CUPTI for performance optimization\n- **09-gpu-extension.cu** and [09-gpu-extension.md](09-gpu-extension.md): GPU application extension mechanisms for modifying behavior without source code changes, including API interception, memory management, kernel optimization, and error resilience\n- **10-cpu-gpu-profiling-boundaries.cu** and [10-cpu-gpu-profiling-boundaries.md](10-cpu-gpu-profiling-boundaries.md): Advanced GPU kernel instrumentation techniques demonstrating fine-grained internal timing, divergent path analysis, dynamic workload profiling, and adaptive algorithm selection within CUDA kernels\n- **11-fine-grained-gpu-modifications.cu** and [11-fine-grained-gpu-modifications.md](11-fine-grained-gpu-modifications.md): Fine-grained GPU code customizations including data structure layout optimization, warp-level primitives, memory access patterns, kernel fusion, and dynamic execution path selection\n- **12-advanced-gpu-customizations.cu** and [12-advanced-gpu-customizations.md](12-advanced-gpu-customizations.md): Advanced GPU customization techniques including thread divergence mitigation, register usage optimization, mixed precision computation, persistent threads for load balancing, and warp specialization patterns\n- **13-low-latency-gpu-packet-processing.cu** and [13-low-latency-gpu-packet-processing.md](13-low-latency-gpu-packet-processing.md): Techniques for minimizing latency in GPU-based network packet processing, including pinned memory, zero-copy memory, stream pipelining, persistent kernels, and CUDA Graphs for real-time network applications\n\nEach tutorial includes comprehensive documentation explaining the concepts, implementation details, and optimization techniques used in ML/AI workloads on GPUs.","funding_links":["https://github.com/sponsors/yunwei37","https://github.com/sponsors/Officeyutong"],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Feunomia-bpf%2Fbasic-cuda-tutorial","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Feunomia-bpf%2Fbasic-cuda-tutorial","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Feunomia-bpf%2Fbasic-cuda-tutorial/lists"}