{"id":22221015,"url":"https://github.com/seungjaelim/cuda.tutorial","last_synced_at":"2026-02-07T05:02:12.236Z","repository":{"id":259369352,"uuid":"860369304","full_name":"SeungjaeLim/CUDA.tutorial","owner":"SeungjaeLim","description":"References content from the OLCF CUDA Training Series. (https://github.com/olcf/cuda-training-series)","archived":false,"fork":false,"pushed_at":"2024-11-21T06:53:08.000Z","size":86,"stargazers_count":2,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-07-27T16:50:53.856Z","etag":null,"topics":["cuda","gpu-programming","nsight-compute","nsight-systems"],"latest_commit_sha":null,"homepage":"","language":"Cuda","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/SeungjaeLim.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2024-09-20T09:57:51.000Z","updated_at":"2024-11-27T08:12:57.000Z","dependencies_parsed_at":"2024-10-24T22:38:23.841Z","dependency_job_id":"4e845be0-df14-49f6-b149-0cf5af4d2e95","html_url":"https://github.com/SeungjaeLim/CUDA.tutorial","commit_stats":null,"previous_names":["seungjaelim/cuda.tutorial"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/SeungjaeLim/CUDA.tutorial","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SeungjaeLim%2FCUDA.tutorial","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SeungjaeLim%2FCUDA.tutorial/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SeungjaeLim%2FCUDA.tutorial/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SeungjaeLim%2FCUDA.tutorial/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/SeungjaeLim","download_url":"https://codeload.github.com/SeungjaeLim/CUDA.tutorial/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SeungjaeLim%2FCUDA.tutorial/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29186742,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-07T03:35:06.566Z","status":"ssl_error","status_checked_at":"2026-02-07T03:34:57.604Z","response_time":63,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cuda","gpu-programming","nsight-compute","nsight-systems"],"created_at":"2024-12-02T23:11:32.119Z","updated_at":"2026-02-07T05:02:12.214Z","avatar_url":"https://github.com/SeungjaeLim.png","language":"Cuda","readme":"# CASYS Kwonst CUDA Tutorial\nThis repository contains tutorials and labs for learning CUDA programming, including concepts like shared memory, parallelism, and kernel optimization. The repository is structured into different labs with corresponding Makefiles for easy compilation and execution.\n\n## Lab\n- **lab1-intro**: Introduction to basic CUDA programming with \"hello world,\" vector addition, and matrix multiplication examples.\n\n- **lab2-shared_memory**: Optimizing matrix multiplication and 1D stencil computations using shared memory in CUDA.\n\n- **lab3-grid_stride_loop**: Using grid-stride loops for efficient vector addition and better GPU utilization.\n\n- **lab4-matrix_sums**: Calculating row and column sums of a matrix in CUDA and analyzing performance with Nsight Compute.\n\n- **lab5-reductions**: Advanced reduction techniques in CUDA, including atomic reduction, parallel reduction with atomic finish, and warp-shuffle reduction.\n\n- **lab6-managed_memory**: Porting linked lists and arrays to GPU using manual memory management, Unified Memory, and prefetching, with a focus on profiling memory behavior.\n\n- **lab7-concurrency**: Exploring concurrency using CUDA streams for overlapping computation and memory transfers, and distributing tasks across multiple GPUs.\n\n- **lab8-optimizing**: Optimizing CUDA matrix transpose using global memory, shared memory, and mitigating bank conflicts.\n\n- **lab9-cooperative-groups**: Using CUDA cooperative groups for reductions and stream compaction with thread-level and grid-level synchronization.\n\n- **lab10-multi-threading**: Exploring single and multi-GPU configurations using OpenMP and CUDA streams to optimize tasks.\n\n- **lab11-multi-process-service**: Profiling the impact of NVIDIA MPS on kernel execution times to understand GPU resource sharing.\n\n- **lab12-debugging**: Debugging CUDA applications using compute-sanitizer and cuda-gdb.\n\n- **lab13-graphs**: Exploring CUDA Graphs through stream capture and explicit graph creation, and integrating cuBLAS operations for efficient execution.\n\n## How to Start\nTo get started with this CUDA tutorial, follow these steps:\n```\n# Clone the repository\ngit clone https://github.com/SeungjaeLim/CUDA.tutorial.git\n\n# Navigate into the project directory\ncd CUDA.tutorial\n\n# Build and run the Docker container with the tutorial environment\nmake up\n```\n### Docker Setup\nThis project uses Docker to create a containerized environment with CUDA support. You can build and run the container using the provided Makefile commands.\n\n### Makefile Commands\nThe Makefile provides various management commands for setting up and running the project. Here's how to use the Makefile:\n\n```\nmake build            # Build the cu_tutorial project.\nmake preprocess       # Preprocess step.\nmake run              # Boot up Docker container.\nmake up               # Build and run the project.\nmake rm               # Remove Docker container.\nmake stop             # Stop Docker container.\nmake reset            # Stop and remove Docker container.\nmake docker-setup     # Setup Docker permissions for the user.\n```","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fseungjaelim%2Fcuda.tutorial","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fseungjaelim%2Fcuda.tutorial","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fseungjaelim%2Fcuda.tutorial/lists"}