{"id":20759705,"url":"https://github.com/saiccoumar/cuda-programming-exercises","last_synced_at":"2026-05-25T06:05:08.364Z","repository":{"id":262924143,"uuid":"888796458","full_name":"saiccoumar/CUDA-Programming-Exercises","owner":"saiccoumar","description":"Brief collection of GPU exercises (my reimplementation). Comes with relevant resources.","archived":false,"fork":false,"pushed_at":"2024-12-20T22:40:53.000Z","size":757,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-11T16:48:43.504Z","etag":null,"topics":["cuda","cuda-programming","nvcc","nvidia"],"latest_commit_sha":null,"homepage":"","language":"Cuda","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/saiccoumar.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-11-15T02:49:11.000Z","updated_at":"2025-01-23T21:34:11.000Z","dependencies_parsed_at":"2024-11-15T03:27:48.535Z","dependency_job_id":"821897cd-5063-487e-af1a-95e5802386c6","html_url":"https://github.com/saiccoumar/CUDA-Programming-Exercises","commit_stats":null,"previous_names":["saiccoumar/cuda-programming-exercises"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/saiccoumar/CUDA-Programming-Exercises","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/saiccoumar%2FCUDA-Programming-Exercises","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/saiccoumar%2FCUDA-Programming-Exercises/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/saiccoumar%2FCUDA-Programming-Exercises/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/saiccoumar%2FCUDA-Programming-Exercises/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/saiccoumar","download_url":"https://codeload.github.com/saiccoumar/CUDA-Programming-Exercises/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/saiccoumar%2FCUDA-Programming-Exercises/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28021569,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-12-25T02:00:05.988Z","response_time":58,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cuda","cuda-programming","nvcc","nvidia"],"created_at":"2024-11-17T10:07:34.939Z","updated_at":"2025-12-25T06:05:02.237Z","avatar_url":"https://github.com/saiccoumar.png","language":"Cuda","funding_links":[],"categories":[],"sub_categories":[],"readme":"# GPU-Programming-Exercises\nBrief collection of GPU exercises (my reimplementation). Comes with relevant resources.\n\n### Writing CUDA code\nWhile there are a few ways to write CUDA code, the most robust and popular I've seen is writing C++ cude code and compiling it with NVCC. You can write CUDA code with C++ syntax, but must certain CUDA specific functions included in \u003ccuda_runtime.h\u003e such as cudaMemcpy and cudaFree. More resources on how to write CUDA code to efficiently CUDA cores can be found below:\n\nhttps://youtu.be/G-EimI4q-TQ?feature=shared\n\nhttps://youtu.be/kUqkOAU84bA?feature=shared\n\nhttps://youtu.be/xwbD6fL5qC8?feature=shared\n\nhttps://www.nvidia.com/content/cudazone/download/Exercise_instructions.pdf\n\nhttps://github.com/csc-training/CUDA/blob/master/exercises/README.md\n\n### Hardware audits\nTo check your NVIDIA GPU's information, run\n```\nnvidia-smi\n```\nTo check the GPU version, run\n```\nnvcc --version\n```\n\n### Compiling CUDA code\nFirst ensure that CUDA and the nvcc compiler are compiled. Then compile a program with the following command:\n```\nnvcc -o my_program my_program.cu\n```\nand run your program with the following:\n```\n./my_program\n```\n\n### Evaluating CUDA code\nNVIDIA's tools for CUDA code aren't immediately intuitive but provide very detailed information about your code. First, you can record the runtime of a CUDA function without transfer overheads factored within the code and print it out to stdout:\n```\n...\n    cudaEvent_t start, stop;\n    cudaEventCreate(\u0026start);\n    cudaEventCreate(\u0026stop);\n    cudaEventRecord(start);\n\n    // Launch CUDA kernel\n    matrixAdd\u003c\u003c\u003cnumBlocks, threadsPerBlock\u003e\u003e\u003e(d_A, d_B, d_C);\n    cudaDeviceSynchronize();\n\n    // Stop and compute CUDA execution time\n    cudaEventRecord(stop);\n    cudaEventSynchronize(stop);\n\n    float milliseconds = 0;\n    cudaEventElapsedTime(\u0026milliseconds, start, stop);\n    std::cout \u003c\u003c \"CUDA kernel execution time: \" \u003c\u003c milliseconds \u003c\u003c \" ms\" \u003c\u003c std::endl;\n...\n```\n\nAdditionally, there are a few monitoring tools you can use to observe your CUDA performance. For example, you can use NVIDIA nsight systems to look at the execution timeline of your code on the GPU:\n![image](https://github.com/user-attachments/assets/27a44358-34e9-4473-abca-879f3761a34f)\n\nNVIDIA tools change rapidly, so there's no guarantee that this will still be available as a tool in the near future, nor that it will be the best tool available. \n\n### Matrix Multiplication Example:\nMatrix Multiplication is a quintessential parallelized GPU task. I implemented it in matrix_multiplication.cu and compiled it for windows. Here are the performance metrics:\n```\nCPU (Intel i13700k) Execution Time: 471.213 ms\nA5000 Execution Time: 0.351008 ms\n4070ti Execution Time: 0.424288 ms\n2080ti Execution Time: 0.224864 ms\n```\n\nNote that a given GPU can have a margin of 0.3 ms margin of runtime which is why a 2080ti appears to run faster than a 4070ti. The difference between GPU's is less noticeable as most GPUs, even older ones, can handle a simple task like matrix multiplication on small (ish) scales well.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsaiccoumar%2Fcuda-programming-exercises","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsaiccoumar%2Fcuda-programming-exercises","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsaiccoumar%2Fcuda-programming-exercises/lists"}