{"id":27137071,"url":"https://github.com/hmunachi/henry-vjp","last_synced_at":"2025-04-08T03:08:47.831Z","repository":{"id":240809321,"uuid":"803513848","full_name":"HMUNACHI/henry-vjp","owner":"HMUNACHI","description":"From zero to hero CUDA for accelerating maths and machine learning on GPU.","archived":false,"fork":false,"pushed_at":"2025-03-25T22:23:36.000Z","size":412,"stargazers_count":183,"open_issues_count":0,"forks_count":5,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-04-03T11:22:12.936Z","etag":null,"topics":["cuda","cuda-kernels","cuda-programming","machine-learning","maths"],"latest_commit_sha":null,"homepage":"","language":"Cuda","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/HMUNACHI.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-05-20T21:41:00.000Z","updated_at":"2025-03-31T05:25:49.000Z","dependencies_parsed_at":"2024-10-25T17:34:43.333Z","dependency_job_id":"585a1b65-3c68-42e3-8901-3538b14dcdf1","html_url":"https://github.com/HMUNACHI/henry-vjp","commit_stats":{"total_commits":14,"total_committers":1,"mean_commits":14.0,"dds":0.0,"last_synced_commit":"7311001ac5a6ebc39b2a9ac25ee76fc0b936b117"},"previous_names":["hmunachi/cuda-repo","hmunachi/henry-vjp"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HMUNACHI%2Fhenry-vjp","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HMUNACHI%2Fhenry-vjp/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HMUNACHI%2Fhenry-vjp/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HMUNACHI%2Fhenry-vjp/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/HMUNACHI","download_url":"https://codeload.github.com/HMUNACHI/henry-vjp/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247767234,"owners_count":20992547,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cuda","cuda-kernels","cuda-programming","machine-learning","maths"],"created_at":"2025-04-08T03:08:47.267Z","updated_at":"2025-04-08T03:08:47.816Z","avatar_url":"https://github.com/HMUNACHI.png","language":"Cuda","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cp align=\"center\"\u003e\n  \u003cimg src=\"assets/logo.jpg\" alt=\"Alt text\"/\u003e\n\u003c/p\u003e\n\n# From zero to hero CUDA for accelerated maths and machine learning.\n\n![License](https://img.shields.io/github/license/hmunachi/cuda-repo?style=flat-square) [![LinkedIn](https://img.shields.io/badge/-LinkedIn-blue?style=flat-square\u0026logo=linkedin\u0026logoColor=white)](https://www.linkedin.com//company/80434055) [![Twitter](https://img.shields.io/twitter/follow/hmunachii?style=social)](https://twitter.com/hmunachii)\n\nAuthor: [Henry Ndubuaku](https://www.linkedin.com/in/henry-ndubuaku-7b6350b8/) \n\n## CUDA\n\nCUDA (Compute Unified Device Architecture) is a parallel computing platform and programming model developed by NVIDIA. \nIt allows software developers to leverage the immense parallel processing power of NVIDIA GPUs (Graphics Processing Units) \nfor general-purpose computing tasks beyond their traditional role in graphics rendering. \nGPUs are designed with thousands of smaller, more efficient cores optimized for handling multiple tasks simultaneously. \nThis makes them exceptionally well-suited for tasks that can be broken down into many independent operations, \nsuch as scientific simulations, machine learning, video processing, and more.\nCUDA enables substantial speedups compared to traditional CPU-only code for suitable applications. \nGPUs can process vast amounts of data in parallel, accelerating computations that would take much longer on CPUs.\nFor certain types of workloads, GPUs can be more energy-efficient than CPUs, delivering higher performance per watt.\n\n### CUDA Code Structure\n\nHost Code (CPU): This is standard C/C++ code that runs on the CPU. It typically includes:\n- Initialization of CUDA devices and contexts.\n- Allocation of memory on the GPU.\n- Transfer of data from CPU to GPU.\n- Launching CUDA kernels (functions that execute on the GPU).\n- Transfer of results back from GPU to CPU.\n- Deallocation of GPU memory.\n\nDevice Code (GPU): This code, often written using the CUDA C/C++ extension, is specifically designed to run on the GPU. It defines:\n- Kernels: Functions executed in parallel by many GPU threads. Each thread receives a unique thread ID that helps it determine its portion of the work.\n- Thread Hierarchy: GPU threads are organized into blocks and grids, allowing for efficient execution across the GPU's architecture.\n\n\n## Prelimnary Videos\n\n### 1. High-Level Concepts\n[![YouTube Video](https://img.youtube.com/vi/4APkMJdiudU/0.jpg)](https://www.youtube.com/watch?v=4APkMJdiudU)\n\n### 2. Programming Model\n[![YouTube Video](https://img.youtube.com/vi/cKI20rITSvo/0.jpg)](https://www.youtube.com/watch?v=cKI20rITSvo)\n\n### 3. Parallelising a For Loop\n[![YouTube Video](https://img.youtube.com/vi/BSzoEXqP9aU/0.jpg)](https://www.youtube.com/watch?v=BSzoEXqP9aU)\n\n### 4. Indexing Threads within Grids and Blocks\n[![YouTube Video](https://img.youtube.com/vi/cRY5utouJzQ/0.jpg)](https://www.youtube.com/watch?v=cRY5utouJzQ)\n\n### 5. Memory Model\n[![YouTube Video](https://img.youtube.com/vi/OSpy-HoR0ac/0.jpg)](https://www.youtube.com/watch?v=OSpy-HoR0ac)\n\n### 6. Synchronisation\n[![YouTube Video](https://img.youtube.com/vi/PJCISyoGpug/0.jpg)](https://www.youtube.com/watch?v=PJCISyoGpug)\n\n## Usage\n\nYou can compile and run any file using `nvcc \u003cfilename\u003e -o output \u0026\u0026 ./output`, but be sure to have a GPU with the appropriate libraries installed. Starting from step 1, we progressively learn CUDA in the context of Mathematics and Machine Learning. Ideal for Researchers and Applied experts hoping to learn how to scale their algorithms on GPUS.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhmunachi%2Fhenry-vjp","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhmunachi%2Fhenry-vjp","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhmunachi%2Fhenry-vjp/lists"}