{"id":13809616,"url":"https://github.com/rbaygildin/learn-gpgpu","last_synced_at":"2025-05-14T08:33:03.078Z","repository":{"id":167630891,"uuid":"131720912","full_name":"rbaygildin/learn-gpgpu","owner":"rbaygildin","description":"Algorithms implemented in CUDA + resources about GPGPU ","archived":false,"fork":false,"pushed_at":"2022-01-18T09:14:44.000Z","size":231,"stargazers_count":52,"open_issues_count":1,"forks_count":15,"subscribers_count":4,"default_branch":"master","last_synced_at":"2024-08-04T02:06:38.848Z","etag":null,"topics":["cublas","cuda","curand","gpgpu","gpu","gpu-computing","image-processing","nvidia","opencl","parallel-computing","pycuda"],"latest_commit_sha":null,"homepage":"","language":"Cuda","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/rbaygildin.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2018-05-01T14:12:53.000Z","updated_at":"2024-06-04T22:49:39.000Z","dependencies_parsed_at":null,"dependency_job_id":"ee0d4a02-8156-4768-8097-1aafd3f40504","html_url":"https://github.com/rbaygildin/learn-gpgpu","commit_stats":null,"previous_names":["rbaygildin/learn-gpgpu"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rbaygildin%2Flearn-gpgpu","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rbaygildin%2Flearn-gpgpu/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rbaygildin%2Flearn-gpgpu/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rbaygildin%2Flearn-gpgpu/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/rbaygildin","download_url":"https://codeload.github.com/rbaygildin/learn-gpgpu/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":225282505,"owners_count":17449524,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cublas","cuda","curand","gpgpu","gpu","gpu-computing","image-processing","nvidia","opencl","parallel-computing","pycuda"],"created_at":"2024-08-04T02:00:32.600Z","updated_at":"2024-11-19T02:30:59.283Z","avatar_url":"https://github.com/rbaygildin.png","language":"Cuda","funding_links":[],"categories":["Learning Resources"],"sub_categories":[],"readme":"# Awesome GPGPU\nThis is a curated list of of examples of using GPU in general-purpose computings, libraries and papers.\n\n## Examples\n\n### CUDA\n\n#### Linear algebra\n\n* [Vector addition](https://github.com/rbaygildin/awesome-gpgpu/tree/master/vectorAdd) - Simplest fast one-dimensional vectors addition [[CUDA](https://github.com/rbaygildin/awesome-gpgpu/tree/master/vectorAdd)]\n\n* [Sum of elements in an array](https://github.com/rbaygildin/awesome-gpgpu/blob/master/sumArray) - Parallel sum of elements in an array [[CUDA](https://github.com/rbaygildin/awesome-gpgpu/blob/master/sumArray/sum.cu)]\n\n* [cuBlas SAXPY](https://github.com/rbaygildin/awesome-gpgpu/blob/master/saxpy/saxpy.cu) - Implementation of SAXPY with cuBlas [[CUDA](https://github.com/rbaygildin/awesome-gpgpu/blob/master/saxpy/saxpy.cu)]\n\n#### Image processing\n\n* [2D convolution](https://github.com/rbaygildin/awesome-gpgpu/blob/master/convolution) - Naïve implementation of 2D convolution [[CUDA](https://github.com/rbaygildin/awesome-gpgpu/blob/master/convolution/convolve2D.cu)]\n\n* [Median filter](https://github.com/rbaygildin/awesome-gpgpu/tree/master/medianFilter) - Median filter with arbitrary size kernel [[CUDA](https://github.com/rbaygildin/awesome-gpgpu/tree/master/medianFilter)]\n\n* [Sobel edge-detection filter](https://github.com/rbaygildin/awesome-gpgpu/blob/master/sobel/sobel.cu) - Parallel implementation of Sobel Operator which is used in image processing [[CUDA](https://github.com/rbaygildin/awesome-gpgpu/blob/master/sobel/sobel.cu)] \n\n#### Clustering\n\n* [K Means clustering](https://github.com/rbaygildin/awesome-gpgpu/blob/master/kmeans2/cuda_kmeans.cu) - Fast Floyd K Means on GPU. Shared memory and two-step reduction (partial and global) are used to implement finding cluster centers [[CUDA](https://github.com/rbaygildin/awesome-gpgpu/blob/master/kmeans2/cuda_kmeans.cu)]\n\n* [Fuzzy C Means clustering](https://github.com/rbaygildin/awesome-gpgpu/blob/master/fcm/cuda_fcm.cu) - Fuzzy C Means. Shared memory and two-step reduction (partial and global) are used to implement finding cluster centers [[CUDA](https://github.com/rbaygildin/awesome-gpgpu/blob/master/fcm/cuda_fcm.cu)]\n\n#### Simulation\n\n* [Calculating PI with Monte Carlo method](https://github.com/rbaygildin/awesome-gpgpu/blob/master/monteCarloPi) - Find PI with Monte Carlo method [[CPU](https://github.com/rbaygildin/awesome-gpgpu/blob/master/monteCarloPi/cpu) | [CUDA](https://github.com/rbaygildin/awesome-gpgpu/blob/master/monteCarloPi/cuda)]\n\n## Libraries\n\n* [CUDA](https://developer.nvidia.com/cuda-toolkit) is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs).\n\n* [Thrust](https://thrust.github.io/) is a powerful library of parallel algorithms and data structures. Thrust provides a flexible, high-level interface for GPU programming that greatly enhances developer productivity. Using Thrust, C++ developers can write just a few lines of code to perform GPU-accelerated sort, scan, transform, and reduction operations orders of magnitude faster than the latest multi-core CPUs. For example, the thrust::sort algorithm delivers 5x to 100x faster sorting performance than STL and TBB.\n\n* [OpenCL](https://www.khronos.org/opencl/) is the open, royalty-free standard for cross-platform, parallel programming of diverse processors found in personal computers, servers, mobile devices and embedded platforms. OpenCL greatly improves the speed and responsiveness of a wide spectrum of applications in numerous market categories including gaming and entertainment titles, scientific and medical software, professional creative tools, vision processing, and neural network training and inferencing.\n\n* [Boost.Compute](http://boostorg.github.io/compute/) is a GPU/parallel-computing library for C++ based on OpenCL. The core library is a thin C++ wrapper over the OpenCL API and provides access to compute devices, contexts, command queues and memory buffers. On top of the core library is a generic, STL-like interface providing common algorithms (e.g. transform(), accumulate(), sort()) along with common containers (e.g. vector\u003cT\u003e, flat_set\u003cT\u003e). It also features a number of extensions including parallel-computing algorithms (e.g. exclusive_scan(), scatter(), reduce()) and a number of fancy iterators (e.g. transform_iterator\u003c\u003e, permutation_iterator\u003c\u003e, zip_iterator\u003c\u003e).\n\n* [PyCUDA](https://documen.tician.de/pycuda/) lets you access Nvidia‘s CUDA parallel computation API from Python. Several wrappers of the CUDA API already exist–so what’s so special about PyCUDA?\n\n* [PyOpenCL](https://documen.tician.de/pyopencl/) gives you easy, Pythonic access to the OpenCL parallel computation API. \n\n* [OpenACC](https://www.openacc.org/) is a user-driven directive-based performance-portable parallel programming model designed for scientists and engineers interested in porting their codes to a wide-variety of heterogeneous HPC hardware platforms and architectures with significantly less programming effort than required with a low-level model.\n\n* [Hemi](http://harrism.github.io/hemi/) simplifies writing portable CUDA C/C++ code. With Hemi, you can write parallel kernels like you write for loops in line in your CPU code and run them on your GPUю\n\n* [CUDPP](https://github.com/cudpp/cudpp) is the CUDA Data Parallel Primitives Library. CUDPP is a library of data-parallel algorithm primitives such as parallel-prefix-sum (\"scan\"), parallel sort and parallel reduction. Primitives such as these are important building blocks for a wide variety of data-parallel algorithms, including sorting, stream compaction, and building data structures such as trees and summed-area tables.\n\n\n## Other awesome lists and repositories\n\n* [Awesome CUDA by Erkaman](https://github.com/Erkaman/Awesome-CUDA) is a list of useful libraries and resources for CUDA development\n\n* [CUDA Awesome by gmarciani](https://github.com/gmarciani/cudawesome) is a collection of awesome algorithms, implemented in CUDA\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frbaygildin%2Flearn-gpgpu","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frbaygildin%2Flearn-gpgpu","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frbaygildin%2Flearn-gpgpu/lists"}