Projects in Awesome Lists tagged with cuda-programming
A curated list of projects in awesome lists tagged with cuda-programming .
https://github.com/taskflow/taskflow
A General-purpose Task-parallel Programming System using Modern C++
concurrent-programming cuda-programming gpu-programming heterogeneous-parallel-programming high-performance-computing multi-threading multicore-programming multithreading parallel parallel-computing parallel-programming taskflow taskparallelism threadpool work-stealing
Last synced: 14 May 2025
https://github.com/rust-gpu/rust-cuda
Ecosystem of libraries and tools for writing and executing fast GPU code fully in Rust.
cuda cuda-kernels cuda-programming gpgpu gpu gpu-programming rust rust-lang
Last synced: 14 May 2025
https://github.com/Rust-GPU/Rust-CUDA
Ecosystem of libraries and tools for writing and executing fast GPU code fully in Rust.
cuda cuda-kernels cuda-programming gpgpu gpu gpu-programming rust rust-lang
Last synced: 27 Mar 2025
https://github.com/xlite-dev/cuda-learn-notes
📚Modern CUDA Learn Notes: 200+ Tensor/CUDA Cores Kernels🎉, HGEMM, FA2 via MMA and CuTe, 98~100% TFLOPS of cuBLAS/FA2.
cuda cuda-kernels cuda-programming cuda-toolkit cudnn cutlass flash-attention flash-mla gemm gemv hgemm
Last synced: 15 Apr 2025
https://github.com/xlite-dev/CUDA-Learn-Notes
📚200+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).
cuda cuda-kernels cuda-programming cuda-toolkit cudnn cutlass flash-attention flash-mla gemm gemv hgemm
Last synced: 26 Mar 2025
https://github.com/DefTruth/CUDA-Learn-Notes
📚200+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).
cuda cuda-kernels cuda-programming cuda-toolkit cudnn cutlass flash-attention flash-mla gemm gemv hgemm
Last synced: 20 Mar 2025
https://github.com/nvidia/cccl
CUDA Core Compute Libraries
accelerated-computing cpp cpp-programming cuda cuda-cpp cuda-kernels cuda-library cuda-programming gpu gpu-acceleration gpu-computing gpu-programming hpc modern-cpp nvidia nvidia-gpu parallel-algorithm parallel-computing parallel-programming
Last synced: 13 May 2025
https://github.com/NVIDIA/cccl
CUDA Core Compute Libraries
accelerated-computing cpp cpp-programming cuda cuda-cpp cuda-kernels cuda-library cuda-programming gpu gpu-acceleration gpu-computing gpu-programming hpc modern-cpp nvidia nvidia-gpu parallel-algorithm parallel-computing parallel-programming
Last synced: 14 May 2025
https://github.com/brucefan1983/CUDA-Programming
Sample codes for my CUDA programming book
cuda-programming gpu-programming molecular-dynamics-simulation
Last synced: 14 May 2025
https://github.com/mit-han-lab/tinychatengine
TinyChatEngine: On-Device LLM Inference Library
arm c cpp cuda-programming deep-learning edge-computing large-language-models on-device-ai quantization x86-64
Last synced: 13 May 2025
https://github.com/coreylowman/cudarc
Safe rust wrapper around CUDA toolkit
cublas cuda cuda-kernels cuda-programming cuda-toolkit cudnn curand gpu gpu-acceleration nccl nvrtc rust
Last synced: 14 May 2025
https://github.com/eyalroz/cuda-api-wrappers
Thin, unified, C++-flavored wrappers for the CUDA APIs
api-wrapper cuda cuda-api-wrappers cuda-device cuda-driver cuda-driver-api cuda-programming cuda-runtime-api cuda-toolkit gpgpu gpgpu-computing gpu gpu-computing gpu-memory modern-cpp
Last synced: 21 Apr 2025
https://github.com/sail-sg/Adan
Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models
adan artificial-intelligence bert-model convnext cuda-programming deep-learning diffusion dreamfusion fairseq gpt2 llm-training llms mae moe optimizer pytorch resnet timm transformer-xl vit
Last synced: 05 Apr 2025
https://github.com/mit-han-lab/TinyChatEngine
TinyChatEngine: On-Device LLM Inference Library
arm c cpp cuda-programming deep-learning edge-computing large-language-models on-device-ai quantization x86-64
Last synced: 07 May 2025
https://github.com/laugh12321/TensorRT-YOLO
🚀 你的YOLO部署神器。TensorRT Plugin、CUDA Kernel、CUDA Graphs三管齐下,享受闪电般的推理速度。| Your YOLO Deployment Powerhouse. With the synergy of TensorRT Plugins, CUDA Kernels, and CUDA Graphs, experience lightning-fast inference speeds.
cuda cuda-graph cuda-kernels cuda-programming detection onnx ppyoloe tensorrt yolov10 yolov3 yolov5 yolov6 yolov7 yolov8 yolov9
Last synced: 18 Mar 2025
https://github.com/laugh12321/tensorrt-yolo
🚀 你的YOLO部署神器。TensorRT Plugin、CUDA Kernel、CUDA Graphs三管齐下,享受闪电般的推理速度。| Your YOLO Deployment Powerhouse. With the synergy of TensorRT Plugins, CUDA Kernels, and CUDA Graphs, experience lightning-fast inference speeds.
cuda cuda-graph cuda-kernels cuda-programming detection onnx ppyoloe tensorrt yolov10 yolov3 yolov5 yolov6 yolov7 yolov8 yolov9
Last synced: 14 May 2025
https://github.com/PaddleJitLab/CUDATutorial
A self-learning tutorail for CUDA High Performance Programing.
cuda-programming deep-learning
Last synced: 14 May 2025
https://github.com/nosferalatu/SimpleGPUHashTable
A simple GPU hash table implemented in CUDA using lock free techniques
cuda cuda-programming data-structures gpu gpu-cuda-programs
Last synced: 06 May 2025
https://github.com/harleyszhang/llm_note
LLM notes, including model inference, transformer model structure, and llm framework code analysis notes
cuda-programming kv-cache llm llm-inference transformer-models triton-kernels vllm
Last synced: 21 Dec 2024
https://github.com/hmunachi/henry-vjp
From zero to hero CUDA for accelerating maths and machine learning on GPU.
cuda cuda-kernels cuda-programming machine-learning maths
Last synced: 08 Apr 2025
https://github.com/HMUNACHI/CUDATutorials
Zero to Hero GPU and CUDA for Maths & ML tutorials with examples.
cuda cuda-kernels cuda-programming machine-learning maths
Last synced: 24 Apr 2025
https://github.com/HMUNACHI/henry-vjp
From zero to hero CUDA for accelerating maths and machine learning on GPU.
cuda cuda-kernels cuda-programming machine-learning maths
Last synced: 05 Apr 2025
https://github.com/HMUNACHI/cuda-tutorials
CUDA tutorials or Maths & ML tutorials with examples, covers multi-gpus, fused attention, winograd convolution, reinforcement learning.
cuda cuda-kernels cuda-programming machine-learning maths
Last synced: 13 May 2025
https://github.com/hmunachi/cuda-repo
From zero to hero CUDA for accelerating maths and machine learning on GPU.
cuda cuda-kernels cuda-programming machine-learning maths
Last synced: 10 Feb 2025
https://github.com/MuGdxy/muda
μ-Cuda, COVER THE LAST MILE OF CUDA. With features: intellisense-friendly, structured launch, automatic cuda graph generation and updating.
cuda cuda-cpp cuda-programming
Last synced: 20 Nov 2024
https://github.com/rocm/hip-cpu
An implementation of HIP that works on CPUs, across OSes.
cpp17 cuda cuda-programming hip hip-kernel-language hip-portability hip-runtime parallel-algorithms spmd stl-algorithms
Last synced: 12 Apr 2025
https://github.com/tgautam03/xgemm
Accelerated General (FP32) Matrix Multiplication from scratch in CUDA
cuda-programming gpu-programming matrix-multiplication sgemm
Last synced: 06 Apr 2025
https://github.com/sunsetquest/cudapad
CudaPAD is a PTX/SASS viewer for NVIDIA Cuda kernels and provides an on-the-fly view of the assembly.
cuda cuda-programming gpu nvidia ptx ptx-utils windows
Last synced: 01 Dec 2024
https://github.com/fahimfba/cuda-wsl2-ubuntu
Install CUDA on Windows11 using WSL2
cuda cuda-programming cuda-support cuda-toolkit cuda-wsl deep-learning deep-reinforcement-learning deeplearning deeplearning-ai machine-learning machinelearning machinelearning-python wsl wsl-environment wsl-ubuntu wsl2
Last synced: 14 Apr 2025
https://github.com/emptysoal/cuda-image-preprocess
Speed up image preprocess with cuda when handle image or tensorrt inference
cnn cuda cuda-demo cuda-kernels cuda-programming deep-learning image-processing tensorrt
Last synced: 06 Dec 2024
https://github.com/huangcongqing/cuda-learning
cuda编程学习入门
cuda cuda-kernels cuda-programming
Last synced: 15 Apr 2025
https://github.com/LinhanDai/yolov9-tensorrt
YOLOv9 Tensorrt deployment acceleration,provide two implementation methods: C++and Python🔥🔥🔥
cpp cuda-programming python tensorrt yolov9
Last synced: 18 Mar 2025
https://github.com/coderonion/cuda-beginner-course-cpp-version
bilibili视频【CUDA 12.x 并行编程入门(C++版)】配套代码
cpp cublas cuda cuda-programming cudnn gpu gpu-programming nvcc nvidia parallel-programming python rust
Last synced: 15 Jun 2025
https://github.com/koushikphy/intro-to-cuda-fortran
A Complete beginner's introduction to programming with CUDA Fortran
cuda cuda-fortran cuda-kernels cuda-programming fortran fortran90 gpgpu gpu gpu-computing high-performance-computing hpc nvidia nvidia-cuda parallel-computing parallel-programming
Last synced: 13 Feb 2025
https://github.com/ashvardanian/cuda-python-starter-kit
Parallel Computing starter project to build GPU & CPU kernels in CUDA & C++ and call them from Python without a single line of CMake using PyBind11
cmake cuda cuda-programming hip hpc matrix-multiplication openmp parallel-computing parallel-programming pybind pybind11 python starter-kit starter-template tutorial
Last synced: 22 Mar 2025
https://github.com/Lin-Mao/DrGPUM
A memory profiler for NVIDIA GPUs to explore memory inefficiencies in GPU-accelerated applications.
cuda-programming gpu-memory gpu-memory-profiler gpu-profiler memory-management
Last synced: 29 Nov 2024
https://github.com/yichengdwu/flashattention.jl
Julia implementation of the Flash Attention algorithm
cuda-programming deeplearning transfomers
Last synced: 17 Jun 2025
https://github.com/ahmetfurkandemir/nvidia-gpu-benchmark
NVIDIA GPU benchmark
aws c colab-notebook cpp cuda cuda-programming gpu gpu-computing gpu-programming linux nvidia nvidia-gpu tesla
Last synced: 15 Apr 2025
https://github.com/imsanjoykb/cuda-bootcamp
CUDA Programming Practices
computer-vision crypto-mining crypto-mining-program cuda cuda-api cuda-development cuda-device cuda-driver cuda-kernels cuda-library cuda-opengl cuda-programming cuda-resource cuda-support cuda-toolkit jetson jetson-inference jetson-xavier nvidia-cuda nvidia-jetson-nano
Last synced: 15 Feb 2025
https://github.com/rrze-hpc/md-bench
A performance-oriented prototyping harness for state of the art Molecular Dynamics algorithms
benchmark cuda-programming hpc molecular-dynamics scientific-computing
Last synced: 24 Apr 2025
https://github.com/tgautam03/tgemm
General Matrix Multiplication using NVIDIA Tensor Cores
cuda-kernels cuda-programming gpu-computing gpu-programming matrix-multiplication nvidia-cuda nvidia-gpu nvidia-tensor-cores sgemm tensor-cores
Last synced: 15 Apr 2025
https://github.com/littlebearsama/xxCu3Dlibrary
cuda 加速3D点云算法库,持续更新(含cudaicp,glfw点云可视化等)
cuda-programming glfw3 pointcloud
Last synced: 18 Mar 2025
https://github.com/minnukota381/cuda-parallel-c-programming
This repository contains various CUDA C programs demonstrating parallel computing techniques using NVIDIA's CUDA platform.
cuda cuda-programming hpc nvcc nvidia
Last synced: 21 Nov 2024
https://github.com/nssharmaofficial/kmeans-in-cuda
K-Means algorithm parallelized in CUDA
cpp cuda cuda-programming high-performance high-performance-computing k-means k-means-algorithm k-means-clustering parallel parallel-computing
Last synced: 27 Apr 2025
https://github.com/neoblizz/hip_template
🖤 Template for starting HIP/C++ project using CMake with Github Action for CI.
cpp cuda cuda-programming gpgpu gpu hip rocm template-project template-repository
Last synced: 26 Mar 2025
https://github.com/seieric/gst-dsobjectsmosaic
📀NVIDIA DeepStream integrated GStreamer Plugin. It can blur objects with cuda cores on Jetson boards. Fast and smooth since everything is done on NVMM.🏎
cuda-programming deepstream gstreamer gstreamer-plugins jetson-agx-orin jetson-agx-xavier jetson-tx1 jetson-tx2 jetson-xavier jetson-xavier-nx nvidia-jetson nvidia-jetson-nano opencv opencv4
Last synced: 27 Jun 2025
https://github.com/coderonion/cuda-beginner-course-rust-version
bilibili视频【CUDA 12.x 并行编程入门(Rust版)】配套代码
candle cpp cublas cuda cuda-programming cudarc cudnn gpu gpu-programming nvcc nvidia parellel-programming python rust
Last synced: 15 Jun 2025
https://github.com/coderonion/cuda-beginner-course-python-version
bilibili视频【CUDA 12.x 并行编程入门(Python版)】配套代码
cpp cublas cuda cuda-programming cudnn cupy gpu gpu-programming nvcc nvidia parallel-programming python rust
Last synced: 15 Jun 2025
https://github.com/evanmcclure/hello_gpu
Hello world example for Rust on GPU
apple apple-silicon cuda cuda-programming example-project gpu gpu-programming gpu-support metal rust rust-lang
Last synced: 12 Apr 2025
https://github.com/wissem01chiha/cuastar
Parallel implementation of the A* trajectory planner algorithm on NVIDIA GPUs for dense point cloud environments
c cpp cpp17 cuda-programming cuda-toolkit motion-planning motion-planning-algorithm navigation nvidia-gpu openmp openmp-parallelization point-cloud point-cloud-processing simd-intrinsics vtk
Last synced: 11 Apr 2025
https://github.com/lawmurray/gpu-gemm
CUDA kernel for matrix-matrix multiplication on Nvidia GPUs, using a Hilbert curve to improve L2 cache utilization.
cplusplus cuda cuda-kernels cuda-programming gpu gpu-computing gpu-programming matrix-multiplication numerical-methods scientific-computing
Last synced: 14 Apr 2025
https://github.com/tensorush/my-dev-containers
:whale: My development environments wrapped into VS Code Dev Containers (15.02.2022).
containers cuda cuda-programming dev-container development docker docker-container jax mamba micromamba python python3 vscode vscode-devcontainer
Last synced: 15 Apr 2025
https://github.com/marcoplaitano/counting-sort-cuda
Parallelized version of Counting Sort using CUDA
counting-sort cuda cuda-kernels cuda-programming gpu gpu-programming sort sorting sorting-algorithms
Last synced: 20 Jun 2025
https://github.com/yashkathe/image-noise-reduction-with-cuda
This project conducts an analysis of image denoising technique - median blur, comparing GPU-accelerated (Numba) and CPU-based (OpenCV) processing speeds.
cuda cuda-programming gpu-programming hardware-speed-analysis image-analysis image-processing numba nvidia nvidia-cuda nvidia-gpu opencv parallel-programming
Last synced: 14 May 2025
https://github.com/hrolive/fundamentals-of-accelerated-computing-with-cuda-c-cpp
Accelerate and optimize existing C/C++ CPU-only applications using the most essential CUDA tools and techniques.
cpp cuda cuda-kernels cuda-programming nsight nvidia profilling
Last synced: 10 Apr 2025
https://github.com/shikha-code36/cuda-programming-beginner-guide
A beginner's guide to CUDA programming
cuda cuda-basic cuda-basics cuda-cpp cuda-demo cuda-kernel cuda-kernels cuda-library cuda-programming cuda-support cuda-toolkit
Last synced: 23 Mar 2025
https://github.com/pei-mao/cuda-erodeanddilate
This is a small implementation example of image processing using CUDA technology, demonstrating basic operation methods.
cuda-programming image-processing
Last synced: 22 Feb 2025
https://github.com/qin-yu/julia-svm-gpu-cuda
2019 [Julia] GPU CUDAnative SVM: a stochastic decomposition implementation of support-vector machine training
cpp cuda cuda-programming gpu gpu-computing gpu-programming julia julia-language julia-package machine-learning machine-learning-algorithms machine-learning-library online-learning supervised-learning svm svm-classifier svm-learning svm-library svm-model svm-training
Last synced: 15 Mar 2025
https://github.com/l1cacheDell/CUDA_Code
Codes for learning cuda. Implementation of multiple kernels.
Last synced: 10 Mar 2025
https://github.com/awrsha/cuda-gpus-and-triton-adcanced-review
This repository provides a comprehensive guide to optimizing GPU kernels for performance, with a focus on NVIDIA GPUs. It covers key tools and techniques such as CUDA, PyTorch, and Triton, aimed at improving computational efficiency for deep learning and scientific computing tasks.
cuda-programming gpu-programming jit kernels matmul mojo-language multiprocessing multithreading torchquantum triton
Last synced: 12 Jan 2025
https://github.com/pastekaztekastor/crowd-simulation
Le projet consiste en une simulation de foule sur une grille, avec des versions parallélisées sur carte graphique. L'objectif est de modéliser le mouvement des individus dans un environnement en utilisant des paramètres tels que la dimension de la grille, le nombre d'individus et exporte de résultat de chaque frame dans unfichier bin pour analyse.
c cmake cpp crowdsimulation cuda-programming graphicscard grid-layout ipynb make nvidia-gpu parallelization
Last synced: 02 Mar 2025
https://github.com/cat-gawr/ai-python
Una piccola AI che il suo picco massimo di risposta è stato di 0.02 secondi di risposta | Konata ~ 2025
cpp cuda-programming golang java python3 tex vhdl-modules
Last synced: 16 Jun 2025
https://github.com/babak2/optimizedsum
Optimized Parallel Sum program demonstrating CPU vs GPU performance
cuda cuda-programming gpu-acceleration gpu-computing gpu-parallelism visual-studio
Last synced: 27 Mar 2025
https://github.com/orlandopalmeira/trabalho-cp-2023-2024
Repositório do trabalho prático no âmbito da UC de Computação Paralela (CP) - Mestrado em Engenharia Informática (MEI/MIEI) - Universidade do Minho (UMinho)
computacao-paralela cp cuda cuda-programming mei miei nvidia nvidia-cuda openmp optimization optimization-problem parallelism performance uminho uminho-mei uminho-miei
Last synced: 20 Mar 2025
https://github.com/jorgedavyd/nsight.nvim
A developer oriented Neovim framework for CUDA performance profiling and analysis.
cuda cuda-kernels cuda-profiler cuda-programming cuda-support cuda-toolkit deep-learning machine-learning neovim neovim-plugin performance-engineering
Last synced: 21 Mar 2025
https://github.com/nrmancuso/big-bang
CUDA and OpenMp NBody simulation based on data from the Milky Way and Andromeda Galaxies
c cuda-kernels cuda-programming nbody-simulation openmp-parallelization parallel-computing space
Last synced: 13 Jun 2025
https://github.com/enriquebdel/clases-cuda-programacion-paralela-en-c-
En este repositorio encontrarás varias lecciones creadas por mí sobre la librería CUDA en C. El programa que utilizo para programar es MobaXterm.
c cuda cuda-programming gnu-linux googlecolab mobaxterm nvidia parallel-programming ubuntu university
Last synced: 21 Mar 2025
https://github.com/satyajitghana/gpu-programming
Contains the contents of GPU Architecture and Programming course done on NPTEL
c cpp cuda cuda-programming gpu-programming nptel nvidia
Last synced: 05 May 2025
https://github.com/alextmjugador/rust-cuda-quickstart
Bring the Rust-CUDA project back to life under modern Linux environments.
cuda cuda-programming cuda-rust cuda-support docker rust
Last synced: 13 Jun 2025
https://github.com/m15kh/cuda_programming
CUDA programming enables parallel computing on NVIDIA GPUs for high-performance tasks like deep learning and scientific computing
cuda cuda-programming gpu nvidia parallel-computing practice-programming
Last synced: 03 Apr 2025
https://github.com/GCaptainNemo/Cuda-Image-Processing
Using CUDA GPU Programming to speed up image processing.
cuda-programming image-processing
Last synced: 20 Mar 2025
https://github.com/headless-start/fashion-mnist-classifier
This repository contains Fashion MNIST Image Classification.
cuda-programming gpu keras mnist-dataset object-detection opencv-python python3 tensorflow tensorflow-models
Last synced: 03 Apr 2025
https://github.com/jadenmeyer/fourier-fft-project
Documentation of final project for Fourier Analysis
cuda-programming fft heat-equ matlab
Last synced: 22 Mar 2025
https://github.com/inventwithdean/cuda_mlp
Implementation of a simple Multilayer Perceptron in pure CUDA
cuda cuda-programming deep-learning neural-networks
Last synced: 30 Mar 2025
https://github.com/djdhairya/nut-bolt-classification
The "NutBoltClassifier" system represents a significant leap forward in automated fastener classification, harnessing deep learning and computer vision techniques.
aritificial-intelligence cnn cuda-programming deep-learning machine-learning nvidia-gpu rnn tensorflow
Last synced: 07 Jan 2025
https://github.com/tommaso-dognini/polimi_gpu101_courseproject
Polimi Passion In Action GPU101 course project. Implementation in CUDA of BFS algorithm
cpp cuda cuda-programming parallel-computing
Last synced: 17 Feb 2025
https://github.com/kartavyaantani/cuda_image_processing
A CUDA-accelerated image processing project featuring multiple GPU-based filters and enhancement techniques. Implements convolution, edge detection, Non-Local Means (NLM) denoising, K-Nearest Neighbors (KNN), and pixelization. Each operation is optimized using CUDA kernels for real-time performance on large images. The project supports command-line
cuda cuda-kernels cuda-programming cuda-toolkit gpu-programming high-performance-computing image-manipulation image-processing nvidia-cuda nvidia-gpu
Last synced: 19 Apr 2025
https://github.com/sartajbhuvaji/cuda
Deloped CUDA kernel functions to load and train a Convolution Neural Network from scratch.
cuda cuda-programming gpu-programming neural-network nvidia-cuda
Last synced: 30 Mar 2025
https://github.com/liberxue/parallel_computing
CUDA Algorithm && Hacker's Delight
algorithms cuda cuda-kernels cuda-programming hacker-s-delight nvidia
Last synced: 20 Feb 2025
https://github.com/dpetrosy/fractal
This project is a Fractal Visualizer developed in C++ with SFML and CUDA.
burning-ship cmake cmakelists cpp cpp-programming cpp-project cuda cuda-opengl cuda-programming fractal fractal-generation fractal-visualization julia mandelbox mandelbrot opengl opengl-project sfml sfml-library tricorn
Last synced: 20 Jun 2025
https://github.com/gravitytwog/electromagneticfield
Electro-magnetic field simulation made with CUDA
c cuda cuda-kernels cuda-programming
Last synced: 14 Apr 2025
https://github.com/giorgiogamba/parallel_programming
Experimenting with parallel programming
cuda cuda-kernels cuda-programming cuda-toolkit parallel parallel-computing parallel-processing parallel-programming visual-studio
Last synced: 20 Feb 2025
https://github.com/vietdoo/seam-carving-cuda
CUDA Seam Carving: Accelerating Image Resizing with GPU Computing
cc cuda cuda-programming gpu-computing parrallel-computing seam-carving
Last synced: 31 Mar 2025
https://github.com/aaditya29/parallel-computing-and-cuda
Learning about Parallel Computing and GPU programming using CUDA.
c cpp cuda cuda-kernels cuda-programming nvidia-cuda openmp openmpi parallel-computing parallel-programming
Last synced: 01 Apr 2025
https://github.com/0x778/gaussian_filter_using_cuda
Implemention of gaussain filter using CUDA
cuda cuda-kernels cuda-programming image-processing
Last synced: 09 Apr 2025
https://github.com/gopikrsmscs/matrix-mul-pytorch-cuda-cpu-analysis
Compare the performances of the matrix multiplication on CPU and GPU with PyTorch cuda programming.
cuda-programming matrix-multiplication python3 pytorch
Last synced: 25 Mar 2025
https://github.com/bardifarsi/threadpoolmanager
ThreadPoolManager is a C++ project that implements an efficient multi-threading system using a thread pool for generic functions of the same type and different tasks. It includes task management, synchronization mechanisms, and thread-safe logging to demonstrate concurrent task execution.
cpp cpp17 cpp20 cuda cuda-programming memory-management multiprocessing multithreading parallel-computing parallel-processing parallel-programming thread thread-pool thread-safety threadpool threads threadsafe
Last synced: 19 Feb 2025
https://github.com/tgautam03/xfilters
GPU accelerated filters for high resolution images.
2d-convolution c cpp cuda cuda-programming gpu-acceleration gpu-computing gpu-programming image-filters image-processing
Last synced: 25 Mar 2025
https://github.com/versi379/optimized-matrix-multiplication
This project utilizes CUDA and cuBLAS to optimize matrix multiplication, achieving up to a 5x speedup on large matrices by leveraging GPU acceleration. It also improves memory efficiency and reduces data transfer times between CPU and GPU.
cublas cuda cuda-programming hpc matrix-multiplication parallel-computing parallel-programming
Last synced: 14 Mar 2025
https://github.com/chibby0ne/cuda_by_example
Old notes (and new ones) of the Cuda by Example book
cuda cuda-programming gpgpu gpu-computing gpu-programming
Last synced: 20 Feb 2025
https://github.com/isquicha/cuda-parallel-studies
Learning CUDA programming here =D
cuda cuda-programming cuda-toolkit
Last synced: 16 Mar 2025
https://github.com/thesupercd/cuda_sort
A simple project implementing and measuring the runtime performance metrics related to massively parallel algorithms (radix sort) on an NVIDIA GPU device.
benchmarking c cpp cuda cuda-programming gpu-acceleration gpu-programming multithreading parallel-processing radix-sort sorting-algorithms
Last synced: 30 Mar 2025
https://github.com/aarid/cuda_operations
This project compares performance between CPU and GPU with CUDA operations. Two simples cases are used: matrix multiplication and 2d convolution.
conv2d cuda cuda-programming gpu gpu-computing matrix-multiplication
Last synced: 20 Feb 2025
https://github.com/saiccoumar/cuda-programming-exercises
Brief collection of GPU exercises (my reimplementation). Comes with relevant resources.
cuda cuda-programming nvcc nvidia
Last synced: 11 Mar 2025
https://github.com/chandkund/pytorch
Foundational introduction to PyTorch, focusing on the basics of tensors, their creation, manipulation, and operations, which are essential for understanding and building deep learning models
classification computer-vision cuda-programming deep-learning loss-functions matplotlib numpy optimization pandas pyhton pytroch workflow
Last synced: 12 Mar 2025
https://github.com/dasbd72/nthu-ipc-2022
National Tsing Hua University - Introduction to Parallel Computing - 2022
cuda cuda-programming hpc mpi openmp pthreads
Last synced: 30 Mar 2025
https://github.com/0xhilsa/vector-cuda
vector calculation with GPU acceleration using CUDA
c cpp11 cuda cuda-kernels cuda-programming nvcc
Last synced: 02 Apr 2025
https://github.com/sergeipapina/color2graycuda
color to gray image conversion nvidia CUDA kernel implementation using make or cmake to compile and link
cmake cuda cuda-kernels cuda-programming link makefile nvidia
Last synced: 06 Apr 2025
https://github.com/rssr25/cuda
Following Cuda By Example book.
cpp cuda cuda-programming hpc shaders
Last synced: 10 Apr 2025