An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with cuda-programming

A curated list of projects in awesome lists tagged with cuda-programming .

https://github.com/rust-gpu/rust-cuda

Ecosystem of libraries and tools for writing and executing fast GPU code fully in Rust.

cuda cuda-kernels cuda-programming gpgpu gpu gpu-programming rust rust-lang

Last synced: 14 May 2025

https://github.com/Rust-GPU/Rust-CUDA

Ecosystem of libraries and tools for writing and executing fast GPU code fully in Rust.

cuda cuda-kernels cuda-programming gpgpu gpu gpu-programming rust rust-lang

Last synced: 27 Mar 2025

https://github.com/xlite-dev/cuda-learn-notes

📚Modern CUDA Learn Notes: 200+ Tensor/CUDA Cores Kernels🎉, HGEMM, FA2 via MMA and CuTe, 98~100% TFLOPS of cuBLAS/FA2.

cuda cuda-kernels cuda-programming cuda-toolkit cudnn cutlass flash-attention flash-mla gemm gemv hgemm

Last synced: 15 Apr 2025

https://github.com/xlite-dev/CUDA-Learn-Notes

📚200+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).

cuda cuda-kernels cuda-programming cuda-toolkit cudnn cutlass flash-attention flash-mla gemm gemv hgemm

Last synced: 26 Mar 2025

https://github.com/DefTruth/CUDA-Learn-Notes

📚200+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).

cuda cuda-kernels cuda-programming cuda-toolkit cudnn cutlass flash-attention flash-mla gemm gemv hgemm

Last synced: 20 Mar 2025

https://github.com/laugh12321/TensorRT-YOLO

🚀 你的YOLO部署神器。TensorRT Plugin、CUDA Kernel、CUDA Graphs三管齐下,享受闪电般的推理速度。| Your YOLO Deployment Powerhouse. With the synergy of TensorRT Plugins, CUDA Kernels, and CUDA Graphs, experience lightning-fast inference speeds.

cuda cuda-graph cuda-kernels cuda-programming detection onnx ppyoloe tensorrt yolov10 yolov3 yolov5 yolov6 yolov7 yolov8 yolov9

Last synced: 18 Mar 2025

https://github.com/laugh12321/tensorrt-yolo

🚀 你的YOLO部署神器。TensorRT Plugin、CUDA Kernel、CUDA Graphs三管齐下,享受闪电般的推理速度。| Your YOLO Deployment Powerhouse. With the synergy of TensorRT Plugins, CUDA Kernels, and CUDA Graphs, experience lightning-fast inference speeds.

cuda cuda-graph cuda-kernels cuda-programming detection onnx ppyoloe tensorrt yolov10 yolov3 yolov5 yolov6 yolov7 yolov8 yolov9

Last synced: 14 May 2025

https://github.com/PaddleJitLab/CUDATutorial

A self-learning tutorail for CUDA High Performance Programing.

cuda-programming deep-learning

Last synced: 14 May 2025

https://github.com/nosferalatu/SimpleGPUHashTable

A simple GPU hash table implemented in CUDA using lock free techniques

cuda cuda-programming data-structures gpu gpu-cuda-programs

Last synced: 06 May 2025

https://github.com/harleyszhang/llm_note

LLM notes, including model inference, transformer model structure, and llm framework code analysis notes

cuda-programming kv-cache llm llm-inference transformer-models triton-kernels vllm

Last synced: 21 Dec 2024

https://github.com/hmunachi/henry-vjp

From zero to hero CUDA for accelerating maths and machine learning on GPU.

cuda cuda-kernels cuda-programming machine-learning maths

Last synced: 08 Apr 2025

https://github.com/HMUNACHI/CUDATutorials

Zero to Hero GPU and CUDA for Maths & ML tutorials with examples.

cuda cuda-kernels cuda-programming machine-learning maths

Last synced: 24 Apr 2025

https://github.com/HMUNACHI/henry-vjp

From zero to hero CUDA for accelerating maths and machine learning on GPU.

cuda cuda-kernels cuda-programming machine-learning maths

Last synced: 05 Apr 2025

https://github.com/HMUNACHI/cuda-tutorials

CUDA tutorials or Maths & ML tutorials with examples, covers multi-gpus, fused attention, winograd convolution, reinforcement learning.

cuda cuda-kernels cuda-programming machine-learning maths

Last synced: 13 May 2025

https://github.com/hmunachi/cuda-repo

From zero to hero CUDA for accelerating maths and machine learning on GPU.

cuda cuda-kernels cuda-programming machine-learning maths

Last synced: 10 Feb 2025

https://github.com/MuGdxy/muda

μ-Cuda, COVER THE LAST MILE OF CUDA. With features: intellisense-friendly, structured launch, automatic cuda graph generation and updating.

cuda cuda-cpp cuda-programming

Last synced: 20 Nov 2024

https://github.com/tgautam03/xgemm

Accelerated General (FP32) Matrix Multiplication from scratch in CUDA

cuda-programming gpu-programming matrix-multiplication sgemm

Last synced: 06 Apr 2025

https://github.com/sunsetquest/cudapad

CudaPAD is a PTX/SASS viewer for NVIDIA Cuda kernels and provides an on-the-fly view of the assembly.

cuda cuda-programming gpu nvidia ptx ptx-utils windows

Last synced: 01 Dec 2024

https://github.com/emptysoal/cuda-image-preprocess

Speed up image preprocess with cuda when handle image or tensorrt inference

cnn cuda cuda-demo cuda-kernels cuda-programming deep-learning image-processing tensorrt

Last synced: 06 Dec 2024

https://github.com/LinhanDai/yolov9-tensorrt

YOLOv9 Tensorrt deployment acceleration,provide two implementation methods: C++and Python🔥🔥🔥

cpp cuda-programming python tensorrt yolov9

Last synced: 18 Mar 2025

https://github.com/coderonion/cuda-beginner-course-cpp-version

bilibili视频【CUDA 12.x 并行编程入门(C++版)】配套代码

cpp cublas cuda cuda-programming cudnn gpu gpu-programming nvcc nvidia parallel-programming python rust

Last synced: 15 Jun 2025

https://github.com/ashvardanian/cuda-python-starter-kit

Parallel Computing starter project to build GPU & CPU kernels in CUDA & C++ and call them from Python without a single line of CMake using PyBind11

cmake cuda cuda-programming hip hpc matrix-multiplication openmp parallel-computing parallel-programming pybind pybind11 python starter-kit starter-template tutorial

Last synced: 22 Mar 2025

https://github.com/Lin-Mao/DrGPUM

A memory profiler for NVIDIA GPUs to explore memory inefficiencies in GPU-accelerated applications.

cuda-programming gpu-memory gpu-memory-profiler gpu-profiler memory-management

Last synced: 29 Nov 2024

https://github.com/yichengdwu/flashattention.jl

Julia implementation of the Flash Attention algorithm

cuda-programming deeplearning transfomers

Last synced: 17 Jun 2025

https://github.com/rrze-hpc/md-bench

A performance-oriented prototyping harness for state of the art Molecular Dynamics algorithms

benchmark cuda-programming hpc molecular-dynamics scientific-computing

Last synced: 24 Apr 2025

https://github.com/littlebearsama/xxCu3Dlibrary

cuda 加速3D点云算法库,持续更新(含cudaicp,glfw点云可视化等)

cuda-programming glfw3 pointcloud

Last synced: 18 Mar 2025

https://github.com/minnukota381/cuda-parallel-c-programming

This repository contains various CUDA C programs demonstrating parallel computing techniques using NVIDIA's CUDA platform.

cuda cuda-programming hpc nvcc nvidia

Last synced: 21 Nov 2024

https://github.com/neoblizz/hip_template

🖤 Template for starting HIP/C++ project using CMake with Github Action for CI.

cpp cuda cuda-programming gpgpu gpu hip rocm template-project template-repository

Last synced: 26 Mar 2025

https://github.com/seieric/gst-dsobjectsmosaic

📀NVIDIA DeepStream integrated GStreamer Plugin. It can blur objects with cuda cores on Jetson boards. Fast and smooth since everything is done on NVMM.🏎

cuda-programming deepstream gstreamer gstreamer-plugins jetson-agx-orin jetson-agx-xavier jetson-tx1 jetson-tx2 jetson-xavier jetson-xavier-nx nvidia-jetson nvidia-jetson-nano opencv opencv4

Last synced: 27 Jun 2025

https://github.com/coderonion/cuda-beginner-course-python-version

bilibili视频【CUDA 12.x 并行编程入门(Python版)】配套代码

cpp cublas cuda cuda-programming cudnn cupy gpu gpu-programming nvcc nvidia parallel-programming python rust

Last synced: 15 Jun 2025

https://github.com/wissem01chiha/cuastar

Parallel implementation of the A* trajectory planner algorithm on NVIDIA GPUs for dense point cloud environments

c cpp cpp17 cuda-programming cuda-toolkit motion-planning motion-planning-algorithm navigation nvidia-gpu openmp openmp-parallelization point-cloud point-cloud-processing simd-intrinsics vtk

Last synced: 11 Apr 2025

https://github.com/lawmurray/gpu-gemm

CUDA kernel for matrix-matrix multiplication on Nvidia GPUs, using a Hilbert curve to improve L2 cache utilization.

cplusplus cuda cuda-kernels cuda-programming gpu gpu-computing gpu-programming matrix-multiplication numerical-methods scientific-computing

Last synced: 14 Apr 2025

https://github.com/tensorush/my-dev-containers

:whale: My development environments wrapped into VS Code Dev Containers (15.02.2022).

containers cuda cuda-programming dev-container development docker docker-container jax mamba micromamba python python3 vscode vscode-devcontainer

Last synced: 15 Apr 2025

https://github.com/yashkathe/image-noise-reduction-with-cuda

This project conducts an analysis of image denoising technique - median blur, comparing GPU-accelerated (Numba) and CPU-based (OpenCV) processing speeds.

cuda cuda-programming gpu-programming hardware-speed-analysis image-analysis image-processing numba nvidia nvidia-cuda nvidia-gpu opencv parallel-programming

Last synced: 14 May 2025

https://github.com/hrolive/fundamentals-of-accelerated-computing-with-cuda-c-cpp

Accelerate and optimize existing C/C++ CPU-only applications using the most essential CUDA tools and techniques.

cpp cuda cuda-kernels cuda-programming nsight nvidia profilling

Last synced: 10 Apr 2025

https://github.com/pei-mao/cuda-erodeanddilate

This is a small implementation example of image processing using CUDA technology, demonstrating basic operation methods.

cuda-programming image-processing

Last synced: 22 Feb 2025

https://github.com/l1cacheDell/CUDA_Code

Codes for learning cuda. Implementation of multiple kernels.

cuda cuda-programming

Last synced: 10 Mar 2025

https://github.com/awrsha/cuda-gpus-and-triton-adcanced-review

This repository provides a comprehensive guide to optimizing GPU kernels for performance, with a focus on NVIDIA GPUs. It covers key tools and techniques such as CUDA, PyTorch, and Triton, aimed at improving computational efficiency for deep learning and scientific computing tasks.

cuda-programming gpu-programming jit kernels matmul mojo-language multiprocessing multithreading torchquantum triton

Last synced: 12 Jan 2025

https://github.com/pastekaztekastor/crowd-simulation

Le projet consiste en une simulation de foule sur une grille, avec des versions parallélisées sur carte graphique. L'objectif est de modéliser le mouvement des individus dans un environnement en utilisant des paramètres tels que la dimension de la grille, le nombre d'individus et exporte de résultat de chaque frame dans unfichier bin pour analyse.

c cmake cpp crowdsimulation cuda-programming graphicscard grid-layout ipynb make nvidia-gpu parallelization

Last synced: 02 Mar 2025

https://github.com/cat-gawr/ai-python

Una piccola AI che il suo picco massimo di risposta è stato di 0.02 secondi di risposta | Konata ~ 2025

cpp cuda-programming golang java python3 tex vhdl-modules

Last synced: 16 Jun 2025

https://github.com/babak2/optimizedsum

Optimized Parallel Sum program demonstrating CPU vs GPU performance

cuda cuda-programming gpu-acceleration gpu-computing gpu-parallelism visual-studio

Last synced: 27 Mar 2025

https://github.com/orlandopalmeira/trabalho-cp-2023-2024

Repositório do trabalho prático no âmbito da UC de Computação Paralela (CP) - Mestrado em Engenharia Informática (MEI/MIEI) - Universidade do Minho (UMinho)

computacao-paralela cp cuda cuda-programming mei miei nvidia nvidia-cuda openmp optimization optimization-problem parallelism performance uminho uminho-mei uminho-miei

Last synced: 20 Mar 2025

https://github.com/nrmancuso/big-bang

CUDA and OpenMp NBody simulation based on data from the Milky Way and Andromeda Galaxies

c cuda-kernels cuda-programming nbody-simulation openmp-parallelization parallel-computing space

Last synced: 13 Jun 2025

https://github.com/enriquebdel/clases-cuda-programacion-paralela-en-c-

En este repositorio encontrarás varias lecciones creadas por mí sobre la librería CUDA en C. El programa que utilizo para programar es MobaXterm.

c cuda cuda-programming gnu-linux googlecolab mobaxterm nvidia parallel-programming ubuntu university

Last synced: 21 Mar 2025

https://github.com/satyajitghana/gpu-programming

Contains the contents of GPU Architecture and Programming course done on NPTEL

c cpp cuda cuda-programming gpu-programming nptel nvidia

Last synced: 05 May 2025

https://github.com/alextmjugador/rust-cuda-quickstart

Bring the Rust-CUDA project back to life under modern Linux environments.

cuda cuda-programming cuda-rust cuda-support docker rust

Last synced: 13 Jun 2025

https://github.com/m15kh/cuda_programming

CUDA programming enables parallel computing on NVIDIA GPUs for high-performance tasks like deep learning and scientific computing

cuda cuda-programming gpu nvidia parallel-computing practice-programming

Last synced: 03 Apr 2025

https://github.com/GCaptainNemo/Cuda-Image-Processing

Using CUDA GPU Programming to speed up image processing.

cuda-programming image-processing

Last synced: 20 Mar 2025

https://github.com/jadenmeyer/fourier-fft-project

Documentation of final project for Fourier Analysis

cuda-programming fft heat-equ matlab

Last synced: 22 Mar 2025

https://github.com/inventwithdean/cuda_mlp

Implementation of a simple Multilayer Perceptron in pure CUDA

cuda cuda-programming deep-learning neural-networks

Last synced: 30 Mar 2025

https://github.com/djdhairya/nut-bolt-classification

The "NutBoltClassifier" system represents a significant leap forward in automated fastener classification, harnessing deep learning and computer vision techniques.

aritificial-intelligence cnn cuda-programming deep-learning machine-learning nvidia-gpu rnn tensorflow

Last synced: 07 Jan 2025

https://github.com/tommaso-dognini/polimi_gpu101_courseproject

Polimi Passion In Action GPU101 course project. Implementation in CUDA of BFS algorithm

cpp cuda cuda-programming parallel-computing

Last synced: 17 Feb 2025

https://github.com/kartavyaantani/cuda_image_processing

A CUDA-accelerated image processing project featuring multiple GPU-based filters and enhancement techniques. Implements convolution, edge detection, Non-Local Means (NLM) denoising, K-Nearest Neighbors (KNN), and pixelization. Each operation is optimized using CUDA kernels for real-time performance on large images. The project supports command-line

cuda cuda-kernels cuda-programming cuda-toolkit gpu-programming high-performance-computing image-manipulation image-processing nvidia-cuda nvidia-gpu

Last synced: 19 Apr 2025

https://github.com/sartajbhuvaji/cuda

Deloped CUDA kernel functions to load and train a Convolution Neural Network from scratch.

cuda cuda-programming gpu-programming neural-network nvidia-cuda

Last synced: 30 Mar 2025

https://github.com/gravitytwog/electromagneticfield

Electro-magnetic field simulation made with CUDA

c cuda cuda-kernels cuda-programming

Last synced: 14 Apr 2025

https://github.com/vietdoo/seam-carving-cuda

CUDA Seam Carving: Accelerating Image Resizing with GPU Computing

cc cuda cuda-programming gpu-computing parrallel-computing seam-carving

Last synced: 31 Mar 2025

https://github.com/0x778/gaussian_filter_using_cuda

Implemention of gaussain filter using CUDA

cuda cuda-kernels cuda-programming image-processing

Last synced: 09 Apr 2025

https://github.com/gopikrsmscs/matrix-mul-pytorch-cuda-cpu-analysis

Compare the performances of the matrix multiplication on CPU and GPU with PyTorch cuda programming.

cuda-programming matrix-multiplication python3 pytorch

Last synced: 25 Mar 2025

https://github.com/bardifarsi/threadpoolmanager

ThreadPoolManager is a C++ project that implements an efficient multi-threading system using a thread pool for generic functions of the same type and different tasks. It includes task management, synchronization mechanisms, and thread-safe logging to demonstrate concurrent task execution.

cpp cpp17 cpp20 cuda cuda-programming memory-management multiprocessing multithreading parallel-computing parallel-processing parallel-programming thread thread-pool thread-safety threadpool threads threadsafe

Last synced: 19 Feb 2025

https://github.com/versi379/optimized-matrix-multiplication

This project utilizes CUDA and cuBLAS to optimize matrix multiplication, achieving up to a 5x speedup on large matrices by leveraging GPU acceleration. It also improves memory efficiency and reduces data transfer times between CPU and GPU.

cublas cuda cuda-programming hpc matrix-multiplication parallel-computing parallel-programming

Last synced: 14 Mar 2025

https://github.com/chibby0ne/cuda_by_example

Old notes (and new ones) of the Cuda by Example book

cuda cuda-programming gpgpu gpu-computing gpu-programming

Last synced: 20 Feb 2025

https://github.com/isquicha/cuda-parallel-studies

Learning CUDA programming here =D

cuda cuda-programming cuda-toolkit

Last synced: 16 Mar 2025

https://github.com/thesupercd/cuda_sort

A simple project implementing and measuring the runtime performance metrics related to massively parallel algorithms (radix sort) on an NVIDIA GPU device.

benchmarking c cpp cuda cuda-programming gpu-acceleration gpu-programming multithreading parallel-processing radix-sort sorting-algorithms

Last synced: 30 Mar 2025

https://github.com/aarid/cuda_operations

This project compares performance between CPU and GPU with CUDA operations. Two simples cases are used: matrix multiplication and 2d convolution.

conv2d cuda cuda-programming gpu gpu-computing matrix-multiplication

Last synced: 20 Feb 2025

https://github.com/saiccoumar/cuda-programming-exercises

Brief collection of GPU exercises (my reimplementation). Comes with relevant resources.

cuda cuda-programming nvcc nvidia

Last synced: 11 Mar 2025

https://github.com/chandkund/pytorch

Foundational introduction to PyTorch, focusing on the basics of tensors, their creation, manipulation, and operations, which are essential for understanding and building deep learning models

classification computer-vision cuda-programming deep-learning loss-functions matplotlib numpy optimization pandas pyhton pytroch workflow

Last synced: 12 Mar 2025

https://github.com/dasbd72/nthu-ipc-2022

National Tsing Hua University - Introduction to Parallel Computing - 2022

cuda cuda-programming hpc mpi openmp pthreads

Last synced: 30 Mar 2025

https://github.com/0xhilsa/vector-cuda

vector calculation with GPU acceleration using CUDA

c cpp11 cuda cuda-kernels cuda-programming nvcc

Last synced: 02 Apr 2025

https://github.com/sergeipapina/color2graycuda

color to gray image conversion nvidia CUDA kernel implementation using make or cmake to compile and link

cmake cuda cuda-kernels cuda-programming link makefile nvidia

Last synced: 06 Apr 2025

https://github.com/rssr25/cuda

Following Cuda By Example book.

cpp cuda cuda-programming hpc shaders

Last synced: 10 Apr 2025