CUDA
CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.
- GitHub: https://github.com/topics/cuda
- Wikipedia: https://en.wikipedia.org/wiki/CUDA
- Created by: Nvidia
- Released: June 23, 2007
- Related Topics: nvcc,
- Last updated: 2026-06-23 00:07:15 UTC
- JSON Representation
https://github.com/shashshukla/ee-210-signals-and-systems
Code for the assignments for EE-210, Signals and Systems, at IIT Bombay 2016.
cuda image-processing signal-processing
Last synced: 26 Apr 2026
https://github.com/alexyzha/cuda-bioinformatics
A CUDA-Accelerated Bioinformatics Toolchain
bioinformatics bioinformatics-tool cplusplus cuda
Last synced: 26 Apr 2026
https://github.com/mateuszk098/parallel-programming-examples
Simple parallel programming examples with CUDA, MPI and OpenMP.
cpp cuda mpi openmp parallel-programming
Last synced: 27 Apr 2026
https://github.com/kbredies/tgv_pycuda
Algorithms, examples and tests for denoising, deblurring, zooming, dequantization and compressive imaging with total variation (TV) and second-order total generalized variation (TGV) regularization. GPU-accelerated code using PyCUDA.
compressive-imaging cuda image-deblurring image-denoising image-dequantization image-zooming python3 total-generalized-variation total-variation
Last synced: 27 Apr 2026
https://github.com/notkartikye/cuda-image-box-filters
🖼️ CUDA-powered tool for applying box filters to a large amount of images
cuda cuda-library cuda-programming npp
Last synced: 27 Apr 2026
https://github.com/gladap/heterogeneous_computing_project
Heterogeneous parallel programming exercise using OpenMP and CUDA to parallelize image filters
cuda heterogeneous-parallel-programming
Last synced: 27 Apr 2026
https://github.com/perhuepenbecker/cudyn
CUDA library for irregular tasks using a dynamic block-internal balancing mechanism
cpp cuda cuda-library cuda-programming gpu-computing gpu-programming irregular
Last synced: 28 Apr 2026
https://github.com/ncorgan/arrayfire-config-info
A small command-line utility that outputs all available ArrayFire devices
Last synced: 28 Apr 2026
https://github.com/obsidianplusplus/yolov5-tensorrt-accelerator
基于TensorRT加速的YOLOv5高性能推理框架 | High-performance YOLOv5 inference framework accelerated by TensorRT with dynamic optimization
cuda dynamic-shapes-cuda-stream fp16 int8 pycuda tensorrt yolov5
Last synced: 28 Apr 2026
https://github.com/rog0d/gpuss_watchers
"The GPU Watchers swore upon their shared memory hierarchy, from L1 to global memory, which also served as their mandate as lords of parallel computation."
cuda gpu-acceleration gpu-monitoring gpu-profiling
Last synced: 28 Apr 2026
https://github.com/axeloooo/pytorch
Collection of deep learning workflows in PyTorch, from fundamentals and classification to transfer learning and experiment tracking.
Last synced: 28 Apr 2026
https://github.com/ltsyk/smart-snake-ai
Advanced Deep Q-Network AI for Snake Game with CUDA support and 700% performance boost
artificial-intelligence cuda deep-q-network dqn game-ai machine-learning pytorch reinforcement-learning snake-game
Last synced: 28 Apr 2026
https://github.com/atelierarith/julia_gpu_playground
For those who want use Julia with GPU
cuda docker docker-compose julia
Last synced: 28 Apr 2026
https://github.com/ccfelius/hpc
High Performance Computing (CUDA, MPI/openMP, high performance ML)
cuda high-performance-computing machine-learning mpi
Last synced: 28 Apr 2026
https://github.com/mathiasotnes/gemm
General Matrix Multiplication (GEMM) optimization in Cuda.
Last synced: 26 Mar 2025
https://github.com/emanuelemessina/cuda-benchmark
Evaluate matrix calculations time between CPU and GPU (CUDA)
benchmark cuda matrix-calculations
Last synced: 28 Apr 2026
https://github.com/shermanlo77/modefilter
ImageJ plugin, Java and CuPy implementation of the mode filter and empirical null filter. The mode filter is an edge-preserving smoothing filter by taking the mode of the empirical density.
cuda cupy empirical-null fiji filter image-filter imagej jcuda mode-filter
Last synced: 28 Apr 2026
https://github.com/jalberty2018/run-pytorch-cuda-develop
Compile environment for Pytorch with CUDA
cloud code-server compiler cuda cuda-toolkit docker-image flash-attn jupyterlab python python3 pytorch sage-attention
Last synced: 28 Apr 2026
https://github.com/fedimser/aldyparen
Renders pictures and videos with algebraic fractals
Last synced: 29 Apr 2026
https://github.com/sandialabs/tenzing
Core library for optimizing CUDA+MPI programs as sequential decision problems.
cuda mpi scr-2759 sequential-decision-problem
Last synced: 29 Apr 2026
https://github.com/snandasena/cuda-at-scale-for-the-enterprise
Gauss Filter with CUDA and NPP
Last synced: 29 Apr 2026
https://github.com/apostolis1/parallel-processing-systems
Project of the undergrad course "Parallel Processing Systems" - NTUA
benchmark c cuda mpi openmp parallel-computing
Last synced: 29 Apr 2026
https://github.com/giog97/histogram_equalization_cuda
Performance comparison of sequential and parallel CUDA Histogram Equalization for image contrast enhancement.
cuda cuda-kernels cuda-programming histogram-equalization image-processing parallel-computing parallel-programming
Last synced: 29 Apr 2026
https://github.com/jonastoth/cuda_raytracer
University project to implement a basic Raytracer in CUDA
Last synced: 29 Apr 2026
https://github.com/mcobzarenco/bitonic.cu
CUDA bitonic sort in rust
cuda parallel-computing rust sorting-algorithms
Last synced: 29 Apr 2026
https://github.com/dogrego/gpgpu-rainbow-raytracer
A GPU-accelerated rainbow ray tracer with CPU reference implementation, CUDA for parallelized refraction/reflection, and OpenGL for interactive visualization
Last synced: 29 Apr 2026
https://github.com/fikri-rouzan/cuda-c-program-part-2
CUDA C program from NVIDIA course.
Last synced: 30 Apr 2026
https://github.com/fulvius31/triton-cache-tracker
A lightweight utility for monitoring and analyzing Triton kernel compilation cache behavior.
cache cuda gpu gpu-kernels triton triton-openai
Last synced: 30 Apr 2026
https://github.com/gaurisharan/cuda-ml-kernels
Repo for CUDA C++ GPU kernels for ML and HPC.
cpp cuda gpu hpc kernels ml parallel-computing systems-ml
Last synced: 30 Apr 2026
https://github.com/neel-dandiwala/npp_cudaatscale_project
For the enterprise course project, I have created a model that executes the histogram equalisation procedure on the given input image file.
Last synced: 30 Apr 2026
https://github.com/mahshid1378/piper-plus-3
Multilingual neural TTS (6 languages: JA/EN/ZH/ES/FR/PT, code supports SV) — C++, C#, Rust, Go, Python, npm (WASM). VITS + Prosody, streaming, CUDA/CoreML/DirectML. pip install piper-plus | npm install piper-plus | cargo install piper-plus-cli
cross-platform csharp cuda deep-learning dotnet japanese multilingual nuget onnx pytorch rust speech-synthesis streaming text-to-speech tts vits webassembly
Last synced: 08 Jun 2026
https://github.com/actepukc/uv-app-starter-pack
Bootstrap PySide6 GUI apps quickly using uv, with built-in PyTorch/CUDA handling.
astral-uv cross-platform cuda gui pyside6 python pytorch qt6 starter-kit template
Last synced: 30 Apr 2026
https://github.com/ivanbuccella/sf2bio
Deep reinforcement learning for de novo drug design: a ReLeaSe method execution on a Docker Environment
cuda deep-learning deep-reinforcement-learning docker docker-compose machine-learning nvidia-cuda nvidia-docker reinforcement-learning release release-method
Last synced: 01 May 2026
https://github.com/mrtejas/cv-sandbox
A collection of Computer Vision mini-projects tuned for a number of tasks, including face detection, object detection, image segmentation and CLIP. Trained on popular datasets and includes comparative study of the methods. Done as a part of S24 course : Computer Vision at IIIT Hyd
computer-vision cuda ml opencv pytorch yolo
Last synced: 01 May 2026
https://github.com/fikri-rouzan/cuda-c-program-part-3
CUDA C program from NVIDIA course.
Last synced: 01 May 2026
https://github.com/darshanakgr/meanfiltergpu
A gpu implementation of mean filter in CUDA
Last synced: 01 May 2026
https://github.com/hrolive/fundamentals-of-accelerated-computing-with-cuda-python
Explore how to use Numba—the just-in-time, type-specializing Python function compiler—to create and launch CUDA kernels to accelerate Python programs on massively parallel NVIDIA GPUs.
accelerated-computing cuda cuda-programming jit numba nvidia python
Last synced: 01 May 2026
https://github.com/andresvalle/ocr-extraction
Text extraction from images using EasyOCR and parallelization with PyTorch
Last synced: 01 May 2026
https://github.com/marius311/cudadistributedtools.jl
A set of utility tools for multi-GPU + multi-process workflows
Last synced: 01 May 2026
https://github.com/f14-bertolotti/torchess
cuda torch extension for a chess engine
Last synced: 01 May 2026
https://github.com/imanghd/parallelprocessing
CE Algorithms Lab @ SUT
cuda openmp parallel-algorithm parallel-processing systolic
Last synced: 01 May 2026
https://github.com/lionpsiuc/postgraduate
A collection of assignments and projects completed during my M.Sc. in High-Performance Computing at Trinity College Dublin.
Last synced: 01 May 2026
https://github.com/zepedroresende/matrixmultiplication
Matrix Multiplication optimizations on intel and CUDA
c cpp cuda hpc matrix-multiplication omp optimization
Last synced: 01 May 2026
https://github.com/d-krylov/cuda_to_opengl
Simple examples for CUDA OpenGL interoperability
Last synced: 01 May 2026
https://github.com/xueeinstein/udacity-cs344-cuda8
Code for Udacity CS344 (Intro to Parallel Programming) using CUDA 8.0
cuda cuda-8 parallel-computing
Last synced: 02 May 2026
https://github.com/cserajdeep/dnn-iris-pytorch
Deep Neural Network with Batch normalization for tabulat datasets.
batch batch-normalization classification cuda deep-learning dnn iris-dataset
Last synced: 02 May 2026
https://github.com/snandasena/courseera_gpu_specilization_capstone_project
Coursera GPU Specilization Capstone Project
cpp cuda gpu-programming imageprocessing linearalgebra
Last synced: 02 May 2026
https://github.com/waz4/tinycomb
A lightweight C and CUDA library for efficiently calculating combinations with repetition. Jump to any combination much faster than bruteforce methods, leveraging precomputed factorials and `tiny-bignum-c` for big-number support.
c combinations-generator combinations-with-repetition cuda tiny-bignum-c tinycomb
Last synced: 02 May 2026
https://github.com/bjornmelin/edge-ai-engineering
📱 Optimized ML for edge devices. Showcasing efficient model deployment, GPU-CPU memory transfer optimization, and real-world edge AI applications. 🤖
cuda edge-computing embedded-systems gpu-optimization iot mobile-ml model-optimization python tflite
Last synced: 02 May 2026
https://github.com/moshidev/acap
Prácticas de la asignatura Arquitectura y Computación de Altas Prestaciones
cuda homework-assignments mpi pthreads
Last synced: 30 Mar 2025
https://github.com/rajshrestha86/kmeans-clusterize-cuda
Implementation of K-Means algorithm from scratch using CUDA.
Last synced: 18 May 2026
https://github.com/amruthapatil/nyu-cudaconvolution
Implementing convolution operations on an image using CUDA, exploiting different methodologies - basic, tiled, and cuDNN
Last synced: 13 Mar 2025
https://github.com/luchrist69/ascent
📄 Improve your resume with Ascent, a simple web app that provides instant feedback to help you land more interviews, all for free.
agentic-ai ascent cuda dapr dapr-pub-sub datalog differential-equations docker engine kafka mpi odeint openai openai-api rancher-desktop rendering simulation simulation-framework
Last synced: 02 May 2026
https://github.com/prateekshukla1108/thunderkittens-docs
Documentation for ThunderKittens framework
Last synced: 18 Mar 2025
https://github.com/jiriklepl/bits-knn-jpdc2024
Replication package for the paper Towards Optimal GPU-accelerated K-Nearest Neighbors Search
bitonic-sort cuda gpu k-nearest-neighbors knn-search top-k
Last synced: 21 Mar 2025
https://github.com/nourmorsy/convolution-neural-network-cuda
Code for optimization to CNN using CUDA
Last synced: 13 May 2026
https://github.com/brendanm12345/simple_renderer_cs149
Simple CUDA renderer implementation. 19th most efficient out of 150+ submissions
Last synced: 18 May 2026
https://github.com/apws25/accelmoe
This repository is for CUDA kernel re-implementation of CPU-based MoE model.
Last synced: 11 May 2026
https://github.com/daniilvorontsov/fourier-option-pricing
MSc thesis project concerned with option pricing for Levy Jump models. Package includes pricing implementations for European Call and Put options for Carr-Madan, COS and Fourier Time Stepping.
carr-madan cuda fourier-transform monte-carlo option-pricing
Last synced: 11 May 2026
https://github.com/theogravity/dual-rtx-6000-blackwell-gemma-4-31b-it-nvfp4
Optimized vLLM setup for Gemma 4 31B NVFP4 with MTP on dual RTX PRO 6000 Blackwell using vllm and docker: native FP4 Tensor Cores, Multi-Token Prediction (96.5% acceptance rate), and prefix caching. Includes benchmark results and replication scripts.
am5 amd blackwell cuda docker fp4 gemma gemma4 llm-inference multi-token-prediction nvfp4 prefix-caching rtx-6000 speculative-decoding tensor-parallel vllm
Last synced: 11 May 2026
https://github.com/realdougeubanks/unmanic.plugin.encoder_video_hevc_nvenc_gpu
Unmanic plugin: H.265/HEVC encoder using NVIDIA hevc_nvenc with a true end-to-end GPU pipeline. Fork of Josh5/unmanic.plugin.encoder_video_hevc_nvenc that adds -hwaccel_output_format cuda when NVDEC HW decoding is enabled, keeping decoded frames in GPU memory through NVENC. Drop-in replacement with sensible defaults and full settings parity.
cuda ffmpeg hardware-acceleration nvdec nvenc nvidia unmanic unmanic-plugin video-transcoding
Last synced: 12 May 2026
https://github.com/skailasa/msc-thesis
A modular thesis
cuda fast-multipole-method kernel-independent numba python3
Last synced: 12 May 2026
https://github.com/tomaszrewak/csgpathtracer
A constructive solid geometry path tracer.
computer-graphics cuda path-tracing rendering
Last synced: 12 May 2026
https://github.com/thesupercd/rainbow_table_builder
A high performance CUDA-based GPU accelerated Rainbow-Table maker, written in C++ without any external libraries or dependencies needed.
cpp cryptography cuda hash-table hashing parallel-processing rainbow-table sha3 sha3-512 uuid
Last synced: 12 May 2026
https://github.com/vishalanandv/small_scale_parallel_programming
The project describes the design and development of a sparse matrixvector product kernel, implemented using super computer.
Last synced: 12 May 2026
https://github.com/brocbyte/cuball
CUDA-based implementation of "Real-Time Rigid Body Simulation on GPUs" [from GPU Gems 3]
Last synced: 12 May 2026
https://github.com/aspragueumkc/hydra2dgpu
GPU-accelerated 2D shallow water equation solver for QGIS — CUDA finite-volume method with unstructured mesh support
cuda finite-volume-method gis gpu-computing hydraulic-modeling hydrodynamics qgis shallow-water-equations
Last synced: 11 Jun 2026
https://github.com/programmergnome/cuda-codes
Snippet repository for learning parallel GPU programming with CUDA.
c cpp-programming cuda cuda-kernel gpu-programming learning-materials parallel-programming parallelization
Last synced: 13 May 2026
https://github.com/rossbates/rummage
Rummage is a GPU accelerated npub miner for Nostr
Last synced: 13 May 2026
https://github.com/nyxflower/mosaics-cuda-openmp
Simple image mosaic command line too (CUDA-OpenMP-C Implementation)
c cuda gpu-programming mosaic mosaic-images openmp parallel-computing parallel-processing
Last synced: 13 May 2026
https://github.com/gianmariaromano/pmc-translated-notes
The repository contains translated notes for the course "Programmazione di Sistemi Multicore" given by Professor De Sensi for the "Informatica" course at Sapienza Università di Roma.
cuda cuda-programming mpi multicore openmp parallel-computing parallel-programming pthreads
Last synced: 14 May 2026
https://github.com/gcol33/resolve
Neural network framework for species distribution modelling (PyTorch/C++/CUDA)
cpp cuda deep-learning ecology machine-learning neural-network pytorch species-distribution
Last synced: 12 Jun 2026
https://github.com/kaierikniermann/hpc-uzh-notes
These are some notes for the High Performance Computing course taught at UZH
cuda high-performance-computing mpi openacc openmp
Last synced: 13 Jun 2026
https://github.com/g023/cuda_inf
A self-contained CUDA inference engine for LiquidAI/LFM2.5-8B-A1B (hybrid conv + GQA-attention MoE, 8.5B params, 1B active) targeting a single RTX 3060 (12 GB). No Python, no frameworks at runtime: a single .cu engine + a header-only byte-level BPE tokenizer.
3060 ai c cpp cuda fast-inference gpu inference inference-engine large-language-models lfm25 liquidai llm moe nvidia open-source rtx token
Last synced: 15 Jun 2026
https://github.com/p4suta/mojiokoshi
Local audio transcription tool with real-time progress, powered by faster-whisper and CUDA
audio-transcription cuda docker fastapi faster-whisper gpu python self-hosted speech-to-text sveltekit transcription whisper
Last synced: 16 Jun 2026
https://github.com/acuoci/pbe-fixed-pivot-cuda
Fast CUDA implementation of aggregation and breakage terms in Population Balance Equations using the fixed pivot sectional method
aggregation breakage cuda fixed-pivot pbe
Last synced: 18 Jun 2026
https://github.com/farukalamai/jetson-yolo-cpp
Real-time object detection, segmentation and tracking on NVIDIA Jetson using YOLO + TensorRT in C++
cpp cuda jetson object-detection tensorrt yolo26
Last synced: 19 Jun 2026
https://github.com/aeyage/intraday-prices
gpu-accelerated portfolio optimisation
Last synced: 19 Jun 2026
https://github.com/drilonaliu/parallel-image-scaling
cuda gpu image-processing scaling-algorithms
Last synced: 21 Jun 2026
https://github.com/sbstndb/neural_k
A simple Neural Network library using Kokkos enabling CUDA or OpenMP backend
ai cuda kokkos library neural-network openmp
Last synced: 22 Jun 2026
https://github.com/sebsop/kmeans-thesis-segmentation
Real-time hybrid quantum-classical K-means segmentation using C++ and CUDA. Bachelor's Thesis at BBU bridging HPC and Quantum Machine Learning (QML).
cpp cuda hpc imgui kmeans opencv quantum-computing
Last synced: 23 Jun 2026
https://github.com/sebsop/realtime-parallel-kmeans-segmentation
Real-time C++ K-means image segmentation on live video streams, using OpenCV, RCC trees, and 5D features, optimized for consumer hardware with Sequential, Multi-threaded, MPI, and CUDA backends.
cpp cuda k-means-clustering mpi multithreading opencv rcc real-time-stream-processing
Last synced: 23 Jun 2026
https://github.com/cfregly/claude-gpu-perf-tune
31 GPU inference profiling and optimization skills for Claude Code, with a bundled MCP server
agent-skills claude-code cuda gpu inference llm mcp performance
Last synced: 23 Jun 2026
https://github.com/llm-db/understanding-gpu-architecture-implications-on-llm-serving-workloads
Understanding GPU Architecture Implications on LLM Serving Workloads (Master Thesis, ETH Zürich, 2024)
cuda inference pytorch rocm transformer
Last synced: 05 May 2026
https://github.com/yablokolabs/bendkernels
Pure Bend parallel algorithm kernels and GPU-scaling examples
algorithms bend cuda gpu hvm parallel-computing
Last synced: 24 Jun 2026
https://github.com/kobinarth-panchalingam/parallel-and-concurrent-programming
Semester - 7 | CS4533 - Parallel and Concurrent Programming | Labs
c concurrent-programming cuda java openmp pthreads
Last synced: 05 May 2026
https://github.com/zelosleone/audiobook-generator
A GPU-accelerated Python application that converts PDF and TXT documents into high-quality MP4 audio files using WhisperSpeech technology.
ai-audio audiobook cuda gpu-acceleration machine-learning pdf-converter python pytorch speech-synthesis text-processing text-to-speech
Last synced: 05 May 2026
https://github.com/hurbalurba/quick-llama.cpp-server
The framework for posting a more modern cuda image for llama.cpp with cuda13 for just newer cards with RPC support. Started as just learning how to compile llama.cpp custom.
cuda cuda13 devops docker dockerbuild gguf llamacpp llm rpc
Last synced: 05 May 2026