CUDA
CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.
- GitHub: https://github.com/topics/cuda
- Wikipedia: https://en.wikipedia.org/wiki/CUDA
- Created by: Nvidia
- Released: June 23, 2007
- Related Topics: nvcc,
- Last updated: 2026-06-22 00:07:17 UTC
- JSON Representation
https://github.com/mrkct/cuda-raytracer
Simple CUDA-Accelerated raytracer
cuda gpu raytracing raytracing-one-weekend
Last synced: 21 Apr 2026
https://github.com/rai-project/dlperf
Déjà vu: Modeling DNN Performance by Recalling History
benchmark cuda deep-learning modeling onnx performance tensorflow
Last synced: 21 Apr 2026
https://github.com/musaibbashir/object-detection
Pytorch+CUDA implementation of several image classification and object detection models like YOLO , Fast-CNN, RF-DETR
cnn computer-vision cuda image-classification object-detection pytorch yolo
Last synced: 21 Apr 2026
https://github.com/mcp-tool-shop-org/gpu-container
Model-aware inference memory-placement planner for single-GPU rigs: profile hardware + model, generate explicit VRAM/RAM/NVMe placement plans across runtimes (llama.cpp/vLLM/...), and prove them with a measured receipt. Not VRAM overflow - declared placement.
cuda gpu inference llama-cpp llm moe offload vram wsl2
Last synced: 09 Jun 2026
https://github.com/dimitrijkrstev/pp-cuda-fft
A parallelised CUDA implementation of the FFT Radix-2 algorithm and its execution time comparison to the DFT and non-parallelised Radix-2
Last synced: 22 Apr 2026
https://github.com/mdnpascual/judgebarmashvp
Error bar for the game called Mash VP
cuda emgucv screencapturer tesseract-ocr
Last synced: 22 Apr 2026
https://github.com/bikemazzell/tuonella-sift
A high-performance, memory-efficient CSV deduplication tool
csv cuda deduplication logger osint rust
Last synced: 24 Apr 2026
https://github.com/brainlesslabs/jalebi
C++ String algorithms for maximum performance
c-plus-plus cplusplus cpp cpp-library cpu cuda library parallel performance simd sse string string-matching vectorization
Last synced: 14 May 2026
https://github.com/bardifarsi/threadpoolmanager
ThreadPoolManager is a C++ project that implements an efficient multi-threading system using a thread pool for generic functions of the same type and different tasks. It includes task management, synchronization mechanisms, and thread-safe logging to demonstrate concurrent task execution.
cpp cpp17 cpp20 cuda cuda-programming memory-management multiprocessing multithreading parallel-computing parallel-processing parallel-programming thread thread-pool thread-safety threadpool threads threadsafe
Last synced: 24 Apr 2026
https://github.com/jackrekirby/raytracing-cuda
Raytracing using CUDA
cpp cuda raytracing raytracing-in-one-weekend
Last synced: 24 Apr 2026
https://github.com/juntyr/necsim-rust-analysis
Analysis of the spatially explicit biodiversity simulation `necsim-rust`
analysis biodiversity cuda mpi necsim rust simulation
Last synced: 24 Apr 2026
https://github.com/seanwevans/damnati
A CUDA-accelerated iterated prisoner's dilemma arena
arena cuda iterated-prisoners-dilemma prisoners-dilemma tournament
Last synced: 14 May 2026
https://github.com/0xsooki/extending-jax
JAX Custom Operations with C++ and CUDA (using Pybind11)
Last synced: 25 Apr 2026
https://github.com/illagrenan/cuda-90-cudnn7-runtime-1604-py36
Ubuntu 16.04 with Python 3.6 and CUDA9 Dockerfile
Last synced: 03 May 2026
https://github.com/sangioai/torchpace
PyTorch CUDA/C++ extension of PACE: Transformer non-linearlity accelerator engine.
Last synced: 25 Apr 2026
https://github.com/daviddavo/19gpu
Short exercises for GPU at Complutense University of Madrid. Mirror from GitLab
accelerator cuda gpu-programming
Last synced: 26 Apr 2026
https://github.com/shashshukla/ee-210-signals-and-systems
Code for the assignments for EE-210, Signals and Systems, at IIT Bombay 2016.
cuda image-processing signal-processing
Last synced: 26 Apr 2026
https://github.com/dwain-barnes/llm-gguf-auto-converter
Automated Jupyter notebook solution for batch converting Large Language Models to GGUF format with multiple quantization options. Built on llama.cpp with HuggingFace integration.
auto-converter batch-processing cuda gguf huggingface jupyter-notebook llama-cpp llm model-quantization
Last synced: 17 Jun 2025
https://github.com/alexyzha/cuda-bioinformatics
A CUDA-Accelerated Bioinformatics Toolchain
bioinformatics bioinformatics-tool cplusplus cuda
Last synced: 26 Apr 2026
https://github.com/mateuszk098/parallel-programming-examples
Simple parallel programming examples with CUDA, MPI and OpenMP.
cpp cuda mpi openmp parallel-programming
Last synced: 27 Apr 2026
https://github.com/kbredies/tgv_pycuda
Algorithms, examples and tests for denoising, deblurring, zooming, dequantization and compressive imaging with total variation (TV) and second-order total generalized variation (TGV) regularization. GPU-accelerated code using PyCUDA.
compressive-imaging cuda image-deblurring image-denoising image-dequantization image-zooming python3 total-generalized-variation total-variation
Last synced: 27 Apr 2026
https://github.com/notkartikye/cuda-image-box-filters
🖼️ CUDA-powered tool for applying box filters to a large amount of images
cuda cuda-library cuda-programming npp
Last synced: 27 Apr 2026
https://github.com/gladap/heterogeneous_computing_project
Heterogeneous parallel programming exercise using OpenMP and CUDA to parallelize image filters
cuda heterogeneous-parallel-programming
Last synced: 27 Apr 2026
https://github.com/perhuepenbecker/cudyn
CUDA library for irregular tasks using a dynamic block-internal balancing mechanism
cpp cuda cuda-library cuda-programming gpu-computing gpu-programming irregular
Last synced: 28 Apr 2026
https://github.com/ncorgan/arrayfire-config-info
A small command-line utility that outputs all available ArrayFire devices
Last synced: 28 Apr 2026
https://github.com/obsidianplusplus/yolov5-tensorrt-accelerator
基于TensorRT加速的YOLOv5高性能推理框架 | High-performance YOLOv5 inference framework accelerated by TensorRT with dynamic optimization
cuda dynamic-shapes-cuda-stream fp16 int8 pycuda tensorrt yolov5
Last synced: 28 Apr 2026
https://github.com/rog0d/gpuss_watchers
"The GPU Watchers swore upon their shared memory hierarchy, from L1 to global memory, which also served as their mandate as lords of parallel computation."
cuda gpu-acceleration gpu-monitoring gpu-profiling
Last synced: 28 Apr 2026
https://github.com/axeloooo/pytorch
Collection of deep learning workflows in PyTorch, from fundamentals and classification to transfer learning and experiment tracking.
Last synced: 28 Apr 2026
https://github.com/ltsyk/smart-snake-ai
Advanced Deep Q-Network AI for Snake Game with CUDA support and 700% performance boost
artificial-intelligence cuda deep-q-network dqn game-ai machine-learning pytorch reinforcement-learning snake-game
Last synced: 28 Apr 2026
https://github.com/atelierarith/julia_gpu_playground
For those who want use Julia with GPU
cuda docker docker-compose julia
Last synced: 28 Apr 2026
https://github.com/ccfelius/hpc
High Performance Computing (CUDA, MPI/openMP, high performance ML)
cuda high-performance-computing machine-learning mpi
Last synced: 28 Apr 2026
https://github.com/emanuelemessina/cuda-benchmark
Evaluate matrix calculations time between CPU and GPU (CUDA)
benchmark cuda matrix-calculations
Last synced: 28 Apr 2026
https://github.com/shermanlo77/modefilter
ImageJ plugin, Java and CuPy implementation of the mode filter and empirical null filter. The mode filter is an edge-preserving smoothing filter by taking the mode of the empirical density.
cuda cupy empirical-null fiji filter image-filter imagej jcuda mode-filter
Last synced: 28 Apr 2026
https://github.com/jalberty2018/run-pytorch-cuda-develop
Compile environment for Pytorch with CUDA
cloud code-server compiler cuda cuda-toolkit docker-image flash-attn jupyterlab python python3 pytorch sage-attention
Last synced: 28 Apr 2026
https://github.com/psteinb/gtc2017
Slides for my presentation at GTC 2017 from May 8-11 in Silicon Valley
compression cuda ffmpeg gpu gpu-computing h264 h265 microscopes spim
Last synced: 03 May 2026
https://github.com/fedimser/aldyparen
Renders pictures and videos with algebraic fractals
Last synced: 29 Apr 2026
https://github.com/sandialabs/tenzing
Core library for optimizing CUDA+MPI programs as sequential decision problems.
cuda mpi scr-2759 sequential-decision-problem
Last synced: 29 Apr 2026
https://github.com/snandasena/cuda-at-scale-for-the-enterprise
Gauss Filter with CUDA and NPP
Last synced: 29 Apr 2026
https://github.com/baro-00/cpp-cuda-lab
Experimental C++ projects using NVIDIA CUDA for parallel computing. Learning & testing GPU kernels
Last synced: 04 May 2026
https://github.com/apostolis1/parallel-processing-systems
Project of the undergrad course "Parallel Processing Systems" - NTUA
benchmark c cuda mpi openmp parallel-computing
Last synced: 29 Apr 2026
https://github.com/giog97/histogram_equalization_cuda
Performance comparison of sequential and parallel CUDA Histogram Equalization for image contrast enhancement.
cuda cuda-kernels cuda-programming histogram-equalization image-processing parallel-computing parallel-programming
Last synced: 29 Apr 2026
https://github.com/jonastoth/cuda_raytracer
University project to implement a basic Raytracer in CUDA
Last synced: 29 Apr 2026
https://github.com/mcobzarenco/bitonic.cu
CUDA bitonic sort in rust
cuda parallel-computing rust sorting-algorithms
Last synced: 29 Apr 2026
https://github.com/dogrego/gpgpu-rainbow-raytracer
A GPU-accelerated rainbow ray tracer with CPU reference implementation, CUDA for parallelized refraction/reflection, and OpenGL for interactive visualization
Last synced: 29 Apr 2026
https://github.com/mathiasotnes/gemm
General Matrix Multiplication (GEMM) optimization in Cuda.
Last synced: 26 Mar 2025
https://github.com/fikri-rouzan/cuda-c-program-part-2
CUDA C program from NVIDIA course.
Last synced: 30 Apr 2026
https://github.com/fulvius31/triton-cache-tracker
A lightweight utility for monitoring and analyzing Triton kernel compilation cache behavior.
cache cuda gpu gpu-kernels triton triton-openai
Last synced: 30 Apr 2026
https://github.com/gaurisharan/cuda-ml-kernels
Repo for CUDA C++ GPU kernels for ML and HPC.
cpp cuda gpu hpc kernels ml parallel-computing systems-ml
Last synced: 30 Apr 2026
https://github.com/neel-dandiwala/npp_cudaatscale_project
For the enterprise course project, I have created a model that executes the histogram equalisation procedure on the given input image file.
Last synced: 30 Apr 2026
https://github.com/mahshid1378/piper-plus-3
Multilingual neural TTS (6 languages: JA/EN/ZH/ES/FR/PT, code supports SV) — C++, C#, Rust, Go, Python, npm (WASM). VITS + Prosody, streaming, CUDA/CoreML/DirectML. pip install piper-plus | npm install piper-plus | cargo install piper-plus-cli
cross-platform csharp cuda deep-learning dotnet japanese multilingual nuget onnx pytorch rust speech-synthesis streaming text-to-speech tts vits webassembly
Last synced: 08 Jun 2026
https://github.com/actepukc/uv-app-starter-pack
Bootstrap PySide6 GUI apps quickly using uv, with built-in PyTorch/CUDA handling.
astral-uv cross-platform cuda gui pyside6 python pytorch qt6 starter-kit template
Last synced: 30 Apr 2026
https://github.com/manishklach/gb300-rl-runtime
Close-to-metal C/CUDA lab for RL inference fast paths: persistent GPU workers, hugepage KV arenas, cacheline-aware command rings, and async reward handoff. Goal: remove page faults, malloc/free, scheduler wakeups, CPU round-trips, and KV migration from the per-token path.
ai-infrastructure close-to-metal cuda gb300 gpu-inference hpc lock-free nvlink reinforcement-learning spsc-queue
Last synced: 09 Jun 2026
https://github.com/ivanbuccella/sf2bio
Deep reinforcement learning for de novo drug design: a ReLeaSe method execution on a Docker Environment
cuda deep-learning deep-reinforcement-learning docker docker-compose machine-learning nvidia-cuda nvidia-docker reinforcement-learning release release-method
Last synced: 01 May 2026
https://github.com/mrtejas/cv-sandbox
A collection of Computer Vision mini-projects tuned for a number of tasks, including face detection, object detection, image segmentation and CLIP. Trained on popular datasets and includes comparative study of the methods. Done as a part of S24 course : Computer Vision at IIIT Hyd
computer-vision cuda ml opencv pytorch yolo
Last synced: 01 May 2026
https://github.com/fikri-rouzan/cuda-c-program-part-3
CUDA C program from NVIDIA course.
Last synced: 01 May 2026
https://github.com/darshanakgr/meanfiltergpu
A gpu implementation of mean filter in CUDA
Last synced: 01 May 2026
https://github.com/hrolive/fundamentals-of-accelerated-computing-with-cuda-python
Explore how to use Numba—the just-in-time, type-specializing Python function compiler—to create and launch CUDA kernels to accelerate Python programs on massively parallel NVIDIA GPUs.
accelerated-computing cuda cuda-programming jit numba nvidia python
Last synced: 01 May 2026
https://github.com/andresvalle/ocr-extraction
Text extraction from images using EasyOCR and parallelization with PyTorch
Last synced: 01 May 2026
https://github.com/marius311/cudadistributedtools.jl
A set of utility tools for multi-GPU + multi-process workflows
Last synced: 01 May 2026
https://github.com/f14-bertolotti/torchess
cuda torch extension for a chess engine
Last synced: 01 May 2026
https://github.com/imanghd/parallelprocessing
CE Algorithms Lab @ SUT
cuda openmp parallel-algorithm parallel-processing systolic
Last synced: 01 May 2026
https://github.com/jaidevd/ipec-fdp
cuda hpc keras mapreduce numba spark tensorflow
Last synced: 11 Apr 2026
https://github.com/vladd12/libexecstd
Modern C++ library for using an execution context of computer devices
cpp cpp17 cuda gpu-acceleration gpu-computing
Last synced: 06 May 2026
https://github.com/baudneo/zomi-server
FastAPI ML server designed for ZoneMinder (zomi-client)
alpr coral-tpu cuda face-detection face-recognition fastapi machine-learning object-detection onnxruntime opencv pydantic-v2 tensorrt torch zoneminder
Last synced: 18 Jan 2026
https://github.com/BardiFarsi/ThreadPoolManager
ThreadPoolManager is a C++ project that implements an efficient multi-threading system using a thread pool for generic functions of the same type and different tasks. It includes task management, synchronization mechanisms, and thread-safe logging to demonstrate concurrent task execution.
cpp cpp17 cpp20 cuda cuda-programming memory-management multiprocessing multithreading parallel-computing parallel-processing parallel-programming thread thread-pool thread-safety threadpool threads threadsafe
Last synced: 15 May 2025
https://github.com/proafxin/cuda-docker
High performance computing Images with pycuda and tensorrt preinstalled
cuda docker dockerfile libcudnn nvidia-tensorrt pycuda python tensorrt
Last synced: 11 Apr 2026
https://github.com/zhaocc1106/cuxx-programing
一些cuda库的样例,cuda、cublas、cublaslt、cusparse...
Last synced: 23 Mar 2025
https://github.com/gammahazard/locate-anything
Sleek, mobile-friendly web UI for NVIDIA LocateAnything-3B — open-vocabulary object detection & grounding on your own GPU, via one docker compose up.
bounding-boxes computer-vision cuda docker fastapi gpu grounding locate-anything machine-learning nvidia object-detection ocr open-vocabulary-detection react self-hosted tailwindcss typescript vision-language-model web-ui
Last synced: 28 May 2026
https://github.com/abhiram-kandiyana/cuda-blast-2024
Reimplementation of NCBI BLAST with CUDA backend for faster retrieval
blast cuda gpu-acceleration parallel-processing
Last synced: 15 Mar 2025
https://github.com/mvishiu11/kmeans-clustering
K-Means Clustering with both GPU (CUDA) and CPU implementations
Last synced: 15 Mar 2025
https://github.com/sahil-rajwar-2004/vector-cuda
vector calculation with GPU acceleration using CUDA
c cpp11 cuda cuda-kernels cuda-programming nvcc
Last synced: 15 May 2025
https://github.com/neel-dandiwala/cuda-programs
Miscellaneous programs that grasp the concept of Parallel Computing
cuda gpu-programming parallel-programming
Last synced: 16 May 2025
https://github.com/tchung1970/sd-cli-cuda
CUDA-accelerated Stable Diffusion plugin for wavespeed-desktop
cuda gpu linux nvidia stable-diffusion
Last synced: 09 May 2026
https://github.com/bikrammajhi/100-days-of-gpu
This is my 🔥 100 Days of GPU — a wild, hands-on journey through CUDA kernels, Triton spells, and PTX sorcery.
cuda nsight-compute ptx triton
Last synced: 18 Jun 2025
https://github.com/bfalls/img-compressor
GPU-accelerated JPEG compressor
cli-tool command-line compression cpp cpp-cuda-gpu-programming-parallel-computing cuda dct demo-project gpgpu gpu-programming high-performance-computing hpc image-compression image-processing jpeg parallel-computing
Last synced: 20 Apr 2026
https://github.com/lk/gpu-nbody
GPU-accelerated n-body engine for t-SNE and physics simulation
cuda gpu n-body n-body-simulator
Last synced: 02 Sep 2025
https://github.com/usman619/pdc
Parallel and Distributed Computing
cuda distributed-computing distributed-systems nextcloud
Last synced: 11 Apr 2026
https://github.com/lordofhyphens/gpu-path-delay-coverage
CUDA-based Path Delay Fault Coverage
Last synced: 04 May 2026
https://github.com/hit07/ml-dl-torch
This repository contains comprehensive understanding of Machine Leaning, DeepLeaning using Pytorch
computer-vision convolutional-neural-networks cuda neural-networks pytorch
Last synced: 28 Feb 2025
https://github.com/gaaniruddha/mphil-gpu-imager
This repository contains code for project #1 of MPhil: test-version of GPU imager for a single time-step, single-channel and single time-step, multi-channel.
astronomy benchmarks cuda cufft google-sheets gpu-imager imaging-astronomy interferometry radio-astronomy
Last synced: 11 Jun 2026
https://github.com/alan-cooney/python-cuda-starter-template
Python CUDA Starter Template
Last synced: 30 Mar 2025
https://github.com/h4ck3r-04/fpassword
Fpassword merges Hashcat's hash-cracking precision with Hydra's parallelized network login, offering penetration testers a powerful tool for swift hash deciphering and simultaneous login attempts across diverse protocols.
brute-force brute-force-attacks c cracking cuda gpgpu hashcat hashes hydra network-security opencl password penetration-testing
Last synced: 16 Jan 2026
https://github.com/jesuscopado/parallel-programming
My solutions for the course Programming Parallel Computers at Aalto University (http://ppc.cs.aalto.fi/). Grade: 5/5
cpp cuda image-segmentation median-filter sorting-algorithms
Last synced: 19 Apr 2026