CUDA
CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.
- GitHub: https://github.com/topics/cuda
- Wikipedia: https://en.wikipedia.org/wiki/CUDA
- Created by: Nvidia
- Released: June 23, 2007
- Related Topics: nvcc,
- Last updated: 2026-07-01 00:07:09 UTC
- JSON Representation
https://github.com/cscfi/csc-env-julia
Julia language environment including MPI.jl, CUDA.jl and AMDGPU.jl preferences for HPC clusters at CSC.
amdgpu ansible cuda hpc julia julia-language mpi
Last synced: 01 Feb 2026
https://github.com/teambipartite/bipartite-gemm
High throughput data-parallel GEMM implementations in Cuda using Cuda cores and Tensor cores
Last synced: 17 Apr 2026
https://github.com/m-torhan/cuda-fractals
CUDA C++ implementation of Fractals visualization
Last synced: 25 Feb 2026
https://github.com/xza85hrf/flag_prediction_project
This application predicts the name of a country (or countries) based on an input flag image. It uses advanced image processing techniques and deep learning models built with PyTorch to classify flags accurately.
cross-validation cuda data-augmentation docker efficientnetb0 flag-recognition image-classification machine-learning mixed-precision-training mobilenetv2 python pytorch resnet resnet-50 transfer-learning
Last synced: 15 Apr 2026
https://github.com/fieldcure/fieldcure-whisper-runtimes
Pre-built Whisper.net native runtime binaries (CPU/CUDA/Vulkan) for the FieldCure software ecosystem.
cuda dotnet native-binaries nuget redistributable vulkan whisper whisper-net
Last synced: 01 Jun 2026
https://github.com/baremetalrt/baremetalrt
BareMetalRT — edge GPU compute mesh
cuda distributed-computing gpu inference llm nvidia tensorrt windows
Last synced: 18 Apr 2026
https://github.com/muppetsg2/cudaraytracer
A custom ray tracer originally developed during university studies to run on CPU, now ported to GPU using CUDA. This project was created to explore GPU rendering techniques and to gain hands-on experience with CUDA programming.
cuda mit-license nvidia-cuda nvidia-gpu raytracing sfml stb-image student-project study-project
Last synced: 16 Apr 2026
https://github.com/yashpotdar-py/flood-vision
Flood Vision - A deep learning–based computer vision system for flood mapping and damage assessment using aerial imagery.
cuda deep-learning flood-detection iot python
Last synced: 16 Apr 2026
https://github.com/sferez/sspp_sparse_matrix_cuda
Small Scale Parallel Programming, Sparse Matrix multiplication with CUDA
cpp cuda omp omp-parallel parallel-computing small-scale-parallel-programming sparse-matrix
Last synced: 30 Apr 2026
https://github.com/aaaastark/nvidia-cuda-google-colab
Deployment of NVIDIA-CUDA on Google Colab. With in examples codes (Vector Addition and Matrix Multiplication).
c cpp cuda googlecolab googlecolaboratory matrix-multiplication nvidia python vector-addition
Last synced: 16 Apr 2026
https://github.com/alexjmercer/cuda-npp-assignment
Learning about CUDA and NVIDIA Performance Primitives. Part of Coursera Assignment.
Last synced: 13 Feb 2026
https://github.com/tlabaltoh/tlab-sharescreen-server-win
Software frame encoder using CUDA and cast encoded frames over UDP. Trying to implement a custom streaming protocol and shader based frame encoder/decoder for screencast.
cuda desktop-capture screensharing unity unity3d windows-graphics-capture
Last synced: 14 Feb 2026
https://github.com/ankhoa1212/cuda-program
This is a GPU program built with CUDA using parallel reduction
cpp cuda curand gpu-programming parallel-reduction
Last synced: 14 Feb 2026
https://github.com/nagharjun17/mlir-to-ptx-cuda
Creating an MLIR dialect that fuses Addition + ReLU, lowers to NVVM and LLVM IR and generates PTX to run the kernel on CUDA GPU
cpp cuda deep-learning llvm mlir ptx
Last synced: 18 Apr 2026
https://github.com/mattjesc/gpu-accelerated-fap
GPU-Accelerated Frequency Analysis Prototype using CUDA, Unit Testing, and User-Defined Settings
c cmake cpp cuda cufft googletest gpu gpu-acceleration gpu-computing gpu-programming nvidia signal-processing test test-automation testing unit-testing
Last synced: 16 Apr 2026
https://github.com/smoke-y/athena
Deep learning library
cuda deep-learning deep-learning-library
Last synced: 01 Mar 2026
https://github.com/aarid/cuda_operations
This project compares performance between CPU and GPU with CUDA operations. Two simples cases are used: matrix multiplication and 2d convolution.
conv2d cuda cuda-programming gpu gpu-computing matrix-multiplication
Last synced: 02 Mar 2026
https://github.com/anselm67/cuda_mnist
A CUDA implementation of MNIST - for CUDA beginners.
cuda gpu gpu-computing gpu-programming mnist mnist-classification
Last synced: 02 Mar 2026
https://github.com/atticuszeller/pytorch-lightning-uv
📦 Zero-config Deep Learning template with PyTorch Lightning, UV package manager, W&B tracking, and modern Python tooling 🚀
classification cuda deep-learning machine-learning mnist-classification python pytorch pytorch-lightning typer uv
Last synced: 16 Apr 2026
https://github.com/eagleeee2/ethminer
EthMiner is a powerful Ethereum mining software optimized for GPU performance using OpenCL and CUDA technologies. It provides easy setup, detailed performance metrics, and robust compatibility with major mining pools, ensuring maximum efficiency and profitability for both novice and experienced miners.
cryptocurrency cuda eth ethash ethereum ethereum-mining gpu-mining mining-pool mining-software open-source
Last synced: 16 Apr 2026
https://github.com/harmeshgv/gpu-powered-bert-finetuning
Efficient fine-tuning of BERT models using CUDA-powered GPUs, optimized for laptops and devices with NVIDIA RTX 3000/4000 series or CUDA-compatible GPUs. Ideal for fast NLP model training with PyTorch and Hugging Face Transformers.
bert-model cuda finetuning-llms pytorch
Last synced: 16 Apr 2026
https://github.com/mathiasotnes/gemm
General Matrix Multiplication (GEMM) optimization in Cuda.
Last synced: 26 Mar 2025
https://github.com/iebeid/cuda-particles
A simple visualization of particles calcualted using CUDA
Last synced: 17 Apr 2026
https://github.com/jonmarty/pycuda-kmeans
A parallelized PyCuda implementation of the KMeans clustering algorithm.
Last synced: 25 Apr 2026
https://github.com/jdibenes/game_of_life_cuda
OpenGL / CUDA implementation of Conway's Game of Life.
cpp cuda opengl qt6 simulation
Last synced: 02 Apr 2026
https://github.com/chrisdalvit/gpu-matrix-transpose
Implementation and benchmarking of different matrix transpose with CUDA
c cpp cuda cuda-kernels cuda-programming gpu-acceleration gpu-computing gpu-programming matrix-transpose nvidia-gpu
Last synced: 17 Apr 2026
https://github.com/leo27945875/pybind11_cuda_matmul
cpp cuda matrix-multiplication pybind11 python3
Last synced: 17 Apr 2026
https://github.com/loreloc/triturus
A bunch of triton kernels with increasing complexity for learning and exploring triton and GPU programming
Last synced: 17 Apr 2026
https://github.com/stckvrflw/pem-spgemm
pemSpGEMM - An Improved SpGEMM Algorithm
Last synced: 17 Apr 2026
https://github.com/void4main/bifurcation-diagram
These little python scripts plot a bifurcation diagram into a png file (work fine on a raspberry pi and accelerated on a NVIDIA Jetson Nano) - but still a lot of room for improvements ...
bifurcation cuda feigenbaum gpu jetson logistic map nano numba sequence vectorize
Last synced: 17 Apr 2026
https://github.com/bjornmelin/ml-production-engineering
⚙️ End-to-end ML deployment solutions. Focused on model serving, multi-GPU optimization, and production-grade system implementation. 🎯
cuda deployment docker fastapi gpu-computing kubernetes mlops production
Last synced: 17 Apr 2026
https://github.com/bjornmelin/nlp-engineering-hub
📚 Enterprise NLP systems and LLM applications. Features custom language model implementations, distributed training pipelines, and efficient inference systems. 🔤
cuda gpu-optimization huggingface huggingface-transformers langchain language-models large-language-models nlp openai python transformers
Last synced: 17 Apr 2026
https://github.com/vibesmiths/mcp-rvc
GPU service for voice cloning via Retrieval-based Voice Conversion (CUDA + ROCm)
cuda docker gpu rocm rvc tts voice-cloning
Last synced: 17 Apr 2026
https://github.com/vibesmiths/mcp-musicgen
GPU service for text-to-music generation via Meta AudioCraft (CUDA + ROCm)
audiocraft cuda docker gpu musicgen python rocm text-to-music
Last synced: 17 Apr 2026
https://github.com/briiqn/obj2schem
A CUDA enabled .obj model to schematic (Sponge V3) converter
cuda minecraft schematics wavefront-obj worldedit
Last synced: 17 Apr 2026
https://github.com/cs550-epfl/report
EPFL CS-550 project report
cuda formal-verification gpu memory-consistency ptx simt
Last synced: 03 Jun 2026
https://github.com/synapticore-io/torch-cuda
PyTorch CUDA 12.8 project template - GPU-accelerated ML with modern Python packaging (uv), checkpointing, early stopping, and reproducibility utilities.
cuda gpu project-template python pytorch
Last synced: 04 Apr 2026
https://github.com/seieric/pytorch-mpi-singularity
Singularity Container including PyTorch with CUDA and mpi backend for DistributedDataParallel
cuda hpc nvidia openmpi pytorch singularity utokyo
Last synced: 18 Apr 2026
https://github.com/thalesmg/haskell-accelerate-parconc
Example and benchmark of Accelerate-HS from Parallel and Concurrent Programming in Haskell
accelerate cuda gpu-computing haskell parallel-computing
Last synced: 18 Apr 2026
https://github.com/qanastek/concurency-tetravex
This software is an fast and reliable tetravex solver based on C++ and CUDA.
c-plus-plus cuda parrallel-computing tetravex
Last synced: 18 Apr 2026
https://github.com/abdelrahman-amen/active_learning_in_nlp
I applied active learning to the IMDB dataset for sentiment analysis. Starting with a small labeled subset, I trained a model and used uncertainty sampling to select and label challenging reviews. This iterative process improved performance while reducing labeling effort.
activelearning cuda entropy imdb-dataset margin nlp python sklearnex torch uncertainty
Last synced: 18 Apr 2026
https://github.com/betarixm/csed490c
POSTECH: Heterogeneous Parallel Computing (Fall 2023)
cuda gpu parallel-computing postech
Last synced: 19 Apr 2026
https://github.com/flavienbwk/nvidia-cuda-mirror-docker
An all-in-one mirror for installing NVIDIA Docker.
cuda docker linux-mirror mirror nvidia nvidia-docker nvidia-docker2 offline offline-capable
Last synced: 18 Apr 2026
https://github.com/cooliron2311/cumd5bf
CUDA based md5 password bruteforcer
Last synced: 18 Apr 2026
https://github.com/marcellodesales/nvidea-cuda-ubuntu-docker
NVidea CUDA base image on Ubuntu Linux, used to run Machine Learning
ai cuda docker docker-compose machine-learning ml nvidia-docker ubuntu
Last synced: 18 Apr 2026
https://github.com/dmmutua/cuda_projects
An Implementation of a variety of Algorithms & Technical Papers Mostly Related to Machine Learning & Deep Learning in CUDA C
c cuda cuda-programming deep-learning machine-learning machine-learning-algorithms
Last synced: 18 Apr 2026
https://github.com/genpat-it/ohe-rs
Ultra-fast one-hot encoding for bioinformatics and ML, powered by Rust + CUDA. Built for cgMLST allele profiles and large-scale categorical data.
bioinformatics cuda machine-learning one-hot-encoding performance pyo3 python rust
Last synced: 04 Jun 2026
https://github.com/ex539/docker-dev-env
A collection of ready-to-use Docker development environments for multiple Linux distributions (Ubuntu, Debian, Alpine, Arch, Kali). Includes shared configurations, utility scripts, and comprehensive documentation for reproducible development setups across teams and CI/CD pipelines.
big-data cpp cuda docker docker-image docker-php docker-setup environment hadoop jenkins kubernetes qtcreator reproducibility x11
Last synced: 05 Apr 2026
https://github.com/aditiisaxena/cuda-accelerated-box-filter-for-texture-image-enhancement
Enhances grayscale texture images using a CUDA-based box filter. Built with CUDA, C++14, and OpenCV for high-performance image processing.
cpp cuda gpu-programming linux nvidia opencv
Last synced: 18 Apr 2026
https://github.com/intelav/gpu-agent-opt
AI Agent Framework for GPU Kernel Autotuning & Optimization. Automate CUDA kernel exploration, profiling, and tuning with AI-driven agents for deep learning, geospatial AI, and HPC workloads.
ai-agents autotuning cuda deep-l edge-ai geospatial gpu hpc nvidia optimization performance pytorch
Last synced: 19 Apr 2026
https://github.com/vicen-te/tiny-nn
A tiny neural network framework for fully-connected layers with CPU and CUDA support
backpropagation cplusplus-20 cpu cuda cuda-12-8 kernel multi-threaded neural-network nn
Last synced: 19 Apr 2026
https://github.com/timanema/msc-thesis-public
Repository containing a GPU-accelerated compressor based on FSST
compression cpp cuda gpu thesis
Last synced: 19 Apr 2026
https://github.com/fatlipp/toyslam
SLAM implementation from scratch w/o external graph optimization libs
cuda gpu lidar-slam mapping odometry robotics slam
Last synced: 20 Apr 2026
https://github.com/ydkn/htw-progko-cuda
Parallel processing of image transformations. Part of the "Programmierkonzepte und Algorithmen" course at HTW-Berlin.
cuda image-transformations opencv
Last synced: 20 Apr 2026
https://github.com/rtfirst/voice-to-text
Cross-platform Push-to-Talk speech-to-text — local Whisper transcription (CUDA/MPS) with optional Anthropic API correction and live VU meter overlay. Windows 11 + macOS.
cuda macos push-to-talk python speech-to-text voice-input whisper windows
Last synced: 04 Jun 2026
https://github.com/amirbroker/cupydtw
Use Cuda for Dynamic Time Warping
cuda dtw dynamic-time-warping python
Last synced: 20 Apr 2026
https://github.com/alexkranias/triton_vs_cuda
Building Triton and CUDA kernels side-by-side to create a cuBLAS-performant GEMM kernel.
cuda cuda-kernels gpu gpu-programming parallel-programming python triton
Last synced: 20 Apr 2026
https://github.com/jusqua/dip-benchmark
Departmental undergraduate research project at UFS. Digital image processing benchmark using multiple tools to learn new ways to develop image processors.
benchmark cuda image-processing matlab opencv sycl visiongl
Last synced: 20 Apr 2026
https://github.com/bonevbs/cuknn
Cuda implementation of k-nearest neighbor search
Last synced: 20 Apr 2026
https://github.com/py-sandy/llama.cpp-windows-builder
Automated, reproducible build scripts for llama.cpp on Windows 10/11. Installs prerequisites, configures CMake and builds with CUDA.
ai build-scripts build-tool builder cuda llamacpp script scripts windows windows-10 windows-11
Last synced: 20 Apr 2026
https://github.com/mrkct/cuda-raytracer
Simple CUDA-Accelerated raytracer
cuda gpu raytracing raytracing-one-weekend
Last synced: 21 Apr 2026
https://github.com/rai-project/dlperf
Déjà vu: Modeling DNN Performance by Recalling History
benchmark cuda deep-learning modeling onnx performance tensorflow
Last synced: 21 Apr 2026
https://github.com/musaibbashir/object-detection
Pytorch+CUDA implementation of several image classification and object detection models like YOLO , Fast-CNN, RF-DETR
cnn computer-vision cuda image-classification object-detection pytorch yolo
Last synced: 21 Apr 2026
https://github.com/dimitrijkrstev/pp-cuda-fft
A parallelised CUDA implementation of the FFT Radix-2 algorithm and its execution time comparison to the DFT and non-parallelised Radix-2
Last synced: 22 Apr 2026
https://github.com/mdnpascual/judgebarmashvp
Error bar for the game called Mash VP
cuda emgucv screencapturer tesseract-ocr
Last synced: 22 Apr 2026
https://github.com/bikemazzell/tuonella-sift
A high-performance, memory-efficient CSV deduplication tool
csv cuda deduplication logger osint rust
Last synced: 24 Apr 2026
https://github.com/bardifarsi/threadpoolmanager
ThreadPoolManager is a C++ project that implements an efficient multi-threading system using a thread pool for generic functions of the same type and different tasks. It includes task management, synchronization mechanisms, and thread-safe logging to demonstrate concurrent task execution.
cpp cpp17 cpp20 cuda cuda-programming memory-management multiprocessing multithreading parallel-computing parallel-processing parallel-programming thread thread-pool thread-safety threadpool threads threadsafe
Last synced: 24 Apr 2026
https://github.com/jackrekirby/raytracing-cuda
Raytracing using CUDA
cpp cuda raytracing raytracing-in-one-weekend
Last synced: 24 Apr 2026
https://github.com/juntyr/necsim-rust-analysis
Analysis of the spatially explicit biodiversity simulation `necsim-rust`
analysis biodiversity cuda mpi necsim rust simulation
Last synced: 24 Apr 2026
https://github.com/0xsooki/extending-jax
JAX Custom Operations with C++ and CUDA (using Pybind11)
Last synced: 25 Apr 2026
https://github.com/sangioai/torchpace
PyTorch CUDA/C++ extension of PACE: Transformer non-linearlity accelerator engine.
Last synced: 25 Apr 2026
https://github.com/daviddavo/19gpu
Short exercises for GPU at Complutense University of Madrid. Mirror from GitLab
accelerator cuda gpu-programming
Last synced: 26 Apr 2026
https://github.com/shashshukla/ee-210-signals-and-systems
Code for the assignments for EE-210, Signals and Systems, at IIT Bombay 2016.
cuda image-processing signal-processing
Last synced: 26 Apr 2026
https://github.com/alexyzha/cuda-bioinformatics
A CUDA-Accelerated Bioinformatics Toolchain
bioinformatics bioinformatics-tool cplusplus cuda
Last synced: 26 Apr 2026
https://github.com/mateuszk098/parallel-programming-examples
Simple parallel programming examples with CUDA, MPI and OpenMP.
cpp cuda mpi openmp parallel-programming
Last synced: 27 Apr 2026
https://github.com/kbredies/tgv_pycuda
Algorithms, examples and tests for denoising, deblurring, zooming, dequantization and compressive imaging with total variation (TV) and second-order total generalized variation (TGV) regularization. GPU-accelerated code using PyCUDA.
compressive-imaging cuda image-deblurring image-denoising image-dequantization image-zooming python3 total-generalized-variation total-variation
Last synced: 27 Apr 2026
https://github.com/notkartikye/cuda-image-box-filters
🖼️ CUDA-powered tool for applying box filters to a large amount of images
cuda cuda-library cuda-programming npp
Last synced: 27 Apr 2026
https://github.com/gladap/heterogeneous_computing_project
Heterogeneous parallel programming exercise using OpenMP and CUDA to parallelize image filters
cuda heterogeneous-parallel-programming
Last synced: 27 Apr 2026
https://github.com/perhuepenbecker/cudyn
CUDA library for irregular tasks using a dynamic block-internal balancing mechanism
cpp cuda cuda-library cuda-programming gpu-computing gpu-programming irregular
Last synced: 28 Apr 2026
https://github.com/ncorgan/arrayfire-config-info
A small command-line utility that outputs all available ArrayFire devices
Last synced: 28 Apr 2026
https://github.com/obsidianplusplus/yolov5-tensorrt-accelerator
基于TensorRT加速的YOLOv5高性能推理框架 | High-performance YOLOv5 inference framework accelerated by TensorRT with dynamic optimization
cuda dynamic-shapes-cuda-stream fp16 int8 pycuda tensorrt yolov5
Last synced: 28 Apr 2026