CUDA
CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.
- GitHub: https://github.com/topics/cuda
- Wikipedia: https://en.wikipedia.org/wiki/CUDA
- Created by: Nvidia
- Released: June 23, 2007
- Related Topics: nvcc,
- Last updated: 2026-06-23 00:07:15 UTC
- JSON Representation
https://github.com/aaaastark/nvidia-cuda-google-colab
Deployment of NVIDIA-CUDA on Google Colab. With in examples codes (Vector Addition and Matrix Multiplication).
c cpp cuda googlecolab googlecolaboratory matrix-multiplication nvidia python vector-addition
Last synced: 16 Apr 2026
https://github.com/alexjmercer/cuda-npp-assignment
Learning about CUDA and NVIDIA Performance Primitives. Part of Coursera Assignment.
Last synced: 13 Feb 2026
https://github.com/tlabaltoh/tlab-sharescreen-server-win
Software frame encoder using CUDA and cast encoded frames over UDP. Trying to implement a custom streaming protocol and shader based frame encoder/decoder for screencast.
cuda desktop-capture screensharing unity unity3d windows-graphics-capture
Last synced: 14 Feb 2026
https://github.com/AndreasKaratzas/orin
Setting up the NVIDIA Jetson Orin Nano Developer Kit
cuda cudnn jetpack6 nvidia-jetson nvidia-sdkmanager orin-nano
Last synced: 25 Feb 2025
https://github.com/ankhoa1212/cuda-program
This is a GPU program built with CUDA using parallel reduction
cpp cuda curand gpu-programming parallel-reduction
Last synced: 14 Feb 2026
https://github.com/nagharjun17/mlir-to-ptx-cuda
Creating an MLIR dialect that fuses Addition + ReLU, lowers to NVVM and LLVM IR and generates PTX to run the kernel on CUDA GPU
cpp cuda deep-learning llvm mlir ptx
Last synced: 18 Apr 2026
https://github.com/anne-andresen/autoencoder_3d_c_cuda
3D Autoencoder training in raw C/CUDA
Last synced: 28 Apr 2026
https://github.com/mattjesc/gpu-accelerated-fap
GPU-Accelerated Frequency Analysis Prototype using CUDA, Unit Testing, and User-Defined Settings
c cmake cpp cuda cufft googletest gpu gpu-acceleration gpu-computing gpu-programming nvidia signal-processing test test-automation testing unit-testing
Last synced: 16 Apr 2026
https://github.com/smoke-y/athena
Deep learning library
cuda deep-learning deep-learning-library
Last synced: 01 Mar 2026
https://github.com/aarid/cuda_operations
This project compares performance between CPU and GPU with CUDA operations. Two simples cases are used: matrix multiplication and 2d convolution.
conv2d cuda cuda-programming gpu gpu-computing matrix-multiplication
Last synced: 02 Mar 2026
https://github.com/anselm67/cuda_mnist
A CUDA implementation of MNIST - for CUDA beginners.
cuda gpu gpu-computing gpu-programming mnist mnist-classification
Last synced: 02 Mar 2026
https://github.com/fedesky25/hpc-project-2024
Project for the 2024 course of HPC: generator of streamplot of complex-valued functions
Last synced: 30 Mar 2025
https://github.com/aledinola/ifp_cuda_mex
Solve the income fluctuation problem on the GPU
Last synced: 14 May 2026
https://github.com/atticuszeller/pytorch-lightning-uv
📦 Zero-config Deep Learning template with PyTorch Lightning, UV package manager, W&B tracking, and modern Python tooling 🚀
classification cuda deep-learning machine-learning mnist-classification python pytorch pytorch-lightning typer uv
Last synced: 16 Apr 2026
https://github.com/ronaldsg20/compu-paralela
Códigos de ejemplo para computación paralela y distribuida
cuda opencv openmp posix-threads
Last synced: 14 May 2026
https://github.com/arya2004/parallel-computing
Parallel Computing Uni Course
Last synced: 18 May 2026
https://github.com/juntyr/necsim-rust-docs
Documentation of the spatially explicit biodiversity simulation necsim-rust
biodiversity cuda docs mpi necsim rust simulation
Last synced: 14 May 2026
https://github.com/nguyenpanda/gemm
Parallel Computing Assignment - K251 - HCMUT - VNU
cpp23 cuda forkjoin matrix-multiplication mpi openmp openmpi parallel-computing simd simd-instructions strassen-multiplication
Last synced: 14 May 2026
https://github.com/cs550-epfl/review
Review of the paper A Formal Analysis of the NVIDIA PTX Memory Consistency Model
cuda formal-verification gpu memory-consistency ptx simt
Last synced: 30 Mar 2025
https://github.com/eagleeee2/ethminer
EthMiner is a powerful Ethereum mining software optimized for GPU performance using OpenCL and CUDA technologies. It provides easy setup, detailed performance metrics, and robust compatibility with major mining pools, ensuring maximum efficiency and profitability for both novice and experienced miners.
cryptocurrency cuda eth ethash ethereum ethereum-mining gpu-mining mining-pool mining-software open-source
Last synced: 16 Apr 2026
https://github.com/harmeshgv/gpu-powered-bert-finetuning
Efficient fine-tuning of BERT models using CUDA-powered GPUs, optimized for laptops and devices with NVIDIA RTX 3000/4000 series or CUDA-compatible GPUs. Ideal for fast NLP model training with PyTorch and Hugging Face Transformers.
bert-model cuda finetuning-llms pytorch
Last synced: 16 Apr 2026
https://github.com/td99/ai-sandbox
A collection of AI tools and prototypes.
ai cuda docker image-generation-ai nvidia python
Last synced: 08 Apr 2026
https://github.com/brainlesslabs/jalebi
C++ String algorithms for maximum performance
c-plus-plus cplusplus cpp cpp-library cpu cuda library parallel performance simd sse string string-matching vectorization
Last synced: 14 May 2026
https://github.com/belrbez/ship-graphic-qt-qml-cuda-c
Client-Server application for Rocket driving in QML graphics
c client-server cpp cuda qml qt5 rocket
Last synced: 08 Apr 2026
https://github.com/uefi-code/bachelorgraduationdesign
I developed a PyTorch_For_PoorGuys framework and Let it train LLM on NVIDIA GeForce 2080Ti GPU as my Bachelor's Graduation Design Project
chatbot cuda gpu hacking large-language-models pytorch
Last synced: 03 May 2026
https://github.com/seanwevans/damnati
A CUDA-accelerated iterated prisoner's dilemma arena
arena cuda iterated-prisoners-dilemma prisoners-dilemma tournament
Last synced: 14 May 2026
https://github.com/dwain-barnes/llm-gguf-auto-converter
Automated Jupyter notebook solution for batch converting Large Language Models to GGUF format with multiple quantization options. Built on llama.cpp with HuggingFace integration.
auto-converter batch-processing cuda gguf huggingface jupyter-notebook llama-cpp llm model-quantization
Last synced: 17 Jun 2025
https://github.com/sergeipapina/color2graycuda
color to gray image conversion nvidia CUDA kernel implementation using make or cmake to compile and link
cmake cuda cuda-kernels cuda-programming link makefile nvidia
Last synced: 06 Apr 2025
https://github.com/iebeid/cuda-particles
A simple visualization of particles calcualted using CUDA
Last synced: 17 Apr 2026
https://github.com/jonmarty/pycuda-kmeans
A parallelized PyCuda implementation of the KMeans clustering algorithm.
Last synced: 25 Apr 2026
https://github.com/jdibenes/game_of_life_cuda
OpenGL / CUDA implementation of Conway's Game of Life.
cpp cuda opengl qt6 simulation
Last synced: 02 Apr 2026
https://github.com/chrisdalvit/gpu-matrix-transpose
Implementation and benchmarking of different matrix transpose with CUDA
c cpp cuda cuda-kernels cuda-programming gpu-acceleration gpu-computing gpu-programming matrix-transpose nvidia-gpu
Last synced: 17 Apr 2026
https://github.com/leo27945875/pybind11_cuda_matmul
cpp cuda matrix-multiplication pybind11 python3
Last synced: 17 Apr 2026
https://github.com/loreloc/triturus
A bunch of triton kernels with increasing complexity for learning and exploring triton and GPU programming
Last synced: 17 Apr 2026
https://github.com/stckvrflw/pem-spgemm
pemSpGEMM - An Improved SpGEMM Algorithm
Last synced: 17 Apr 2026
https://github.com/void4main/bifurcation-diagram
These little python scripts plot a bifurcation diagram into a png file (work fine on a raspberry pi and accelerated on a NVIDIA Jetson Nano) - but still a lot of room for improvements ...
bifurcation cuda feigenbaum gpu jetson logistic map nano numba sequence vectorize
Last synced: 17 Apr 2026
https://github.com/bjornmelin/ml-production-engineering
⚙️ End-to-end ML deployment solutions. Focused on model serving, multi-GPU optimization, and production-grade system implementation. 🎯
cuda deployment docker fastapi gpu-computing kubernetes mlops production
Last synced: 17 Apr 2026
https://github.com/bjornmelin/nlp-engineering-hub
📚 Enterprise NLP systems and LLM applications. Features custom language model implementations, distributed training pipelines, and efficient inference systems. 🔤
cuda gpu-optimization huggingface huggingface-transformers langchain language-models large-language-models nlp openai python transformers
Last synced: 17 Apr 2026
https://github.com/vibesmiths/mcp-rvc
GPU service for voice cloning via Retrieval-based Voice Conversion (CUDA + ROCm)
cuda docker gpu rocm rvc tts voice-cloning
Last synced: 17 Apr 2026
https://github.com/vibesmiths/mcp-musicgen
GPU service for text-to-music generation via Meta AudioCraft (CUDA + ROCm)
audiocraft cuda docker gpu musicgen python rocm text-to-music
Last synced: 17 Apr 2026
https://github.com/briiqn/obj2schem
A CUDA enabled .obj model to schematic (Sponge V3) converter
cuda minecraft schematics wavefront-obj worldedit
Last synced: 17 Apr 2026
https://github.com/cs550-epfl/report
EPFL CS-550 project report
cuda formal-verification gpu memory-consistency ptx simt
Last synced: 03 Jun 2026
https://github.com/adesoji1/youtubesummaryai
Python script for YouTube summary. The service should summarize an YouTube video by url. It should works for long video and for different languages.
cuda googleapi python3 speech-recognition transformers youtube-api-v3 youtube-dl
Last synced: 04 Apr 2025
https://github.com/tylerfaulkner/n-body_simulation
CUDA N-Body Gravitational Simulation with rendering in Python with MatPlotLib
Last synced: 20 May 2026
https://github.com/synapticore-io/torch-cuda
PyTorch CUDA 12.8 project template - GPU-accelerated ML with modern Python packaging (uv), checkpointing, early stopping, and reproducibility utilities.
cuda gpu project-template python pytorch
Last synced: 04 Apr 2026
https://github.com/seieric/pytorch-mpi-singularity
Singularity Container including PyTorch with CUDA and mpi backend for DistributedDataParallel
cuda hpc nvidia openmpi pytorch singularity utokyo
Last synced: 18 Apr 2026
https://github.com/thalesmg/haskell-accelerate-parconc
Example and benchmark of Accelerate-HS from Parallel and Concurrent Programming in Haskell
accelerate cuda gpu-computing haskell parallel-computing
Last synced: 18 Apr 2026
https://github.com/qanastek/concurency-tetravex
This software is an fast and reliable tetravex solver based on C++ and CUDA.
c-plus-plus cuda parrallel-computing tetravex
Last synced: 18 Apr 2026
https://github.com/abdelrahman-amen/active_learning_in_nlp
I applied active learning to the IMDB dataset for sentiment analysis. Starting with a small labeled subset, I trained a model and used uncertainty sampling to select and label challenging reviews. This iterative process improved performance while reducing labeling effort.
activelearning cuda entropy imdb-dataset margin nlp python sklearnex torch uncertainty
Last synced: 18 Apr 2026
https://github.com/betarixm/csed490c
POSTECH: Heterogeneous Parallel Computing (Fall 2023)
cuda gpu parallel-computing postech
Last synced: 19 Apr 2026
https://github.com/flavienbwk/nvidia-cuda-mirror-docker
An all-in-one mirror for installing NVIDIA Docker.
cuda docker linux-mirror mirror nvidia nvidia-docker nvidia-docker2 offline offline-capable
Last synced: 18 Apr 2026
https://github.com/cooliron2311/cumd5bf
CUDA based md5 password bruteforcer
Last synced: 18 Apr 2026
https://github.com/marcellodesales/nvidea-cuda-ubuntu-docker
NVidea CUDA base image on Ubuntu Linux, used to run Machine Learning
ai cuda docker docker-compose machine-learning ml nvidia-docker ubuntu
Last synced: 18 Apr 2026
https://github.com/dmmutua/cuda_projects
An Implementation of a variety of Algorithms & Technical Papers Mostly Related to Machine Learning & Deep Learning in CUDA C
c cuda cuda-programming deep-learning machine-learning machine-learning-algorithms
Last synced: 18 Apr 2026
https://github.com/genpat-it/ohe-rs
Ultra-fast one-hot encoding for bioinformatics and ML, powered by Rust + CUDA. Built for cgMLST allele profiles and large-scale categorical data.
bioinformatics cuda machine-learning one-hot-encoding performance pyo3 python rust
Last synced: 04 Jun 2026
https://github.com/ex539/docker-dev-env
A collection of ready-to-use Docker development environments for multiple Linux distributions (Ubuntu, Debian, Alpine, Arch, Kali). Includes shared configurations, utility scripts, and comprehensive documentation for reproducible development setups across teams and CI/CD pipelines.
big-data cpp cuda docker docker-image docker-php docker-setup environment hadoop jenkins kubernetes qtcreator reproducibility x11
Last synced: 05 Apr 2026
https://github.com/aditiisaxena/cuda-accelerated-box-filter-for-texture-image-enhancement
Enhances grayscale texture images using a CUDA-based box filter. Built with CUDA, C++14, and OpenCV for high-performance image processing.
cpp cuda gpu-programming linux nvidia opencv
Last synced: 18 Apr 2026
https://github.com/intelav/gpu-agent-opt
AI Agent Framework for GPU Kernel Autotuning & Optimization. Automate CUDA kernel exploration, profiling, and tuning with AI-driven agents for deep learning, geospatial AI, and HPC workloads.
ai-agents autotuning cuda deep-l edge-ai geospatial gpu hpc nvidia optimization performance pytorch
Last synced: 19 Apr 2026
https://github.com/vicen-te/tiny-nn
A tiny neural network framework for fully-connected layers with CPU and CUDA support
backpropagation cplusplus-20 cpu cuda cuda-12-8 kernel multi-threaded neural-network nn
Last synced: 19 Apr 2026
https://github.com/timanema/msc-thesis-public
Repository containing a GPU-accelerated compressor based on FSST
compression cpp cuda gpu thesis
Last synced: 19 Apr 2026
https://github.com/fatlipp/toyslam
SLAM implementation from scratch w/o external graph optimization libs
cuda gpu lidar-slam mapping odometry robotics slam
Last synced: 20 Apr 2026
https://github.com/ydkn/htw-progko-cuda
Parallel processing of image transformations. Part of the "Programmierkonzepte und Algorithmen" course at HTW-Berlin.
cuda image-transformations opencv
Last synced: 20 Apr 2026
https://github.com/rtfirst/voice-to-text
Cross-platform Push-to-Talk speech-to-text — local Whisper transcription (CUDA/MPS) with optional Anthropic API correction and live VU meter overlay. Windows 11 + macOS.
cuda macos push-to-talk python speech-to-text voice-input whisper windows
Last synced: 04 Jun 2026
https://github.com/amirbroker/cupydtw
Use Cuda for Dynamic Time Warping
cuda dtw dynamic-time-warping python
Last synced: 20 Apr 2026
https://github.com/alexkranias/triton_vs_cuda
Building Triton and CUDA kernels side-by-side to create a cuBLAS-performant GEMM kernel.
cuda cuda-kernels gpu gpu-programming parallel-programming python triton
Last synced: 20 Apr 2026
https://github.com/rurumimic/candle
huggingface candle
cuda gpu huggingface nvidia transformer
Last synced: 05 May 2026
https://github.com/jusqua/dip-benchmark
Departmental undergraduate research project at UFS. Digital image processing benchmark using multiple tools to learn new ways to develop image processors.
benchmark cuda image-processing matlab opencv sycl visiongl
Last synced: 20 Apr 2026
https://github.com/bonevbs/cuknn
Cuda implementation of k-nearest neighbor search
Last synced: 20 Apr 2026
https://github.com/py-sandy/llama.cpp-windows-builder
Automated, reproducible build scripts for llama.cpp on Windows 10/11. Installs prerequisites, configures CMake and builds with CUDA.
ai build-scripts build-tool builder cuda llamacpp script scripts windows windows-10 windows-11
Last synced: 20 Apr 2026
https://github.com/larygwil/cuda-samples-old
nvidia cuda samples old (5.0 - 7.5)
Last synced: 03 May 2026
https://github.com/mrkct/cuda-raytracer
Simple CUDA-Accelerated raytracer
cuda gpu raytracing raytracing-one-weekend
Last synced: 21 Apr 2026
https://github.com/rai-project/dlperf
Déjà vu: Modeling DNN Performance by Recalling History
benchmark cuda deep-learning modeling onnx performance tensorflow
Last synced: 21 Apr 2026
https://github.com/musaibbashir/object-detection
Pytorch+CUDA implementation of several image classification and object detection models like YOLO , Fast-CNN, RF-DETR
cnn computer-vision cuda image-classification object-detection pytorch yolo
Last synced: 21 Apr 2026
https://github.com/dimitrijkrstev/pp-cuda-fft
A parallelised CUDA implementation of the FFT Radix-2 algorithm and its execution time comparison to the DFT and non-parallelised Radix-2
Last synced: 22 Apr 2026
https://github.com/mdnpascual/judgebarmashvp
Error bar for the game called Mash VP
cuda emgucv screencapturer tesseract-ocr
Last synced: 22 Apr 2026
https://github.com/bikemazzell/tuonella-sift
A high-performance, memory-efficient CSV deduplication tool
csv cuda deduplication logger osint rust
Last synced: 24 Apr 2026
https://github.com/bardifarsi/threadpoolmanager
ThreadPoolManager is a C++ project that implements an efficient multi-threading system using a thread pool for generic functions of the same type and different tasks. It includes task management, synchronization mechanisms, and thread-safe logging to demonstrate concurrent task execution.
cpp cpp17 cpp20 cuda cuda-programming memory-management multiprocessing multithreading parallel-computing parallel-processing parallel-programming thread thread-pool thread-safety threadpool threads threadsafe
Last synced: 24 Apr 2026
https://github.com/jackrekirby/raytracing-cuda
Raytracing using CUDA
cpp cuda raytracing raytracing-in-one-weekend
Last synced: 24 Apr 2026
https://github.com/juntyr/necsim-rust-analysis
Analysis of the spatially explicit biodiversity simulation `necsim-rust`
analysis biodiversity cuda mpi necsim rust simulation
Last synced: 24 Apr 2026
https://github.com/alkaifaftab000/autonomous-maze-solver
Building an Autonomous Maze Solver using reinforcement learning to train agents for decision-making in dynamic grid-based environments
agent criticism cuda gymnasium-environment maze-solving-bot pytorch reinforcement-learning reward-functions
Last synced: 12 Apr 2026
https://github.com/0xsooki/extending-jax
JAX Custom Operations with C++ and CUDA (using Pybind11)
Last synced: 25 Apr 2026
https://github.com/sangioai/torchpace
PyTorch CUDA/C++ extension of PACE: Transformer non-linearlity accelerator engine.
Last synced: 25 Apr 2026
https://github.com/daviddavo/19gpu
Short exercises for GPU at Complutense University of Madrid. Mirror from GitLab
accelerator cuda gpu-programming
Last synced: 26 Apr 2026