CUDA
CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.
- GitHub: https://github.com/topics/cuda
- Wikipedia: https://en.wikipedia.org/wiki/CUDA
- Created by: Nvidia
- Released: June 23, 2007
- Related Topics: nvcc,
- Last updated: 2026-07-02 00:07:18 UTC
- JSON Representation
https://github.com/neuraladitya/neural_network_c
Neural Network C is an advanced neural network implementation in pure C, optimized for high performance on CPUs and NVIDIA GPUs.
artificial-intelligence bayesian-optimization c-programming convolutional-neural-networks cuda deep-learning encryption gpu-computing high-performance-computing machine-learning mpi multi-gpu neural-network openmp parallel-computing quantization real-time-monitoring secure-computing tensor-cores transformers
Last synced: 10 May 2026
https://github.com/sebftw/interp2gpu
GPU-accelerated 2D spline interpolation, à la interp2(..., "spline"), in MATLAB.
cuda gpu gpu-acceleration matlab spline spline-interpolation
Last synced: 10 May 2026
https://github.com/cashcon57/open-supersampling
OpenSuperSampling (OSS) — vendor-agnostic open-source RT denoising, upscaling, and frame extrapolation
cuda deep-learning dlss frame-generation fsr game-engine gaussian-splatting open-source real-time-rendering super-resolution upscaling
Last synced: 10 Jun 2026
https://github.com/dlr-amr/t8gpu
Header-only finite volume library targetting GPUs using t8code as meshing backend.
adaptive-mesh-refinement cuda finite-volume gpgpu-computing hpc mesh mpi parallel-computing simulation
Last synced: 10 May 2026
https://github.com/jeremywildsmith/shadowhash-distributed
Elixir distributed Shadow File password cracker with GPU accelerated cracking for md5crypt hashing algorithm.
cracking-hash cracking-hashes cracking-password cuda distributed-systems elixir erlang hashing nx security
Last synced: 11 May 2026
https://github.com/fabulani/360ip-with-cuda
360° Image Processing with CUDA and OpenCV.
360-image 360-video cpp cuda image-processing opencv
Last synced: 11 May 2026
https://github.com/islamshahil/live-video-analysis
Live Video Analysis using PyTorch
cuda deeplearning neural-network opencv-python python pytorch video-processing webcam
Last synced: 11 May 2026
https://github.com/apws25/accelmoe
This repository is for CUDA kernel re-implementation of CPU-based MoE model.
Last synced: 11 May 2026
https://github.com/xza85hrf/flag_prediction_project
This application predicts the name of a country (or countries) based on an input flag image. It uses advanced image processing techniques and deep learning models built with PyTorch to classify flags accurately.
cross-validation cuda data-augmentation docker efficientnetb0 flag-recognition image-classification machine-learning mixed-precision-training mobilenetv2 python pytorch resnet resnet-50 transfer-learning
Last synced: 15 Apr 2026
https://github.com/sergiomarquezdev/yt-transcriber
🛠️ CLI tool to transcribe YouTube videos using OpenAI Whisper with CUDA acceleration, generate AI summaries (EN/ES) with Gemini, and create LinkedIn/Twitter content. Supports YouTube, Google Drive, and local files.
ai cli cuda gemini python transcription whisper youtube
Last synced: 15 May 2026
https://github.com/fieldcure/fieldcure-whisper-runtimes
Pre-built Whisper.net native runtime binaries (CPU/CUDA/Vulkan) for the FieldCure software ecosystem.
cuda dotnet native-binaries nuget redistributable vulkan whisper whisper-net
Last synced: 01 Jun 2026
https://github.com/separatrixxx/pgp_labs_7_sem
👓 Laboratory work for the 7 semester of MAI on PGP and PDP
Last synced: 15 May 2026
https://github.com/abhiksark/gluon-by-example
Learn Triton's Gluon by example — the same GPU kernels written in Triton and Gluon, benchmarked
cuda deep-learning gluon gpu gpu-kernels triton tutorial
Last synced: 01 Jul 2026
https://github.com/baremetalrt/baremetalrt
BareMetalRT — edge GPU compute mesh
cuda distributed-computing gpu inference llm nvidia tensorrt windows
Last synced: 18 Apr 2026
https://github.com/tornikeo/minimal-vscode-cuda-meson
Minimal sample of using VSCode and Meson to build CUDA applications
Last synced: 08 Sep 2025
https://github.com/lablup/backend.ai-accelerator-cuda
The Backend.AI CUDA Accelerator Plugin
Last synced: 16 May 2026
https://github.com/elcruzo/cuda-conv
Lightweight CUDA kernel for 2D image convolution achieving 20x+ speedup. Built with CuPy for the NVIDIA Hackathon.
computer-vision convolution cuda cupy gpu-computing hackathon high-performance-computing image-processing nvidia python
Last synced: 15 May 2026
https://github.com/muppetsg2/cudaraytracer
A custom ray tracer originally developed during university studies to run on CPU, now ported to GPU using CUDA. This project was created to explore GPU rendering techniques and to gain hands-on experience with CUDA programming.
cuda mit-license nvidia-cuda nvidia-gpu raytracing sfml stb-image student-project study-project
Last synced: 16 Apr 2026
https://github.com/farukalamai/cpp-for-cuda
A structured C++ learning path designed specifically for developers preparing to learn CUDA programming.
Last synced: 09 Jun 2026
https://github.com/bolu-atx/cuda-dojo
Level up your CUDA skills - RPG style. Do or do not, there is no try.
cuda examples learning tutorial
Last synced: 01 Jul 2026
https://github.com/yashpotdar-py/flood-vision
Flood Vision - A deep learning–based computer vision system for flood mapping and damage assessment using aerial imagery.
cuda deep-learning flood-detection iot python
Last synced: 16 Apr 2026
https://github.com/sferez/sspp_sparse_matrix_cuda
Small Scale Parallel Programming, Sparse Matrix multiplication with CUDA
cpp cuda omp omp-parallel parallel-computing small-scale-parallel-programming sparse-matrix
Last synced: 30 Apr 2026
https://github.com/antoniakras/semantic-video-search
GPU-optimized semantic search on video transcripts, with benchmarking of FAISS, Pinecone, and PostgreSQL vector databases. Deployed via Docker on FORTH’s GPU infrastructure.
bert-embeddings bert-fine-tuning cuda dokcer embedding-models embeddings-word2vec faiss-vector-database gpu-computing huggingface-transformers nlp-machine-learning pgvector pineconedb postgresql python pytorch retrieval-augmented-generation similarity-search vector-database whisper-ai
Last synced: 03 May 2026
https://github.com/lehoangan2906/cuda_basics
A simple implementation of operations on vectors and matrices, optimized for running on Nvidia GPU with CUDA
Last synced: 16 Jun 2025
https://github.com/aaaastark/nvidia-cuda-google-colab
Deployment of NVIDIA-CUDA on Google Colab. With in examples codes (Vector Addition and Matrix Multiplication).
c cpp cuda googlecolab googlecolaboratory matrix-multiplication nvidia python vector-addition
Last synced: 16 Apr 2026
https://github.com/alexjmercer/cuda-npp-assignment
Learning about CUDA and NVIDIA Performance Primitives. Part of Coursera Assignment.
Last synced: 13 Feb 2026
https://github.com/tlabaltoh/tlab-sharescreen-server-win
Software frame encoder using CUDA and cast encoded frames over UDP. Trying to implement a custom streaming protocol and shader based frame encoder/decoder for screencast.
cuda desktop-capture screensharing unity unity3d windows-graphics-capture
Last synced: 14 Feb 2026
https://github.com/ankhoa1212/cuda-program
This is a GPU program built with CUDA using parallel reduction
cpp cuda curand gpu-programming parallel-reduction
Last synced: 14 Feb 2026
https://github.com/nagharjun17/mlir-to-ptx-cuda
Creating an MLIR dialect that fuses Addition + ReLU, lowers to NVVM and LLVM IR and generates PTX to run the kernel on CUDA GPU
cpp cuda deep-learning llvm mlir ptx
Last synced: 18 Apr 2026
https://github.com/mcp-tool-shop-org/gpu-container
Model-aware inference memory-placement planner for single-GPU rigs: profile hardware + model, generate explicit VRAM/RAM/NVMe placement plans across runtimes (llama.cpp/vLLM/...), and prove them with a measured receipt. Not VRAM overflow - declared placement.
cuda gpu inference llama-cpp llm moe offload vram wsl2
Last synced: 09 Jun 2026
https://github.com/mattjesc/gpu-accelerated-fap
GPU-Accelerated Frequency Analysis Prototype using CUDA, Unit Testing, and User-Defined Settings
c cmake cpp cuda cufft googletest gpu gpu-acceleration gpu-computing gpu-programming nvidia signal-processing test test-automation testing unit-testing
Last synced: 16 Apr 2026
https://github.com/illagrenan/cuda-90-cudnn7-runtime-1604-py36
Ubuntu 16.04 with Python 3.6 and CUDA9 Dockerfile
Last synced: 03 May 2026
https://github.com/smoke-y/athena
Deep learning library
cuda deep-learning deep-learning-library
Last synced: 01 Mar 2026
https://github.com/aarid/cuda_operations
This project compares performance between CPU and GPU with CUDA operations. Two simples cases are used: matrix multiplication and 2d convolution.
conv2d cuda cuda-programming gpu gpu-computing matrix-multiplication
Last synced: 02 Mar 2026
https://github.com/anselm67/cuda_mnist
A CUDA implementation of MNIST - for CUDA beginners.
cuda gpu gpu-computing gpu-programming mnist mnist-classification
Last synced: 02 Mar 2026
https://github.com/deltatecs/voses
Volatile Secret Searcher - massively parallel, brute force memory dump analysis for (D)TLS secret extraction
cuda memory-hacking reverse-engineering tls
Last synced: 15 Jun 2025
https://github.com/atticuszeller/pytorch-lightning-uv
📦 Zero-config Deep Learning template with PyTorch Lightning, UV package manager, W&B tracking, and modern Python tooling 🚀
classification cuda deep-learning machine-learning mnist-classification python pytorch pytorch-lightning typer uv
Last synced: 16 Apr 2026
https://github.com/psteinb/gtc2017
Slides for my presentation at GTC 2017 from May 8-11 in Silicon Valley
compression cuda ffmpeg gpu gpu-computing h264 h265 microscopes spim
Last synced: 03 May 2026
https://github.com/viktor-akusoff/chernabogpy
ChernabogPy is a Python package for visualizing gravitational distortions caused by black holes using nonlinear ray tracing.
cuda gpu physics-simulation python3 relativity-of-space-and-time torch
Last synced: 15 May 2026
https://github.com/zury7/parallel-programming
A collection of performance optimizations and comparisons between multiprocessing and multithreading using pthreads, OpenMP, and CUDA. The experiments analyze execution speed, resource usage, and parallelization efficiency across different computational models. ( CS 4553 : Scientific Computing )
Last synced: 08 May 2026
https://github.com/eagleeee2/ethminer
EthMiner is a powerful Ethereum mining software optimized for GPU performance using OpenCL and CUDA technologies. It provides easy setup, detailed performance metrics, and robust compatibility with major mining pools, ensuring maximum efficiency and profitability for both novice and experienced miners.
cryptocurrency cuda eth ethash ethereum ethereum-mining gpu-mining mining-pool mining-software open-source
Last synced: 16 Apr 2026
https://github.com/harmeshgv/gpu-powered-bert-finetuning
Efficient fine-tuning of BERT models using CUDA-powered GPUs, optimized for laptops and devices with NVIDIA RTX 3000/4000 series or CUDA-compatible GPUs. Ideal for fast NLP model training with PyTorch and Hugging Face Transformers.
bert-model cuda finetuning-llms pytorch
Last synced: 16 Apr 2026
https://github.com/grizzz13/minimal-cuda
Minimal configurations to setup cuda cpp in cmake.
Last synced: 18 Apr 2026
https://github.com/dstrigl/cnnplus
Master thesis 2010: Fast Convolutional Neural Network Training and Classification on CUDA GPUs
cnn convolutional-neural-networks cpp cuda gpu neural-networks speedup thesis
Last synced: 30 Jun 2026
https://github.com/AMYPAD/miutil
Basic functionality needed for AMYPAD
cuda matlab medical-imaging python
Last synced: 10 Apr 2025
https://github.com/phrutis/bip39scan
brute bip39 mnemonic GPU - $250
bip39 brute brute-force bruteforce cuda gpu mnemonic phrases seed
Last synced: 10 Apr 2025
https://github.com/joe-mruz/hgvisualizer
An interactive simulation and visualization tool for evolving hypergraphs, inspired by the Wolfram Physics Project.
cpp cuda hypergraph physics simulator wolfram
Last synced: 02 May 2026
https://github.com/iebeid/cuda-particles
A simple visualization of particles calcualted using CUDA
Last synced: 17 Apr 2026
https://github.com/jonmarty/pycuda-kmeans
A parallelized PyCuda implementation of the KMeans clustering algorithm.
Last synced: 25 Apr 2026
https://github.com/jdibenes/game_of_life_cuda
OpenGL / CUDA implementation of Conway's Game of Life.
cpp cuda opengl qt6 simulation
Last synced: 02 Apr 2026
https://github.com/chrisdalvit/gpu-matrix-transpose
Implementation and benchmarking of different matrix transpose with CUDA
c cpp cuda cuda-kernels cuda-programming gpu-acceleration gpu-computing gpu-programming matrix-transpose nvidia-gpu
Last synced: 17 Apr 2026
https://github.com/leo27945875/pybind11_cuda_matmul
cpp cuda matrix-multiplication pybind11 python3
Last synced: 17 Apr 2026
https://github.com/manishklach/gb300-rl-runtime
Close-to-metal C/CUDA lab for RL inference fast paths: persistent GPU workers, hugepage KV arenas, cacheline-aware command rings, and async reward handoff. Goal: remove page faults, malloc/free, scheduler wakeups, CPU round-trips, and KV migration from the per-token path.
ai-infrastructure close-to-metal cuda gb300 gpu-inference hpc lock-free nvlink reinforcement-learning spsc-queue
Last synced: 09 Jun 2026
https://github.com/loreloc/triturus
A bunch of triton kernels with increasing complexity for learning and exploring triton and GPU programming
Last synced: 17 Apr 2026
https://github.com/stckvrflw/pem-spgemm
pemSpGEMM - An Improved SpGEMM Algorithm
Last synced: 17 Apr 2026
https://github.com/void4main/bifurcation-diagram
These little python scripts plot a bifurcation diagram into a png file (work fine on a raspberry pi and accelerated on a NVIDIA Jetson Nano) - but still a lot of room for improvements ...
bifurcation cuda feigenbaum gpu jetson logistic map nano numba sequence vectorize
Last synced: 17 Apr 2026
https://github.com/bjornmelin/ml-production-engineering
⚙️ End-to-end ML deployment solutions. Focused on model serving, multi-GPU optimization, and production-grade system implementation. 🎯
cuda deployment docker fastapi gpu-computing kubernetes mlops production
Last synced: 17 Apr 2026
https://github.com/bjornmelin/nlp-engineering-hub
📚 Enterprise NLP systems and LLM applications. Features custom language model implementations, distributed training pipelines, and efficient inference systems. 🔤
cuda gpu-optimization huggingface huggingface-transformers langchain language-models large-language-models nlp openai python transformers
Last synced: 17 Apr 2026
https://github.com/vibesmiths/mcp-rvc
GPU service for voice cloning via Retrieval-based Voice Conversion (CUDA + ROCm)
cuda docker gpu rocm rvc tts voice-cloning
Last synced: 17 Apr 2026
https://github.com/vibesmiths/mcp-musicgen
GPU service for text-to-music generation via Meta AudioCraft (CUDA + ROCm)
audiocraft cuda docker gpu musicgen python rocm text-to-music
Last synced: 17 Apr 2026
https://github.com/briiqn/obj2schem
A CUDA enabled .obj model to schematic (Sponge V3) converter
cuda minecraft schematics wavefront-obj worldedit
Last synced: 17 Apr 2026
https://github.com/cs550-epfl/report
EPFL CS-550 project report
cuda formal-verification gpu memory-consistency ptx simt
Last synced: 03 Jun 2026
https://github.com/kentakoong/mtnlog
A simple multinode performance logger for Python
cuda lanta nvitop python slurm-cluster
Last synced: 11 Jan 2026
https://github.com/synapticore-io/torch-cuda
PyTorch CUDA 12.8 project template - GPU-accelerated ML with modern Python packaging (uv), checkpointing, early stopping, and reproducibility utilities.
cuda gpu project-template python pytorch
Last synced: 04 Apr 2026
https://github.com/seieric/pytorch-mpi-singularity
Singularity Container including PyTorch with CUDA and mpi backend for DistributedDataParallel
cuda hpc nvidia openmpi pytorch singularity utokyo
Last synced: 18 Apr 2026
https://github.com/thalesmg/haskell-accelerate-parconc
Example and benchmark of Accelerate-HS from Parallel and Concurrent Programming in Haskell
accelerate cuda gpu-computing haskell parallel-computing
Last synced: 18 Apr 2026
https://github.com/qanastek/concurency-tetravex
This software is an fast and reliable tetravex solver based on C++ and CUDA.
c-plus-plus cuda parrallel-computing tetravex
Last synced: 18 Apr 2026
https://github.com/abdelrahman-amen/active_learning_in_nlp
I applied active learning to the IMDB dataset for sentiment analysis. Starting with a small labeled subset, I trained a model and used uncertainty sampling to select and label challenging reviews. This iterative process improved performance while reducing labeling effort.
activelearning cuda entropy imdb-dataset margin nlp python sklearnex torch uncertainty
Last synced: 18 Apr 2026
https://github.com/betarixm/csed490c
POSTECH: Heterogeneous Parallel Computing (Fall 2023)
cuda gpu parallel-computing postech
Last synced: 19 Apr 2026
https://github.com/flavienbwk/nvidia-cuda-mirror-docker
An all-in-one mirror for installing NVIDIA Docker.
cuda docker linux-mirror mirror nvidia nvidia-docker nvidia-docker2 offline offline-capable
Last synced: 18 Apr 2026
https://github.com/cooliron2311/cumd5bf
CUDA based md5 password bruteforcer
Last synced: 18 Apr 2026
https://github.com/marcellodesales/nvidea-cuda-ubuntu-docker
NVidea CUDA base image on Ubuntu Linux, used to run Machine Learning
ai cuda docker docker-compose machine-learning ml nvidia-docker ubuntu
Last synced: 18 Apr 2026
https://github.com/dmmutua/cuda_projects
An Implementation of a variety of Algorithms & Technical Papers Mostly Related to Machine Learning & Deep Learning in CUDA C
c cuda cuda-programming deep-learning machine-learning machine-learning-algorithms
Last synced: 18 Apr 2026
https://github.com/genpat-it/ohe-rs
Ultra-fast one-hot encoding for bioinformatics and ML, powered by Rust + CUDA. Built for cgMLST allele profiles and large-scale categorical data.
bioinformatics cuda machine-learning one-hot-encoding performance pyo3 python rust
Last synced: 04 Jun 2026
https://github.com/ex539/docker-dev-env
A collection of ready-to-use Docker development environments for multiple Linux distributions (Ubuntu, Debian, Alpine, Arch, Kali). Includes shared configurations, utility scripts, and comprehensive documentation for reproducible development setups across teams and CI/CD pipelines.
big-data cpp cuda docker docker-image docker-php docker-setup environment hadoop jenkins kubernetes qtcreator reproducibility x11
Last synced: 05 Apr 2026
https://github.com/aditiisaxena/cuda-accelerated-box-filter-for-texture-image-enhancement
Enhances grayscale texture images using a CUDA-based box filter. Built with CUDA, C++14, and OpenCV for high-performance image processing.
cpp cuda gpu-programming linux nvidia opencv
Last synced: 18 Apr 2026
https://github.com/equiel-1703/cuhip
Wrapper tool to convert CUDA source code to HIP code and compile it with HIPCC. Useful for learning CUDA programming using AMD devices..
Last synced: 14 May 2026
https://github.com/intelav/gpu-agent-opt
AI Agent Framework for GPU Kernel Autotuning & Optimization. Automate CUDA kernel exploration, profiling, and tuning with AI-driven agents for deep learning, geospatial AI, and HPC workloads.
ai-agents autotuning cuda deep-l edge-ai geospatial gpu hpc nvidia optimization performance pytorch
Last synced: 19 Apr 2026
https://github.com/vicen-te/tiny-nn
A tiny neural network framework for fully-connected layers with CPU and CUDA support
backpropagation cplusplus-20 cpu cuda cuda-12-8 kernel multi-threaded neural-network nn
Last synced: 19 Apr 2026
https://github.com/timanema/msc-thesis-public
Repository containing a GPU-accelerated compressor based on FSST
compression cpp cuda gpu thesis
Last synced: 19 Apr 2026
https://github.com/aledinola/ifp_cuda_mex
Solve the income fluctuation problem on the GPU
Last synced: 14 May 2026
https://github.com/fatlipp/toyslam
SLAM implementation from scratch w/o external graph optimization libs
cuda gpu lidar-slam mapping odometry robotics slam
Last synced: 20 Apr 2026
https://github.com/ydkn/htw-progko-cuda
Parallel processing of image transformations. Part of the "Programmierkonzepte und Algorithmen" course at HTW-Berlin.
cuda image-transformations opencv
Last synced: 20 Apr 2026
https://github.com/rtfirst/voice-to-text
Cross-platform Push-to-Talk speech-to-text — local Whisper transcription (CUDA/MPS) with optional Anthropic API correction and live VU meter overlay. Windows 11 + macOS.
cuda macos push-to-talk python speech-to-text voice-input whisper windows
Last synced: 04 Jun 2026
https://github.com/amirbroker/cupydtw
Use Cuda for Dynamic Time Warping
cuda dtw dynamic-time-warping python
Last synced: 20 Apr 2026