CUDA
CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.
- GitHub: https://github.com/topics/cuda
- Wikipedia: https://en.wikipedia.org/wiki/CUDA
- Created by: Nvidia
- Released: June 23, 2007
- Related Topics: nvcc,
- Last updated: 2026-06-30 00:07:24 UTC
- JSON Representation
https://github.com/ophoperhpo/dcgan-lentach-logo-generator
The Lentach logo generator. #MachineLearningFun
cuda dcgan dcgan-tensorflow keras lentach machinelearning ml
Last synced: 26 Jun 2026
https://github.com/igorcosta/deep-docker
Docker image for Deep Learning on AWS Cloud
cuda deep-learning docker docker-image tensorflow
Last synced: 05 May 2026
https://github.com/daaboulex/unsloth-nix
Unsloth (git main) packaged for NixOS — CPU/CUDA/ROCm LoRA fine-tuning envs
cuda fine-tuning flake lora machine-learning nix nixos nixos-module pytorch rocm unsloth
Last synced: 10 Jun 2026
https://github.com/garciparedes/cuda-examples
Cuda examples who I develop to learn HPC based on GPU
c c-plus-plus cuda examples gpgpu gpu hpc
Last synced: 09 May 2026
https://github.com/brosnanyuen/raybnn_dataloader
Data Loader for RayBNN
arrayfire cpu csv csv-parser cuda data-structures gpu-computing oneapi opencl parallel parallel-computing rust
Last synced: 07 May 2026
https://github.com/abhans/archdev
Container that is built with Arch Linux with NVIDIA Driver & CUDA support, PyTorch and TensorFlow built in.
archlinux container cuda docker
Last synced: 07 May 2026
https://github.com/seralexeev/rabbit0
Robot Rabbit
cuda jetson nvidia robotics ros2 zed-camera
Last synced: 15 Jun 2026
https://github.com/abdulfatir/subkmeans
Numpy and pyCUDA implementation of subKmeans
clustering cuda kdd kmeans numpy pycuda python subspace-clustering
Last synced: 09 May 2026
https://github.com/thesupercd/rainbow_table_builder
A high performance CUDA-based GPU accelerated Rainbow-Table maker, written in C++ without any external libraries or dependencies needed.
cpp cryptography cuda hash-table hashing parallel-processing rainbow-table sha3 sha3-512 uuid
Last synced: 12 May 2026
https://github.com/vishalanandv/small_scale_parallel_programming
The project describes the design and development of a sparse matrixvector product kernel, implemented using super computer.
Last synced: 12 May 2026
https://github.com/brocbyte/cuball
CUDA-based implementation of "Real-Time Rigid Body Simulation on GPUs" [from GPU Gems 3]
Last synced: 12 May 2026
https://github.com/aspragueumkc/hydra2dgpu
GPU-accelerated 2D shallow water equation solver for QGIS — CUDA finite-volume method with unstructured mesh support
cuda finite-volume-method gis gpu-computing hydraulic-modeling hydrodynamics qgis shallow-water-equations
Last synced: 11 Jun 2026
https://github.com/programmergnome/cuda-codes
Snippet repository for learning parallel GPU programming with CUDA.
c cpp-programming cuda cuda-kernel gpu-programming learning-materials parallel-programming parallelization
Last synced: 13 May 2026
https://github.com/rossbates/rummage
Rummage is a GPU accelerated npub miner for Nostr
Last synced: 13 May 2026
https://github.com/nyxflower/mosaics-cuda-openmp
Simple image mosaic command line too (CUDA-OpenMP-C Implementation)
c cuda gpu-programming mosaic mosaic-images openmp parallel-computing parallel-processing
Last synced: 13 May 2026
https://github.com/gianmariaromano/pmc-translated-notes
The repository contains translated notes for the course "Programmazione di Sistemi Multicore" given by Professor De Sensi for the "Informatica" course at Sapienza Università di Roma.
cuda cuda-programming mpi multicore openmp parallel-computing parallel-programming pthreads
Last synced: 14 May 2026
https://github.com/gcol33/resolve
Neural network framework for species distribution modelling (PyTorch/C++/CUDA)
cpp cuda deep-learning ecology machine-learning neural-network pytorch species-distribution
Last synced: 12 Jun 2026
https://github.com/kaierikniermann/hpc-uzh-notes
These are some notes for the High Performance Computing course taught at UZH
cuda high-performance-computing mpi openacc openmp
Last synced: 13 Jun 2026
https://github.com/g023/cuda_inf
A self-contained CUDA inference engine for LiquidAI/LFM2.5-8B-A1B (hybrid conv + GQA-attention MoE, 8.5B params, 1B active) targeting a single RTX 3060 (12 GB). No Python, no frameworks at runtime: a single .cu engine + a header-only byte-level BPE tokenizer.
3060 ai c cpp cuda fast-inference gpu inference inference-engine large-language-models lfm25 liquidai llm moe nvidia open-source rtx token
Last synced: 15 Jun 2026
https://github.com/p4suta/mojiokoshi
Local audio transcription tool with real-time progress, powered by faster-whisper and CUDA
audio-transcription cuda docker fastapi faster-whisper gpu python self-hosted speech-to-text sveltekit transcription whisper
Last synced: 16 Jun 2026
https://github.com/acuoci/pbe-fixed-pivot-cuda
Fast CUDA implementation of aggregation and breakage terms in Population Balance Equations using the fixed pivot sectional method
aggregation breakage cuda fixed-pivot pbe
Last synced: 18 Jun 2026
https://github.com/farukalamai/jetson-yolo-cpp
Real-time object detection, segmentation and tracking on NVIDIA Jetson using YOLO + TensorRT in C++
cpp cuda jetson object-detection tensorrt yolo26
Last synced: 19 Jun 2026
https://github.com/aeyage/intraday-prices
gpu-accelerated portfolio optimisation
Last synced: 19 Jun 2026
https://github.com/drilonaliu/parallel-image-scaling
cuda gpu image-processing scaling-algorithms
Last synced: 21 Jun 2026
https://github.com/sbstndb/neural_k
A simple Neural Network library using Kokkos enabling CUDA or OpenMP backend
ai cuda kokkos library neural-network openmp
Last synced: 22 Jun 2026
https://github.com/sebsop/kmeans-thesis-segmentation
Real-time hybrid quantum-classical K-means segmentation using C++ and CUDA. Bachelor's Thesis at BBU bridging HPC and Quantum Machine Learning (QML).
cpp cuda hpc imgui kmeans opencv quantum-computing
Last synced: 23 Jun 2026
https://github.com/sebsop/realtime-parallel-kmeans-segmentation
Real-time C++ K-means image segmentation on live video streams, using OpenCV, RCC trees, and 5D features, optimized for consumer hardware with Sequential, Multi-threaded, MPI, and CUDA backends.
cpp cuda k-means-clustering mpi multithreading opencv rcc real-time-stream-processing
Last synced: 23 Jun 2026
https://github.com/cfregly/claude-gpu-perf-tune
31 GPU inference profiling and optimization skills for Claude Code, with a bundled MCP server
agent-skills claude-code cuda gpu inference llm mcp performance
Last synced: 23 Jun 2026
https://github.com/yablokolabs/bendkernels
Pure Bend parallel algorithm kernels and GPU-scaling examples
algorithms bend cuda gpu hvm parallel-computing
Last synced: 24 Jun 2026
https://github.com/awaldis/cuda-experiments
A place to explore the capabilities and limits of CUDA parallel processing.
cuda cuda-kernels cuda-programming
Last synced: 25 Jun 2026
https://github.com/angelnicolasc/meridian
Phase-aware vLLM scheduler for reasoning models: output-first dispatch, entropy-gated think termination, tiered KV eviction, and TTOT-focused benchmarking.
cuda inference kv-cache llm observability pyo3 python reasoning-models rust scheduler vllm
Last synced: 27 Jun 2026
https://github.com/kamb-code/sha256-r19-preimage
Oracle-free preimage attack on 19-round reduced SHA-256 — paper, solver, and independent verifier
cryptanalysis cryptography cuda gpu hash-functions preimage-attack security-research sha256
Last synced: 27 Jun 2026
https://github.com/huggon1/ml-algorithm-implementations
Educational implementations for ML, DL, LLM blocks, ViT, and CUDA.
cuda machine-learning numpy pytorch vision-transformer
Last synced: 28 Jun 2026
https://github.com/lk/gpu-nbody
GPU-accelerated n-body engine for t-SNE and physics simulation
cuda gpu n-body n-body-simulator
Last synced: 29 Jun 2026
https://github.com/thc1006/ct2-maxwell-final
Frozen CTranslate2 build re-introducing NVIDIA Compute Capability 5.0 (sm_50, Maxwell) so faster-whisper runs on Maxwell GPUs (Quadro K2200, GTX 750/9xx, 940MX, 960M). Packages upstream PR #1766; CUDA 12.9 + cuDNN 9.10; validated on a K2200.
ctranslate2 cuda faster-whisper maxwell sm50 speech-to-text whisper
Last synced: 29 Jun 2026
https://github.com/hurbalurba/quick-llama.cpp-server
The framework for posting a more modern cuda image for llama.cpp with cuda13 for just newer cards with RPC support. Started as just learning how to compile llama.cpp custom.
cuda cuda13 devops docker dockerbuild gguf llamacpp llm rpc
Last synced: 05 May 2026
https://github.com/j89103138/yolov11-traffic-sign
This repository contains a YOLOv11 project for training, detection, and benchmarking of traffic signs. The project utilizes CUDA acceleration to enhance performance and efficiency in real-time traffic sign detection and evaluation.
cuda opencv python pytorch traffic traffic-sign yolov11
Last synced: 05 May 2026
https://github.com/barrrry1/claymore-s-dual-miner
Claymore's Dual Miner is a powerful GPU mining software designed for Ethereum (ETH) and simultaneous dual mining of coins like Decred, Siacoin, Pascal, and Lbry. It supports AMD and NVIDIA GPUs, leveraging OpenCL and CUDA optimization for maximum hashrate. Features include automatic GPU tuning, detailed statistics, and stability watchdog.
blockchain crypto-mining cryptocurrency cuda eth ethereum gpu-mining mining mining-pool opencl
Last synced: 05 May 2026
https://github.com/pauloruszel/yolo11_face_detection
cuda nvcc nvidia-gpu pip python3 pytorch widerface-dataset yolo11
Last synced: 05 May 2026
https://github.com/jakubfr4czek/concurrent-gauss-elimination
Concurrent gaussian elimination algorithm implemented using traces theory. Parallelism has been achieved employing CUDA cores.
agh agh-ust agh-wi conda cuda cuda-kernels cuda-toolkit diekert-graph graphviz java python python3 traces-theory
Last synced: 05 May 2026
https://github.com/abdelrahman-amen/active_learning_with_different_query_strategies
This project explores the implementation of active learning techniques, focusing on various query strategies to optimize the selection of informative data points for model training. It aims to reduce the amount of labeled data required while improving model performance, especially in scenarios with limited labeled data.
activelearning cuda entropy kldivergence margin numpy python pyto uncertainty
Last synced: 06 May 2026
https://github.com/hritiksauw199/human-face-to-cartoon-conversion-using-optimized-cyclegan
Transform real human faces into cartoon-style images using a reduced CycleGAN architecture optimized for efficiency and quality.
cuda cyclegan data-science deep-learning deep-neural-networks gan human-cartoon matplotlib neural-network python pytorch torchvision
Last synced: 06 May 2026
https://github.com/iglee/jax-cuda-eicl-exp-docker
Docker for getting jax to work with cuda, for reproducing ml experiments like eicl. Sure, let's NOT make a compatibility matrix and let people fight for their lives on cuda
cuda docker jax jaxline ml-engineering ml-experiments tensorflow
Last synced: 06 May 2026
https://github.com/raiszo/cs334
Journey thorugh Intro to Parallel Programming
Last synced: 06 May 2026
https://github.com/r00tens/text-classifier
Naive Bayes classifier for text classification with CPU and GPU (CUDA)
classification classifier cpp cuda machine-learning naive-bayes
Last synced: 06 May 2026
https://github.com/iamfaham/model-inference-profiler
A PyTorch-based tool for profiling deep learning model inference performance, analyzing computational bottlenecks, and visualizing resource utilization.
cuda memory pytorch visualizations
Last synced: 06 May 2026
https://github.com/rosnavigator/parallelkmeansimagecompressor
Parallel KMeans-based image quantization compressor that reduces the number of colors in an image while preserving visual quality. It uses KMeans clustering for color quantization and supports sequential, OpenMP, MPI, and CUDA implementations for performance and scalability. PoliMi - Advanced Methods for Scientific Computing (2023-2024)
boost clustering colors compression cuda image-quantization kmeans kmeans-clustering lossy-compression mpi odette opencv openmp parallel-computing parallel-programming performance polimi scalability sl-train
Last synced: 06 May 2026
https://github.com/sarodyatawatta/flagpol
Energy and polarization based interference mitigation
cuda energy mixed-precision polarization radio-frequency-interference radio-interferometry reinforcement-learning statistical-inference
Last synced: 06 May 2026
https://github.com/jamesnulliu/learning-programming-massively-parallel-processors
Leaning notes of Programming Massively Parallel Processors, 4-th edition.
Last synced: 06 May 2026
https://github.com/mka-codelake/wispy
Minimalist push-to-talk dictation tool for Windows. Faster Whisper, local, offline.
cuda dictation faster-whisper local offline portable push-to-talk python speech-to-text stt transcription voice-input whisper windows
Last synced: 06 May 2026
https://github.com/sebp/vscode-sycl-dpcpp-cuda
Sample project to use the VS Code Remote - Containers extension to develop SYCL applications for NVIDIA GPUs using the oneAPI DPC++ compiler.
cuda dpcpp fedora gpu-computing podman sycl vscode
Last synced: 06 May 2026
https://github.com/jpuigcerver/prob-phoc
Probabilistic relevance scores from PHOC embeddings
cuda keyword-spotting kws phoc pytorch
Last synced: 07 May 2026
https://github.com/drilonaliu/parallel-sierpinski-triangle
GPU-accelerated Sierpinski Triangle generation with CUDA and OpenGL interoperability.
cuda fractals gpu parallel-programming sierpinski-triangle
Last synced: 07 May 2026
https://github.com/yuuuuurei/yolo-sibi
Real-time SIBI hand gesture detection using YOLOv8 and deep learning classifiers.
bahasa-indonesia bahasa-isyarat cuda deep-learning hand-gesture hand-gesture-recognition pytorch real-time sibi sign-language yolo yolov8
Last synced: 07 May 2026
https://github.com/drilonaliu/parallel-koch-snowflake
GPU-accelerated Koch Snowflake generation with CUDA and OpenGL interoperability.
cuda fractals gpu koch-snowflake parallel-programming
Last synced: 07 May 2026
https://github.com/muhamadajiw/parallel-matrix-inversion
A parallel program for matrix inversion using MPI, OpenMP, and CUDA
Last synced: 07 May 2026
https://github.com/shreya888/learning-cuda-with-cpp-and-pytorch
My notes, code, & insights will be recorded here while learning CUDA with C++ and PyTorch
Last synced: 07 May 2026
https://github.com/stevenchang5/canny_edge
Implementation of canny edge detection, with option to use cuda to improve performance
Last synced: 07 May 2026
https://github.com/rssr25/cuda
Following Cuda By Example book.
cpp cuda cuda-programming hpc shaders
Last synced: 07 May 2026
https://github.com/wpjunior/cuda-numba-playground
Some uses of cuda with numba framework
Last synced: 07 May 2026
https://github.com/pankajarm/ethereum-mining-cuda
cuda ethereum ethereum-mining ethminer ubuntu1604
Last synced: 08 May 2026
https://github.com/not-ml/ml-3
A PyTorch-based Convolutional Neural Network (CNN) for image classification using the CIFAR-10 dataset, featuring advanced architecture, data augmentation, GPU support, and dynamic learning rate scheduling.
ai cifar10 cnn cuda gpu image-classification machine-learning modeltraining python pytorch torchvision
Last synced: 08 May 2026
https://github.com/jimmygizmo/tensorpup
Machine-learning model training using parallelization strategies on multiple serverless GPU instances.
ai cuda cudnn distributed gpu serverless tensorflow
Last synced: 08 May 2026
https://github.com/popke523/rybki
A 3D shoal of fish animation using the boids algorithm, OpenGL for rendering and CUDA for parallel processing.
Last synced: 08 May 2026
https://github.com/leo27945875/parallel_pso
cpp cuda openmp parallel-programming particle-swarm-optimization pthread pybind11 python
Last synced: 08 May 2026
https://github.com/sydney-informatics-hub/computer-vision-fine-tuning
Fine tune a computer vision to solve your task locally, on HPC, in a container, or in the cloud!
computer-vision cuda deep-learning python
Last synced: 09 May 2026
https://github.com/sugarcane-mk/finetuning_wav2vec2
This repo provides step by step process from sctatch to fine tune facebook's wav2vec2-large model using transformers
asr asr-model cuda facebook fairseq fine-tuning finetuning huggingface librosa python torch transformers wav2vec2 wav2vec2-large-960h
Last synced: 09 May 2026
https://github.com/ginkobalboa/parfis
Particles and field simulator. Written in C++ with Python bindings. The algorithm is based on the particle-in-cell (PIC) method used for interacting many-particle systems.
cpp cuda physics-simulation python
Last synced: 09 May 2026
https://github.com/dbklim/optimized_tensorflow_wheels
Optimized versions TensorFlow and TensorFlow-GPU for specific CPUs and GPUs (for both old and new).
cuda nvidia-cuda nvidia-gpu tensorflow tensorflow-community-wheels tensorflow-gpu tensorflow-packages tensorflow-whells wheels
Last synced: 09 May 2026
https://github.com/lfrati/subpair
Fast pairwise cosine distance calculation and numba accelerated evolutionary matrix subset extraction 🍐🚀
Last synced: 09 May 2026
https://github.com/nick8592/ubuntu-20.04-cuda-cudnn-pytorch
cuda cuda-toolkit cudnn python3 pytorch ubuntu2004
Last synced: 09 May 2026
https://github.com/xorengine/marvin4000
Real-time audio translation using Whisper + SeamlessM4T / NLLB-200
ai asr audio-processing consumer-hardware cuda gpu-accelerated machine-learning multilingual nllb nmt pytorch real-time seamlessm4t speech-recognition transcription translation whisper
Last synced: 09 May 2026
https://github.com/starlitdreams/lunar-landing
This project implements a DQN agent using PyTorch to solve the LunarLander-v2 environment from OpenAI Gym. The agent learns to control the lunar lander using experience replay and a target network, aiming to maximize rewards by landing smoothly. Uses CUDA for computation.
artificial-intelligence cuda deep-learning gymnasium neural-network neural-networks numpy nvidia-gpu python python3 torch
Last synced: 09 May 2026
https://github.com/donaurelio/ansible-playbooks
A Bunch of ansible-playbooks that automate computer infraestruture provisioning
ansible-playbooks cuda docker gromacs openmpi
Last synced: 09 May 2026
https://github.com/michaelfranzl/image_fah-client
Dockerfile for Folding@home client with AMD and Nvidia GPGPU support
container cuda debian docker foldingathome gpu-computing opencl
Last synced: 09 May 2026
https://github.com/edumucelli/build-tensorflow
Build Tensorflow from source using a Dockerfile
Last synced: 10 May 2026
https://github.com/chris-official/pytorchgaf
PyTorch accelerated GAF transform
cuda gpu gramian-angular-fields image-analysis python pytorch time-series
Last synced: 10 May 2026
https://github.com/neuraladitya/neural_network_c
Neural Network C is an advanced neural network implementation in pure C, optimized for high performance on CPUs and NVIDIA GPUs.
artificial-intelligence bayesian-optimization c-programming convolutional-neural-networks cuda deep-learning encryption gpu-computing high-performance-computing machine-learning mpi multi-gpu neural-network openmp parallel-computing quantization real-time-monitoring secure-computing tensor-cores transformers
Last synced: 10 May 2026
https://github.com/sebftw/interp2gpu
GPU-accelerated 2D spline interpolation, à la interp2(..., "spline"), in MATLAB.
cuda gpu gpu-acceleration matlab spline spline-interpolation
Last synced: 10 May 2026
https://github.com/cashcon57/open-supersampling
OpenSuperSampling (OSS) — vendor-agnostic open-source RT denoising, upscaling, and frame extrapolation
cuda deep-learning dlss frame-generation fsr game-engine gaussian-splatting open-source real-time-rendering super-resolution upscaling
Last synced: 10 Jun 2026
https://github.com/dlr-amr/t8gpu
Header-only finite volume library targetting GPUs using t8code as meshing backend.
adaptive-mesh-refinement cuda finite-volume gpgpu-computing hpc mesh mpi parallel-computing simulation
Last synced: 10 May 2026
https://github.com/jeremywildsmith/shadowhash-distributed
Elixir distributed Shadow File password cracker with GPU accelerated cracking for md5crypt hashing algorithm.
cracking-hash cracking-hashes cracking-password cuda distributed-systems elixir erlang hashing nx security
Last synced: 11 May 2026
https://github.com/fabulani/360ip-with-cuda
360° Image Processing with CUDA and OpenCV.
360-image 360-video cpp cuda image-processing opencv
Last synced: 11 May 2026
https://github.com/islamshahil/live-video-analysis
Live Video Analysis using PyTorch
cuda deeplearning neural-network opencv-python python pytorch video-processing webcam
Last synced: 11 May 2026
https://github.com/apws25/accelmoe
This repository is for CUDA kernel re-implementation of CPU-based MoE model.
Last synced: 11 May 2026
https://github.com/daniilvorontsov/fourier-option-pricing
MSc thesis project concerned with option pricing for Levy Jump models. Package includes pricing implementations for European Call and Put options for Carr-Madan, COS and Fourier Time Stepping.
carr-madan cuda fourier-transform monte-carlo option-pricing
Last synced: 11 May 2026
https://github.com/theogravity/dual-rtx-6000-blackwell-gemma-4-31b-it-nvfp4
Optimized vLLM setup for Gemma 4 31B NVFP4 with MTP on dual RTX PRO 6000 Blackwell using vllm and docker: native FP4 Tensor Cores, Multi-Token Prediction (96.5% acceptance rate), and prefix caching. Includes benchmark results and replication scripts.
am5 amd blackwell cuda docker fp4 gemma gemma4 llm-inference multi-token-prediction nvfp4 prefix-caching rtx-6000 speculative-decoding tensor-parallel vllm
Last synced: 11 May 2026
https://github.com/realdougeubanks/unmanic.plugin.encoder_video_hevc_nvenc_gpu
Unmanic plugin: H.265/HEVC encoder using NVIDIA hevc_nvenc with a true end-to-end GPU pipeline. Fork of Josh5/unmanic.plugin.encoder_video_hevc_nvenc that adds -hwaccel_output_format cuda when NVDEC HW decoding is enabled, keeping decoded frames in GPU memory through NVENC. Drop-in replacement with sensible defaults and full settings parity.
cuda ffmpeg hardware-acceleration nvdec nvenc nvidia unmanic unmanic-plugin video-transcoding
Last synced: 12 May 2026
https://github.com/skailasa/msc-thesis
A modular thesis
cuda fast-multipole-method kernel-independent numba python3
Last synced: 12 May 2026
https://github.com/tomaszrewak/csgpathtracer
A constructive solid geometry path tracer.
computer-graphics cuda path-tracing rendering
Last synced: 12 May 2026