CUDA
CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.
- GitHub: https://github.com/topics/cuda
- Wikipedia: https://en.wikipedia.org/wiki/CUDA
- Created by: Nvidia
- Released: June 23, 2007
- Related Topics: nvcc,
- Last updated: 2026-06-30 00:07:24 UTC
- JSON Representation
https://github.com/cklxx/arle
Rust-native inference runtime for Qwen3 / Qwen3.5 — OpenAI-compatible serving + integrated agent, train, and self-evolution workflows. CUDA + Metal, no PyTorch on the hot path.
agent cuda flashinfer gspo inference infra kv-cache llm metal mlx openai-compatible qwen3 qwen35 rl rust
Last synced: 02 May 2026
https://github.com/scarfy-sysu/rtx5060-pytorch-cuda129
Run PyTorch with CUDA 12.9 on RTX 50 series (e.g. RTX 5060)
cuda deep-learning pytorch rtx5060
Last synced: 20 Jul 2025
https://github.com/nellogan/distributed_compy
Distributed_compy is a distributed computing library that offers multi-threading, heterogeneous (CPU + mult-GPU), and multi-node support
cluster cuda heterogeneous-parallel-programming multi-threading multigpu openmp openmpi
Last synced: 16 Aug 2025
https://github.com/murrellgroup/conflux.jl
Single-node data parallelism in Julia with CUDA
cuda data-parallelism flux julia nccl
Last synced: 22 May 2026
https://github.com/xlite-dev/HGEMM
⚡️Write HGEMM from scratch using Tensor Cores with WMMA, MMA PTX and CuTe API. 🎉🎉
Last synced: 30 Jul 2025
https://github.com/B1-663R/docker-mining
Dockerfiles to build docker images to start mining with an NVIDIA Docker architecture
cryptocurrency cuda docker-image docker-nvidia mining
Last synced: 28 Mar 2025
https://github.com/mirzaim/cuda-devcontainer
CUDA Development Container
cuda devcontainer devcontainers docker remote-development
Last synced: 23 Apr 2025
https://github.com/stdogpkg/cukuramoto
A python/CUDA pkg which solves numerically the kuramoto model through the Heun's method
complex-networks cuda kuramoto-model
Last synced: 28 Jan 2026
https://github.com/debowin/gpu-parallel-recommender-system
GPGPU Parallel User-User Collaborative Filtering System in CUDA C
collaborative-filtering cuda gpu-programming movielens-dataset recommender-system
Last synced: 24 Apr 2026
https://github.com/nekon69/fastnoiselitecuda
A wrapper around C++ FastNoiseLite library for CUDA
cellular-noise computer-graphics cpp cuda fastnoiselite gamedev generative-art gpgpu gpu header-only noise opensimplex2-noise pcg perlin-noise procedural-generation simplex-noise terrain-generation texture-generation worley-noise
Last synced: 02 Oct 2025
https://github.com/kpetridis24/four-russians-algorithm
Boolean matrix multiplication accelerated by the four-Russians algorithm
c cuda gpu high-performance matrix-multiplication preprocess
Last synced: 29 May 2026
https://github.com/lintenn/cudaaddvectors-explicit-vs-unified-memory
Performance comparison of two different forms of memory management in CUDA
c cuda explicit memory memory-management performance unified-memory
Last synced: 17 May 2026
https://github.com/l30nardosv/reproduce-parcosi-moleculardocking
Reproducing paper: "Benchmarking the Performance of Irregular Computations in AutoDock-GPU Molecular Docking"
autodock-gpu cpu cuda gpu molecular-docking molecular-docking-scripts opencl paper reproducible-research
Last synced: 16 Feb 2026
https://github.com/andreabak/whispersubs
Generate subtitles for your video or audio files using the power of AI
ai cuda deep-learning gpu-acceleration machine-learning srt subtitles transcribe transcription translate whisper
Last synced: 15 Feb 2026
https://github.com/teodutu/asc
Arhitectura Sistemelor de Calcul - UPB 2020
cache-optimization cuda parallel-programming profiling python-threading
Last synced: 24 Apr 2026
https://github.com/csvancea/gpu-hashtable
GPU-backed linear-probing hash table implemented in CUDA. Supports batch operations such as insert and retrieval.
Last synced: 24 Apr 2026
https://github.com/bdwhst/fluora
A CUDA PBR path tracer
cpp cuda pathtracing pbr rendering
Last synced: 13 Feb 2026
https://github.com/amruthapatil/nyu-cudamatrixoperations
Optimizing CUDA programs for vector addition and matrix multiplication
cuda high-performance-computing
Last synced: 21 May 2026
https://github.com/tiw302/mandelbrot-c
A simple Mandelbrot set explorer written in C. Crafted with SDL2 and multithreaded rendering for a smooth experience. ‹(•_•)›
c cuda fractal graphics mandelbrot multithreading sdl2 web webassembly
Last synced: 26 Apr 2026
https://github.com/rogerallen/jmandelbrotr
Java CUDA Mandelbrot explorer
cuda cuda-opengl java jcuda joml lwjgl3 mandelbrot-viewer opengl
Last synced: 18 Apr 2026
https://github.com/alpinebuster/arkime-docker-compose
Deploy Arkime with GPU-accelerated Rust/Python parsers and custom plugins using Docker Compose.
arkime c cuda deep-neural-networks docker docker-compose llm machine-learning networking pcap pcapng python rust traffic-analysis
Last synced: 16 Apr 2026
https://github.com/true-real-michael/python-plane-ransac
Parallel RANSAC for plane detection for multiple point clouds using Python and CUDA
cuda numba plane-detection python ransac
Last synced: 14 Mar 2025
https://github.com/kagof/julia-image-processing
Image processing programs written in Julia
Last synced: 18 May 2026
https://github.com/neoblizz/cupti-plus-plus
CUPTI++ is a C++ interface to the CUDA Profiling Tools Interface (CUPTI).
cpp cuda cuda-profiler cupti profiler
Last synced: 26 Apr 2026
https://github.com/mazharuddin-mohammed/semidgfem
High-performance TCAD Simulator Using Discontinuous Galerkin FEM
cuda discontinuous-galerkin-method tcad tcad-device-simulator
Last synced: 15 Jun 2025
https://github.com/acrlakshman/gradient-augmented-levelset-cuda
Implementation of Gradient Augmented Levelset method for CPU and GPU
Last synced: 17 Feb 2026
https://github.com/l1cacheDell/CUDA_Code
Codes for learning cuda. Implementation of multiple kernels.
Last synced: 10 Mar 2025
https://github.com/capelliexp/sc2-im-pf-pathfinding-thesis
Master of science thesis project. Using CUDA to utilize a systems GPU to create pathfinding data (IM+PF), usable by multiple agents in the same environment.
ai cplusplus cuda gpgpu pathfinding starcraft2
Last synced: 15 May 2026
https://github.com/gjbex/gpu-programming
Material for a training on portable GPU programming
cuda gpu kokkos openmp openmp-off stl thrust
Last synced: 08 Feb 2026
https://github.com/djenriquez/ewbf-cuda-miner
Run ewbf-miner for zcash
cuda docker mining nvidia nvidia-docker zcash zcl zclassic
Last synced: 17 May 2026
https://github.com/avitase/fast_frechet
Comparison of different (fast) discrete Fréchet distance implementations in C++ and CUDA.
benchmark cpp cuda frechet-distance simd
Last synced: 18 May 2026
https://github.com/zeloe/juce_cuda_convolution
GPU acceleration for efficient, high-quality audio processing.
audio audio-processing convolution cuda dsp juce
Last synced: 03 Mar 2026
https://github.com/tank3-tk3/pi-calculation-cpu-gpu
PI calculation with CPU and GPU
c cpp cuda parallel-computing pi
Last synced: 13 Apr 2026
https://github.com/lukasboettcher/msc-code
This is the repo for my master thesis on a GPU accelerated andersen analysis.
andersen-analysis clang cuda llvm static-analysis
Last synced: 16 Jan 2026
https://github.com/alpha74/cuda_basics
Nvidia NVCC CUDA programs for begineers.
c cpp cuda cuda-programs nvcc nvidia parallel-computing parallel-programming
Last synced: 08 May 2026
https://github.com/terrylindev/image-to-ASCII
🖼️ A command-line tool for converting images to ASCII art
ascii ascii-art cli command-line cpp cuda docker image-processing image-to-ascii mpi opencv terminal
Last synced: 12 Jul 2025
https://github.com/neoblizz/spmv
Efficient Sparse Matrix-Vector Multiplication (SpMV) using ModernGPU (MTX + CSR formats).
csr cuda gpgpu load-balancing mtx spmv
Last synced: 28 Apr 2026
https://github.com/grakshith/parallel-k-means
K-Means clustering for Image Colour Quantization and Image Compression
cuda image-color-quantization image-compression k-means mpi opencv openmp
Last synced: 28 Apr 2026
https://github.com/xmas7/cudampi
A large hybrid CPU/GPU sorting network using CUDA and MPI. The sorting network uses a standard Quicksort for CPUs and a custom Bitonic Sort for GPUs. These two algorithms were the fastest in a number of prior benchmarks.
cpu cuda gpu hybrid mpi network
Last synced: 29 Apr 2026
https://github.com/digimortl/libguess
Patches that give Bitcoin Core an ability of CUDA mining
bitcoin c-plus-plus cryptocurrency cuda
Last synced: 16 Apr 2026
https://github.com/tawssie/zmpy3d_pt
Python implementation of 3D Zernike moments with PyTorch
3d-zernike cuda gpu protein-structure python pytorch structural-bioinformatics superposition zernike-moments
Last synced: 24 Oct 2025
https://github.com/nixos-cuda/cuda-legacy
Select CUDA package sets which have aged out of Nixpkgs. [maintainers=@ConnorBaker, @SomeoneSerge]
Last synced: 15 May 2026
https://github.com/shreyansh26/mlsys-experiments
A collection of scripts on experimenting and implementing MLSys-related stuff
cuda cuda-kernel gpu gpu-programming llm-inference profiling pytorch triton
Last synced: 30 Aug 2025
https://github.com/kishore-narendran/eecs221-highperformancecomputing
Assignments done during the graduate course EECS 221 - Introduction to HPC that I took in the Spring Quarter of 2016 at University of California, Irvine. Involves assignments that use OpenMP, MPI and CUDA.
Last synced: 17 May 2026
https://github.com/boltzmannentropy/vllm-5090
vLLM-5090: Docker Container for RTX 5090 on WSL2/Windows
Last synced: 08 Oct 2025
https://github.com/tvanfossen/entropic
Local-first agentic inference engine in C/C++. Multi-tier model routing, grammar-constrained output, MCP tool servers. Embeddable via C ABI.
agentic-ai agentic-framework cpp cpp20 cuda edge-ai embedded-ai gbnf gguf grammar-constrained-decoding inference-engine llama-cpp llm local-llm mcp on-device-ai privacy-first tool-calling
Last synced: 30 May 2026
https://github.com/seungjaelim/cuda.tutorial
References content from the OLCF CUDA Training Series. (https://github.com/olcf/cuda-training-series)
cuda gpu-programming nsight-compute nsight-systems
Last synced: 07 Feb 2026
https://github.com/alexjmercer/fractal-art
Generating Fractals in C++ using SFML. For the ultimate visual stimulation and in-depth code!
cmake cmakelists cpp20 cuda cuda-programming fractal-rendering graphics mandelbrot multithreading sfml2
Last synced: 05 Mar 2026
https://github.com/isazi/aoflagger
AOFlagger Radio Frequency Interference mitigation algorithm.
Last synced: 30 Apr 2026
https://github.com/headless-start/data-augmentation-impact
This repository contains effect of Data Augmentation of Training Set during Model Training.
augmented-images cuda data gpu keras matplotlib mnist opencv-python python3 tensorflow training-data
Last synced: 05 Apr 2026
https://github.com/muhac/jupyter-pytorch-docker
JupyterLab for AI in Docker! Anaconda and PyTorch GPU supported.
conda-environment cuda docker jupyterlab pytorch
Last synced: 01 Oct 2025
https://github.com/dqbd/cuda-btree
Implementation of B-Trees on NVIDIA CUDA
Last synced: 30 Apr 2026
https://github.com/elftausend/sliced
Array operations with automatic differentiation on CPU and GPU
autograd automatic-differentiation cuda custos matrix opencl
Last synced: 31 Jan 2026
https://github.com/szymon423/tsp-cpu-vs-gpu
Simple brute force approach to solve travelling salesman problem with CPU and GPU
Last synced: 11 Mar 2025
https://github.com/kilamper/matrix-multiplication
AC - Matrix multiplication using OpenMP, MPI and CUDA
Last synced: 16 May 2026
https://github.com/nondairyneutrino/pararealgpu.jl
A distributed and GPU-based implementation of the Parareal algorithm for parallel-in-time integration of equations of motion.
accelerator computational-physics computational-science cuda differential-equation-solvers distributed-computing gpu-computing high-performance-computing julialang ode ordinary-differential-equations parallel-computing parallel-in-time-integration parareal partial-differential-equation pde simulation
Last synced: 21 Apr 2026
https://github.com/lightshade12/kittlespt
A hobby CUDA pathtracing renderer.
3d-graphics computer-graphics cuda gpu path-tracing ray-tracing
Last synced: 18 Mar 2025
https://github.com/thunder-compute/thunder-compute-documentation
Documentation for Thunder Compute, a cloud platform creating technology to virtualize GPUs over TCP
ai artificial-intelligence cloud cloud-computing cuda gpu llm machine-learning nvidia pytorch tensorflow thunder-compute virtualization
Last synced: 15 Oct 2025
https://github.com/shahed-chy-suzan/psd-to-html--cuda
Cuda is a single page creative portfolio psd to html template which is built with HTML5 & CSS3. The site can be customized easily to suit your needs.
Last synced: 18 Jan 2026
https://github.com/dhruvsrikanth/fastconv
Distributed and serial implementations of the 2D Convolution operation in c++ and CUDA.
convolution-filters cpp cuda gpu-programming high-performance-computing hpc image-editor image-processing nvidia parallel-programming
Last synced: 04 May 2026
https://github.com/pratikvn/nla4hpc-exercises-framework
The exercises framework for the Numerical Linear Algebra for HPC course at Karlsruhe Institute of Technology.
cuda ginkgo homeworks hpc-course teaching
Last synced: 19 May 2026
https://github.com/brosnanyuen/raybnn_graph
Graph Manipulation Library For GPUs, CPUs, and FPGAs via CUDA, OpenCL, and oneAPI
cuda gpu graph graph-algorithms neural-network neural-networks opencl raybnn rust
Last synced: 06 Feb 2026
https://github.com/alpha74/hungarianalgocuda
Hungarian Algorithm for Linear Assignment Problem implemented using CUDA.
cuda nvcc parallel-computing parallel-programming
Last synced: 01 Jun 2026
https://github.com/ashwani-rathee/imagesgpu.jl
Image Processing on GPU in Julia
cuda gpu image image-processing julia
Last synced: 11 Jul 2025
https://github.com/haleelrah/Vision-pro-MAX
A Raspberry Pi-based object detection system for assisting visually impaired individuals. This project utilizes YOLO object detection and a Hailo 8L TPU to identify obstacles like manholes, potholes, and bumps, providing real-time audio feedback to aid navigation.
bash computer-vision cuda fine-tuning jupyter-notebook object-detection opencv python pytorch raspberry-pi rpi-camera ssh text-to-speech ultralytics yolo yolov8
Last synced: 30 Dec 2025
https://github.com/mark0011astra/simplecuda
CUDAを使用したGPU演算をNumPyと同様のインターフェースで簡単行えるライブラリ。A library that allows users to easily perform GPU operations using CUDA with a NumPy-like interface.
cuda cupy gpu machine-learning numpy python vector
Last synced: 02 May 2026
https://github.com/jonathanraiman/mini_cuda_rtc
Miniature CUDA Array library with Runtime Compilation
cpp11 cuda jit runtime-compilation
Last synced: 14 Apr 2026
https://github.com/liberxue/parallel_computing
CUDA Algorithm && Hacker's Delight
algorithms cuda cuda-kernels cuda-programming hacker-s-delight nvidia
Last synced: 24 Feb 2026
https://github.com/yaronkoresh/definers
A comprehensive Python toolkit for AI, data processing, media manipulation, and system utilities.
artificial-intelligence cuda data-science deep-learning diffusers feature-extraction generative-ai gpu gradio image-generation machine-learning multimedia music-generation python-library pytorch scikit-learn toolkit transformers video-generation web-scraping
Last synced: 08 Apr 2026
https://github.com/mcp-tool-shop-org/backpropagate
Headless LLM fine-tuning in 3 lines — smart defaults, VRAM-aware batch sizing, multi-run SLAO, GGUF export for Ollama.
api cuda fine-tuning headless llm lora machine-learning ollama python qlora training unsloth web-security windows
Last synced: 31 May 2026
https://github.com/unvercan/ssd300-model-pytorch
SSD300 Model using PyTorch
cnn computer-vision convolutional-neural-networks cuda deep-learning image-processing neural-network object-detection opencv python pytorch single-shot-detection ssd ssd300
Last synced: 17 Mar 2025
https://github.com/rajarsheya/real-time-audio-feature-extraction-with-cuda-for-speech-recognition
This project accelerates MFCC extraction using CUDA for real-time speech recognition. Offloading the process to the GPU reduces latency and speeds up processing, enabling fast, local speech-to-text transcription for applications like virtual assistants, without cloud reliance.
audio-processing cpp cuda fourier-transform python
Last synced: 10 May 2026
https://github.com/lcsb-biocore/cufluxsampler.jl
GPU-accelerated algorithms for flux sampling in CUDA.jl
cobra cuda gpu julia metabolic-network metabolism sampling
Last synced: 02 May 2026
https://github.com/jxlarrea/homeassistant-voice-recipes
GPU/CUDA-accelerated voice control stack for Home Assistant. Runs on x86/x64 and ARM64 (including the NVIDIA DGX Spark). 100% Local - No Cloud, No Subscriptions.
arm64 cuda dgx-spark gb10 gpu-acceleration home-assistant local-llm qwen3 speech-to-text text-to-speech voice-assistant x86-64
Last synced: 26 May 2026
https://github.com/vishwamartur/btc_recovery
High-performance Bitcoin wallet password recovery system with GPU acceleration and integrated graphics support. Recover Bitcoin Core wallet.dat files without blockchain download using advanced algorithms and blockchain APIs.
bitcoin bitcoin-core blockchain blockchain-api cpp cryptocurrency cuda electrum gpu-acceleration integrated-graphics multithreading opencl password-recovery private-keys recovery-tools wallet-dat wallet-recovery
Last synced: 14 Apr 2026
https://github.com/deepankaracharyya/6th_sem_assignments
c cuda data-mining postgresql-database python
Last synced: 02 May 2026
https://github.com/brendanbignell/cuda_montecarlooptionpricer
CUDA Monte Carlo Barrier Option Pricing Demo & Jupyer lab ML models
cuda deep-learning ml pytorch quantitative-finance xgboost-regression
Last synced: 19 Apr 2026
https://github.com/ruturaj4/cuda_nvidia_tutorial
cuda projects
cuda cuda-vector-addition nvidia nvidia-cuda parallel
Last synced: 26 Oct 2025
https://github.com/dvhh/masscorrelation
An exercise in writing an efficient correlation calculator
calculations correlation-calculation cuda matrix multi-threading openmp
Last synced: 15 May 2026
https://github.com/bokutotu/cudnn_graph_api_example
cudnn graph api example
Last synced: 04 May 2026
https://github.com/rnabla/cuda-des
Bruteforcing DES using CUDA
bruteforce cuda data des encryption gpu parallel standard
Last synced: 27 Oct 2025
https://github.com/rajarsheya/real-time-traffic-analysis-with-cuda-object-detection
Implemented CUDA-accelerated object detection (YOLO) to analyze a sample image dataset. Performed vehicle counting and simulated speed estimation to demonstrate real-time traffic analysis capabilities.
Last synced: 12 Apr 2026
https://github.com/liuyuweitarek/pytorch-docker-builder
Automate PyTorch Docker image builds with compatible Python, CUDA, and Poetry versions, including CI/CD for testing.
cicd containerd cuda docker docker-image poetry-python python python3 pytorch pytorch-docker
Last synced: 06 Feb 2026
https://github.com/rhysdg/whisper-onnx-python
A low-footprint GPU accelerated Speech to Text Python package for the Jetpack 5 era bolstered by an optimized graph
ai chatbot cuda machine-learning onnxruntime speech-to-text whisper
Last synced: 16 Feb 2026
https://github.com/thomasonzhou/minitorch
rebuilding pytorch: from autograd to convolutions in CUDA
Last synced: 02 Feb 2026
https://github.com/Programmer-RD-AI/DetectX
A Pythonic approach to object detection using Detectron2, a clean, modular framework for training and deploying computer vision models. DetectX simplifies the complexity of object detection while maintaining high performance and extensibility.
coco-dataset computer-vision computer-vision-library cuda deep-learning detectron2 faster-rcnn gpu-accelerated machine-learning ml-framework object-detection object-recognition python3 pytorch retinanet
Last synced: 04 May 2025
https://github.com/SanaeProject/Matrix-for-Cpp
This repository has types that handle matrices.
cpp14 cpp14-library cuda matrix-library
Last synced: 15 May 2025
https://github.com/alekseyscorpi/vacancies_server
This is a server for vacancies generation using LLM (Saiga3)
code cuda cuda-toolkit docker dockerfile flask llama3 llamacpp llm ngrok pydantic saiga
Last synced: 06 Feb 2026
https://github.com/hartorn/docker-python
Repository to build python image, based on ubuntu and CUDA
cuda docker mkl-dnn onednn python3 ubuntu ubuntu1804
Last synced: 05 May 2026