CUDA
CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.
- GitHub: https://github.com/topics/cuda
- Wikipedia: https://en.wikipedia.org/wiki/CUDA
- Created by: Nvidia
- Released: June 23, 2007
- Related Topics: nvcc,
- Last updated: 2026-06-22 00:07:17 UTC
- JSON Representation
https://github.com/sashakolpakov/graphem-rapids
Graph embedding for influence maximization in networks
cuda cuda-kernels embeddings graph-algorithms graph-theory pykeops pytorch rapidsai
Last synced: 16 Apr 2026
https://github.com/jtriley/gpucrate
Creates hard-linked GPU driver (currently just NVIDIA) volumes for use with docker, singularity, etc.
container cuda docker gpu singularity
Last synced: 27 Feb 2026
https://github.com/nolmoonen/cuda-sdf
CUDA-accelerated path traced Menger sponge using ray marching.
cuda menger path-tracer ray-marching sdf
Last synced: 12 Feb 2026
https://github.com/bensuperpc/easyai
Make your own AI easily !
ai cuda python python3 tensorflow
Last synced: 16 Feb 2026
https://github.com/lawmurray/gpu-gemm
CUDA kernel for matrix-matrix multiplication on Nvidia GPUs, using a Hilbert curve to improve L2 cache utilization.
cplusplus cuda cuda-kernels cuda-programming gpu gpu-computing gpu-programming matrix-multiplication numerical-methods scientific-computing
Last synced: 01 Mar 2026
https://github.com/btursunbayev/nvsonar
Active GPU diagnostic tool that identifies performance bottlenecks using micro-probes
cuda diagnostics gpu monitoring nvidia performance
Last synced: 02 Apr 2026
https://github.com/phael-exe/aco-selection-parallel
Parallelization of ACO with CUDA and OpenMP for large-scale instance selection.
cuda openmp parallel-computing
Last synced: 03 Jun 2026
https://github.com/arsfiqball/image-sharpen-cpp
Implementation of Image Sharpening algorithm in C++ & CUDA
cuda gpu image-processing image-sharpening-algorithm
Last synced: 22 Apr 2026
https://github.com/ventura8/whisper-pro-asr
A high-performance Docker container that runs OpenAI's Whisper model. Optimized for CPU, Intel NPU, Intel Arc/iGPU, and NVIDIA CUDA GPUs.
asr bazarr ctranslate2 cuda docker faster-whisper hardware-acceleration huggingface intel-npu media-automation openvino speech-to-text uvr vocal-isolation whisper whisper-asr
Last synced: 28 Apr 2026
https://github.com/yosh-matsuda/gpu-array
Maximum GPU performance with Modern C++ syntax. RAII and Range-based abstraction to GPU memory management and data layouts, enabling code safety and performance optimization with zero overhead.
cpp cpp20 cuda gpu header-only hip
Last synced: 08 Jun 2026
https://github.com/prince781/libgpublas
Drop-in GPU acceleration for linear algebra.
blas blas-kernels c cblas clblas cuda gpu gpu-acceleration hpc interposition linear-algebra nvidia opencl
Last synced: 29 Apr 2026
https://github.com/cniweb/srbminer-multi-cuda
Docker containing SRBMiner-Multi and CUDA
cpu-miner cpu-mining cuda gpu-miner gpu-mining miner srbminer srbminer-multi yespower yespoweric
Last synced: 01 May 2026
https://github.com/shmishtopher/cudnn-versions
A scoop bucket for installing NVIDIA cuDNN versions.
cuda cudnn scoop scoop-apps scoop-bucket
Last synced: 01 May 2026
https://github.com/phrb/nvidia-workshop-autotuning
Resources for autotuning CUDA compiler parameters
autotuning compilers cuda gpu julia nodal nvcc
Last synced: 03 May 2026
https://github.com/avitase/fast_frechet
Comparison of different (fast) discrete Fréchet distance implementations in C++ and CUDA.
benchmark cpp cuda frechet-distance simd
Last synced: 18 May 2026
https://github.com/matthias-fauconneau/combustion
Reaction rates and transport properties
ast cantera chemistry code-generation combustion compute cranelift cuda cvode interpreter ir rates reaction spirv transport vulkan
Last synced: 04 Apr 2026
https://github.com/tiw302/mandelbrot-c
A simple Mandelbrot set explorer written in C. Crafted with SDL2 and multithreaded rendering for a smooth experience. ‹(•_•)›
c cuda fractal graphics mandelbrot multithreading sdl2 web webassembly
Last synced: 26 Apr 2026
https://github.com/kpetridis24/four-russians-algorithm
Boolean matrix multiplication accelerated by the four-Russians algorithm
c cuda gpu high-performance matrix-multiplication preprocess
Last synced: 29 May 2026
https://github.com/boltzmannentropy/vllm-5090
vLLM-5090: Docker Container for RTX 5090 on WSL2/Windows
Last synced: 08 Oct 2025
https://github.com/scarfy-sysu/rtx5060-pytorch-cuda129
Run PyTorch with CUDA 12.9 on RTX 50 series (e.g. RTX 5060)
cuda deep-learning pytorch rtx5060
Last synced: 20 Jul 2025
https://github.com/qin-yu/julia-svm-gpu-cuda
2019 [Julia] GPU CUDAnative SVM: a stochastic decomposition implementation of support-vector machine training
cpp cuda cuda-programming gpu gpu-computing gpu-programming julia julia-language julia-package machine-learning machine-learning-algorithms machine-learning-library online-learning supervised-learning svm svm-classifier svm-learning svm-library svm-model svm-training
Last synced: 12 Apr 2026
https://github.com/headless-start/data-augmentation-impact
This repository contains effect of Data Augmentation of Training Set during Model Training.
augmented-images cuda data gpu keras matplotlib mnist opencv-python python3 tensorflow training-data
Last synced: 05 Apr 2026
https://github.com/cklxx/arle
Rust-native inference runtime for Qwen3 / Qwen3.5 — OpenAI-compatible serving + integrated agent, train, and self-evolution workflows. CUDA + Metal, no PyTorch on the hot path.
agent cuda flashinfer gspo inference infra kv-cache llm metal mlx openai-compatible qwen3 qwen35 rl rust
Last synced: 02 May 2026
https://github.com/szaghi/adam
Multi-physics AMR SDK and apps for High Performance Computing — from laptop to exascale device-accelerated superpc
amr cfd cuda fluid-dynamics fortran gas-dynamics hpc hydro-dynamics mpi openacc openmp plasma-dynamics
Last synced: 04 Apr 2026
https://github.com/xlite-dev/HGEMM
⚡️Write HGEMM from scratch using Tensor Cores with WMMA, MMA PTX and CuTe API. 🎉🎉
Last synced: 30 Jul 2025
https://github.com/neoblizz/cupti-plus-plus
CUPTI++ is a C++ interface to the CUDA Profiling Tools Interface (CUPTI).
cpp cuda cuda-profiler cupti profiler
Last synced: 26 Apr 2026
https://github.com/kagof/julia-image-processing
Image processing programs written in Julia
Last synced: 18 May 2026
https://github.com/artain-ai/ignite-ms
Fast self-hosted embedding engine for search, RAG, and reindexing workloads on NVIDIA GPUs. Built in Rust + TensorRT for teams that care about scale, cost, and control.
batch-inference batch-processing cuda embeddings gpu high-performance huggingface machine-learning multi-gpu nlp rag rust self-hosted semantic-search tensorrt text-embeddings vector-search
Last synced: 04 Jun 2026
https://github.com/gjbex/gpu-programming
Material for a training on portable GPU programming
cuda gpu kokkos openmp openmp-off stl thrust
Last synced: 08 Feb 2026
https://github.com/muhac/jupyter-pytorch-docker
JupyterLab for AI in Docker! Anaconda and PyTorch GPU supported.
conda-environment cuda docker jupyterlab pytorch
Last synced: 01 Oct 2025
https://github.com/projectcontinuum/continuum-feature-ai
AI and ML features for continuum
ai continuum continuum-feature cuda llm ml mlops pytourch unsloth
Last synced: 04 Apr 2026
https://github.com/mr-technologies/imagefiltercpp
Example of custom image filter for MRTech IFF C++ SDK
camera cpp cuda demosaicing dng genicam gpu h264 h265 image-processing jetson json low-latency machine-vision mipi rest-api rtsp sdk tiff vulkan
Last synced: 26 Feb 2026
https://github.com/andreasholt/cusmc
A CUDA-accelerated Statistical Model Checker for Stochastic Timed Automata
Last synced: 11 Feb 2026
https://github.com/tthebc01/cudaconda3
Lightweight container environment with Cuda, Miniconda3, and Jupyter Lab.
cuda docker gpu jupyterlab marimo-notebook miniconda3 reverse-proxy-application
Last synced: 11 Feb 2026
https://github.com/dpbm/qml-course
Minicurso de quantum Machine learning
cuda cuda-q cuquantum docker ml python qml quantum quantum-computing tensorflow
Last synced: 31 Jan 2026
https://github.com/capelliexp/sc2-im-pf-pathfinding-thesis
Master of science thesis project. Using CUDA to utilize a systems GPU to create pathfinding data (IM+PF), usable by multiple agents in the same environment.
ai cplusplus cuda gpgpu pathfinding starcraft2
Last synced: 15 May 2026
https://github.com/murrellgroup/conflux.jl
Single-node data parallelism in Julia with CUDA
cuda data-parallelism flux julia nccl
Last synced: 22 May 2026
https://github.com/galaxies99/inception-cuda
CUDA Implementation of Inception
Last synced: 12 Apr 2025
https://github.com/zeloe/juce_cuda_convolution
GPU acceleration for efficient, high-quality audio processing.
audio audio-processing convolution cuda dsp juce
Last synced: 03 Mar 2026
https://github.com/geekysuavo/gpufield
A CUDA-accelerated electromagnetostatics solver
cuda magnetic-fields magnetostatics
Last synced: 24 Dec 2025
https://github.com/dzimiks/cuda-matrix-multiplication
CUDA Matrix Multiplication
cuda matrix matrix-multiplication python
Last synced: 16 Apr 2026
https://github.com/brocbyte/realtime-deformations
Snow simulation (Material Point Method)
cuda glm material-point-method opengl
Last synced: 10 Aug 2025
https://github.com/podgorskiy/deeplearningserversetup
My notes on setting up a server for Deep-Learning
cuda deep-learning driver ethernet ipmi neural-network nfs notes nvidia nvidia-driver nvidia-gpu server sshfs ubuntu
Last synced: 22 Aug 2025
https://github.com/terrylindev/image-to-ASCII
🖼️ A command-line tool for converting images to ASCII art
ascii ascii-art cli command-line cpp cuda docker image-processing image-to-ascii mpi opencv terminal
Last synced: 12 Jul 2025
https://github.com/kim-hwiwon/T-espresso
A CUDA Library for Low-overhead Host-to-Device Transmission of Patterned Profile Data
Last synced: 10 Apr 2025
https://github.com/nixos-cuda/cuda-legacy
Select CUDA package sets which have aged out of Nixpkgs. [maintainers=@ConnorBaker, @SomeoneSerge]
Last synced: 15 May 2026
https://github.com/neoblizz/spmv
Efficient Sparse Matrix-Vector Multiplication (SpMV) using ModernGPU (MTX + CSR formats).
csr cuda gpgpu load-balancing mtx spmv
Last synced: 28 Apr 2026
https://github.com/dito97/gol
High-performance Computing (90535) final project at UniGe
Last synced: 02 May 2026
https://github.com/grakshith/parallel-k-means
K-Means clustering for Image Colour Quantization and Image Compression
cuda image-color-quantization image-compression k-means mpi opencv openmp
Last synced: 28 Apr 2026
https://github.com/mulx10/firefly
Enhancing Object Detection in using Thermal Imaging for thin cross-section unidentifiable objects(eg. cyclist, pedestrians).
autonomous-cars autonomous-navigation autonomous-vehicles c cuda object-detection thermal-camera yolov3
Last synced: 03 Sep 2025
https://github.com/digimortl/libguess
Patches that give Bitcoin Core an ability of CUDA mining
bitcoin c-plus-plus cryptocurrency cuda
Last synced: 16 Apr 2026
https://github.com/juntyr/necsim-rust
Spatially explicit biodiversity simulations using a parallel library written in Rust
biodiversity cuda mpi necsim rust simulation
Last synced: 22 Mar 2025
https://github.com/kim-hwiwon/t-espresso
A CUDA Library for Low-overhead Host-to-Device Transmission of Patterned Profile Data
Last synced: 04 May 2026
https://github.com/acrlakshman/gradient-augmented-levelset-cuda
Implementation of Gradient Augmented Levelset method for CPU and GPU
Last synced: 17 Feb 2026
https://github.com/toxy4ny/artaxerxes
Artaxerxes - Adaptive High-Performance Stress Tester v.1.0. Rebuild old version Xerxes DDoS. Supports GPU+io_uring, DPDK, eBPF/XDP with intelligent fallbacks. Educational tool for advanced cybersecurity labs
cuda cuda-programming cybersecurity cybersecurity-education cybersecurity-tools dpdk ebpf educational high-performance network-security network-security-tool penetration-testing penetration-testing-framework penetration-testing-tools security-tools stress-testing
Last synced: 08 Oct 2025
https://github.com/nachovizzo/saxpy_openacc_cpp
My way of thinking about OpenACC, C++, and Parallel computing in general
Last synced: 04 Sep 2025
https://github.com/prithivsakthiur/vlm-parsing
VLM-Parsing is a Gradio-based web application for parsing documents and images into structured HTML and Markdown formats using advanced Vision Language Models (VLMs).
cuda gradio html huggingface-models huggingface-spaces huggingface-transformers logics markdown ocr-recognition pytorch qwen2-5-vl spaces vlm
Last synced: 05 Apr 2026
https://github.com/xmas7/cudampi
A large hybrid CPU/GPU sorting network using CUDA and MPI. The sorting network uses a standard Quicksort for CPUs and a custom Bitonic Sort for GPUs. These two algorithms were the fastest in a number of prior benchmarks.
cpu cuda gpu hybrid mpi network
Last synced: 29 Apr 2026
https://github.com/stdogpkg/cukuramoto
A python/CUDA pkg which solves numerically the kuramoto model through the Heun's method
complex-networks cuda kuramoto-model
Last synced: 28 Jan 2026
https://github.com/amruthapatil/nyu-cudamatrixoperations
Optimizing CUDA programs for vector addition and matrix multiplication
cuda high-performance-computing
Last synced: 21 May 2026
https://github.com/l1cacheDell/CUDA_Code
Codes for learning cuda. Implementation of multiple kernels.
Last synced: 10 Mar 2025
https://github.com/lintenn/cudaaddvectors-explicit-vs-unified-memory
Performance comparison of two different forms of memory management in CUDA
c cuda explicit memory memory-management performance unified-memory
Last synced: 17 May 2026
https://github.com/fattorib/thunderkittens-simple-gemm
Simple Tensorcore GEMM in ThunderKittens
Last synced: 09 Feb 2026
https://github.com/nellogan/distributed_compy
Distributed_compy is a distributed computing library that offers multi-threading, heterogeneous (CPU + mult-GPU), and multi-node support
cluster cuda heterogeneous-parallel-programming multi-threading multigpu openmp openmpi
Last synced: 16 Aug 2025
https://github.com/nekon69/fastnoiselitecuda
A wrapper around C++ FastNoiseLite library for CUDA
cellular-noise computer-graphics cpp cuda fastnoiselite gamedev generative-art gpgpu gpu header-only noise opensimplex2-noise pcg perlin-noise procedural-generation simplex-noise terrain-generation texture-generation worley-noise
Last synced: 02 Oct 2025
https://github.com/copperfr/blendervxkex
Windows 7 CUDA & OptiX support for Blender 4.x
blender cuda cycles-renderer optix vxkex windows-7
Last synced: 20 Jan 2026
https://github.com/hanzhi713/bitonic-sort
In-place GPU sort with bitonic sort
bitonic-sort cuda gpu in-place sorting
Last synced: 09 Feb 2026
https://github.com/djenriquez/ewbf-cuda-miner
Run ewbf-miner for zcash
cuda docker mining nvidia nvidia-docker zcash zcl zclassic
Last synced: 17 May 2026
https://github.com/alpha74/cuda_basics
Nvidia NVCC CUDA programs for begineers.
c cpp cuda cuda-programs nvcc nvidia parallel-computing parallel-programming
Last synced: 08 May 2026
https://github.com/alexjmercer/fractal-art
Generating Fractals in C++ using SFML. For the ultimate visual stimulation and in-depth code!
cmake cmakelists cpp20 cuda cuda-programming fractal-rendering graphics mandelbrot multithreading sfml2
Last synced: 05 Mar 2026
https://github.com/tyler-hilbert/cuda-kmeans
K-Means in CUDA
cuda kmeans-clustering machine-learning nsight
Last synced: 30 Mar 2025
https://github.com/l30nardosv/reproduce-parcosi-moleculardocking
Reproducing paper: "Benchmarking the Performance of Irregular Computations in AutoDock-GPU Molecular Docking"
autodock-gpu cpu cuda gpu molecular-docking molecular-docking-scripts opencl paper reproducible-research
Last synced: 16 Feb 2026
https://github.com/mazharuddin-mohammed/semidgfem
High-performance TCAD Simulator Using Discontinuous Galerkin FEM
cuda discontinuous-galerkin-method tcad tcad-device-simulator
Last synced: 15 Jun 2025
https://github.com/kohulan/tensorflow-2.0-installation-with-cuda-support
A detailed step by step guide to install Tensorflow-2.0-gpu with CUDA Drivers on Ubuntu Server/ Desktop LTS
Last synced: 07 May 2025
https://github.com/isazi/aoflagger
AOFlagger Radio Frequency Interference mitigation algorithm.
Last synced: 30 Apr 2026
https://github.com/amypad/numcu
Numerical CUDA-based Python library
array buffer c cpp cpython cpython-api cpython-extensions cuda cxx hacktoberfest numpy python vector
Last synced: 29 Jun 2025
https://github.com/pothosware/pothosgpu
Pothos toolkit for ArrayFire API support
arrayfire cuda dataflow dataflow-programming gpu opencl pothos
Last synced: 19 Apr 2026
https://github.com/dqbd/cuda-btree
Implementation of B-Trees on NVIDIA CUDA
Last synced: 30 Apr 2026
https://github.com/betarixm/cuecc
POSTECH: Heterogeneous Parallel Computing (Fall 2023)
cryptography ctypes cuda ecc postech secp256k1
Last synced: 12 May 2025
https://github.com/navdeep-g/dimreduce4gpu
Dimensionality reduction ("dimreduce") on GPUs ("4gpu")
cplusplus cuda dimensionality-reduction gpu linear-algebra pca python svd unsupervised-learning
Last synced: 14 Apr 2025
https://github.com/mu7annad0/100gpu
100 Days of CUDA: Optimizing My Life, One Kernel at a Time. 🔄🔥
Last synced: 08 Mar 2026
https://github.com/matthewfeickert/cuda-tf-torch
An Ubuntu 18.04 NVIDIA Docker image with CUDA 10.1 CuDNN 7 with TensorFlow and PyTorch
cuda cuda-101 cudnn cudnn-v7 docker docker-image gpu nvidia-docker nvidia-gpu pytorch tensorflow torch
Last synced: 07 Jan 2026
https://github.com/seungjaelim/cuda.tutorial
References content from the OLCF CUDA Training Series. (https://github.com/olcf/cuda-training-series)
cuda gpu-programming nsight-compute nsight-systems
Last synced: 07 Feb 2026
https://github.com/mirzaim/cuda-devcontainer
CUDA Development Container
cuda devcontainer devcontainers docker remote-development
Last synced: 23 Apr 2025
https://github.com/ginkgo-project/cudaarchitectureselector
A CMake module simplifying the specification of CUDA architectures
Last synced: 05 Nov 2025
https://github.com/dhruvsrikanth/cudann
A distributed implementation of a deep learning framework in CUDA.
cpp cuda deep-learning deep-learning-framework gpu-programming high-performance-computing hpc parallel-programming
Last synced: 01 May 2026
https://github.com/true-real-michael/python-plane-ransac
Parallel RANSAC for plane detection for multiple point clouds using Python and CUDA
cuda numba plane-detection python ransac
Last synced: 14 Mar 2025
https://github.com/lukasboettcher/msc-code
This is the repo for my master thesis on a GPU accelerated andersen analysis.
andersen-analysis clang cuda llvm static-analysis
Last synced: 16 Jan 2026
https://github.com/cfries/javagpuexperiments
Repository used to demo OpenCL, JOCL, JCuda.
Last synced: 25 Apr 2026
https://github.com/LKohlhepp/Ito-Monte-Carlo
MC-Simulation of the Ito-SDE (Krülls 1994)
astronomy astrophysics cuda gpu-acceleration monte-carlo physics-simulation simulation stochastic-differential-equations
Last synced: 10 Mar 2025
https://github.com/tawssie/zmpy3d_pt
Python implementation of 3D Zernike moments with PyTorch
3d-zernike cuda gpu protein-structure python pytorch structural-bioinformatics superposition zernike-moments
Last synced: 24 Oct 2025
https://github.com/aiday-mar/mpi-cuda-project
Using MPI and CUDA in order to accelerate the conjugate gradient algorithm execution in C++
c-plus-plus cuda gpu mpi university-project
Last synced: 02 May 2026