CUDA
CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.
- GitHub: https://github.com/topics/cuda
- Wikipedia: https://en.wikipedia.org/wiki/CUDA
- Created by: Nvidia
- Released: June 23, 2007
- Related Topics: nvcc,
- Last updated: 2026-06-15 00:07:19 UTC
- JSON Representation
https://github.com/arbor-sim/arbor
The Arbor multi-compartment neural network simulation library.
cuda gpu hip hpc modern-cpp mpi neuroscience
Last synced: 16 May 2025
https://github.com/rocm/hip-cpu
An implementation of HIP that works on CPUs, across OSes.
cpp17 cuda cuda-programming hip hip-kernel-language hip-portability hip-runtime parallel-algorithms spmd stl-algorithms
Last synced: 12 Apr 2025
https://github.com/cgtuebingen/Flex-Convolution
Source code for: Flex-Convolution (Million-Scale Point-Cloud Learning Beyond Grid-Worlds), accepted at ACCV 2018
3d-point-clouds cuda deep-learning deep-neural-networks machine-learning pointcloud pointcloudprocessing research research-paper segmentation shapenet tensorflow
Last synced: 20 Mar 2025
https://github.com/mseitzer/gpu-monitor
Script to remotely check GPU servers for free GPUs
cuda cudnn deep-learning gpu nvidia-smi remote ssh
Last synced: 12 Apr 2025
https://github.com/pytorch/extension-script
Example repository for custom C++/CUDA operators for TorchScript
Last synced: 30 Sep 2025
https://github.com/pennylaneai/pennylane-lightning
The Lightning plugin ecosystem provides fast quantum state-vector and tensor network simulators written in C++ for use with PennyLane.
cuda distributed-computing gpu hpc mpi openmp parallel quantum-computing quantum-machine-learning rocm
Last synced: 15 May 2025
https://github.com/juliagpu/acceleratedkernels.jl
Cross-architecture parallel algorithms for Julia's CPU and GPU backends. Targets multithreaded CPUs, and GPUs via Intel oneAPI, AMD ROCm, Apple Metal, Nvidia CUDA.
amd apple cuda gpu intel julia metal nvidia oneapi parallel rocm standard-library
Last synced: 04 Apr 2025
https://github.com/mind-inria/mri-nufft
Doing non-Cartesian MR Imaging has never been so easy.
cuda gpu mri mri-reconstruction nufft numerical-methods numpy tensorflow torch
Last synced: 02 Mar 2026
https://github.com/pwhiddy/fat-clouds
GPU Fluid Simulation with Volumetric Rendering
blaze cuda fluid gpu navier-stokes raymarching smoke
Last synced: 12 Jul 2025
https://github.com/PennyLaneAI/pennylane-lightning
The Lightning plugin ecosystem provides fast quantum state-vector and tensor network simulators written in C++ for use with PennyLane.
cuda distributed-computing gpu hpc mpi openmp parallel quantum-computing quantum-machine-learning rocm
Last synced: 11 May 2025
https://github.com/JuliaGPU/AcceleratedKernels.jl
Cross-architecture parallel algorithms for Julia's CPU and GPU backends. Targets multithreaded CPUs, and GPUs via Intel oneAPI, AMD ROCm, Apple Metal, Nvidia CUDA.
amd apple cuda gpu intel julia metal nvidia oneapi parallel rocm standard-library
Last synced: 17 Mar 2025
https://github.com/windqaq/mpm
Simulating on GPU using Material Point Method and rendering.
cuda gvdb-voxels material-point-method raytracer simulation thrust
Last synced: 23 Mar 2025
https://github.com/cair/tmu
Implements the Tsetlin Machine, Coalesced Tsetlin Machine, Convolutional Tsetlin Machine, Regression Tsetlin Machine, and Weighted Tsetlin Machine, with support for continuous features, drop clause, Type III Feedback, focused negative sampling, multi-task classifier, autoencoder, literal budget, and one-vs-one multi-class classifier. TMU is written in Python with wrappers for C and CUDA-based clause evaluation and updating.
absorbing-states autoencoder convolution cuda gpu incremental incremental-computation multi-output pattern-recognition propositional-logic regression relational-logic sparse tsetlin-machine
Last synced: 09 Apr 2025
https://github.com/ANNetGPGPU/ANNetGPGPU
A GPU (CUDA) based Artificial Neural Network library
c-plus-plus-11 cuda propagation-network self-organizing-map
Last synced: 23 Apr 2025
https://github.com/qengineering/pytorch-jetson-nano
PyTorch installation wheels for Jetson Nano
aarch64 cuda cuda-10 cudnn cudnn8 jetson-nano jetson-tx2 jetson-xavier libtorch python-wheel python3 pytorch pytorch-installation torch torchvision wheel
Last synced: 13 Apr 2025
https://github.com/leimao/CUDA-GEMM-Optimization
CUDA Matrix Multiplication Optimization
Last synced: 01 Aug 2025
https://github.com/bazel-contrib/rules_cuda
Starlark implementation of bazel rules for CUDA.
Last synced: 15 Feb 2026
https://github.com/seeed-projects/recomputer-jetson-for-beginners
Beginner's Guide to reComputer Jetson
ai application beginner-friendly cuda cv deepstream examples generative-ai guide jetson llm ml nlp nvidia recomputer robotics tao tensorrt
Last synced: 05 Apr 2025
https://github.com/arrayfire/arrayfire-ml
ArrayFire's Machine Learning Library.
Last synced: 16 Jul 2025
https://github.com/mitulgarg/env-doctor
Debug your GPU, CUDA, and AI stacks across local, Docker, and CI/CD (CLI and MCP server)
compatibility-tool cuda cuda-library cuda-support cuda-toolkit cudnn gpu-acceleration mcp-server nvidia-driver nvidia-gpu nvidia-smi pytorch wsl2
Last synced: 02 Apr 2026
https://github.com/ayutaz/piper-plus
Multilingual neural TTS (6 languages: JA/EN/ZH/ES/FR/PT) — C++, C# (.NET), Rust, Python SDKs. VITS + Prosody, streaming, CUDA/CoreML/DirectML, custom dictionaries. Install: pip install piper-tts-plus | dotnet tool install PiperPlus.Cli | cargo install piper-plus-cli
cross-platform csharp cuda deep-learning dotnet japanese multilingual nuget onnx pytorch rust speech-synthesis streaming text-to-speech tts vits webassembly
Last synced: 01 Apr 2026
https://github.com/nvidia-merlin/hierarchicalkv
HierarchicalKV is a part of NVIDIA Merlin and provides hierarchical key-value storage to meet RecSys requirements. The key capability of HierarchicalKV is to store key-value feature-embeddings on high-bandwidth memory (HBM) of GPUs and in host memory. It also can be used as a generic key-value storage.
cuda dynamic-embedding embedding-storage gpu hashtable key-value-store recommender-system
Last synced: 10 Apr 2025
https://github.com/upul/aurora
Minimal Deep Learning library is written in Python/Cython/C++ and Numpy/CUDA/cuDNN.
cplusplus cuda cudnn cython deep-learning python3 system-design
Last synced: 08 May 2025
https://github.com/nvidia/nvimagecodec
A nvImageCodec library of GPU- and CPU- accelerated codecs featuring a unified interface
computer-vision cpp cuda dali data-processing deep-learning fast-data-pipeline gpu image-processing machine-learning nvidia python pytorch
Last synced: 12 Jan 2026
https://github.com/mitmul/pynvvl
A Python wrapper of NVIDIA Video Loader (NVVL) with CuPy for fast video loading with Python
cuda cupy gpu numpy nvidia-video-loader nvvl python video video-processing
Last synced: 07 Oct 2025
https://github.com/harrism/mini-nbody
A simple gravitational N-body simulation in less than 100 lines of C code, with CUDA optimizations.
astrophysics benchmark cuda nbody
Last synced: 06 Oct 2025
https://github.com/Heteroflow/Heteroflow
Concurrent CPU-GPU Programming using Task Models
cpu-gpu-scheduling cuda gpu gpu-acceleration gpu-computing gpu-programming heterogeneous-computing heterogeneous-parallel-programming heterogeneous-systems multithreaded multithreading task-parallelism
Last synced: 01 Apr 2025
https://github.com/coreylowman/llama-dfdx
LLaMa 7b with CUDA acceleration implemented in rust. Minimal GPU memory needed!
cuda deep-learning inference language-model llama neural-network rust rust-lang
Last synced: 13 Apr 2025
https://github.com/osai-ai/dokai
Collection of Docker images for ML/DL and video processing projects
cuda deep-learning docker docker-image ffmpeg opencv python pytorch tensorrt video-processing
Last synced: 04 Apr 2025
https://github.com/mratsim/Arch-Data-Science
Archlinux PKGBUILDs for Data Science, Machine Learning, Deep Learning, NLP and Computer Vision
archlinux cuda cudnn data-science deep-learning lightgbm machine-learning mkl mxnet natural-language-processing natural-language-understanding nervana opencv package pandas pytorch scikit-learn spacy tensorflow xgboost
Last synced: 20 Jul 2025
https://github.com/sonots/cumo
Cumo (pronounced like "koomo") is CUDA aware numerical library whose interface is highly compatible with Ruby Numo
cuda numo ruby scicentific-computing
Last synced: 10 Apr 2025
https://github.com/1ytic/pytorch-edit-distance
Levenshtein edit-distance on PyTorch and CUDA
asr cuda edit-distance levenshtein nlp pytorch
Last synced: 19 Jun 2025
https://github.com/bdusell/singularity-tutorial
Tutorial for using Singularity containers
container cuda cudnn pytorch singularity
Last synced: 09 Apr 2025
https://github.com/openmm/nnpops
High-performance operations for neural network potentials
cuda gpu machine-learning molecular-dynamics molecular-modeling
Last synced: 20 Jun 2025
https://github.com/mratsim/arch-data-science
Archlinux PKGBUILDs for Data Science, Machine Learning, Deep Learning, NLP and Computer Vision
archlinux cuda cudnn data-science deep-learning lightgbm machine-learning mkl mxnet natural-language-processing natural-language-understanding nervana opencv package pandas pytorch scikit-learn spacy tensorflow xgboost
Last synced: 18 Apr 2025
https://github.com/sasagawa888/deeppipe2
Deep Learning library using GPU(CUDA/cuBLAS)
cublas cuda deep-learning elixir gpu
Last synced: 15 Mar 2025
https://github.com/openmm/NNPOps
High-performance operations for neural network potentials
cuda gpu machine-learning molecular-dynamics molecular-modeling
Last synced: 04 May 2025
https://github.com/braintwister/docker-devel-env
Fast, reproducible, and portable software development environments
clang cmake conan cuda development docker eclipse gcc jenkins nsight portability reproducibility vscode
Last synced: 23 Oct 2025
https://github.com/ashvardanian/parallelreductionsbenchmark
Thrust, CUB, TBB, AVX2, AVX-512, CUDA, OpenCL, OpenMP, Metal - all it takes to sum a lot of numbers fast!
apple avx512 cuda glsl gpgpu gpu gpu-acceleration gpu-computing hpc intel metal nvidia opencl openmp parallel simd stl tbb thrust
Last synced: 06 Apr 2025
https://github.com/mitsuba-renderer/drjit-core
Dr.Jit — A Just-In-Time-Compiler for Differentiable Rendering (core library)
Last synced: 05 Apr 2025
https://github.com/eomii/rules_ll
An Upstream Clang/LLVM-based toolchain for contemporary C++ and heterogeneous programming
bazel bleeding-edge build-system clang clang-tidy cpp cuda gpu-programming hermetic hip llvm nix openmp remote-caching remote-execution reproducible sanitizers
Last synced: 06 Apr 2025
https://github.com/BrainTwister/docker-devel-env
Fast, reproducible, and portable software development environments
clang cmake conan cuda development docker eclipse gcc jenkins nsight portability reproducibility vscode
Last synced: 06 Aug 2025
https://github.com/soran-ghaderi/torchebm
🍓 Build and train energy-based and diffusion models in PyTorch ⚡.
contrastive-divergence cuda diffusion-models energy-based-model equilibrium equilibrium-matching equilibrium-modeling generative-ai hamiltonian hamiltonian-monte-carlo langevin-dynamics noise-contrastive-estimation probabilistic-machine-learning reasoning sampling-methods score-matching variational-inference
Last synced: 01 Jun 2026
https://github.com/esemeniuc/openpose-docker
A docker build file for CMU openpose with Python API support
cuda deep-learning deep-neural-networks docker openpose pose-estimation python
Last synced: 10 Feb 2026
https://github.com/sunsetquest/cudapad
CudaPAD is a PTX/SASS viewer for NVIDIA Cuda kernels and provides an on-the-fly view of the assembly.
cuda cuda-programming gpu nvidia ptx ptx-utils windows
Last synced: 25 Jul 2025
https://github.com/ttsiodras/mandelbrotsse
Real-time Mandelbrot zoom via SSE, AVX, OpenMP, CUDA, XaoS...
Last synced: 18 Oct 2025
https://github.com/villekf/omega
Open-source multi-dimensional tomographic reconstruction software (OMEGA)
computed-tomography cone-beam-ct cuda emission-tomography gate gpgpu gpu image-reconstruction inverse-problems matlab medical-imaging octave opencl opengate pet python reconstruction tomography
Last synced: 05 Sep 2025
https://github.com/shinmorino/sqaod
Solvers/annealers for simulated quantum annealing on CPU and CUDA(NVIDIA GPU).
accelearated cplusplus-11 cuda gpu linux monte-carlo-simulation nvidia-gpu python quantum-annealing quantum-computing windows
Last synced: 21 Aug 2025
https://github.com/puttsk/cuda-tutorial
A set of hands-on tutorials for CUDA programming
Last synced: 31 Jan 2026
https://github.com/WeltXing/PyDyNet
NumPy实现类PyTorch的动态计算图和神经网络框架(MLP, CNN, RNN, Transformer)
autograd cnn cuda cupy deep-learning-framework numpy python pytorch pytorch-implementation rnn transformer
Last synced: 01 Sep 2025
https://github.com/DevXT-LLC/ezlocalai
ezlocalai is an easy to set up local artificial intelligence server with OpenAI Style Endpoints.
ai artificial-intelligence cuda llamacpp local
Last synced: 21 Nov 2025
https://github.com/devxt-llc/ezlocalai
ezlocalai is an easy to set up local artificial intelligence server with OpenAI Style Endpoints.
ai artificial-intelligence cuda llamacpp local
Last synced: 21 Feb 2026
https://github.com/BobMcDear/neural-network-cuda
Neural network from scratch in CUDA/C++
cplusplus cuda deep-learning machine-learning neural-network
Last synced: 23 Aug 2025
https://paragroup.github.io/WindFlow/
A C++17 Data Stream Processing Parallel Library for Multicores and GPUs
cuda gpu gpu-acceleration gpu-computing gpu-programming multi-core multicore multithreading parallel-computing parallel-patterns parallel-programming parallelism sliding-windows stream stream-api stream-processing streaming streaming-api streaming-data streams
Last synced: 14 May 2025
https://github.com/spcl/daceml
A Data-Centric Compiler for Machine Learning
compiler cuda deep-learning fpga high-performance-computing machine-learning pytorch
Last synced: 10 Sep 2025
https://github.com/fixstars/cuda-multi-view-stereo
C++/CUDA library for Multi-View Stereo
3d-reconstruction computer-vision cuda multi-view-stereo structure-from-motion
Last synced: 08 Apr 2025
https://github.com/projectchrono/dem-engine
A dual-GPU DEM solver with complex grain geometry support
chrono cuda discrete-element-method gpu multi-gpu simulation
Last synced: 06 Apr 2025
https://github.com/fixstars/cuda-efficient-features
A CUDA implementation of keypoint detection and descriptor extraction
computer-vision cuda descriptors local-features robotics slam structure-from-motion
Last synced: 13 Apr 2025
https://github.com/matifali/dockerdl
Deep Learning Docker Image
cuda deep-learning docker jupyter numpy python pytorch tensorflow
Last synced: 10 Apr 2025
https://github.com/juliagpu/gemmkernels.jl
Flexible and performant GEMM kernels in Julia
Last synced: 24 Jul 2025
https://github.com/amypad/cuvec
Unifying Python/C++/CUDA memory: Python buffered array ↔️ `std::vector` ↔️ CUDA managed memory
array buffer c cpp cpu cpython cpython-api cpython-extensions cuda cxx gpu hacktoberfest pybind11 python swig vector
Last synced: 05 Apr 2025
https://github.com/zenustech/zpc
zenus parallel computing library for zenus physics-based simulations
cuda gpu hpc math physics simulation
Last synced: 06 Apr 2025
https://github.com/JuliaGPU/GemmKernels.jl
Flexible and performant GEMM kernels in Julia
Last synced: 04 May 2025
https://github.com/pika-org/pika
pika is a C++ tasking library built on std::execution with fibers, CUDA, HIP, and MPI support.
concurrency cplusplus cpp cuda gpu hip mpi p2300 parallelism rocm stdexec
Last synced: 30 Jan 2026
https://github.com/dbraun/pytorchtop
GPU PyTorch TOP in TouchDesigner with CUDA-enabled OpenCV
cuda libtorch opencv pytorch touchdesigner
Last synced: 15 Apr 2025
https://github.com/shi-yan/FreeWill
A deep learning library in C++/CUDA
cnn cuda deep-learning dnn machine-learning neural-network qt5
Last synced: 09 Jul 2025
https://github.com/nvidia-ai-iot/deepstream_libraries
DeepStream Libraries offer CVCUDA, NvImageCodec, and PyNvVideoCodec modules as Python APIs for seamless integration into custom frameworks.
computer-vision cuda cv-cuda data-processing gpu image-processing nvidia nvimagecodec pynvvideocodec pytorch
Last synced: 01 Apr 2026
https://github.com/zidage/alcedostudio
Open-source RAW photo processing and digital asset management software.
cpp cuda image-processing photo-editor photography raw-image
Last synced: 30 May 2026
https://github.com/bobmcdear/neural-network-cuda
Neural network from scratch in CUDA/C++
cplusplus cuda deep-learning machine-learning neural-network
Last synced: 28 Jul 2025
https://github.com/una-dinosauria/local-search-quantization
State-of-the-art method for large-scale ANN search as of Oct 2016. Presented at ECCV 16.
computer-vision cuda eccv-16 gpu julia multi-codebook quantization
Last synced: 02 Aug 2025
https://github.com/maxilevi/raytracer
C++ raytracer that supports custom models. Supports running the calculations on the CPU using C++11 threads or in the GPU via CUDA.
bvh cuda graphics-programming intersection raytracer
Last synced: 27 Apr 2025
https://github.com/xlite-dev/hgemm
⚡️Write HGEMM from scratch using Tensor Cores with WMMA, MMA and CuTe API, Achieve Peak⚡️ Performance.
Last synced: 11 Jun 2025
https://github.com/celeritas-project/celeritas
Celeritas is a new Monte Carlo transport code designed to accelerate scientific discovery in high energy physics by improving detector simulation throughput and energy efficiency using GPUs.
computational-physics cuda detector-simulation gpu hep high-energy-physics hip monte-carlo particle-transport
Last synced: 05 Apr 2025
https://github.com/unum-cloud/udisk
The fastest ACID-transactional persisted Key-Value store designed as modified LSM-Tree for NVMe block-devices with GPU-acceleration and SPDK to bypass the Linux kernel
cuda database io-uring iouring key-value key-value-store linux linux-kernel lsm-tree spdk
Last synced: 26 Feb 2026
https://github.com/a-new-bellhope/bellhopcuda
CUDA and C++ port of BELLHOP / BELLHOP3D underwater acoustics simulator
acoustics cuda hpc oceanography underwater-acoustics
Last synced: 07 Apr 2025
https://github.com/inconvergent/differential-lattice
a generative algorithm using CUDA
animation cuda generative generative-algorithm generative-art
Last synced: 09 Apr 2025
https://github.com/open-atmos/pysdm
Pythonic particle-based (super-droplet) warm-rain/aqueous-chemistry cloud microphysics package with box, parcel & 1D/2D prescribed-flow examples in Python, Julia and Matlab
atmospheric-modelling atmospheric-physics cuda gpu gpu-computing monte-carlo-simulation numba nvrtc particle-system physics-simulation pint pypi-package python research simulation thrust
Last synced: 20 May 2026
https://github.com/elftausend/custos
A minimal OpenCL, CUDA, Vulkan and host CPU array manipulation engine / framework.
array-manipulations autograd automatic-differentiation cpu cuda cuda-support custos framework gpu lazy-evaluation no-std opencl rust vulkan wgsl
Last synced: 07 May 2025
https://github.com/nolanzzz/mtmct
Design and Implementation of a Multi-Target Multi-Camera Tracking Solution
cuda deep-learning detection machine-learning opencv python pytorch reidentification research-project resnet tracking
Last synced: 20 Mar 2025
https://github.com/celerity/ndzip
A High-Throughput Parallel Lossless Compressor for Scientific Data
compression cuda floating-point gpu simd sycl
Last synced: 11 Dec 2025
https://github.com/heethesh/computer-vision-and-deep-learning-setup
Tutorial on how to setup your system with a NVIDIA GPU and to install Deep Learning Frameworks like TensorFlow, Darknet for YOLO, Theano, and Keras; OpenCV; and NVIDIA drivers, CUDA, and cuDNN libraries on Ubuntu 16.04, 17.10 and 18.04.
cuda cudnn gpu install keras nvidia opencv python tensorflow ubuntu ubuntu1710
Last synced: 19 Apr 2025
https://github.com/yalue/cuda_scheduling_examiner_mirror
A tool for examining GPU scheduling behavior.
benchmark cuda cuda-kernels gpu gpu-scheduling mandelbrot
Last synced: 17 Mar 2026
https://github.com/mantasu/glasses-detector
Glasses detection, classification and segmentation
classification computer-vision cuda detection detector eyeglasses eyes frames glasses gpu lenses mps pytorch segmentation sunglasses
Last synced: 14 Apr 2025
https://github.com/pinto0309/dmhead
Dual model head pose estimation. Fusion of SOTA models. 360° 6D HeadPose detection. All pre-processing and post-processing are fused together, allowing end-to-end processing in a single inference.
6d cuda head-pose-estimation headpose-detection headpose-estimation models onnx tensorrt
Last synced: 30 Apr 2025
https://github.com/lukeyeager/cmake-cuda-example
Example of how to use CUDA with CMake >= 3.8
Last synced: 25 Mar 2025
https://github.com/pypr/compyle
Execute a subset of Python on HPC platforms
cuda cython high-performance-computing opencl openmp python transpile
Last synced: 04 Apr 2025