CUDA
CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.
- GitHub: https://github.com/topics/cuda
- Wikipedia: https://en.wikipedia.org/wiki/CUDA
- Created by: Nvidia
- Released: June 23, 2007
- Related Topics: nvcc,
- Last updated: 2026-06-15 00:07:19 UTC
- JSON Representation
https://github.com/mkeeter/mpr
Reference implementation for "Massively Parallel Rendering of Complex Closed-Form Implicit Surfaces" (SIGGRAPH 2020)
cad cuda gpu implicit-surfaces rendering
Last synced: 16 Mar 2025
https://github.com/rapidsai/node
GPU-accelerated data science and visualization in node
cuda data-science data-visualization gpgpu gpu nodejs
Last synced: 16 May 2025
https://github.com/supranational/sppark
Zero-knowledge template library
bls12-377 bls12-381 cuda ntt pasta-curves rocm zero-knowledge zero-knowledge-proofs zk-snarks zk-starks
Last synced: 12 Apr 2025
https://github.com/guoriyue/3dgs-warp-scratch
Build 3D Gaussian Splatting from scratch with NVIDIA Warp in Python — CPU/GPU compatible, with a clean and minimalist design focused on learning modern graphics.
3dgs build-from-scratch cuda graphics nerf nvidia-warp python
Last synced: 05 Mar 2026
https://github.com/dividiti/ck-caffe
Collective Knowledge workflow for Caffe to automate installation across diverse platforms and to collaboratively evaluate and optimize Caffe-based workloads across diverse hardware, software and data sets (compilers, libraries, tools, models, inputs):
accuracy android caffe collaborative-optimization collective-knowledge costs cuda customizable-workflows dnn-as-a-service dnn-optimization json-api linux opencl performance-portability portable-package-manager reproducible-experiments resources windows
Last synced: 04 May 2025
https://github.com/nobuyuki83/delfem2
Research prototyping framework for physics simulation written in C++
cuda fem-simulation finite-element-methods geometry-processing opengl physics-simulation simulation
Last synced: 06 Apr 2025
https://github.com/msminhas93/nviwatch
NviWatch: A blazingly fast rust based TUI for managing and monitoring NVIDIA GPU processes
bash command-line-tool cuda deeplearning gpu gpu-monitoring linux monitoring nvidia nvidia-smi nvml performant process-monitoring ratatui resource-monitoring rust terminal top tui ubuntu
Last synced: 09 Apr 2025
https://github.com/hijkzzz/cuda-neural-network
Convolutional Neural Network with CUDA (MNIST 99.23%)
cnn cpp cuda mnist neural-network
Last synced: 14 Jul 2025
https://github.com/helyim/helyim
seaweedfs implemented in pure Rust
cuda dpdk erasure-coding hdfs iouring kernel-bypass object-storage rdma s3 spdk webdav
Last synced: 03 Oct 2025
https://github.com/zhongkaifu/seq2seqsharp
Seq2SeqSharp is a tensor based fast & flexible deep neural network framework written by .NET (C#). It has many highlighted features, such as automatic differentiation, different network types (Transformer, LSTM, BiLSTM and so on), multi-GPUs supported, cross-platforms (Windows, Linux, x86, x64, ARM), multimodal model for text and images and so on.
attention-model cuda deep-learning encoder-decoder gpu image lstm machine-translation neural-network seq2seq sequence-to-sequence tensor text transformer transformer-architecture transformer-encoder translation vision-transformer
Last synced: 04 Apr 2025
https://github.com/proger/accelerated-scan
Accelerated First Order Parallel Associative Scan
cuda cumulative-sum recurrent-neural-networks state-space-model torch
Last synced: 25 Dec 2025
https://github.com/nvidia/gmat
A toolkit showing GPU's all-round capability in video processing
codec cpp cuda deep-learning ffmpeg gpu image-processing nvidia video video-codec
Last synced: 11 May 2025
https://github.com/NVIDIA/GMAT
A toolkit showing GPU's all-round capability in video processing
codec cpp cuda deep-learning ffmpeg gpu image-processing nvidia video video-codec
Last synced: 04 Apr 2025
https://github.com/uncomplicate/clojurecuda
Clojure library for CUDA development
clojure clojure-library cuda cuda-development gpu-acceleration gpu-computing high-performance java
Last synced: 16 May 2025
https://github.com/xlite-dev/ffpa-attn
📚FFPA(Split-D): Extend FlashAttention with Split-D for large headdim, O(1) GPU SRAM complexity, 1.8x~3x↑🎉 faster than SDPA EA.
attention cuda deepseek deepseek-r1 deepseek-v3 flash-attention flash-mla fused-mla mla mlsys sdpa tensor-cores
Last synced: 11 Jun 2025
https://github.com/toruniina/lbvh
an implementation of parallel linear BVH (LBVH) on GPU
bvh cuda gpu nearest-neighbor-search parallel thrust
Last synced: 21 Aug 2025
https://github.com/HMUNACHI/CUDATutorials
Zero to Hero GPU and CUDA for Maths & ML tutorials with examples.
cuda cuda-kernels cuda-programming machine-learning maths
Last synced: 24 Apr 2025
https://github.com/HMUNACHI/henry-vjp
From zero to hero CUDA for accelerating maths and machine learning on GPU.
cuda cuda-kernels cuda-programming machine-learning maths
Last synced: 05 Apr 2025
https://github.com/hmunachi/henry-vjp
From zero to hero CUDA for accelerating maths and machine learning on GPU.
cuda cuda-kernels cuda-programming machine-learning maths
Last synced: 08 Apr 2025
https://github.com/HMUNACHI/cuda-tutorials
CUDA tutorials or Maths & ML tutorials with examples, covers multi-gpus, fused attention, winograd convolution, reinforcement learning.
cuda cuda-kernels cuda-programming machine-learning maths
Last synced: 13 May 2025
https://github.com/lxxue/frnn
Fixed Radius Nearest Neighbor Search on GPU
cuda nearest-neighbor-search pytorch
Last synced: 17 Mar 2026
https://github.com/cuMF/cumf_als
CUDA Matrix Factorization Library with Alternating Least Square (ALS)
als cuda gpu machine machine-learning matrix-factorization
Last synced: 04 May 2025
https://github.com/zpzim/scamp
The fastest way to compute matrix profiles on CPU and GPU!
cuda gpu matrix-profile python time-series time-series-analysis
Last synced: 05 Apr 2025
https://github.com/eth-cscs/cosma
Distributed Communication-Optimal Matrix-Matrix Multiplication Algorithm
communication-optimal cuda gpu-acceleration linear-algebra matmul matrix-multiplication mpi pdgemm rocm scalapack
Last synced: 02 Mar 2026
https://github.com/pykeio/diffusers
A modular Rust library for super fast Stable Diffusion inference - 45% faster than PyTorch 🔮
cuda diffusion-models onnx onnxruntime onnxruntime-gpu rust stable-diffusion stable-diffusion-v2
Last synced: 28 Mar 2025
https://github.com/yilingqiao/dmrf
Dynamic Mesh-Aware Radiance Fields (ICCV2023): Raytracing rendering and interactive simulating mesh with NeRF
cuda nerf raytracing simulation
Last synced: 09 Apr 2025
https://github.com/primitiv/primitiv
A Neural Network Toolkit.
cmake cpp cuda deep-learning framework gpu neural-network opencl
Last synced: 18 Jun 2025
https://github.com/merzlab/QUICK
QUICK: A GPU-enabled ab intio quantum chemistry software package
chemistry computational-chemistry cuda density-functional-theory electronic-structure-calculations gpu gpu-acceleration hartree-fock parallel-computing quantum-chemistry
Last synced: 09 Jul 2025
https://github.com/invergent-ai/surogate
Training/Fine-tuning at the speed of light
cuda deep-learning fine-tuning generative-ai llama llm llms nvidia-gpu qwen sft
Last synced: 10 May 2026
https://github.com/sjtu-ipads/phoenixos
Fast OS-level support for GPU checkpoint and restore
checkpoint-restore criu cuda gpu
Last synced: 05 Apr 2025
https://github.com/cnugteren/cltune
CLTune: An automatic OpenCL & CUDA kernel tuner
Last synced: 21 Aug 2025
https://github.com/jimver/cuda-toolkit
GitHub Action to install CUDA
action cuda cuda-toolkit github-actions nvidia nvidia-cuda
Last synced: 14 Apr 2025
https://github.com/librapid/librapid
A highly optimised C++ library for mathematical applications and neural networks.
array cpp cpp20 cpp23 cuda gpu high-performance-computing library matrix multidimensional-arrays multithreading parallel-programming pypy pypy3 python python3 simd
Last synced: 27 Mar 2026
https://github.com/qengineering/install-opencv-jetson-nano
OpenCV installation script with CUDA and cuDNN support
cuda cudnn jetson-nano jetson-xavier opencv opencv4
Last synced: 04 Apr 2025
https://github.com/acceleratehs/accelerate-llvm
LLVM backend for Accelerate
accelerate compiler cuda gpu gpu-computing hacktoberfest haskell llvm parallel-computing
Last synced: 07 Apr 2025
https://github.com/LibRapid/librapid
A highly optimised C++ library for mathematical applications and neural networks.
array cpp cpp20 cpp23 cuda gpu high-performance-computing library matrix multidimensional-arrays multithreading parallel-programming pypy pypy3 python python3 simd
Last synced: 01 Aug 2025
https://github.com/p-ranav/PhotoLab
AI-Powered Photo Editor (Python, PyQt6, PyTorch)
animegan colorization cuda human-segmentation interactive nuitka numpy opencv photo-editor pillow portrait-mode pyqt6 pyqt6-desktop-application python python3 pytorch scikit-image spot-removal stacking
Last synced: 07 Apr 2025
https://github.com/dvlab-research/sparsetransformer
A fast and memory-efficient libarary for sparse transformer with varying token numbers (e.g., 3D point cloud).
3d-point-cloud cuda sparse-transformer transformer
Last synced: 03 Jul 2025
https://github.com/rocm/gpufort
GPUFORT: S2S translation tool for CUDA Fortran and Fortran+X in the spirit of hipify
cuda cuda-fortran fortran gpgpu gpu hip interoperability openacc openmp rocm
Last synced: 21 Jun 2025
https://github.com/ROCm/gpufort
GPUFORT: S2S translation tool for CUDA Fortran and Fortran+X in the spirit of hipify
cuda cuda-fortran fortran gpgpu gpu hip interoperability openacc openmp rocm
Last synced: 11 Mar 2025
https://github.com/deftruth/ffpa-attn-mma
📚FFPA(Split-D): Yet another Faster Flash Prefill Attention with O(1) SRAM complexity large headdim (D > 256), ~2x↑🎉vs SDPA EA.
attention cuda deepseek deepseek-r1 deepseek-v3 flash-attention flash-mla fused-mla mla mlsys sdpa tensor-cores
Last synced: 06 Apr 2025
https://github.com/zju3dv/envgs
[CVPR 2025] EnvGS: Modeling View-Dependent Appearance with Environment Gaussian
2dgs 3dgs cuda optix path-tracing ray-tracing reflection
Last synced: 05 Apr 2025
https://github.com/dvlab-research/SparseTransformer
A fast and memory-efficient libarary for sparse transformer with varying token numbers (e.g., 3D point cloud).
3d-point-cloud cuda sparse-transformer transformer
Last synced: 20 Mar 2025
https://github.com/xlite-dev/ffpa-attn-mma
📚FFPA(Split-D): Yet another Faster Flash Prefill Attention with O(1) GPU SRAM complexity for headdim > 256, ~2x↑🎉vs SDPA EA.
attention cuda deepseek deepseek-r1 deepseek-v3 flash-attention flash-mla fused-mla mla mlsys sdpa tensor-cores
Last synced: 30 Mar 2025
https://github.com/pythonlessons/tensorflow-object-detection-tutorial
The purpose of this tutorial is to learn how to install and prepare TensorFlow framework to train your own convolutional neural network object detection classifier for multiple objects, starting from scratch
classifier cuda cudnn detection detection-api detection-classifier detection-tutorial gpu grabscreen labels object-detection pil python-mss tensorflow tensorflow-cpu tensorflow-gpu tensorflow-models tutorial
Last synced: 23 Oct 2025
https://github.com/kibae/onnxruntime-server
ONNX Runtime Server: The ONNX Runtime Server is a server that provides TCP and HTTP/HTTPS REST APIs for ONNX inference.
ai contributions-welcome cuda deep-learning inference-server machine-learning nueral-networks onnx onnxruntime
Last synced: 05 Apr 2025
https://github.com/lucasdelimanogueira/PyNorch
Recreating PyTorch from scratch (C/C++, CUDA, NCCL and Python, with multi-GPU support and automatic differentiation!)
c cuda deep-learning neural-network python pytorch
Last synced: 15 Sep 2025
https://github.com/cp2k/dbcsr
DBCSR: Distributed Block Compressed Sparse Row matrix library
blas cp2k cuda gemm hpc linear-algebra matrix-multiplication mpi openmp-parallelization sparse-matrix
Last synced: 21 Feb 2026
https://github.com/eth-cscs/implicitglobalgrid.jl
Almost trivial distributed parallelization of stencil-based GPU and CPU applications on a regular staggered grid
cuda distributed gpu julia julia-mpi-wrapper mpi multi-gpu staggered-grids stencil-codes
Last synced: 04 Apr 2025
https://github.com/lucasdelimanogueira/pynorch
Recreating PyTorch from scratch (C/C++, CUDA, NCCL and Python, with multi-GPU support and automatic differentiation!)
c cuda deep-learning neural-network python pytorch
Last synced: 07 Jul 2025
https://github.com/rocm/hipblas
[DEPRECATED] Moved to ROCm/rocm-libraries repo
Last synced: 02 Apr 2026
https://github.com/patwie/cuda-design-patterns
Some CUDA design patterns and a bit of template magic for CUDA
bazel cpp11 cuda cuda-development cuda-device cuda-kernels cuda-utils gpu template-metaprogramming
Last synced: 14 Apr 2025
https://github.com/rust-nvml/nvml-wrapper
Safe Rust wrapper for the NVIDIA Management Library
cuda ffi ffi-bindings ffi-wrapper gpu hardware-management hardware-monitoring library monitoring nvidia nvml opencl
Last synced: 12 Dec 2025
https://github.com/arborx/arborx
Performance-portable geometric search library
bounding-volume-hierarchy c-plus-plus clustering cpp cuda dbscan distributed gpu hdbscan high-performance-computing hpc knn-search kokkos mpi nearest-neighbors parallel
Last synced: 10 Apr 2025
https://github.com/electronic-structure/SIRIUS
Domain specific library for electronic structure calculations
cuda density-functional-theory electronic-structure-calculations full-potential gpu lapw mpi planewave pseudopotential rocm
Last synced: 09 Jul 2025
https://github.com/chenhunghan/ialacol
🪶 Lightweight OpenAI drop-in replacement for Kubernetes
ai cloudnative cuda ggml gptq gpu helm kubernetes langchain llamacpp llm llm-inference llm-serving openai python
Last synced: 30 Sep 2025
https://github.com/bobmcdear/attorch
A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.
cuda deep-learning machine-learning openai openai-triton pytorch triton
Last synced: 22 Aug 2025
https://github.com/jamjamjon/usls
A Rust library integrated with ONNXRuntime, providing a collection of Computer Vison and Vision-Language models.
clip cuda florence2 grounding-dino imshow moondream ocr onnx onnxruntime rust-yolo sam sapiens smolvlm tensorrt yolo yolo-rs yolo-rust yolov10 yolov11 yolov8
Last synced: 16 May 2025
https://github.com/fangq/mcx
Monte Carlo eXtreme (MCX) - GPU-accelerated photon transport simulator
3d c cuda matlab monte-carlo optical-imaging pascal photon-transport physics-simulation ray-tracing volumetric-rendering voxel-based
Last synced: 15 May 2025
https://github.com/1461521844lijin/trt_yolo_video_pipeline
TensorRT+YOLO系列的 多路 多卡 多实例 并行视频分析处理案例
cuda ffmpeg opencv video-processing yolo yolov8
Last synced: 18 Jul 2025
https://github.com/devnen/dia-tts-server
Self-host the powerful Dia TTS model. This server offers a user-friendly Web UI, flexible API endpoints (incl. OpenAI compatible), support for SafeTensors/BF16, voice cloning, dialogue generation, and GPU/CPU execution.
ai api-server audio-generation cuda dia dia-tts dialogue-tts fastapi huggingface openai-api python pytorch speech-synthesis speech-synthesis-api text-to-speech tts tts-api voice-cloning web-ui
Last synced: 06 May 2025
https://github.com/chonspqx/modulated-deform-conv
deformable convolution 2D 3D DeformableConvolution DeformConv Modulated Pytorch CUDA
cuda cuda-extension deform-conv3d deformable-convolutional deformable-convolutional-networks python pytorch
Last synced: 07 Jul 2025
https://github.com/qengineering/jetson-nano-image
Jetson Nano image with deep learning frameworks
cuda deep-learning jetson-nano mnn ncnn opencv pytorch sd-card-image team-viewer tegra tensorflow torch torchvision
Last synced: 05 Apr 2025
https://github.com/goofit/goofit
Code repository for the massively-parallel framework for maximum-likelihood fits, implemented in CUDA/OpenMP
cuda fitting gpu gpu-computing omp physics root-cern thrust
Last synced: 10 Apr 2025
https://github.com/roastduck/FreeTensor
A language and compiler for irregular tensor programs.
ast automatic-differentiation code-generation cuda gpu jit openmp tensor
Last synced: 11 Apr 2025
https://github.com/fhamborg/newsmtsc
Target-dependent sentiment classification in news articles reporting on political events. Includes a high-quality data set of over 11k sentences and a state-of-the-art classification model.
cuda dataset deep-learning news-articles pytorch sentiment-analysis sentiment-classification text-classification tsc
Last synced: 07 Apr 2025
https://github.com/cgtuebingen/ggnn
GGNN: State of the Art Graph-based GPU Nearest Neighbor Search
ann approximate-nearest-neighbor-search cuda gpu nearest-neighbor-search vector-database vector-db
Last synced: 20 Nov 2025
https://github.com/openmlsys/openmlsys-cuda
Tutorials for writing high-performance GPU operators in AI frameworks.
Last synced: 08 Oct 2025
https://github.com/charlesq34/diy-deep-learning-workstation
Build a deep learning workstation from scratch (HW & SW).
cuda deep-learning gpu ubuntu workstations
Last synced: 25 Feb 2026
https://github.com/anicetngrt/jiro-nn
A Deep Learning and preprocessing framework in Rust with support for CPU and GPU.
adam classification cuda data-analysis deep-learning dropout gpu gpu-computing machine-learning ml nalgebra neural-networks nn opencl pipelines regression rust sgd
Last synced: 09 Apr 2025
https://github.com/rsnk96/Ubuntu-Setup-Scripts
Scripts to help you set up your Ubuntu quickly, especially if you're in any subfield of Data Science or AI!
anaconda cuda deep-learning deeplearning dl ffmpeg installers ml opencv python pytorch tensorflow tensorflow-setup ubuntu zsh
Last synced: 07 Apr 2025
https://github.com/AnicetNgrt/jiro-nn
A Deep Learning and preprocessing framework in Rust with support for CPU and GPU.
adam classification cuda data-analysis deep-learning dropout gpu gpu-computing machine-learning ml nalgebra neural-networks nn opencl pipelines regression rust sgd
Last synced: 25 Sep 2025
https://github.com/acdslab/mppi-generic
Templated C++/CUDA implementation of Model Predictive Path Integral Control (MPPI)
cpp cuda model-predictive-control model-predictive-path-integral robotics stochastic-optimization
Last synced: 05 Apr 2025
https://github.com/glotzerlab/fresnel
Publication quality path tracing in real time.
cuda optix path-tracing python simulation soft-matter
Last synced: 13 Oct 2025
https://github.com/GooFit/GooFit
Code repository for the massively-parallel framework for maximum-likelihood fits, implemented in CUDA/OpenMP
cuda fitting gpu gpu-computing omp physics root-cern thrust
Last synced: 08 Apr 2025
https://github.com/naeioi/pbf-cuda
Position Based Fluids CUDA implementation
cuda fluid-solver opengl real-time simulation
Last synced: 26 Apr 2025
https://github.com/qdLMF/LIO-SAM-GPU-ScanToMapOpt
A CUDA reimplementation of the line/plane odometry of LIO-SAM. A point cloud hash map (inspired by iVox of Faster-LIO) on GPU is used to accelerate 5-neighbour KNN search.
3d-mapping cuda faster-lio gpu ivox knn lidar lidar-inertial-odometry lidar-slam lio lio-sam loam slam
Last synced: 18 Mar 2025
https://github.com/jdermody/brightwire
Bright Wire is an open source machine learning library for .NET with GPU support (via CUDA)
convolutional-neural-networks csharp cuda cuda-support gpu gpu-support machine-learning machine-learning-library machinelearning neural-network recurrent-neural-networks
Last synced: 05 Apr 2025
https://github.com/ihhub/penguinv
Computer vision library with focus on heterogeneous systems
avx computer-vision cpp cuda gpu hacktoberfest heterogeneous-systems image-processing opencl python simd sse thread-pool
Last synced: 30 Oct 2025
https://github.com/inoryy/tensorflow-optimized-wheels
TensorFlow wheels built for latest CUDA/CuDNN and enabled performance flags: SSE, AVX, FMA; XLA
avx2 cuda cudnn python sse tensorflow tensorflow-gpu tensorflow-wheels wheels xla
Last synced: 02 Apr 2025
https://github.com/psmarter/mini-infer
基于PagedAttention的高性能大模型推理引擎(重构中)
ai cuda deep-learning gpu inference language-model llm machine-learning pagedattention python pytorch transformer triton
Last synced: 02 Apr 2026
https://github.com/gpmueller/eigen-cuda
MWE for using the Eigen library in CUDA kernels
Last synced: 14 Apr 2025
https://github.com/MuGdxy/muda
μ-Cuda, COVER THE LAST MILE OF CUDA. With features: intellisense-friendly, structured launch, automatic cuda graph generation and updating.
cuda cuda-cpp cuda-programming
Last synced: 09 Jul 2025
https://github.com/nirw4nna/dsc
Tensor library & inference framework for machine learning
cuda gpu large-language-models machine-learning pytorch tensor-algebra
Last synced: 21 Jan 2026
https://github.com/sniklaus/pytorch-extension
an example of a CUDA extension for PyTorch using CuPy which computes the Hadamard product of two tensors
cuda cupy deep-learning python pytorch
Last synced: 25 Dec 2025
https://github.com/src-d/minhashcuda
Weighted MinHash implementation on CUDA (multi-gpu).
cuda lsh machine-learning minhash
Last synced: 09 Apr 2025
https://github.com/arbor-sim/arbor
The Arbor multi-compartment neural network simulation library.
cuda gpu hip hpc modern-cpp mpi neuroscience
Last synced: 16 May 2025