CUDA
CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.
- GitHub: https://github.com/topics/cuda
- Wikipedia: https://en.wikipedia.org/wiki/CUDA
- Created by: Nvidia
- Released: June 23, 2007
- Related Topics: nvcc,
- Last updated: 2026-06-20 00:07:16 UTC
- JSON Representation
https://github.com/lucaspar/poetry-torch
Installing hardware-accelerated PyTorch with Poetry on different hardware using the same `pyproject.toml`
cuda hardware-acceleration poetry pyproject-toml python-poetry pytorch torch
Last synced: 19 Oct 2025
https://github.com/llnl/fpchecker
A dynamic analysis tool to detect floating-point errors in HPC applications.
cancellation clang cuda exceptions floating-point floating-point-arithmetic infinity llvm overflow overflow-detection underflow-detection
Last synced: 21 May 2026
https://github.com/r00tman/eventhands
Real-Time Neural 3D Hand Pose Estimation from an Event Stream [ICCV 2021]
computer-vision cuda dataset deep-learning event-camera hand-pose hand-pose-estimation hand-tracking iccv2021 mano opengl pytorch smpl
Last synced: 11 Apr 2025
https://github.com/pkestene/euler2d_kokkos
Simple 2d finite volume solver for Euler equations using c++ kokkos library
cea cfd cpp cuda euler finite-volume gpu gpu-computing hydrodynamics kokkos miniapp multithreading openmp parallelism parallelization performance-portability
Last synced: 19 Aug 2025
https://github.com/CAPS-UMU/FIDESlib
A server-side CKKS GPU library fully interoperable with OpenFHE.
ckks cuda gpu homomorphic-encryption openfhe
Last synced: 11 Apr 2026
https://github.com/harrism/ranger
Generate simple index ranges in C++ and CUDA C++
Last synced: 22 Mar 2025
https://github.com/shrec/ultrafastsecp256k1
Ultra high-performance secp256k1 ECC engine | Python, Node.js, Rust, Go, C#, Swift, Java bindings | CUDA, Metal, OpenCL GPU | ECDSA, Schnorr, FROST, MuSig2, BIP-352 | 15+ platforms
android arm64 bitcoin constant-time cryptography cuda ecdsa embedded ethereum ffi gpu-cryptography ios nodejs opencl python riscv rust schnorr-signatures secp256k1 webassembly
Last synced: 04 Jun 2026
https://github.com/termoshtt/link_cuda_kernel
HowTo: Compile CUDA with nvcc, and link to Rust
Last synced: 24 Apr 2025
https://github.com/hpi-epic/gpucsl
Constraint-based Causal Structure Learning on GPUs.
csl cuda cupy gpu-computing python
Last synced: 30 Oct 2025
https://github.com/mravanelli/pytorch_mlp_for_asr
This code implements a basic MLP for speech recognition. The MLP is trained with pytorch, while feature extraction, alignments, and decoding are performed with Kaldi. The current implementation supports dropout and batch normalization. An example for phoneme recognition using the standard TIMIT dataset is provided.
asr cuda deep-learning deep-neural-networks feedforward-neural-network kaldi kaldi-asr mlp multilayer-perceptron neural-networks python pytorch speech-recognition timit
Last synced: 27 Jul 2025
https://github.com/gabrielscabrera/nbody
GPU-accelerated N-Body particle simulator with visualizer.
cuda cuda-support nbody nbody-gravity nbody-gravity-simulation nbody-sim nbody-simulation nbody-simulations particle-system particles particles-animations simulations sphere
Last synced: 30 Apr 2025
https://github.com/mravanelli/pytorch_MLP_for_ASR
This code implements a basic MLP for speech recognition. The MLP is trained with pytorch, while feature extraction, alignments, and decoding are performed with Kaldi. The current implementation supports dropout and batch normalization. An example for phoneme recognition using the standard TIMIT dataset is provided.
asr cuda deep-learning deep-neural-networks feedforward-neural-network kaldi kaldi-asr mlp multilayer-perceptron neural-networks python pytorch speech-recognition timit
Last synced: 19 Jul 2025
https://github.com/marianhlavac/fft-cuda
Fast Fourier Transform implementation, computable on CUDA platform. Seminar project for MI-PRC course at FIT CTU.
c-plus-plus coursework cuda fast-fourier-transform fit-ctu nvidia python
Last synced: 14 Apr 2025
https://github.com/outofai/cudacanvas
Python Module for PyTorch Tensor Visualisation in CUDA Eliminating CPU Transfer
cuda deep-learning gpu machine-learning neural-network plotting pypi pypi-package python pytorch tensor visulization
Last synced: 09 Sep 2025
https://github.com/nvidia/numbast
Numbast is a tool to build an automated pipeline that converts CUDA APIs into Numba bindings.
Last synced: 25 Mar 2025
https://github.com/countzero/windows_llama.cpp
PowerShell automation to rebuild llama.cpp for a Windows environment.
cmake conda cuda llama-cpp openblas powershell windows
Last synced: 26 Apr 2026
https://github.com/feltor-dev/feltor
Numerical methods for edge and scrape-off layer blob and turbulence simulations. Homepage:
c-plus-plus computational-fluid-dynamics computational-physics cuda discontinuous-galerkin gpu mpi numerical-methods numerical-simulations openmp parallel-algorithm parallel-computing plasma-physics plasma-turbulence
Last synced: 10 Apr 2026
https://github.com/differentiableuniverseinitiative/jaxdecomp
JAX bindings for the NVIDIA cuDecomp library
Last synced: 08 May 2025
https://github.com/bruce-lee-ly/flash_attention_inference
Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.
cuda cutlass flash-attention flash-attention-2 gpu inference large-language-model llm mha multi-head-attention nvidia tensor-core
Last synced: 25 Aug 2025
https://github.com/prg-titech/dynasoar
CUDA Dynamic Memory Allocator for SOA Data Layout
cuda memory-allocation simd smmo
Last synced: 12 May 2025
https://github.com/tdortman/Cuckoo-GPU
High-Performance GPU Cuckoo Filter
bloom-filter cuckoo-filter cuda gpu membership-query probabilistic-data-structures probabilistic-filters
Last synced: 31 Mar 2026
https://github.com/kwea123/python-ray-tracing-with-cuda-example
An example of cuda ray tracing in pure python syntax.
Last synced: 26 Mar 2025
https://github.com/gangliao/vs-code-cuda
support cuda grammars in Visual Studio Code
cuda vs vs-code vscode-extension
Last synced: 26 Jul 2025
https://github.com/jnbntz/gpu-edu-workshops
Code examples for CUDA and OpenACC
Last synced: 04 Mar 2025
https://github.com/pinto0309/facemesh_onnx_tensorrt
Verify that the post-processing merged into FaceMesh works correctly. The object detection model can be anything other than BlazeFace. YOLOv4 and FaceMesh committed to this repository have modified post-processing.
cuda facemesh onnx python tensorrt
Last synced: 30 Apr 2025
https://github.com/gangliao/VS-Code-Cuda
support cuda grammars in Visual Studio Code
cuda vs vs-code vscode-extension
Last synced: 11 Mar 2025
https://github.com/SafeAILab/zkDL
zkDL, an open source toolkit for zero-knowledge proofs of deep learning powered by CUDA
cuda deep-neural-networks gpu-acceleration privacy-enhancing-technologies zero-knowledge-proof
Last synced: 01 May 2025
https://github.com/huangcongqing/cuda-learning
cuda编程学习入门
cuda cuda-kernels cuda-programming
Last synced: 15 Apr 2025
https://github.com/quim0/wfa-gpu
GPU implementation of the Wavefront Alignment Algorithm for global, gap-affine, pairwise sequence alginment
bioinformatics cuda gpu pairwise-alignment sequence-alignment
Last synced: 21 Jun 2025
https://github.com/wdmapp/gtensor
GTensor is a multi-dimensional array C++14 header-only library for hybrid GPU development.
cpp cpp14 cuda gpu hacktoberfest rocm sycl
Last synced: 04 Apr 2025
https://github.com/pkestene/euler_kokkos
Compressible hydro and magneto-hydrodynamics (2nd order Godunov) implemented with MPI+Kokkos
cea cfd cmake cpp cuda finite-volume finite-volume-method fluid-dynamics gpu kokkos magnetohydrodynamics mpi parallel-computing parallelism performance-portability
Last synced: 19 Aug 2025
https://github.com/davidalgis/interopunitycuda
Demonstrate interoperability between Unity Engine and CUDA
cpp cuda dx11 gpu gpu-acceleration native-plugin opengl unity unity3d
Last synced: 26 Feb 2026
https://github.com/rightnow-ai/automegakernel
An agent harness that compiles a model into one provably-correct, self-retargeting CUDA megakernel and self-tunes it past cuBLAS at batch-1 LLM decode.
agent-harness cuda gpu gpu-programming kernel-fusion llm-inference machine-learning megakernel mlsys
Last synced: 15 Jun 2026
https://github.com/blurgyy/jaxngp
JAX implementation of instant-ngp (NeRF part)
cuda hashgrid instant-ngp jax nerf neural-radiance-field nix python wsl
Last synced: 09 Apr 2025
https://github.com/LatticeQCD/SIMULATeQCD
SIMULATeQCD is a multi-GPU Lattice QCD framework that makes it easy for physicists to implement lattice QCD formulas while still providing competitive performance.
cuda gpu hip hpc lattice lattice-qcd mpi parallel physics
Last synced: 26 Mar 2025
https://github.com/AkashiSN/ffmpeg-docker
ffmpeg build in docker
aribb24 crossbuild cuda docker ffmpeg intel-qsv mingw64 vaapi
Last synced: 18 Jul 2025
https://github.com/sdpython/onnx-extended
New operators for the ReferenceEvaluator, new kernels for onnxruntime, CPU, CUDA
cuda machine-learning onnx onnxruntime python
Last synced: 11 Oct 2025
https://github.com/pawbz/geophyinv.jl
A Julia Toolbox for Geophysical Modeling and Inverse Problems
cuda elastic finite-difference-method gpu julia parallel-computing poisson seismic-inversion seismic-tomography seismic-waves staggeredgrid
Last synced: 09 Oct 2025
https://github.com/pkestene/euler2d_cudafortran
2nd order Godunov solver for 2d Euler equations written in CUDA Fortran and stdpar (standard paralelism)
cea conservation-laws cuda cuda-fortran euler-equations fortran gpu gpu-computing hydrodynamics nvfortran nvhpc stdpar
Last synced: 19 Aug 2025
https://github.com/baderlab/ecuda
STL-like containers (array, vector, matrix, cube) useable in device code.
Last synced: 15 Jun 2025
https://github.com/tweakoz/orkid
Orkid Media Engine (C++/Lua/Python3/Linux/MacOs/OpenVR)
audio bullet-physics cplusplus-20 cuda ecs game-development game-engine gltf2 graphics kurzweil-k2000 linux mac openvdb particles pbr-shading python3 pytorch simulation virtual-reality vulkan
Last synced: 10 Apr 2025
https://github.com/mumax/plus
More versatile and extensible GPU-accelerated micromagnetic simulator
cpp cuda finite-difference-time-domain gpu-computing micromagnetics python scientific-computing
Last synced: 20 Apr 2026
https://github.com/niftypet/nipet
High-throughput PET image reconstruction with high quantitative accuracy and precision
analysis cuda gpu image-reconstruction medical-imaging mlem pet processing python
Last synced: 10 Sep 2025
https://github.com/rodrgo/OpenPH
Parallel reduction of boundary matrices for Persistent Homology with CUDA
cuda gpu-computing numerical-computation parallel-computing persistent-homology topological-data-analysis
Last synced: 01 May 2025
https://github.com/lddl/object-detection-opencv-rust
Just set of functions to utilize YOLO v3, v4, v7 and v8 version with OpenCV's DNN module
computer-vision cuda object-detection onnx onnxruntime opencv yolov3 yolov4 yolov7 yolov8
Last synced: 11 May 2026
https://github.com/williamvenner/squad-mortar-helper
💣 SMH – a computer vision project for automatic, precision mortar strike calculations in Squad
computer-vision cuda cv gpu mortar mortars rust smh squad squad-game squadgame
Last synced: 08 May 2025
https://github.com/coderonion/cuda-beginner-course-cpp-version
bilibili视频【CUDA 12.x 并行编程入门(C++版)】配套代码
cpp cublas cuda cuda-programming cudnn gpu gpu-programming nvcc nvidia parallel-programming python rust
Last synced: 15 Jun 2025
https://github.com/jundaf2/eigenmha
Forward and backward Attention DNN operators implementationed by LibTorch, cuDNN, and Eigen.
backpropagation cuda cudnn cudnn-v8 dnn inference pytorch
Last synced: 13 Apr 2025
https://github.com/gjbex/python-for-hpc
Repository for participants of the "Python for HPC" training
cuda cython dask gpu hpc mpi numba python python-training scientific-computing swig training
Last synced: 13 Jul 2025
https://github.com/xapajiamnu/glm
A GPU language model, based on btree backed tries.
c-plus-plus cuda fast gpu language-model
Last synced: 13 Apr 2025
https://github.com/openxrlab/xrtailor
OpenXRLab GPU Cloth Simulation Engine
cloth-simulation computer-graphics cpp cuda physics-simulation synthetic-data
Last synced: 14 Oct 2025
https://github.com/seonglae/llama2gptq
Chat to LLaMa 2 that also provides responses with reference documents over vector database. Locally available model using GPTQ 4bit quantization.
chatai chatbot chatgpt cuda gpt langchain llama-2 llama2 model-quantization quantization question-answering rye streamlit-chat transformers
Last synced: 22 Apr 2025
https://github.com/conradsnicta/bandicoot-code
Bandicoot: C++ library for GPU linear algebra & scientific computing - https://coot.sourceforge.io
armadillo c-plus-plus clblas cublas cuda cuda-kernels cusolver gpu gpu-accelerated-library gpu-acceleration gpu-computing linear-algebra linear-algebra-library machine-learning matrix-functions matrix-library opencl opencl-kernels scientific-computing
Last synced: 07 Mar 2026
https://github.com/pawbz/GeoPhyInv.jl
A Julia Toolbox for Geophysical Modeling and Inverse Problems
cuda elastic finite-difference-method gpu julia parallel-computing poisson seismic-inversion seismic-tomography seismic-waves staggeredgrid
Last synced: 07 May 2025
https://github.com/ahdhn/cudatemplate
Template for starting CUDA/C++ project using CMake with Github Action for CI
Last synced: 19 Apr 2025
https://github.com/idealab-isu/GPView
GPU Accelerated Voxelization Framework for 3D CAD models.
Last synced: 20 Mar 2025
https://github.com/shibatch/tlfloat
C++ template library for floating point operations
arbitrary-precision bfloat16 constexpr cplusplus cpp20 cross-platform cuda elementary-functions float128 float256 floating-point half-precision heapless ieee754 library math octuple-precision quadruple-precision templates
Last synced: 14 Jul 2025
https://github.com/mnicely/nvml_examples
Examples showing how to utilize the NVML library for GPU monitoring
Last synced: 14 Apr 2025
https://github.com/jimouris/parallel-convolution
🖼️ Parallel Image Convolution, applying a blur filter to images. Written in C, optimized in three different ways: MPI, MPI & OpenMP and CUDA.
blur-filter cuda image-convolution image-processing mpi mpi-library parallel-processing
Last synced: 08 Apr 2025
https://github.com/dereklstinson/gocudnn
Go Bindings for cudnn and other cuda pacakges.
convolutional-neural-networks cuda cudnn go golang machine-learning neural-network
Last synced: 24 Jul 2025
https://github.com/miguelcarcamov/gpuvmem
GPU Framework for Radio Astronomical Image Synthesis
alma astronomical-algorithms astronomical-images astrophysics complex-systems cuda gpu gpu-acceleration gpu-computing image-synthesis maximum-entropy multi-gpu optimization-methods radio-imaging radio-interferometry radioastronomy ska vla
Last synced: 14 Apr 2025
https://github.com/anibali/docker-torch
A Docker image for Lua Torch
cuda docker docker-image lua torch
Last synced: 28 Oct 2025
https://github.com/kamalkraj/swin-transformer-serve
Deploy Swin Transformer using TorchServe
cuda deploy docker eager-load microsoft python3 pytorch swin-transformer torch torchscript torchserve
Last synced: 18 Oct 2025
https://github.com/johnh2o2/cuvarbase
Python library for fast time-series analysis on CUDA GPUs
cuda fourier-methods gpu gpu-computing lomb-scargle-periodogram nfft python python-3 time-series
Last synced: 07 May 2025
https://github.com/hpcgarage/cuasr
cuASR: CUDA Algebra for Semirings
cuda high-performance-computing linear-algebra linear-algebra-library
Last synced: 29 Oct 2025
https://github.com/tier4/autoware_nova_carter
Integration of NVIDIA Nova Carter with Autoware
amr autonomous-driving autoware cuda nvidia nvidia-jetson ros2
Last synced: 07 Mar 2026
https://github.com/pkestene/ppkmhd
MPI+Kokkos implementation of spectral difference method (SDM) high order schemes
cea cfd cpp cuda finite-volume finite-volume-method finite-volumes gpu gpu-computing hpc hydrodynamics kokkos magnetohydrodynamics mpi parallel-computing performance-portability
Last synced: 19 Aug 2025
https://github.com/ashvardanian/cuda-python-starter-kit
Parallel Computing starter project to build GPU & CPU kernels in CUDA & C++ and call them from Python without a single line of CMake using PyBind11
cmake cuda cuda-programming hip hpc matrix-multiplication openmp parallel-computing parallel-programming pybind pybind11 python starter-kit starter-template tutorial
Last synced: 13 Jul 2025
https://github.com/eth-cscs/spla
Specialized Parallel Linear Algebra, providing distributed GEMM functionality for specific matrix distributions with optional GPU acceleration.
cuda gemm linear-algebra mpi rocm
Last synced: 14 Apr 2025
https://github.com/anderson101866/cualgo
A cross-platform Pytnon library for fundamental algorithm with GPU-accelerated computing
algorithm cuda gpu gpu-acceleration gpu-computing numpy python
Last synced: 14 Jul 2025
https://github.com/tjyuyao/cutex
PyCUDA based PyTorch Extension Made Easy
cuda customized extension operator pycuda pytorch
Last synced: 13 Sep 2025
https://github.com/akhuntsaria/canny-edge-detection
Canny edge detector implemented in CUDA C/C++
cuda image-processing video-processing
Last synced: 17 Jan 2026
https://github.com/cair/fast-tsetlin-machine-in-cuda-with-imdb-demo
A CUDA implementation of the Tsetlin Machine based on bitwise operators
bitwise-operators cuda explainable-artificial-intelligence gpu-computing pattern-recognition tsetlin-machine
Last synced: 08 Oct 2025
https://github.com/pkestene/ppkMHD
MPI+Kokkos implementation of spectral difference method (SDM) high order schemes
cea cfd cpp cuda finite-volume finite-volume-method finite-volumes gpu gpu-computing hpc hydrodynamics kokkos magnetohydrodynamics mpi parallel-computing performance-portability
Last synced: 10 Mar 2025
https://github.com/gritukan/hamkaas
cublas cuda cudnn deep-learning diy inference
Last synced: 13 Apr 2025
https://github.com/fclc/multi-plexer
Goal: Low power cluster capable of serving 24+ streams of 4KHDR60 source transcodes while consuming no more than 100W at peak and idling at less than 10W
arm64 clustering cuda decoding encoding ffmpeg hardware hdr jetson jetson-nano jetson-xavier-nx opencl plex pocl raspberry-pi-4 rockpro64 transcode zfs zfsonlinux
Last synced: 14 Aug 2025
https://github.com/bigmat18/cuda-mesh-voxelization
GPU-accelerated pipeline for robust 3D mesh Boolean operations (CSG) using voxelized Signed Distance Fields (SDFs).
computer-graphics cuda hpc mesh-processing voxel
Last synced: 15 May 2026
https://github.com/adamtiger/tinygpulang
Tutorial on building a gpu compiler backend in LLVM
Last synced: 25 Feb 2026
https://github.com/kerkelae/disimpy
Massively parallel Monte Carlo diffusion MR simulator written in Python.
cuda diffusion-mri gpu-computing monte-carlo-simulation
Last synced: 12 Apr 2025
https://github.com/eyalroz/gpu-kernel-runner
Runs a single CUDA/OpenCL kernel, taking its source from a file and arguments from the command-line
cuda debugging-tool gpgpu gpu gpu-kernel-performance gpu-kernels multi-language opencl performance-analysis performance-testing profiling runner
Last synced: 22 Jan 2026
https://github.com/whizzzkid/opencv-complete-build-cuda
Full build script for Open CV with/without cuda and bumblebee support
build build-automation cuda cudnn install installer opencv opencv-library opencv-python opencv3 opencv3-python
Last synced: 17 Nov 2025
https://github.com/unsalted/docker-nheqminer-cuda
CUDA capable docker image of nheqminer (zcash/equihash miner)
blockchain cuda docker docker-image gpu nheqminer nicehash nvidia nvidia-docker zcash
Last synced: 04 Apr 2026
https://github.com/Deyht/CIANNA
Convolutional Interactive Artificial Neural Networks by/for Astrophysicists
astronomy astrophysics convolutional-neural-networks cuda deep-learning deep-neural-networks gpu machine-learning ml neural-network object-detection yolo
Last synced: 20 Apr 2025
https://github.com/m-pilia/disptools
Generate displacement fields with known volume changes
cuda image-processing jacobian python3
Last synced: 15 Oct 2025
https://github.com/yoyoberenguer/pygameshader
2D Game texture special effects
2d 2d-graphics cuda cuda-kernels cupy effects game game-2d game-development game-library gpu graphics image-processing indiegame openmp pygame shaders special-effects
Last synced: 03 Apr 2025
https://github.com/nasa-jpl/flightview
Real-time tools for Imaging Spectroscopy Data
aviris camera cameralink cuda hyperspectral hyperspectral-analysis hyperspectral-data rtp spectroscopy
Last synced: 24 Apr 2025
https://github.com/tristanbilot/mlx-gcn
MLX implementation of GCN, with benchmark on MPS, CUDA and CPU (M1 Pro, M2 Ultra, M3 Max).
apple cuda deep-learning gnn mlx pytorch
Last synced: 13 Sep 2025
https://github.com/omlins/libdiffusion
Proof of Concept: a C-callable GPU-enabled parallel 2-D heat diffusion solver written in Julia using CUDA, MPI and graphics
c cuda distributed julia mpi mult-gpu
Last synced: 04 Apr 2025