CUDA
CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.
- GitHub: https://github.com/topics/cuda
- Wikipedia: https://en.wikipedia.org/wiki/CUDA
- Created by: Nvidia
- Released: June 23, 2007
- Related Topics: nvcc,
- Last updated: 2026-06-20 00:07:16 UTC
- JSON Representation
https://github.com/nasa-jpl/flightview
Real-time tools for Imaging Spectroscopy Data
aviris camera cameralink cuda hyperspectral hyperspectral-analysis hyperspectral-data rtp spectroscopy
Last synced: 24 Apr 2025
https://github.com/sandialabs/lgrtk
Tool Kit for Lagrangian Grid Reconnection
cuda gpu hpc physics sandia-national-laboratories scr-2300 snl-applications
Last synced: 02 May 2025
https://github.com/juliagpu/nccl.jl
A Julia wrapper for the NVIDIA Collective Communications Library.
Last synced: 20 Sep 2025
https://github.com/sgl-project/whl
SGLang Kernel Wheel Index
cuda cutlass flashinfer sglang
Last synced: 12 Jun 2026
https://github.com/carpentries-incubator/lesson-gpu-programming
GPU Programming with Python and CUDA.
beta carpentries-incubator cuda cupy english gpu lesson lesson-gpu-programming numba parallel-programming programming python
Last synced: 16 Mar 2026
https://github.com/tree-sitter-grammars/tree-sitter-cuda
CUDA grammar for tree-sitter
Last synced: 30 Dec 2025
https://github.com/amusi/sift-gpu
A CUDA implementation of SIFT
cuda feature-detection gpu keypoints-detector sift
Last synced: 25 Mar 2025
https://github.com/iowar/kecmatch-gpu
Finds matching solidity function signatures using GPU
Last synced: 15 Jul 2025
https://github.com/shrec/UltrafastSecp256k1
Ultra high-performance secp256k1 ECC library | C++20 | CUDA, Metal, OpenCL, ROCm, WASM | Apple Silicon M1-M4 | 15+ platforms | Branchless, allocation-free hot paths
android arm64 bitcoin constant-time crypto cryptocurrency cryptography cuda ecc ecdsa embedded ethereum gpu-cryptography ios opencl performance riscv schnorr-signatures secp256k1 webassembly
Last synced: 03 Apr 2026
https://github.com/SomeoneSerge/nixpkgs-cuda-ci
Building and caching nixpkgs with cudaSupport=true. We push to https://cuda-maintainers.cachix.org/
computer-vision cuda deep-learning nix nixpkgs
Last synced: 08 Aug 2025
https://github.com/bluescarni/rakau
C++17 N-body Barnes-Hut on heterogeneous hardware architectures
astronomy astrophyics astrophysical-simulation avx avx2 avx512 cpp17 cuda n-body n-body-simulator nbody nbody-gravity-simulation nbody-problem nbody-sim nbody-simulation rocm simd vectorization
Last synced: 04 Jul 2025
https://github.com/ktaletsk/NCCV
Short course on computer vision and image processing using Numba+CUDA+OpenCV
computer-vision cuda jupyter-notebook numba
Last synced: 09 May 2025
https://github.com/sevagh/zen
optimized realtime harmonic/percussive source separation using the GPU (NVIDIA CUDA) and CPU (Intel IPP)
audio cuda digital-signal-processing dsp real-time source-separation thrust
Last synced: 13 Apr 2025
https://github.com/ENOT-AutoDL/ONNX-Runtime-with-TensorRT-and-OpenVINO
Docker scripts for building ONNX Runtime with TensorRT and OpenVINO in manylinux environment
cuda nvidia onnx onnxruntime openvino tensorrt
Last synced: 20 Mar 2025
https://github.com/kostyaev/sentence2vec
Deep sentence embedding using Sequence to Sequence learning
cuda sentence2vec seq2seq torch
Last synced: 21 Mar 2025
https://github.com/illuhad/hipcpu
Implementation of AMD HIP for CPUs
cuda gpgpu hip hpc openmp openmp-parallelization
Last synced: 16 Apr 2025
https://github.com/TristanBilot/mlx-GCN
MLX implementation of GCN, with benchmark on MPS, CUDA and CPU (M1 Pro, M2 Ultra, M3 Max).
apple cuda deep-learning gnn mlx pytorch
Last synced: 27 Mar 2025
https://github.com/ptsolvers/chmy.jl
Finite differences and staggered grids on CPUs and GPUs
cuda gpu julialang metal mpi parallel rocm staggeredgrid stencil
Last synced: 23 Apr 2025
https://github.com/illuhad/hipCPU
Implementation of AMD HIP for CPUs
cuda gpgpu hip hpc openmp openmp-parallelization
Last synced: 21 Apr 2025
https://github.com/src-d/infrastructure-dockerfiles
Dockerfile-s to build the images which power source{d}'s computing infrastructure.
cuda dockerfile infrastructure jupyterhub pytorch tensorflow
Last synced: 05 May 2025
https://github.com/shahriarrezghi/spyker
High-performance Spiking Neural Networks Library Written From Scratch with C++ and Python Interfaces.
computational-neuroscience cuda cudnn cxx high-performance neuroscience onednn python r-stdp snn stdp
Last synced: 02 Oct 2025
https://github.com/ktaletsk/nccv
Short course on computer vision and image processing using Numba+CUDA+OpenCV
computer-vision cuda jupyter-notebook numba
Last synced: 04 Sep 2025
https://github.com/ShahriarRezghi/Spyker
High-performance Spiking Neural Networks Library Written From Scratch with C++ and Python Interfaces.
computational-neuroscience cuda cudnn cxx high-performance neuroscience onednn python r-stdp snn stdp
Last synced: 04 Apr 2025
https://github.com/mikeswang/triumvirate
A Python/C++ package for three-point clustering measurements in LSS analyses
clustering-statistics cpp cuda cython hip large-scale-structure-cosmology python
Last synced: 14 Mar 2026
https://github.com/c3sr/comm_scope
NUMA-aware multi-CPU multi-GPU data transfer benchmarks
bandwidth benchmark-suite cuda gpu hip numa nvlink performance
Last synced: 17 Jan 2026
https://github.com/microsoft/svirl
Svirl is GPU-accelerated solver of complex Ginzburg-Landau equations for superconductivity. It consists of time-dependent solver to describe vortex dynamics and free energy minimizer to accurately find static configurations.
cuda ginzburg-landau gpu python scientific-computing superconductivity vortex
Last synced: 30 Jul 2025
https://github.com/koushikphy/intro-to-cuda-fortran
A Complete beginner's introduction to programming with CUDA Fortran
cuda cuda-fortran cuda-kernels cuda-programming fortran fortran90 gpgpu gpu gpu-computing high-performance-computing hpc nvidia nvidia-cuda parallel-computing parallel-programming
Last synced: 28 Oct 2025
https://github.com/heavyai/heavyai.jl
Julia client for OmniSci GPU-accelerated SQL engine and analytics platform
cuda data-science database gpu julia-language julia-package julialang sql
Last synced: 13 Aug 2025
https://github.com/bdhu/gpuinfo
A minimal command-line utility written in Rust for querying GPU status
command-line-tool cuda gpu nvidia nvidia-smi nvml rust rust-lang
Last synced: 13 Apr 2025
https://github.com/xiaosong9905/hpc-notes
Personal Notes for Learning HPC & Parallel Computation [Active Adding New Content]
cuda gpu hpc parallel-computing
Last synced: 15 May 2025
https://github.com/prg-titech/ikra-cpp
C++ Library for Object-oriented Programming with Structure of Arrays Layout
Last synced: 12 May 2025
https://github.com/tylerjthomas9/rapids.jl
An unofficial Julia wrapper for the RAPIDS.ai ecosystem using PythonCall.jl
Last synced: 05 May 2025
https://github.com/pinto0309/20220228_intel_deeplearning_day_hitnet_demo
Special Presentation Demo at Intel IoT Planet 2021 DeepLearning Day / インテル IoT プラネット 2021 DeepLearning Dayの特別講演の発表資料 https://www.intel.co.jp/content/www/jp/ja/now/iot-planet/deep-learning-day.html
cuda docker intel onnx openvino
Last synced: 05 May 2025
https://github.com/mxpv/nvml-go
golang wrapper for NVIDIA Management Library (NVML)
cuda golang golang-wrapper gpu nvidia nvidia-smi nvml
Last synced: 05 Oct 2025
https://github.com/gunrock/mini
mini is mini
cuda gpu graph-primitives gunrock mini-gunrock traversal-operators workload-mapping-strategies
Last synced: 28 Apr 2025
https://github.com/ema2159/equirectangular-cubemaptransform
OpenCV with CUDA and OpenMP implementations for transforming equirectangular images to cube maps and vice versa
cubemap-to-equirectangular cuda equirectangular-to-cubemap opencv openmp
Last synced: 15 Apr 2025
https://github.com/minhhn2910/cuda-half2
Convert CUDA programs from float data type to half or half2 with SIMDization
Last synced: 30 Apr 2025
https://github.com/shahruk10/nixshells
Frequently used nix shells for Python, CUDA and more.
cuda nix nix-shell nixpkgs python tensorflow torch virtualenv
Last synced: 10 Mar 2026
https://github.com/kerneltuner/kernel_launcher
Using C++ magic to launch/capture CUDA kernels and tune them with Kernel Tuner
Last synced: 12 Apr 2025
https://github.com/yhtang/graphdot
GPU-accelerated Marginalized Graph Kernel with customizable node and edge features; Gaussian process regression.
cheminformatics cuda gpu graph-algorithms machine-learning python
Last synced: 15 Apr 2025
https://github.com/justincdavis/trtutils
Utilities for enabling easier high-level usage of TensorRT in Python
cuda dnn-inference gpu-acceleration image-classification inference jetson nvidia object-detection python tensorrt tensorrt-inference
Last synced: 12 Mar 2026
https://github.com/pkestene/ms-hpc-ai-gpu
resources pour le cours d'introduction à la programmation des GPUs du mastère spécialisé HPC-AI
cuda deep-learning gpu gpu-computing machine-learning physics-informed-neural-networks pinn pinns
Last synced: 19 Aug 2025
https://github.com/dbklim/docker_image_with_cuda10_cudnn7
Dockerfiles and manual for easy build of docker image with CUDA10.X and cuDNN7.6 to run TensorFlow/PyTorch on the nvidia GPU in docker-container.
cuda cudnn docker docker-gpu docker-image docker-nvidia dockerfile gpu gpu-docker nvidia nvidia-docker pytorch pytorch-gpu tensorflow tensorflow-examples tensorflow-gpu tensorflow-gpu-docker torch
Last synced: 24 Oct 2025
https://github.com/dancing-ui/uestc_vhm
使用yolov8、fast-reid、deepsort完成目标跟踪,使用yolov8、fast-reid、Faiss完成行人重识别
cuda deepsort dockerfile faiss fast-reid tensorrt yolov8n
Last synced: 29 Jul 2025
https://github.com/pleiszenburg/gravitation
n-body-simulation performance test suite
benchmark cuda gpgpu gpgpu-computing high-performance-computing n-body numerical-computation openmp openmp-parallelization parallel-computing parallelization simd simd-parallelism test-suite
Last synced: 25 Jul 2025
https://github.com/nvidia/optix-dev
OptiX SDK headers, everything needed to build & run OptiX applications. SDK samples not included.
cuda gpu gpu-acceleration gpu-programming nvidia optix ray-tracing raytracing
Last synced: 14 Apr 2025
https://github.com/kyegomez/neva
The open source implementation of "NeVA: NeMo Vision and Language Assistant"
artificial-intelligence cuda gpt4 multi-modal multi-modal-learning multithreading neva nvidia robotics
Last synced: 15 Oct 2025
https://github.com/ahmetfurkandemir/nvidia-gpu-benchmark
NVIDIA GPU benchmark
aws c colab-notebook cpp cuda cuda-programming gpu gpu-computing gpu-programming linux nvidia nvidia-gpu tesla
Last synced: 15 Apr 2025
https://github.com/bwohlberg/sporco-cuda
CUDA extension for the SPORCO project
convolutional-sparse-coding cuda gpu
Last synced: 12 Jul 2025
https://github.com/moldyn/clustering
Robust and stable clustering of molecular dynamics simulation trajectories.
biophysics clustering cpp cuda molecular-dynamics
Last synced: 14 Apr 2025
https://github.com/jtschwar/tomo_tv
C++ library for Regularized 2D and 3D Tomography Reconstructions.
3d-reconstruction cuda inverse-problems regularization tomography
Last synced: 25 Apr 2025
https://github.com/niftypet/nimpa
NiftyPET: Neuro-Image Manipulation, Processing and Analysis
analysis cuda gpu medical-imaging mr pet processing python
Last synced: 21 Sep 2025
https://github.com/pkestene/cuda-proj-tmpl
A minimal cmake based project skeleton for developping a CUDA application
cea cmake cuda gpu gpu-computing parallel-computing parallel-programming template
Last synced: 29 Jul 2025
https://github.com/datarhei/ffmpeg
FFmpeg base image for datarhei/core.
alpine cuda docker ffmpeg mmal raspberry-pi vaapi
Last synced: 16 Sep 2025
https://github.com/rocm/hip-python
HIP Python Low-level Bindings
ai cuda cython gpu hip hpc interoperability ml python radeon-instinct-mi-series
Last synced: 12 Apr 2025
https://github.com/harrism/cuda_event_benchmark
Unit benchmarks of CUDA event APIs.
Last synced: 22 Mar 2025
https://github.com/m0dulo/InferSpore
🌱 A fully independent Large Language Model (LLM) inference engine, built leveraging cuBLAS and cub. 🧩
cuda inference-engine llama2 llm
Last synced: 25 Apr 2025
https://github.com/forkni/cuda-link
Zero-copy bidirectional GPU texture sharing between TouchDesigner and Python via CUDA IPC. Sub-microsecond per-frame overhead with ring buffer architecture and GPU-side synchronization.
cuda cupy gpu inter-process-communication ipc python pytorch real-time shared-memory texture-sharing touchdesigner zero-copy
Last synced: 30 May 2026
https://github.com/roflmaostc/radonka.jl
A simple yet sufficiently fast (attenuated) Radon and backproject implementation using KernelAbstractions.jl. Runs on CPU, CUDA, ...
automatic-differentiation computed-tomography ct cuda gpu julia julia-language optimization radon radon-transform tomography x-ray
Last synced: 22 Jul 2025
https://github.com/denzp/rust-inline-cuda-tutorial
Let's jump into CUDA development with Rust
Last synced: 22 Mar 2025
https://github.com/anroshka/snake-ai
🐍 A Snake game AI that learns to play through Deep Q-Learning. Built with PyTorch and Pygame, featuring CUDA acceleration and real-time visualization of the learning process.
artificial-intelligence collaborate collaboration cuda deep-learning deep-q-learning dqn game-ai gpu-acceleration machine-learning neural-network pygame python pytorch q-learning reinforcement-learning snake-game
Last synced: 24 Feb 2026
https://github.com/primitiv/primitiv-python
Python binding of primitiv.
cuda cython deep-learning framework gpu neural-network numpy opencl python
Last synced: 16 Jul 2025
https://github.com/matthewfeickert/nvidia-gpu-ml-library-test
Simple tests for JAX, PyTorch, and TensorFlow to test if the installed NVIDIA drivers are being properly picked up
cuda cudnn gpu jax nvidia pytorch setup tensorflow torch
Last synced: 15 Apr 2025
https://github.com/willprice/flowty
The swiss army knife for extracting optical flow
brox cuda cython dense-inverse-search dis docker farneback lucas-kanade nvidia-docker opencv optic-flow optical-flow pyramidal tv-l1 tvl1 variational-refinement
Last synced: 10 Apr 2025
https://github.com/stellar-group/blaze_cuda
WIP · CUDA compatibility for Blaze · https://bitbucket.org/blaze-lib/blaze
blaze cpp cpp14 cuda gpu hpc linear-algebra metaprogramming
Last synced: 30 Apr 2025
https://github.com/vovod/pytorch-who-is-that-pokemon
All 151 classes pokemon Gen1 classification with torchvision model.
cuda deep-learning image-classification pokemon python pytorch torchvision
Last synced: 20 Jun 2025
https://github.com/bio-phys/cadishi
Cadishi: CAlculation of DIStance HIstograms
astrophysics capriqorn correlation cuda distance-histogram distribution function gpgpu gpu high-performance histogram molecular-dynamics openmp openmp-parallelization orthorhombic periodic-box python rdf triclinic vectorization
Last synced: 16 Jan 2026
https://github.com/OMEGAMAX10/Face-Mask-Detection-Using-YOLOv4
Because of the COVID-19 pandemic of 2020, more and more people are concerned with protecting themselves using masks, thus the need of software capable of monitoring whether the people are wearing masks or not. That is why I created a Python application using OpenCV (with CUDA support) based on the YOLOv4 algorithm, capable of monitoring the safety level of a space with video surveillance.
computer-vision covid-19 cuda cuda-support face-mask-detection gui gui-application masks monitoring opencv pyqt5 python safety-level video-surveillance wearing-masks yolov4 yolov4-algorithm
Last synced: 21 Apr 2025
https://github.com/dehancer/dehancer-gpulib-cpp
C++ cross-platform gpu SDK
apple cpp cpplibrary cuda cuda-kernels linux macos metal metal-shader opencl opencl-kernels windows
Last synced: 21 Feb 2026
https://github.com/pinto0309/pytorch-build
Provide Docker build sequences of PyTorch for various environments.
Last synced: 07 May 2025
https://github.com/itzmeanjan/blake3
SYCL accelerated BLAKE3 Hash Implementation
avx2 avx512 binary-merklization blake3 cpu cryptographic-hash-functions cuda dpcpp gpu gpu-computing merkle-tree sycl
Last synced: 18 Aug 2025
https://github.com/imsanjoykb/cuda-bootcamp
CUDA Programming Practices
computer-vision crypto-mining crypto-mining-program cuda cuda-api cuda-development cuda-device cuda-driver cuda-kernels cuda-library cuda-opengl cuda-programming cuda-resource cuda-support cuda-toolkit jetson jetson-inference jetson-xavier nvidia-cuda nvidia-jetson-nano
Last synced: 05 Jul 2025
https://github.com/linonetwo/moss-dockerfile
用于在 Docker 里运行复旦的 MOSS 语言模型,使用 GradIO 提供 WebUI。
ai chatglm chatgpt cuda deeplearning docker gpu moss pytorch
Last synced: 12 Apr 2025
https://github.com/r-barnes/barnes2019-landscape
Landscape evolution models and graph processing on the GPU
Last synced: 15 Apr 2025
https://github.com/abhishekyana/cyclegans-pytorch
CycleGANs-PyTorch applied on Young to Old image converter.
cuda cyclegan faceapp gan python pytorch resnet tutorial-code young2old
Last synced: 16 Jul 2025
https://github.com/serengil/gpuutils
GpuUtils: A Simple Tool for GPU Analysis and Allocation
Last synced: 21 Aug 2025
https://github.com/evilfreelancer/docker-whisper-server
whisper.cpp HTTP transcription server with OpenAI-like API in Docker
api api-server asr cuda docker docker-compose dockerfile nvidia openai openai-api whisper whisper-cpp
Last synced: 23 Oct 2025
https://github.com/ivangabriele/docker-cuda-desktop
Ubuntu PyTorch CUDA Docker image with KDE Plasma Desktop & VNC. Ideal for LLM & Deep Learning remote work.
cuda d-bus dbus deep-learning desktop docker gpu large-language-models llm nvidia python pytorch remote-desktop server ubuntu ubuntu-desktop vnc vnc-server x11
Last synced: 07 Mar 2026
https://github.com/sparselinearalgebra/spbla
Sparse Boolean linear algebra for Nvidia Cuda, OpenCL and CPU computations
boolean-algebra cplusplus cuda graph-algorithms graphblas opencl python sparse-matrix suitesparse
Last synced: 02 Jan 2026
https://github.com/yujun-shi/cfmatting_cuda_mkl
A cuda & mkl implementation of closed-form matting
Last synced: 10 Apr 2025
https://github.com/triagemd/tensorflow-builds
Tensorflow binaries and Docker images compiled with GPU support and CPU optimizations.
bazel cuda cudnn docker gpu machine-learning nvidia python tensorflow tensorflow-serving
Last synced: 09 Jul 2025
https://github.com/uga-ssrl/SSRLCV
The UGA SSRL's Computer Vision Software Collection
3d-reconstruction computer-vision computer-vision-algorithms computervision cubesat cubesat-payload cubesatellite cuda gis gis-application jetson jetson-nano jetson-tx2 jetson-tx2i satellite-data
Last synced: 20 Mar 2025
https://github.com/yashassamaga/convolutionbuildingblocks
GEMM and Winograd based convolutions using CUTLASS
convolution cuda cutlass deep-learning
Last synced: 28 Jul 2025
https://github.com/mberr/torch-max-mem
Decorators for maximizing memory utilization with PyTorch & CUDA
Last synced: 30 Jul 2025
https://github.com/cggos/hpc
High-Performance Computing: CPU Instructions, GPU OpenCL & CUDA, etc. :sunny:
cuda heterogeneous-parallel-programming multi-threading neon opencl openmp simd sse
Last synced: 21 Mar 2025
https://github.com/lnstadrum/fastaugment
A handy data augmentation toolkit for image classification put in a single efficient TensorFlow/PyTorch op.
augmentation-transformations brightness-correction cuda cutout data-augmentation gamma-correction gpu mixup perspective-distortions tensorflow-op
Last synced: 23 Mar 2025
https://github.com/bkraad47/fat_llama
fat_llama is a Python package for upscaling audio files to FLAC or WAV formats using advanced audio processing techniques. It utilizes CUDA-accelerated calculations to enhance audio quality by upsampling and adding missing frequencies through FFT, resulting in richer and more detailed audio.
audio audio-engineering audio-processing audiophile cuda cufft cupy fft flac hi-res hpc mp3 music nvidia ogg parallel-computing physics upscaling wav
Last synced: 05 May 2025
https://github.com/bfrg/vim-cuda-syntax
CUDA syntax highlighting for Vim
cuda highlighting syntax vim vim-syntax
Last synced: 09 Apr 2025
https://github.com/cartersusi/pacman_cuda
[AUR][Pacman] Current Cuda compatibility with Tensorflow and Torch on Arch Linux
arch arch-linux archlinux aur compatibility cuda guide installer linux pacman script tensorflow torch
Last synced: 23 Apr 2025
https://github.com/d9d-project/d9d
d9d - d[istribute]d - distributed training framework based on PyTorch that tries to be efficient yet hackable
ai cuda distributed distributed-systems llm pytorch
Last synced: 14 Apr 2026
https://github.com/PINTO0309/Open3D-build
Provide Docker build sequences of Open3D for various environments.
cuda docker jetson jetson-nano open3d open3d-python pytorch tensorflow
Last synced: 20 Mar 2025