CUDA

CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.
- GitHub: https://github.com/topics/cuda
- Wikipedia: https://en.wikipedia.org/wiki/CUDA
- Created by: Nvidia
- Released: June 23, 2007
- Related Topics: nvcc,
- Last updated: 2025-04-27 00:06:26 UTC
- JSON Representation
https://github.com/cybercongress/go-cyber
Your 🔵 Superintelligence
ai blockchain computation-graphs cosmos cosmos-sdk cuda cyber cyber-rank fuckgoogle great-web ipfs knowledge-graph protocol search search-engine soft3 supercomputer tendermint universe-mirror web3
Last synced: 07 Apr 2025
https://github.com/cybercongress/cyberd
Your 🔵 Superintelligence
ai blockchain computation-graphs cosmos cosmos-sdk cuda cyber cyber-rank fuckgoogle great-web ipfs knowledge-graph protocol search search-engine soft3 supercomputer tendermint universe-mirror web3
Last synced: 04 Feb 2025
https://github.com/harrism/hemi
Simple utilities to enable code reuse and portability between CUDA C/C++ and standard C/C++.
c-plus-plus cuda cuda-device cuda-kernels gpu hemi
Last synced: 06 Apr 2025
https://github.com/IBM/aihwkit
IBM Analog Hardware Acceleration Kit
ai analog-devices cuda neural-networks pytorch
Last synced: 17 Nov 2024
https://github.com/omlins/parallelstencil.jl
Package for writing high-level code for parallel high-performance stencil computations that can be deployed on both GPUs and CPUs
cuda gpu julia multi-gpu multi-xpu parallel staggered-grids stencil stencil-codes xpu
Last synced: 11 Apr 2025
https://github.com/omlins/ParallelStencil.jl
Package for writing high-level code for parallel high-performance stencil computations that can be deployed on both GPUs and CPUs
cuda gpu julia multi-gpu multi-xpu parallel staggered-grids stencil stencil-codes xpu
Last synced: 27 Mar 2025
https://github.com/UoB-HPC/BabelStream
STREAM, for lots of devices written in many programming models
benchmark cuda gpgpu gpu hpc kokkos memory-bandwidth openacc opencl openmp parallel-processing raja sycl
Last synced: 21 Apr 2025
https://github.com/agenium-scale/nsimd
Agenium Scale vectorization library for CPUs and GPUs
aarch64 avx avx2 avx512 cpp20 cpp20-library cuda hpc neon neon128 rocm simd simd-instructions simd-library simd-programming sse2 sse42 sve vectorization-library
Last synced: 09 Apr 2025
https://github.com/lmnt-com/haste
Haste: a fast, simple, and open RNN library
algorithm api cpp cuda deep-learning gru lstm machine-learning python pytorch rnn rnn-implementations rnn-layers tensorflow
Last synced: 04 Apr 2025
https://github.com/a2flo/floor
A C++ Compute/Graphics Library and Toolchain enabling same-source CUDA/Host/Metal/OpenCL/Vulkan C++ programming and execution.
c-plus-plus compiler compute cuda graphics ios linux macos metal opencl openxr rendering spir spir-v virtual-reality vulkan windows
Last synced: 12 Apr 2025
https://github.com/kerneltuner/kernel_tuner
Kernel Tuner
auto-tuning autotuning c cplusplus cuda cuda-kernels gpu gpu-computing kernel-tuner machine-learning opencl opencl-kernels optimization python software-development testing
Last synced: 14 Apr 2025
https://github.com/knightcrawler25/optix-pathtracer
Simple physically based path tracer based on Nvidia's Optix Ray Tracing Engine
brdf cuda disney gpu optix pathtracing raytracing
Last synced: 07 Apr 2025
https://github.com/nvidia/cuda-checkpoint
CUDA checkpoint and restore utility
Last synced: 12 Apr 2025
https://github.com/rkinas/triton-resources
A curated list of resources for learning and exploring Triton, OpenAI's programming language for writing efficient GPU code.
Last synced: 12 Apr 2025
https://github.com/QMCPACK/qmcpack
Main repository for QMCPACK, an open-source production level many-body ab initio Quantum Monte Carlo code for computing the electronic structure of atoms, molecules, and solids with full performance portable GPU support
c-plus-plus cuda electronic-structure gpu high-performance-computing hpc mpi oneapi quantum-chemistry quantum-monte-carlo rocm
Last synced: 26 Mar 2025
https://github.com/zjhellofss/kuiperllama
校招、秋招、春招、实习好项目,带你从零动手实现支持LLama2/3和Qwen2.5的大模型推理框架。
cpp cuda inference-engine llama2 llama3 llm llm-inference qwen qwen2
Last synced: 08 Apr 2025
https://github.com/charles-r-earp/autograph
A machine learning library for Rust.
cuda machine-learning neural-networks rust
Last synced: 19 Nov 2024
https://github.com/lattice/quda
QUDA is a library for performing calculations in lattice QCD on GPUs.
c c-plus-plus cuda gpu mpi multi-gpu qcd
Last synced: 08 Apr 2025
https://github.com/favreau/Sol-R
Open-Source CUDA/OpenCL Speed Of Light Ray-tracer
3d 3d-graphics-engine cuda gpgpu gpu-acceleration gpu-computing graphics-engine interactive opencl path-tracing pathtracing ray-tracing raytracer raytracing raytracing-engine realtime-rendering rendering science virtual-reality vr
Last synced: 12 Nov 2024
https://github.com/gezp/docker-ubuntu-desktop
Docker Image for Ubuntu Desktop which support HW GPU accelerated GUI apps. you can access the Container with ssh or remote desktop, just like Cloud VM.
cuda docker kasmvnc nomachine nvidia-gpu opengl remote-desktop ubuntu virtualgl
Last synced: 13 Apr 2025
https://github.com/Bruce-Lee-LY/cuda_hgemm
Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.
cublas cuda gemm gpu hgemm matrix-multiply nvidia tensor-core
Last synced: 19 Nov 2024
https://github.com/sekwiatkowski/Komputation
Komputation is a neural network framework for the Java Virtual Machine written in Kotlin and CUDA C.
artificial-intelligence convolutional-neural-networks cuda framework gpu jvm kotlin machine-learning neural-networks nlp nvidia recurrent-neural-networks seq2seq
Last synced: 01 Apr 2025
https://github.com/pcb9382/FaceAlgorithm
face detection face recognition包含人脸检测(retinaface,yolov5face,yolov7face,yolov8face),人脸检测跟踪(ByteTracker),人脸角度计算(Face_Angle)人脸矫正(Face_Aligner),人脸识别(Arcface),口罩检测(MaskRecognitiion),年龄性别检测(Gender_age),静默活体检测(Silent_Face_Anti_Spoofing),FaceAlignment(106keypoints)
cuda face-alignment face-detection face-recognition tensorrt yolov5face yolov7face yolov8face
Last synced: 18 Mar 2025
https://github.com/nvidia-genomics-research/genomeworks
SDK for GPU accelerated genome assembly and analysis
alignment cuda genomics gpu mapping nvidia partial-order-alignment poa python-api
Last synced: 05 Apr 2025
https://github.com/clara-parabricks/GenomeWorks
SDK for GPU accelerated genome assembly and analysis
alignment cuda genomics gpu mapping nvidia partial-order-alignment poa python-api
Last synced: 26 Dec 2024
https://github.com/NVIDIA-Genomics-Research/GenomeWorks
SDK for GPU accelerated genome assembly and analysis
alignment cuda genomics gpu mapping nvidia partial-order-alignment poa python-api
Last synced: 15 Nov 2024
https://github.com/GoodAI/BrainSimulator
Brain Simulator is a platform for visual prototyping of artificial intelligence architectures.
ai brain-simulator cuda machine-learning
Last synced: 20 Nov 2024
https://github.com/JuliaGPU/CuArrays.jl
A Curious Cumulation of CUDA Cuisine
Last synced: 29 Nov 2024
https://github.com/andrewkchan/yalm
Yet Another Language Model: LLM inference in C++/CUDA, no libraries except for I/O
cpp cuda inference-engine llama llamacpp llm llm-inference machine-learning mistral
Last synced: 12 Apr 2025
https://github.com/rentainhe/pytorch-distributed-training
Simple tutorials on Pytorch DDP training
apex cuda ddp-training deep-learning pytorch
Last synced: 09 Apr 2025
https://github.com/LLNL/blt
A streamlined CMake build system foundation for developing HPC software
blt build-system build-tools cmake cpp cuda hpc radiuss testing
Last synced: 21 Apr 2025
https://github.com/marian-nmt/marian-dev
Fast Neural Machine Translation in C++ - development repository
cpp11 cuda fast gpu-acceleration neural-machine-translation
Last synced: 14 Apr 2025
https://github.com/llnl/blt
A streamlined CMake build system foundation for developing HPC software
blt build-system build-tools cmake cpp cuda hpc radiuss testing
Last synced: 08 Apr 2025
https://github.com/koide3/gtsam_points
A collection of GTSAM factors and optimizers for point cloud SLAM
bundle-adjustment continuous-time cuda factor-graph gpu gtsam kdtree localization mapping point-cloud registration slam voxelmap
Last synced: 12 Apr 2025
https://github.com/trinkle23897/fast-poisson-image-editing
A fast poisson image editing implementation that can utilize multi-core CPU or GPU to handle a high-resolution image input.
cpp cuda high-performance-computing image-processing jacobi-iteration jacobi-method mpi numpy openmp parallel-computing poisson-image-editing pybind11 python
Last synced: 05 Apr 2025
https://github.com/Trinkle23897/Fast-Poisson-Image-Editing
A fast poisson image editing implementation that can utilize multi-core CPU or GPU to handle a high-resolution image input.
cpp cuda high-performance-computing image-processing jacobi-iteration jacobi-method mpi numpy openmp parallel-computing poisson-image-editing pybind11 python
Last synced: 02 Apr 2025
https://github.com/bwohlberg/sporco
Sparse Optimisation Research Code
admm convolutional-dictionary-learning convolutional-sparse-coding cuda dictionary-learning fista optimization optimization-algorithms plug-and-play-priors python robust-pca sparse-coding sparse-representations sparsity total-variation total-variation-minimization
Last synced: 01 Apr 2025
https://github.com/tumaer/JAXFLUIDS
Differentiable Fluid Dynamics Package
automatic-differentiation cfd compressible-flows computational-fluid-dynamics cuda deep-learning fluid-dynamics gpu gpu-computing high-performance hpc jax jaxfluids machine-learning multi-phase-flows tpu turbulence
Last synced: 11 Feb 2025
https://github.com/owensgroup/RXMesh
GPU-accelerated triangle mesh processing
3d 3d-graphics cuda data-structure geometry geometry-processing gpu mesh mesh-processing parallel-computing surface-mesh
Last synced: 25 Apr 2025
https://github.com/zjhellofss/KuiperLLama
校招、秋招、春招、实习好项目,带你从零动手实现支持LLama2/3和Qwen2.5的大模型推理框架。
cpp cuda inference-engine llama2 llama3 llm llm-inference qwen qwen2
Last synced: 03 Jan 2025
https://github.com/slicer/light-the-torch
Install PyTorch distributions with computation backend auto-detection
Last synced: 07 Apr 2025
https://github.com/pmeier/light-the-torch
Install PyTorch distributions with computation backend auto-detection
Last synced: 24 Mar 2025
https://github.com/asmirnou/watsor
Object detection for video surveillance
camera coral cuda detection ffmpeg gpu hardware-acceleration homeassistant ip mpegts mqtt person-detector python realtime stream surveillance tensorrt tensrflow video zones
Last synced: 05 Apr 2025
https://github.com/modelscope/dash-infer
DashInfer is a native LLM inference engine aiming to deliver industry-leading performance atop various hardware architectures, including CUDA, x86 and ARMv9.
cpu cuda guided-decoding llm llm-inference native-engine
Last synced: 12 Apr 2025
https://github.com/AmusementClub/vs-mlrt
Efficient CPU/GPU/Vulkan ML Runtimes for VapourSynth (with built-in support for waifu2x, DPIR, RealESRGANv2/v3, Real-CUGAN, RIFE, SCUNet and more!)
artificial-intelligence cuda deep-learning directml dpir gpu migraphx ncnn neural-network onnx onnxruntime openvino real-cugan real-esrgan rife tensorrt vapoursynth vulkan waifu2x
Last synced: 24 Mar 2025
https://github.com/ritchieng/dlami
A Deep Learning Amazon Web Service (AWS) AMI that is open, free and works. Run in less than 5 minutes. TensorFlow, Keras, PyTorch, Theano, MXNet, CNTK, Caffe and all dependencies.
ami aws cuda cudnn5 keras python tensorflow ubuntu
Last synced: 10 Feb 2025
https://github.com/matteo-ronchetti/torch-radon
Computational Tomography in PyTorch
cuda hacktoberfest inverse-problems pytorch radon-transform shearlet-transform tomography
Last synced: 15 Apr 2025
https://github.com/shapelets/khiva
An open-source library of algorithms to analyse time series in GPU and CPU.
clustering cpp cuda data-series discords distances gpu khiva kshape matrix-profile motifs multicore opencl shapelets snippets time-series timeseries
Last synced: 27 Dec 2024
https://github.com/zjin-lcf/HeCBench
benchmark cuda gpu-computing hip hpc-applications openmp scientific-computing sycl test-driven-development
Last synced: 04 Apr 2025
https://github.com/opendilab/di-hpc
OpenDILab RL HPC OP Lib, including CUDA and Triton kernel
cuda hpc lstm pytorch reinforcement-learning triton
Last synced: 09 Apr 2025
https://github.com/marnovo/macos-egpu-cuda-guide
Set up CUDA for machine learning (and gaming) on macOS using a NVIDIA eGPU
apple cuda deep-learning egpu gaming gpu guide hacktoberfest mac machine-learning macos nvidia
Last synced: 19 Dec 2024
https://github.com/marnovo/macOS-eGPU-CUDA-guide
Set up CUDA for machine learning (and gaming) on macOS using a NVIDIA eGPU
apple cuda deep-learning egpu gaming gpu guide hacktoberfest mac machine-learning macos nvidia
Last synced: 22 Nov 2024
https://github.com/wangzyon/NVIDIA_SGEMM_PRACTICE
Step-by-step optimization of CUDA SGEMM
Last synced: 04 Apr 2025
https://github.com/Hellisotherpeople/CX_DB8
a contextual, biasable, word-or-sentence-or-paragraph extractive summarizer powered by the latest in text embeddings (Bert, Universal Sentence Encoder, Flair)
contextual-summarization cuda debate-evidence embeddings extractive-summarization flair python semantic-search semantic-summarization summarization summarizer token-level-summarization universal-sentence-encoder
Last synced: 22 Nov 2024
https://github.com/ceed/libceed
CEED Library: Code for Efficient Extensible Discretizations
api ceed cuda ecp exascale-computing gpu high-order high-performance-computing hpc julia linear-algebra
Last synced: 14 Apr 2025
https://github.com/bh107/bohrium
Automatic parallelization of Python/NumPy, C, and C++ codes on Linux and MacOSX
cuda gpu gpu-acceleration multi-core numpy opencl parallel-computing
Last synced: 12 Nov 2024
https://github.com/bytedance/abq-llm
An acceleration library that supports arbitrary bit-width combinatorial quantization operations
cuda llm-inference mlsys quantized-networks research
Last synced: 04 Apr 2025
https://github.com/llnl/hiop
HPC solver for nonlinear optimization problems
acopf bfgs constrained-optimization cuda gpu-support hpc interior-point-method interior-point-optimizer math-physics mpi nonlinear-optimization nonlinear-programming nonlinear-programming-algorithms nonsmooth-optimization optimization parallel-programming quasi-newton radiuss rocm solver
Last synced: 09 Apr 2025
https://github.com/1ytic/warp-rnnt
CUDA-Warp RNN-Transducer
cuda forward-backward pytorch rnn-transducer tensorflow warp
Last synced: 05 Apr 2025
https://github.com/demoriarty/torchpq
Approximate nearest neighbor search with product quantization on GPU in pytorch and cuda
cuda nearest-neighbor-search pytorch
Last synced: 05 Apr 2025
https://github.com/openucx/ucc
Unified Collective Communication Library
collectives cuda deep-learning hpc infiniband mpi openshmem pgas pytorch roce sharp
Last synced: 08 Apr 2025
https://github.com/DeMoriarty/TorchPQ
Approximate nearest neighbor search with product quantization on GPU in pytorch and cuda
cuda nearest-neighbor-search pytorch
Last synced: 01 Apr 2025
https://github.com/helmut-hoffer-von-ankershoffen/jetson
Helmut Hoffer von Ankershoffen experimenting with arm64 based NVIDIA Jetson (Nano and AGX Xavier) edge devices running Kubernetes (K8s) for machine learning (ML) including Jupyter Notebooks, TensorFlow Training and TensorFlow Serving using CUDA for smart IoT.
ansible archiconda cuda docker edge-devices hoffer-von-ankershoffen jupyter k8s kubeflow kubernetes kustomize machine-learning ml nvidia-jetson-nano nvidia-jetson-xavier skaffold smart-iot software-engineering tensorflow-serving virtualbox
Last synced: 14 Apr 2025
https://github.com/CEED/libCEED
CEED Library: Code for Efficient Extensible Discretizations
api ceed cuda ecp exascale-computing gpu high-order high-performance-computing hpc julia linear-algebra
Last synced: 14 Nov 2024
https://github.com/nvidia/dl4agx
Deep Learning tools and applications for NVIDIA AGX platforms.
autonomous-driving computer-vision cuda deep-learning drive-agx embedded
Last synced: 12 Apr 2025
https://github.com/mkeeter/mpr
Reference implementation for "Massively Parallel Rendering of Complex Closed-Form Implicit Surfaces" (SIGGRAPH 2020)
cad cuda gpu implicit-surfaces rendering
Last synced: 16 Mar 2025
https://github.com/rapidsai/node
GPU-accelerated data science and visualization in node
cuda data-science data-visualization gpgpu gpu nodejs
Last synced: 08 Apr 2025
https://github.com/supranational/sppark
Zero-knowledge template library
bls12-377 bls12-381 cuda ntt pasta-curves rocm zero-knowledge zero-knowledge-proofs zk-snarks zk-starks
Last synced: 12 Apr 2025
https://github.com/nobuyuki83/delfem2
Research prototyping framework for physics simulation written in C++
cuda fem-simulation finite-element-methods geometry-processing opengl physics-simulation simulation
Last synced: 06 Apr 2025
https://github.com/dividiti/ck-caffe
Collective Knowledge workflow for Caffe to automate installation across diverse platforms and to collaboratively evaluate and optimize Caffe-based workloads across diverse hardware, software and data sets (compilers, libraries, tools, models, inputs):
accuracy android caffe collaborative-optimization collective-knowledge costs cuda customizable-workflows dnn-as-a-service dnn-optimization json-api linux opencl performance-portability portable-package-manager reproducible-experiments resources windows
Last synced: 13 Nov 2024
https://github.com/msminhas93/nviwatch
NviWatch: A blazingly fast rust based TUI for managing and monitoring NVIDIA GPU processes
bash command-line-tool cuda deeplearning gpu gpu-monitoring linux monitoring nvidia nvidia-smi nvml performant process-monitoring ratatui resource-monitoring rust terminal top tui ubuntu
Last synced: 09 Apr 2025
https://github.com/zhongkaifu/seq2seqsharp
Seq2SeqSharp is a tensor based fast & flexible deep neural network framework written by .NET (C#). It has many highlighted features, such as automatic differentiation, different network types (Transformer, LSTM, BiLSTM and so on), multi-GPUs supported, cross-platforms (Windows, Linux, x86, x64, ARM), multimodal model for text and images and so on.
attention-model cuda deep-learning encoder-decoder gpu image lstm machine-translation neural-network seq2seq sequence-to-sequence tensor text transformer transformer-architecture transformer-encoder translation vision-transformer
Last synced: 04 Apr 2025
https://github.com/unitaryfoundation/qrack
Comprehensive, GPU accelerated framework for developing universal virtual quantum processors
cuda distributed-quantum-computing gpu hpc integrated-graphics intel-hd-graphics near-clifford opencl physics physics-simulation quantum quantum-computer-simulator quantum-computing quantum-information quantum-simulator qubits
Last synced: 02 Apr 2025
https://github.com/NVIDIA/GMAT
A toolkit showing GPU's all-round capability in video processing
codec cpp cuda deep-learning ffmpeg gpu image-processing nvidia video video-codec
Last synced: 04 Apr 2025
https://github.com/uncomplicate/clojurecuda
Clojure library for CUDA development
clojure clojure-library cuda cuda-development gpu-acceleration gpu-computing high-performance java
Last synced: 08 Apr 2025
https://github.com/toruniina/lbvh
an implementation of parallel linear BVH (LBVH) on GPU
bvh cuda gpu nearest-neighbor-search parallel thrust
Last synced: 20 Dec 2024
https://github.com/HMUNACHI/henry-vjp
From zero to hero CUDA for accelerating maths and machine learning on GPU.
cuda cuda-kernels cuda-programming machine-learning maths
Last synced: 05 Apr 2025
https://github.com/hmunachi/henry-vjp
From zero to hero CUDA for accelerating maths and machine learning on GPU.
cuda cuda-kernels cuda-programming machine-learning maths
Last synced: 08 Apr 2025
https://github.com/HMUNACHI/CUDATutorials
Zero to Hero GPU and CUDA for Maths & ML tutorials with examples.
cuda cuda-kernels cuda-programming machine-learning maths
Last synced: 24 Apr 2025
https://github.com/nvidia/gmat
A toolkit showing GPU's all-round capability in video processing
codec cpp cuda deep-learning ffmpeg gpu image-processing nvidia video video-codec
Last synced: 10 Feb 2025
https://github.com/lxxue/frnn
Fixed Radius Nearest Neighbor Search on GPU
cuda nearest-neighbor-search pytorch
Last synced: 03 Apr 2025
https://github.com/zpzim/scamp
The fastest way to compute matrix profiles on CPU and GPU!
cuda gpu matrix-profile python time-series time-series-analysis
Last synced: 05 Apr 2025
https://github.com/eth-cscs/cosma
Distributed Communication-Optimal Matrix-Matrix Multiplication Algorithm
communication-optimal cuda gpu-acceleration linear-algebra matmul matrix-multiplication mpi pdgemm rocm scalapack
Last synced: 04 Apr 2025
https://github.com/pykeio/diffusers
A modular Rust library for super fast Stable Diffusion inference - 45% faster than PyTorch 🔮
cuda diffusion-models onnx onnxruntime onnxruntime-gpu rust stable-diffusion stable-diffusion-v2
Last synced: 28 Mar 2025
https://github.com/yilingqiao/dmrf
Dynamic Mesh-Aware Radiance Fields (ICCV2023): Raytracing rendering and interactive simulating mesh with NeRF
cuda nerf raytracing simulation
Last synced: 09 Apr 2025
https://github.com/hmunachi/cuda-repo
From zero to hero CUDA for accelerating maths and machine learning on GPU.
cuda cuda-kernels cuda-programming machine-learning maths
Last synced: 10 Feb 2025
https://github.com/primitiv/primitiv
A Neural Network Toolkit.
cmake cpp cuda deep-learning framework gpu neural-network opencl
Last synced: 10 Apr 2025
https://github.com/cnugteren/cltune
CLTune: An automatic OpenCL & CUDA kernel tuner
Last synced: 19 Dec 2024
https://github.com/cuMF/cumf_als
CUDA Matrix Factorization Library with Alternating Least Square (ALS)
als cuda gpu machine machine-learning matrix-factorization
Last synced: 13 Nov 2024
https://github.com/sjtu-ipads/phoenixos
Fast OS-level support for GPU checkpoint and restore
checkpoint-restore criu cuda gpu
Last synced: 05 Apr 2025
https://github.com/HMUNACHI/cuda-repo
From zero to hero CUDA for accelerating maths and machine learning on GPU.
cuda cuda-kernels cuda-programming machine-learning maths
Last synced: 12 Nov 2024
https://github.com/jimver/cuda-toolkit
GitHub Action to install CUDA
action cuda cuda-toolkit github-actions nvidia nvidia-cuda
Last synced: 14 Apr 2025