CUDA
CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.
- GitHub: https://github.com/topics/cuda
- Wikipedia: https://en.wikipedia.org/wiki/CUDA
- Created by: Nvidia
- Released: June 23, 2007
- Related Topics: nvcc,
- Last updated: 2026-06-20 00:07:16 UTC
- JSON Representation
https://github.com/phrb/gpu-autotuning
Autotuning NVCC Compiler Parameters, published @ CCPE Journal
autotuning cuda nvcc opentuner
Last synced: 06 Jul 2025
https://github.com/actypedef/mixedgemm
a mixed-precision gemm with quantize and reorder kernel.
cuda inference-acceleration llm mlsys quantization
Last synced: 15 Jun 2025
https://github.com/alessandrobessi/cuda-lab
Playing with CUDA and GPUs in Google Colab
cuda cuda-kernels gpu gpu-acceleration gpu-programming parallel-algorithm parallel-computing
Last synced: 15 Apr 2025
https://github.com/nglsg/uniapi
The Universal LLM Gateway - Integrate ANY AI Model with One Consistent API
ai ai-tools api-client api-integration api-wrapper chatbot cpp cross-platform cuda gpu-accelerated high-performance http-server inference-server language-model llm llm-integration openai-compatible rest-api universal-api
Last synced: 17 Jun 2025
https://github.com/ai-dock/python
Python docker images for use in GPU cloud and local environments. Includes AI-Dock base for authentication and improved user experience.
ai cuda docker machine-learning python rocm runpod vast
Last synced: 28 Aug 2025
https://github.com/pkestene/incremental-fluids-kokkos
Simple, single-file fluid solvers for learning purposes revisited with parallel programing (Kokkos: OpenMP / Cuda)
cfd cuda kokkos openmp parallel-programming
Last synced: 19 Aug 2025
https://github.com/101001000/tfg-pathtracer
CUDA Path tracing render engine, with MIS and the Disney BRDF
cuda pathtracing raytracing renderer
Last synced: 11 Apr 2025
https://github.com/adamdempsey90/fvm
My finite volume method project. Here I will implement the many pieces of a finite volume method to incorporate into a larger code.
c cfd cuda fvm gpu hydrodynamics
Last synced: 13 Apr 2025
https://github.com/previsionio/damavand
Damavand is a quantum circuit simulator. It can run on laptops or High Performance Computing architectures, such CPU distributed architectures or multi GPU distributed architectures.
cuda distributed-computing hpc multi-gpu multithreading quantum-computing rust simulator
Last synced: 19 Apr 2025
https://github.com/sashakolpakov/dire-rapids
DiRe accelerated by PyTorch, PyKeOps and cuVS
cuda cuda-kernels dimensionality-reduction pykeops pytorch rapidsai t-sne umap
Last synced: 05 Mar 2026
https://github.com/ctknight/fluidsimulator
A CUDA-accelerated SPH Fluid Simulator capable of simulating millions of particles in seconds
computer-animation computer-graphics cuda fluid-simulation hydrostatics simulation-engine
Last synced: 14 Apr 2025
https://github.com/neoblizz/hip_template
🖤 Template for starting HIP/C++ project using CMake with Github Action for CI.
cpp cuda cuda-programming gpgpu gpu hip rocm template-project template-repository
Last synced: 26 Mar 2025
https://github.com/yuhui-zh15/gpu-smart
Interactive Automatic GPU Manager
cuda cudnn deep-learning gpu manager neural-network nvidia pytorch tensorflow
Last synced: 22 Jun 2025
https://github.com/jundaf2/gpu-tensor-permute
permute sequence data on GPU with high bandwidth
cuda gpu-acceleration sequence-to-sequence
Last synced: 13 Apr 2025
https://github.com/dansarie/socracked
Performs key-recovery attacks on the SoDark family of algorithms.
cryptanalysis cryptography cuda hf-radio key-recovery
Last synced: 21 Feb 2026
https://github.com/insightsoftwareconsortium/itkvkfftbackend
VkFFT backends for ITK FFT classes.
cpp cuda fft hip insight-toolkit itk itk-module opencl python vulkan
Last synced: 14 Apr 2025
https://github.com/alexiii/grafen
A performance-effective program for gravity field calculation for layered ellipsoidal density model.
cuda earth-science geophysical-inversions geophysics gravity-field gravity-model inverse-problems
Last synced: 27 Jun 2025
https://github.com/ghost---shadow/near-duplicate-image-detector
CUDA implementation of some perceptual hashing algorithms
Last synced: 29 Oct 2025
https://github.com/aknvictor/culingam
CULiNGAM accelerates LiNGAM analysis on GPUs.
Last synced: 05 May 2025
https://github.com/raad-labs/raad-video
A high-performance video loading library for machine learning, designed for efficient training data preparation.
cuda machine-learning training-data
Last synced: 17 Oct 2025
https://github.com/thelolagemann/docker-gminer
a docker container for running gminer
cryptocurrency cryptocurrency-mining cryptomining cuckoocycle cuda docker dual-mining equihash ethash ethereum ethereum-miner ethereum-mining gminer kawpow miner mining nvidia progpow ton
Last synced: 17 Jan 2026
https://github.com/amsokol/tensorflow-windows-build-tutorial
Tutorial how to build and install TensorFlow GPU/CPU for Windows from source code using bazel
bazel build cuda gpu sources tensorflow windows
Last synced: 30 Jun 2025
https://github.com/harrism/nsys_easy
Easier, quicker command-line CUDA profiling
Last synced: 15 Oct 2025
https://github.com/prg-titech/kani-cuda
A program synthesizer for CUDA like GPGPU language
Last synced: 12 May 2025
https://github.com/bobbui/tensorflow-serving-cuda-docker
Docker image for tensorflow serving with Nvidia CUDA, CuDNN
cuda cudnn docker docker-image tensorflow tensorflow-serving ubuntu1604
Last synced: 09 Apr 2025
https://github.com/tgymnich/shallowwater.jl
🌊 Simple Finite Volumes models that solve the shallow water equations
cuda hpc julia shallow-water-equations simulation tsunami
Last synced: 13 Mar 2025
https://github.com/hevnsnt/collider
GPU-accelerated Bitcoin Puzzle solver using Pollard's Kangaroo algorithm. K=1.15 efficiency. CUDA + Metal.
bitcoin bitcoin-puzzle cryptocurrency cuda ecdlp gpu mining-pool open-source pollard-kangaroo secp256k1
Last synced: 21 May 2026
https://github.com/alpaka-group/bactria
Broadly Applicable C++ Tracing and Instrumentation API :camel:
cuda hardware-counters instrumentation-api metrics rocm tracing-events
Last synced: 21 Apr 2025
https://github.com/pkestene/kokkos-proj-tmpl
A minimal cmake based project skeleton for developping a kokkos application
cea cuda gpu kokkos openmp parallel-computing parallelization performance-portability
Last synced: 19 Aug 2025
https://github.com/elsa-lab/base-env
Basis of ELSA computational platform
cuda machine-learning server-utility ubuntu
Last synced: 14 Oct 2025
https://github.com/ikergarcia1996/matrix-benchmark
A cupy (GPU) / numpy benchmark to measure how fast different hardware can perform matrix operations.
benchmark cuda cupy embedding gpu matrix numpy python word-embeddings
Last synced: 05 Oct 2025
https://github.com/josonchan1998/opencv_install
Build OpenCV from sources with cuda in anaconda3
anaconda3 cuda opencv shell-script
Last synced: 12 Oct 2025
https://github.com/zephirfxec/hnanosolver
Houdini GPU Fluid Solver powered by NanoVDB
cpp cuda fluid-dynamics houdini nanovdb openvdb
Last synced: 05 May 2025
https://github.com/lebedov/cudamps
Python interface to CUDA Multi-Process Service
Last synced: 02 Mar 2026
https://github.com/alankrantas/tensorflow-cuda-gpu-devcontainer
Tensorflow CUDA DevContainer Configuration for Supporting NVIDIA GPU
cuda cudnn deep-learning devcontainer gpu-acceleration keras machine-learning nvidia nvidia-gpu tensorflow
Last synced: 27 Apr 2025
https://github.com/nikelborm/amd-amdgpu-rocm-ollama-gfx90c-ati-radeon-vega-ryzen7-5800h-arch-linux
Run Ollama on AMD Ryzen 7 5800H CPU with integrated GPU AMD ATI Radeon Vega (gfx90c) with optimizations
amd amd-gpu amdgpu archlinux avx2 bash bash-scripting cuda linux llama llama3 llm ollama oneapi radeon rocm ssse3 vega
Last synced: 30 Apr 2025
https://github.com/ktaletsk/gpu_dsm
🔗Accessible quantitative polymer rheology predictions with slip-links on GPU
c-plus-plus cuda gpu polymer rheology
Last synced: 10 Sep 2025
https://github.com/NAGAGroup/Scalix
Scalix is a data parallel compute library that automatically scales to the available compute resources.
Last synced: 01 Apr 2025
https://github.com/jacobtomlinson/advent-of-gpu-code-2020
Solutions for Advent of Code 2020 written for the GPU in Python
advent-of-code cuda gpu jupyter-notebooks numba python
Last synced: 25 Mar 2025
https://github.com/alesiong/template-matching
Simple template matching by GPU (CUDA)
computer-vision cuda template-matching
Last synced: 30 Apr 2025
https://github.com/dendenxu/bvh-ray-tracing
CUDA Ray Tracing using BVH. Forked and modified from https://github.com/YuliangXiu/bvh-distance-queries
bvh cuda pytorch ray-tracing ray-triangle-intersection
Last synced: 28 Jul 2025
https://github.com/tillahoffmann/universal_tensorflow_image
Develop tensorflow models with or without a GPU accelerator using the same Docker image. 🥳
Last synced: 12 Jul 2025
https://github.com/dusanerdeljan/stereo-depth
Bachelor thesis - GPU accelerated single view passive stereo depth estimation pipeline
convolutional-neural-networks cuda depth-estimation pytorch real-time stereo-matching stereo-vision
Last synced: 28 Oct 2025
https://github.com/lynncoleart/guda
A High-Performance CPU-Based CUDA-Compatible Linear Algebra Library
ai blas cuda inference llm-inference
Last synced: 04 Mar 2026
https://github.com/neka-nat/cuimage
Rust implementation of image processing library with CUDA
Last synced: 13 Apr 2025
https://github.com/jcbritobr/nvml-csharp
Nvml( nvidia monitoring library) wrapper for c#.
csharp cuda gpu library monitoring nvidia nvml
Last synced: 06 Apr 2025
https://github.com/enp1s0/shgemm
Fast multiplication of single-precision and half-precision matrices on Tensor Cores
Last synced: 31 Jul 2025
https://github.com/aniketsingh03/processing-history-of-images
:bulb: Detecting processing history of images by using Deep Learning
cuda deep-learning image-forensics matlab python3 pytorch
Last synced: 14 Jul 2025
https://github.com/mnicely/computeworks_examples
Matrix multiplication example performed with OpenMP, OpenACC, BLAS, cuBLABS, and CUDA
blas cublas cuda docker eclipse-plugin nsight nvidia nvidia-docker openacc openmp pgi-compiler
Last synced: 14 Apr 2025
https://github.com/wi2trier/gpu-server
System configuration for a CUDA-based GPU server using Nix
cuda gpu nix server system-config ubuntu
Last synced: 17 Jan 2026
https://github.com/bryanoliveira/cellular-automata
A cellular automata program built with C++, OpenGL, CUDA and OpenMP.
cellular-automata cuda life opengl openmp
Last synced: 08 Sep 2025
https://github.com/elftausend/nvjpeg-rs
Rust bindings to the nvJPEG library.
bindings cuda ffi ffi-bindings ffi-wrapper image-processing jpg nvjpeg rust rust-lang
Last synced: 11 Apr 2025
https://github.com/elftausend/gradients
Deep Learning library written in Rust (OpenCL, CUDA & CPU)
cpu cuda deep-learning gpu gpu-acceleration machine-learning mlp neural-networks opencl rust
Last synced: 11 Apr 2025
https://github.com/drsnowbird/cuda-pytorch-docker
Nvidia CUDA for GPU + PyTorch (latest) in Docker
cuda deep-learning docker gpu jupyter-notebook nvidia-gpu pytorch ssl-proxy
Last synced: 10 Apr 2025
https://github.com/hejia-zhang/libwave
C++ library for hardware-accelerated video stream decoding
cuda ffmpeg gpu video-decoding video-streaming
Last synced: 15 Apr 2025
https://github.com/kerneltuner/kernel_float
CUDA/HIP header-only library writing vectorized and low-precision (16 bit, 8 bit) GPU kernels
bfloat16 cpp cuda floating-point gpu half-precision header-only-library hip kernel-tuner low-precision mixed-precision performance reduced-precision vectorization
Last synced: 12 Apr 2025
https://github.com/joaomlneto/cpds-heat
Heat Equation using different solvers (Jacobi, Red-Black, Gaussian) in C using different paradigms (sequential, OpenMP, MPI, CUDA) - Assignments for the Concurrent, Parallel and Distributed Systems course @ UPC 2013
cuda cuda-support gauss-seidel gaussian heat-equation jacobi mpi mpi-applications openmp openmp-applications openmp-parallelization openmp-support openmpi paradigms performance red-black solvers
Last synced: 22 Apr 2025
https://github.com/microsoft/hat
TOML-annotated C header file format for packaging binary files, from Microsoft Research
benchmarking cpp cprogramming cuda metadata platform-independent python-library rocm toml
Last synced: 10 Apr 2025
https://github.com/guangyancai/isoext
GPU isosurface extraction with pytorch support.
cuda dual-contouring isosurface-extraction marching-cubes nanobind python pytorch thrust
Last synced: 23 Apr 2025
https://github.com/qengineering/tensorflow-addons-jetson-nano
TensorFlow Addons installation wheels for Jetson Nano
aarch64 cuda cudnn installation-wheel jetson-nano linux python3 tensorflow-addons wheel
Last synced: 10 Jun 2025
https://github.com/tcoppex/cudaraster-linux
Linux port of cudaraster, Nvidia's GPU rasterizer.
Last synced: 14 Apr 2025
https://github.com/tigercosmos/simple-vgg16-cu
Simple VGG16 implemented in CUDA
Last synced: 24 Jul 2025
https://github.com/nyo16/llama_cpp_ex
Elixir bindings for llama.cpp — run LLMs locally with Metal, CUDA, Vulkan, or CPU. Streaming, chat templates, embeddings, structured output, and concurrent batched inference.
Last synced: 04 Jun 2026
https://github.com/neoheartbeats/neoheartbeats-kernel
An architecture for LLMs' continual-learning and long-term memories
cuda fine-tuning llama-factory llm
Last synced: 05 May 2025
https://github.com/basemax/predictionwikipediamathematicsvisitsresearch
Improving Prediction of Daily Visits of Wikipedia Mathematics Topics using Graph Neural Networks
article convgru cuda gconvgru graph-neural-network graph-neural-networks neural-network neural-network-architectures neural-network-example neural-network-tutorials neural-networks python pytorch research torch wikipedia
Last synced: 05 May 2025
https://github.com/raymondcm/blockmatching
CPU and CUDA implementation of Full Exhaustive Block Matching Algorithm using Integral Images
block-matching-algorithm cuda integral-image parallel vision
Last synced: 27 Apr 2025
https://github.com/aresio/cupsoda
cupSODA is CUDA-powered coarse-grain deterministic simulator of mass-action kinetics models
biochemical cuda gpu-computing mass-action simulation
Last synced: 21 Feb 2026
https://github.com/nexusgpu/tensor-fusion-site
TensorFusion landing page and product docs
ai cuda gpu gpu-acceleration gpu-management gpu-monitoring gpu-pooling gpu-sharing gpu-usage gpu-virtualization nvidia nvidia-cuda pytorch rcuda tensorflow
Last synced: 31 Jul 2025
https://github.com/frgfm/torch-cuda-template
Template for CUDA / C++ extension writing with PyTorch
cpp cuda pytorch pytorch-extension
Last synced: 31 Jul 2025
https://github.com/NCAR/micm
A model-independent chemistry module for atmosphere models
atmospheric-chemistry atmospheric-modeling atmospheric-science cuda gpu gpu-acceleration hpc ode-solver
Last synced: 20 Jul 2025
https://github.com/Eve-ning/glcm-cupy
GLCM in CUDA
computer-vision cuda cupy feature-engineering glcm python
Last synced: 15 Mar 2025
https://github.com/chrxh/alien-docs
Documentation for ALIEN
cuda evolution physics-simulation simulation
Last synced: 24 Jun 2025
https://github.com/rmiguelkelly/quickcluster
A KMeans implemented in C++ with Python bindings and GPU acceleration
clustering clustering-algorithm cpp cuda gpu kmeans kmeans-clustering metal objective-c python python3 unsupervised-learning
Last synced: 26 Jul 2025
https://github.com/abus-aikorea/aria-coversong
The best gradio web-ui for creating cover song that uses mdx-net and rvc. Easy one click installation. Fully portable.
cuda demucs gradio karaoke mdx-net nvidia python pytorch rvc song-covers uvr vocal-remover voice-conversion
Last synced: 25 Apr 2025
https://github.com/benediktalkin/kappaprofiler
lightweight simple profiling for python/pytorch
Last synced: 19 Jul 2025
https://github.com/bhattbhavesh91/cudf-rapids-demo
A simple demo of cuDF which is a RAPIDS GPU-Accelerated Dataframe Library!
arrow cuda cudf demo gpu gpu-dataframe pandas python rapids
Last synced: 17 Apr 2025
https://github.com/p-ranav/vulkan-earth
Vulkan-based 3D Rendering of Earth
3d cuda engine gpu rendering simulation vulkan
Last synced: 05 May 2025
https://github.com/marcogarlet/cuda_cubeattack
CUDA implementation of Cube Attack
Last synced: 28 Oct 2025
https://github.com/kabir5296/deep-learning-setup-for-ubuntu-guide
CUDA, CuDNN, NVIDIA Driver, and PyTorch Installation for Ubuntu
cuda cudnn deeplearning nlp python pytorch
Last synced: 15 Mar 2025
https://github.com/jaisw7/dgfs1d_gpu
Discontinuous Galerkin Fast Spectral (DGFS) in one dimension
boltzmann computational-fluid-dynamics cuda dgfs diffusion-process discontinuous-galerkin fast-spectral flow-transport gas-dynamics gpu-computing heat-transfer massively-parallel multi-species python3
Last synced: 02 Feb 2026
https://github.com/yomi4486/zundamon_v3
マスター、お冷ショットで。
cuda discord-bot discord-py docker docker-compose python tts voicevox zundamon
Last synced: 14 Apr 2025
https://github.com/stonerlab/jams
JAMS: a GPU accelerated atomistic spin dynamics code
c-plus-plus cuda heisenberg-model leeds-university magnetism physics-simulation simulation spin-dynamics
Last synced: 28 Feb 2026
https://github.com/radenmuaz/slope-ad
A small automatic differentiation engine, supporting higher-order derivatives
array autograd automatic-differentiation cuda gradient iree jvp machine-learning metal mlir onnx onnxruntime tensor vjp
Last synced: 26 Jun 2025
https://github.com/hrntsm/ghgpucomputingtest
Test using CUDA with Alea GPU in grasshopper.
Last synced: 14 Apr 2025