An open API service indexing awesome lists of open source software.

CUDA

CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.

https://github.com/arbor-sim/arbor

The Arbor multi-compartment neural network simulation library.

cuda gpu hip hpc modern-cpp mpi neuroscience

Last synced: 16 May 2025

https://github.com/cgtuebingen/Flex-Convolution

Source code for: Flex-Convolution (Million-Scale Point-Cloud Learning Beyond Grid-Worlds), accepted at ACCV 2018

3d-point-clouds cuda deep-learning deep-neural-networks machine-learning pointcloud pointcloudprocessing research research-paper segmentation shapenet tensorflow

Last synced: 20 Mar 2025

https://github.com/mseitzer/gpu-monitor

Script to remotely check GPU servers for free GPUs

cuda cudnn deep-learning gpu nvidia-smi remote ssh

Last synced: 12 Apr 2025

https://github.com/pytorch/extension-script

Example repository for custom C++/CUDA operators for TorchScript

cpp cuda pytorch

Last synced: 30 Sep 2025

https://github.com/pennylaneai/pennylane-lightning

The Lightning plugin ecosystem provides fast quantum state-vector and tensor network simulators written in C++ for use with PennyLane.

cuda distributed-computing gpu hpc mpi openmp parallel quantum-computing quantum-machine-learning rocm

Last synced: 15 May 2025

https://github.com/juliagpu/acceleratedkernels.jl

Cross-architecture parallel algorithms for Julia's CPU and GPU backends. Targets multithreaded CPUs, and GPUs via Intel oneAPI, AMD ROCm, Apple Metal, Nvidia CUDA.

amd apple cuda gpu intel julia metal nvidia oneapi parallel rocm standard-library

Last synced: 04 Apr 2025

https://github.com/ROCm/rocRAND

RAND library for HIP programming language

cuda gpu hip random rng rocm

Last synced: 18 Aug 2025

https://github.com/mind-inria/mri-nufft

Doing non-Cartesian MR Imaging has never been so easy.

cuda gpu mri mri-reconstruction nufft numerical-methods numpy tensorflow torch

Last synced: 02 Mar 2026

https://github.com/pwhiddy/fat-clouds

GPU Fluid Simulation with Volumetric Rendering

blaze cuda fluid gpu navier-stokes raymarching smoke

Last synced: 12 Jul 2025

https://github.com/PennyLaneAI/pennylane-lightning

The Lightning plugin ecosystem provides fast quantum state-vector and tensor network simulators written in C++ for use with PennyLane.

cuda distributed-computing gpu hpc mpi openmp parallel quantum-computing quantum-machine-learning rocm

Last synced: 11 May 2025

https://github.com/cryinkfly/rhinoceros-3d-for-linux

This is a project, where I give you a way to use Rhino 3D on Linux!

archlinux cuda fedora international linux linuxmint manjaro nvidia opengl opensuse ubuntu wine

Last synced: 06 Apr 2025

https://github.com/JuliaGPU/AcceleratedKernels.jl

Cross-architecture parallel algorithms for Julia's CPU and GPU backends. Targets multithreaded CPUs, and GPUs via Intel oneAPI, AMD ROCm, Apple Metal, Nvidia CUDA.

amd apple cuda gpu intel julia metal nvidia oneapi parallel rocm standard-library

Last synced: 17 Mar 2025

https://github.com/windqaq/mpm

Simulating on GPU using Material Point Method and rendering.

cuda gvdb-voxels material-point-method raytracer simulation thrust

Last synced: 23 Mar 2025

https://github.com/cair/tmu

Implements the Tsetlin Machine, Coalesced Tsetlin Machine, Convolutional Tsetlin Machine, Regression Tsetlin Machine, and Weighted Tsetlin Machine, with support for continuous features, drop clause, Type III Feedback, focused negative sampling, multi-task classifier, autoencoder, literal budget, and one-vs-one multi-class classifier. TMU is written in Python with wrappers for C and CUDA-based clause evaluation and updating.

absorbing-states autoencoder convolution cuda gpu incremental incremental-computation multi-output pattern-recognition propositional-logic regression relational-logic sparse tsetlin-machine

Last synced: 09 Apr 2025

https://github.com/ANNetGPGPU/ANNetGPGPU

A GPU (CUDA) based Artificial Neural Network library

c-plus-plus-11 cuda propagation-network self-organizing-map

Last synced: 23 Apr 2025

https://github.com/leimao/CUDA-GEMM-Optimization

CUDA Matrix Multiplication Optimization

cuda

Last synced: 01 Aug 2025

https://github.com/bazel-contrib/rules_cuda

Starlark implementation of bazel rules for CUDA.

bazel cuda starlark

Last synced: 15 Feb 2026

https://github.com/arrayfire/arrayfire-ml

ArrayFire's Machine Learning Library.

arrayfire cpp11 cuda opencl

Last synced: 16 Jul 2025

https://github.com/cuihaoleo/gpg-fingerprint-filter-gpu

Generate OpenPGP keys with fingerprints that match a specific pattern (a.k.a. vanity keys)

cuda gnupg gpg gpu pgp vanity

Last synced: 15 Apr 2025

https://github.com/mitulgarg/env-doctor

Debug your GPU, CUDA, and AI stacks across local, Docker, and CI/CD (CLI and MCP server)

compatibility-tool cuda cuda-library cuda-support cuda-toolkit cudnn gpu-acceleration mcp-server nvidia-driver nvidia-gpu nvidia-smi pytorch wsl2

Last synced: 02 Apr 2026

https://github.com/ayutaz/piper-plus

Multilingual neural TTS (6 languages: JA/EN/ZH/ES/FR/PT) — C++, C# (.NET), Rust, Python SDKs. VITS + Prosody, streaming, CUDA/CoreML/DirectML, custom dictionaries. Install: pip install piper-tts-plus | dotnet tool install PiperPlus.Cli | cargo install piper-plus-cli

cross-platform csharp cuda deep-learning dotnet japanese multilingual nuget onnx pytorch rust speech-synthesis streaming text-to-speech tts vits webassembly

Last synced: 01 Apr 2026

https://github.com/nvidia-merlin/hierarchicalkv

HierarchicalKV is a part of NVIDIA Merlin and provides hierarchical key-value storage to meet RecSys requirements. The key capability of HierarchicalKV is to store key-value feature-embeddings on high-bandwidth memory (HBM) of GPUs and in host memory. It also can be used as a generic key-value storage.

cuda dynamic-embedding embedding-storage gpu hashtable key-value-store recommender-system

Last synced: 10 Apr 2025

https://github.com/upul/aurora

Minimal Deep Learning library is written in Python/Cython/C++ and Numpy/CUDA/cuDNN.

cplusplus cuda cudnn cython deep-learning python3 system-design

Last synced: 08 May 2025

https://github.com/nvidia/nvimagecodec

A nvImageCodec library of GPU- and CPU- accelerated codecs featuring a unified interface

computer-vision cpp cuda dali data-processing deep-learning fast-data-pipeline gpu image-processing machine-learning nvidia python pytorch

Last synced: 12 Jan 2026

https://github.com/mitmul/pynvvl

A Python wrapper of NVIDIA Video Loader (NVVL) with CuPy for fast video loading with Python

cuda cupy gpu numpy nvidia-video-loader nvvl python video video-processing

Last synced: 07 Oct 2025

https://github.com/harrism/mini-nbody

A simple gravitational N-body simulation in less than 100 lines of C code, with CUDA optimizations.

astrophysics benchmark cuda nbody

Last synced: 06 Oct 2025

https://github.com/coreylowman/llama-dfdx

LLaMa 7b with CUDA acceleration implemented in rust. Minimal GPU memory needed!

cuda deep-learning inference language-model llama neural-network rust rust-lang

Last synced: 13 Apr 2025

https://github.com/osai-ai/dokai

Collection of Docker images for ML/DL and video processing projects

cuda deep-learning docker docker-image ffmpeg opencv python pytorch tensorrt video-processing

Last synced: 04 Apr 2025

https://github.com/meiqua/pose_refine

cuda icp for 6D pose estimation

cuda icp renderer

Last synced: 14 Apr 2025

https://github.com/sonots/cumo

Cumo (pronounced like "koomo") is CUDA aware numerical library whose interface is highly compatible with Ruby Numo

cuda numo ruby scicentific-computing

Last synced: 10 Apr 2025

https://github.com/1ytic/pytorch-edit-distance

Levenshtein edit-distance on PyTorch and CUDA

asr cuda edit-distance levenshtein nlp pytorch

Last synced: 19 Jun 2025

https://github.com/bdusell/singularity-tutorial

Tutorial for using Singularity containers

container cuda cudnn pytorch singularity

Last synced: 09 Apr 2025

https://github.com/openmm/nnpops

High-performance operations for neural network potentials

cuda gpu machine-learning molecular-dynamics molecular-modeling

Last synced: 20 Jun 2025

https://github.com/sasagawa888/deeppipe2

Deep Learning library using GPU(CUDA/cuBLAS)

cublas cuda deep-learning elixir gpu

Last synced: 15 Mar 2025

https://github.com/openmm/NNPOps

High-performance operations for neural network potentials

cuda gpu machine-learning molecular-dynamics molecular-modeling

Last synced: 04 May 2025

https://github.com/braintwister/docker-devel-env

Fast, reproducible, and portable software development environments

clang cmake conan cuda development docker eclipse gcc jenkins nsight portability reproducibility vscode

Last synced: 23 Oct 2025

https://github.com/ashvardanian/parallelreductionsbenchmark

Thrust, CUB, TBB, AVX2, AVX-512, CUDA, OpenCL, OpenMP, Metal - all it takes to sum a lot of numbers fast!

apple avx512 cuda glsl gpgpu gpu gpu-acceleration gpu-computing hpc intel metal nvidia opencl openmp parallel simd stl tbb thrust

Last synced: 06 Apr 2025

https://github.com/mitsuba-renderer/drjit-core

Dr.Jit — A Just-In-Time-Compiler for Differentiable Rendering (core library)

cuda jit llvm

Last synced: 05 Apr 2025

https://github.com/eomii/rules_ll

An Upstream Clang/LLVM-based toolchain for contemporary C++ and heterogeneous programming

bazel bleeding-edge build-system clang clang-tidy cpp cuda gpu-programming hermetic hip llvm nix openmp remote-caching remote-execution reproducible sanitizers

Last synced: 06 Apr 2025

https://github.com/BrainTwister/docker-devel-env

Fast, reproducible, and portable software development environments

clang cmake conan cuda development docker eclipse gcc jenkins nsight portability reproducibility vscode

Last synced: 06 Aug 2025

https://github.com/esemeniuc/openpose-docker

A docker build file for CMU openpose with Python API support

cuda deep-learning deep-neural-networks docker openpose pose-estimation python

Last synced: 10 Feb 2026

https://github.com/sunsetquest/cudapad

CudaPAD is a PTX/SASS viewer for NVIDIA Cuda kernels and provides an on-the-fly view of the assembly.

cuda cuda-programming gpu nvidia ptx ptx-utils windows

Last synced: 25 Jul 2025

https://github.com/ttsiodras/mandelbrotsse

Real-time Mandelbrot zoom via SSE, AVX, OpenMP, CUDA, XaoS...

avx cuda openmp sse

Last synced: 18 Oct 2025

https://github.com/kozyilmaz/nheqminer-macos

nheqminer for macOS with AVX and CUDA

apple cuda gpu-miner macos nheqminer osx zcash

Last synced: 24 Mar 2025

https://github.com/llnl/aluminum

High-performance, GPU-aware communication library

cpp cuda gpu hpc mpi

Last synced: 05 Apr 2025

https://github.com/shinmorino/sqaod

Solvers/annealers for simulated quantum annealing on CPU and CUDA(NVIDIA GPU).

accelearated cplusplus-11 cuda gpu linux monte-carlo-simulation nvidia-gpu python quantum-annealing quantum-computing windows

Last synced: 21 Aug 2025

https://github.com/puttsk/cuda-tutorial

A set of hands-on tutorials for CUDA programming

cuda tutorial

Last synced: 31 Jan 2026

https://github.com/WeltXing/PyDyNet

NumPy实现类PyTorch的动态计算图和神经网络框架(MLP, CNN, RNN, Transformer)

autograd cnn cuda cupy deep-learning-framework numpy python pytorch pytorch-implementation rnn transformer

Last synced: 01 Sep 2025

https://github.com/DevXT-LLC/ezlocalai

ezlocalai is an easy to set up local artificial intelligence server with OpenAI Style Endpoints.

ai artificial-intelligence cuda llamacpp local

Last synced: 21 Nov 2025

https://github.com/devxt-llc/ezlocalai

ezlocalai is an easy to set up local artificial intelligence server with OpenAI Style Endpoints.

ai artificial-intelligence cuda llamacpp local

Last synced: 21 Feb 2026

https://github.com/rxwei/cuda-swift

Parallel Computing Library for Linux and macOS & NVIDIA CUDA Wrapper

cublas cuda gpu parallel swift

Last synced: 29 Apr 2025

https://github.com/BobMcDear/neural-network-cuda

Neural network from scratch in CUDA/C++

cplusplus cuda deep-learning machine-learning neural-network

Last synced: 23 Aug 2025

https://github.com/spcl/daceml

A Data-Centric Compiler for Machine Learning

compiler cuda deep-learning fpga high-performance-computing machine-learning pytorch

Last synced: 10 Sep 2025

https://github.com/projectchrono/dem-engine

A dual-GPU DEM solver with complex grain geometry support

chrono cuda discrete-element-method gpu multi-gpu simulation

Last synced: 06 Apr 2025

https://github.com/fixstars/cuda-efficient-features

A CUDA implementation of keypoint detection and descriptor extraction

computer-vision cuda descriptors local-features robotics slam structure-from-motion

Last synced: 13 Apr 2025

https://github.com/juliagpu/gemmkernels.jl

Flexible and performant GEMM kernels in Julia

cuda gpu julia

Last synced: 24 Jul 2025

https://github.com/amypad/cuvec

Unifying Python/C++/CUDA memory: Python buffered array ↔️ `std::vector` ↔️ CUDA managed memory

array buffer c cpp cpu cpython cpython-api cpython-extensions cuda cxx gpu hacktoberfest pybind11 python swig vector

Last synced: 05 Apr 2025

https://github.com/zenustech/zpc

zenus parallel computing library for zenus physics-based simulations

cuda gpu hpc math physics simulation

Last synced: 06 Apr 2025

https://github.com/JuliaGPU/GemmKernels.jl

Flexible and performant GEMM kernels in Julia

cuda gpu julia

Last synced: 04 May 2025

https://github.com/pika-org/pika

pika is a C++ tasking library built on std::execution with fibers, CUDA, HIP, and MPI support.

concurrency cplusplus cpp cuda gpu hip mpi p2300 parallelism rocm stdexec

Last synced: 30 Jan 2026

https://github.com/JuliaAttic/CUDArt.jl

Julia wrapper for CUDA runtime API

cuda gpu julia

Last synced: 22 Jul 2025

https://github.com/dbraun/pytorchtop

GPU PyTorch TOP in TouchDesigner with CUDA-enabled OpenCV

cuda libtorch opencv pytorch touchdesigner

Last synced: 15 Apr 2025

https://github.com/juliaattic/cudart.jl

Julia wrapper for CUDA runtime API

cuda gpu julia

Last synced: 17 Dec 2025

https://github.com/shi-yan/FreeWill

A deep learning library in C++/CUDA

cnn cuda deep-learning dnn machine-learning neural-network qt5

Last synced: 09 Jul 2025

https://github.com/nvidia-ai-iot/deepstream_libraries

DeepStream Libraries offer CVCUDA, NvImageCodec, and PyNvVideoCodec modules as Python APIs for seamless integration into custom frameworks.

computer-vision cuda cv-cuda data-processing gpu image-processing nvidia nvimagecodec pynvvideocodec pytorch

Last synced: 01 Apr 2026

https://github.com/zidage/alcedostudio

Open-source RAW photo processing and digital asset management software.

cpp cuda image-processing photo-editor photography raw-image

Last synced: 30 May 2026

https://github.com/rocm/rocalution

Next generation library for iterative sparse solvers for ROCm platform

cplusplus cuda fortran mpi opencl openmp solver sparse

Last synced: 05 Apr 2025

https://github.com/bobmcdear/neural-network-cuda

Neural network from scratch in CUDA/C++

cplusplus cuda deep-learning machine-learning neural-network

Last synced: 28 Jul 2025

https://github.com/una-dinosauria/local-search-quantization

State-of-the-art method for large-scale ANN search as of Oct 2016. Presented at ECCV 16.

computer-vision cuda eccv-16 gpu julia multi-codebook quantization

Last synced: 02 Aug 2025

https://github.com/maxilevi/raytracer

C++ raytracer that supports custom models. Supports running the calculations on the CPU using C++11 threads or in the GPU via CUDA.

bvh cuda graphics-programming intersection raytracer

Last synced: 27 Apr 2025

https://github.com/tmcdonell/cuda

Haskell FFI bindings to CUDA

cuda ffi-bindings haskell

Last synced: 06 Apr 2025

https://github.com/xlite-dev/hgemm

⚡️Write HGEMM from scratch using Tensor Cores with WMMA, MMA and CuTe API, Achieve Peak⚡️ Performance.

cuda hgemm tensor-cores

Last synced: 11 Jun 2025

https://github.com/celeritas-project/celeritas

Celeritas is a new Monte Carlo transport code designed to accelerate scientific discovery in high energy physics by improving detector simulation throughput and energy efficiency using GPUs.

computational-physics cuda detector-simulation gpu hep high-energy-physics hip monte-carlo particle-transport

Last synced: 05 Apr 2025

https://github.com/unum-cloud/udisk

The fastest ACID-transactional persisted Key-Value store designed as modified LSM-Tree for NVMe block-devices with GPU-acceleration and SPDK to bypass the Linux kernel

cuda database io-uring iouring key-value key-value-store linux linux-kernel lsm-tree spdk

Last synced: 26 Feb 2026

https://github.com/a-new-bellhope/bellhopcuda

CUDA and C++ port of BELLHOP / BELLHOP3D underwater acoustics simulator

acoustics cuda hpc oceanography underwater-acoustics

Last synced: 07 Apr 2025

https://github.com/rocm/hipfort

Fortran interfaces for ROCm libraries

blas cuda fft fortran gpgpu gpu hip interoperability random rocm solver sparse

Last synced: 05 Apr 2025

https://github.com/open-atmos/pysdm

Pythonic particle-based (super-droplet) warm-rain/aqueous-chemistry cloud microphysics package with box, parcel & 1D/2D prescribed-flow examples in Python, Julia and Matlab

atmospheric-modelling atmospheric-physics cuda gpu gpu-computing monte-carlo-simulation numba nvrtc particle-system physics-simulation pint pypi-package python research simulation thrust

Last synced: 20 May 2026

https://github.com/elftausend/custos

A minimal OpenCL, CUDA, Vulkan and host CPU array manipulation engine / framework.

array-manipulations autograd automatic-differentiation cpu cuda cuda-support custos framework gpu lazy-evaluation no-std opencl rust vulkan wgsl

Last synced: 07 May 2025

https://github.com/nolanzzz/mtmct

Design and Implementation of a Multi-Target Multi-Camera Tracking Solution

cuda deep-learning detection machine-learning opencv python pytorch reidentification research-project resnet tracking

Last synced: 20 Mar 2025

https://github.com/celerity/ndzip

A High-Throughput Parallel Lossless Compressor for Scientific Data

compression cuda floating-point gpu simd sycl

Last synced: 11 Dec 2025

https://github.com/heethesh/computer-vision-and-deep-learning-setup

Tutorial on how to setup your system with a NVIDIA GPU and to install Deep Learning Frameworks like TensorFlow, Darknet for YOLO, Theano, and Keras; OpenCV; and NVIDIA drivers, CUDA, and cuDNN libraries on Ubuntu 16.04, 17.10 and 18.04.

cuda cudnn gpu install keras nvidia opencv python tensorflow ubuntu ubuntu1710

Last synced: 19 Apr 2025

https://github.com/yalue/cuda_scheduling_examiner_mirror

A tool for examining GPU scheduling behavior.

benchmark cuda cuda-kernels gpu gpu-scheduling mandelbrot

Last synced: 17 Mar 2026

https://github.com/pinto0309/dmhead

Dual model head pose estimation. Fusion of SOTA models. 360° 6D HeadPose detection. All pre-processing and post-processing are fused together, allowing end-to-end processing in a single inference.

6d cuda head-pose-estimation headpose-detection headpose-estimation models onnx tensorrt

Last synced: 30 Apr 2025

https://github.com/Ar-Ray-code/darknet_ros_fp16

darknet + ROS2 Humble + OpenCV4 + CUDA 11(cuDNN, Jetson Orin)

cuda cudnn darknet object-detection opencv4 ros ros2-foxy yolo yolo-tiny yolov3 yolov7

Last synced: 20 Mar 2025

https://github.com/lukeyeager/cmake-cuda-example

Example of how to use CUDA with CMake >= 3.8

cmake cuda

Last synced: 25 Mar 2025

https://github.com/dr-bonez/tor-v3-vanity

A TOR v3 vanity url generator designed to run on an NVIDIA GPU.

cuda gpu tor v3 vanity

Last synced: 16 Jan 2026

https://github.com/pypr/compyle

Execute a subset of Python on HPC platforms

cuda cython high-performance-computing opencl openmp python transpile

Last synced: 04 Apr 2025