An open API service indexing awesome lists of open source software.

CUDA

CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.

https://github.com/mkeeter/mpr

Reference implementation for "Massively Parallel Rendering of Complex Closed-Form Implicit Surfaces" (SIGGRAPH 2020)

cad cuda gpu implicit-surfaces rendering

Last synced: 16 Mar 2025

https://github.com/QINZHAOYU/CudaSteps

基于《cuda编程-基础与实践》(樊哲勇 著)的cuda学习之路。

cuda gpu nvidia

Last synced: 14 May 2025

https://github.com/rapidsai/node

GPU-accelerated data science and visualization in node

cuda data-science data-visualization gpgpu gpu nodejs

Last synced: 16 May 2025

https://github.com/guoriyue/3dgs-warp-scratch

Build 3D Gaussian Splatting from scratch with NVIDIA Warp in Python — CPU/GPU compatible, with a clean and minimalist design focused on learning modern graphics.

3dgs build-from-scratch cuda graphics nerf nvidia-warp python

Last synced: 05 Mar 2026

https://github.com/dividiti/ck-caffe

Collective Knowledge workflow for Caffe to automate installation across diverse platforms and to collaboratively evaluate and optimize Caffe-based workloads across diverse hardware, software and data sets (compilers, libraries, tools, models, inputs):

accuracy android caffe collaborative-optimization collective-knowledge costs cuda customizable-workflows dnn-as-a-service dnn-optimization json-api linux opencl performance-portability portable-package-manager reproducible-experiments resources windows

Last synced: 04 May 2025

https://github.com/nobuyuki83/delfem2

Research prototyping framework for physics simulation written in C++

cuda fem-simulation finite-element-methods geometry-processing opengl physics-simulation simulation

Last synced: 06 Apr 2025

https://github.com/hijkzzz/cuda-neural-network

Convolutional Neural Network with CUDA (MNIST 99.23%)

cnn cpp cuda mnist neural-network

Last synced: 14 Jul 2025

https://github.com/zhongkaifu/seq2seqsharp

Seq2SeqSharp is a tensor based fast & flexible deep neural network framework written by .NET (C#). It has many highlighted features, such as automatic differentiation, different network types (Transformer, LSTM, BiLSTM and so on), multi-GPUs supported, cross-platforms (Windows, Linux, x86, x64, ARM), multimodal model for text and images and so on.

attention-model cuda deep-learning encoder-decoder gpu image lstm machine-translation neural-network seq2seq sequence-to-sequence tensor text transformer transformer-architecture transformer-encoder translation vision-transformer

Last synced: 04 Apr 2025

https://github.com/proger/accelerated-scan

Accelerated First Order Parallel Associative Scan

cuda cumulative-sum recurrent-neural-networks state-space-model torch

Last synced: 25 Dec 2025

https://github.com/nvidia/gmat

A toolkit showing GPU's all-round capability in video processing

codec cpp cuda deep-learning ffmpeg gpu image-processing nvidia video video-codec

Last synced: 11 May 2025

https://github.com/NVIDIA/GMAT

A toolkit showing GPU's all-round capability in video processing

codec cpp cuda deep-learning ffmpeg gpu image-processing nvidia video video-codec

Last synced: 04 Apr 2025

https://github.com/xlite-dev/ffpa-attn

📚FFPA(Split-D): Extend FlashAttention with Split-D for large headdim, O(1) GPU SRAM complexity, 1.8x~3x↑🎉 faster than SDPA EA.

attention cuda deepseek deepseek-r1 deepseek-v3 flash-attention flash-mla fused-mla mla mlsys sdpa tensor-cores

Last synced: 11 Jun 2025

https://github.com/toruniina/lbvh

an implementation of parallel linear BVH (LBVH) on GPU

bvh cuda gpu nearest-neighbor-search parallel thrust

Last synced: 21 Aug 2025

https://github.com/HMUNACHI/CUDATutorials

Zero to Hero GPU and CUDA for Maths & ML tutorials with examples.

cuda cuda-kernels cuda-programming machine-learning maths

Last synced: 24 Apr 2025

https://github.com/HMUNACHI/henry-vjp

From zero to hero CUDA for accelerating maths and machine learning on GPU.

cuda cuda-kernels cuda-programming machine-learning maths

Last synced: 05 Apr 2025

https://github.com/hmunachi/henry-vjp

From zero to hero CUDA for accelerating maths and machine learning on GPU.

cuda cuda-kernels cuda-programming machine-learning maths

Last synced: 08 Apr 2025

https://github.com/HMUNACHI/cuda-tutorials

CUDA tutorials or Maths & ML tutorials with examples, covers multi-gpus, fused attention, winograd convolution, reinforcement learning.

cuda cuda-kernels cuda-programming machine-learning maths

Last synced: 13 May 2025

https://github.com/lxxue/frnn

Fixed Radius Nearest Neighbor Search on GPU

cuda nearest-neighbor-search pytorch

Last synced: 17 Mar 2026

https://github.com/xmrminer/xmrminer

:ant: A CUDA based miner for Monero

cuda gpu monero nvidia xmr

Last synced: 07 May 2025

https://github.com/cuMF/cumf_als

CUDA Matrix Factorization Library with Alternating Least Square (ALS)

als cuda gpu machine machine-learning matrix-factorization

Last synced: 04 May 2025

https://github.com/xmrMiner/xmrMiner

:ant: A CUDA based miner for Monero

cuda gpu monero nvidia xmr

Last synced: 12 May 2025

https://github.com/zpzim/scamp

The fastest way to compute matrix profiles on CPU and GPU!

cuda gpu matrix-profile python time-series time-series-analysis

Last synced: 05 Apr 2025

https://github.com/eth-cscs/cosma

Distributed Communication-Optimal Matrix-Matrix Multiplication Algorithm

communication-optimal cuda gpu-acceleration linear-algebra matmul matrix-multiplication mpi pdgemm rocm scalapack

Last synced: 02 Mar 2026

https://github.com/pykeio/diffusers

A modular Rust library for super fast Stable Diffusion inference - 45% faster than PyTorch 🔮

cuda diffusion-models onnx onnxruntime onnxruntime-gpu rust stable-diffusion stable-diffusion-v2

Last synced: 28 Mar 2025

https://github.com/yilingqiao/dmrf

Dynamic Mesh-Aware Radiance Fields (ICCV2023): Raytracing rendering and interactive simulating mesh with NeRF

cuda nerf raytracing simulation

Last synced: 09 Apr 2025

https://github.com/invergent-ai/surogate

Training/Fine-tuning at the speed of light

cuda deep-learning fine-tuning generative-ai llama llm llms nvidia-gpu qwen sft

Last synced: 10 May 2026

https://github.com/rocm/rocprim

ROCm Parallel Primitives

amd cuda gpu hip parallel primitive rocm

Last synced: 02 Apr 2026

https://github.com/sjtu-ipads/phoenixos

Fast OS-level support for GPU checkpoint and restore

checkpoint-restore criu cuda gpu

Last synced: 05 Apr 2025

https://github.com/cnugteren/cltune

CLTune: An automatic OpenCL & CUDA kernel tuner

auto-tuning cuda opencl tuner

Last synced: 21 Aug 2025

https://github.com/jimver/cuda-toolkit

GitHub Action to install CUDA

action cuda cuda-toolkit github-actions nvidia nvidia-cuda

Last synced: 14 Apr 2025

https://github.com/librapid/librapid

A highly optimised C++ library for mathematical applications and neural networks.

array cpp cpp20 cpp23 cuda gpu high-performance-computing library matrix multidimensional-arrays multithreading parallel-programming pypy pypy3 python python3 simd

Last synced: 27 Mar 2026

https://github.com/qengineering/install-opencv-jetson-nano

OpenCV installation script with CUDA and cuDNN support

cuda cudnn jetson-nano jetson-xavier opencv opencv4

Last synced: 04 Apr 2025

https://github.com/LibRapid/librapid

A highly optimised C++ library for mathematical applications and neural networks.

array cpp cpp20 cpp23 cuda gpu high-performance-computing library matrix multidimensional-arrays multithreading parallel-programming pypy pypy3 python python3 simd

Last synced: 01 Aug 2025

https://github.com/dvlab-research/sparsetransformer

A fast and memory-efficient libarary for sparse transformer with varying token numbers (e.g., 3D point cloud).

3d-point-cloud cuda sparse-transformer transformer

Last synced: 03 Jul 2025

https://github.com/rocm/gpufort

GPUFORT: S2S translation tool for CUDA Fortran and Fortran+X in the spirit of hipify

cuda cuda-fortran fortran gpgpu gpu hip interoperability openacc openmp rocm

Last synced: 21 Jun 2025

https://github.com/ROCm/gpufort

GPUFORT: S2S translation tool for CUDA Fortran and Fortran+X in the spirit of hipify

cuda cuda-fortran fortran gpgpu gpu hip interoperability openacc openmp rocm

Last synced: 11 Mar 2025

https://github.com/deftruth/ffpa-attn-mma

📚FFPA(Split-D): Yet another Faster Flash Prefill Attention with O(1) SRAM complexity large headdim (D > 256), ~2x↑🎉vs SDPA EA.

attention cuda deepseek deepseek-r1 deepseek-v3 flash-attention flash-mla fused-mla mla mlsys sdpa tensor-cores

Last synced: 06 Apr 2025

https://github.com/zju3dv/envgs

[CVPR 2025] EnvGS: Modeling View-Dependent Appearance with Environment Gaussian

2dgs 3dgs cuda optix path-tracing ray-tracing reflection

Last synced: 05 Apr 2025

https://github.com/dvlab-research/SparseTransformer

A fast and memory-efficient libarary for sparse transformer with varying token numbers (e.g., 3D point cloud).

3d-point-cloud cuda sparse-transformer transformer

Last synced: 20 Mar 2025

https://github.com/xlite-dev/ffpa-attn-mma

📚FFPA(Split-D): Yet another Faster Flash Prefill Attention with O(1) GPU SRAM complexity for headdim > 256, ~2x↑🎉vs SDPA EA.

attention cuda deepseek deepseek-r1 deepseek-v3 flash-attention flash-mla fused-mla mla mlsys sdpa tensor-cores

Last synced: 30 Mar 2025

https://github.com/pythonlessons/tensorflow-object-detection-tutorial

The purpose of this tutorial is to learn how to install and prepare TensorFlow framework to train your own convolutional neural network object detection classifier for multiple objects, starting from scratch

classifier cuda cudnn detection detection-api detection-classifier detection-tutorial gpu grabscreen labels object-detection pil python-mss tensorflow tensorflow-cpu tensorflow-gpu tensorflow-models tutorial

Last synced: 23 Oct 2025

https://github.com/kibae/onnxruntime-server

ONNX Runtime Server: The ONNX Runtime Server is a server that provides TCP and HTTP/HTTPS REST APIs for ONNX inference.

ai contributions-welcome cuda deep-learning inference-server machine-learning nueral-networks onnx onnxruntime

Last synced: 05 Apr 2025

https://github.com/lucasdelimanogueira/PyNorch

Recreating PyTorch from scratch (C/C++, CUDA, NCCL and Python, with multi-GPU support and automatic differentiation!)

c cuda deep-learning neural-network python pytorch

Last synced: 15 Sep 2025

https://github.com/cp2k/dbcsr

DBCSR: Distributed Block Compressed Sparse Row matrix library

blas cp2k cuda gemm hpc linear-algebra matrix-multiplication mpi openmp-parallelization sparse-matrix

Last synced: 21 Feb 2026

https://github.com/eth-cscs/implicitglobalgrid.jl

Almost trivial distributed parallelization of stencil-based GPU and CPU applications on a regular staggered grid

cuda distributed gpu julia julia-mpi-wrapper mpi multi-gpu staggered-grids stencil-codes

Last synced: 04 Apr 2025

https://github.com/lucasdelimanogueira/pynorch

Recreating PyTorch from scratch (C/C++, CUDA, NCCL and Python, with multi-GPU support and automatic differentiation!)

c cuda deep-learning neural-network python pytorch

Last synced: 07 Jul 2025

https://github.com/rocm/hipblas

[DEPRECATED] Moved to ROCm/rocm-libraries repo

blas cuda hip rocm

Last synced: 02 Apr 2026

https://github.com/patwie/cuda-design-patterns

Some CUDA design patterns and a bit of template magic for CUDA

bazel cpp11 cuda cuda-development cuda-device cuda-kernels cuda-utils gpu template-metaprogramming

Last synced: 14 Apr 2025

https://github.com/dr-noob/gpufetch

Simple yet fancy GPU architecture fetching tool

cuda gpu igpu intel nvidia

Last synced: 13 Apr 2025

https://github.com/dizcza/docker-hashcat

Latest hashcat docker for CUDA, OpenCL, and POCL. Deployed on Vast.ai

cuda docker hashcat nvidia opencl pocl vast-ai

Last synced: 01 Apr 2025

https://github.com/chenhunghan/ialacol

🪶 Lightweight OpenAI drop-in replacement for Kubernetes

ai cloudnative cuda ggml gptq gpu helm kubernetes langchain llamacpp llm llm-inference llm-serving openai python

Last synced: 30 Sep 2025

https://github.com/bobmcdear/attorch

A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.

cuda deep-learning machine-learning openai openai-triton pytorch triton

Last synced: 22 Aug 2025

https://github.com/jamjamjon/usls

A Rust library integrated with ONNXRuntime, providing a collection of Computer Vison and Vision-Language models.

clip cuda florence2 grounding-dino imshow moondream ocr onnx onnxruntime rust-yolo sam sapiens smolvlm tensorrt yolo yolo-rs yolo-rust yolov10 yolov11 yolov8

Last synced: 16 May 2025

https://github.com/fangq/mcx

Monte Carlo eXtreme (MCX) - GPU-accelerated photon transport simulator

3d c cuda matlab monte-carlo optical-imaging pascal photon-transport physics-simulation ray-tracing volumetric-rendering voxel-based

Last synced: 15 May 2025

https://github.com/libmir/dcompute

DCompute: Native execution of D on GPUs and other Accelerators

cuda d fpga gpgpu gpu ldc opencl

Last synced: 20 Aug 2025

https://github.com/1461521844lijin/trt_yolo_video_pipeline

TensorRT+YOLO系列的 多路 多卡 多实例 并行视频分析处理案例

cuda ffmpeg opencv video-processing yolo yolov8

Last synced: 18 Jul 2025

https://github.com/Dr-Noob/gpufetch

Simple yet fancy GPU architecture fetching tool

cuda gpu igpu intel nvidia

Last synced: 01 Apr 2025

https://github.com/devnen/dia-tts-server

Self-host the powerful Dia TTS model. This server offers a user-friendly Web UI, flexible API endpoints (incl. OpenAI compatible), support for SafeTensors/BF16, voice cloning, dialogue generation, and GPU/CPU execution.

ai api-server audio-generation cuda dia dia-tts dialogue-tts fastapi huggingface openai-api python pytorch speech-synthesis speech-synthesis-api text-to-speech tts tts-api voice-cloning web-ui

Last synced: 06 May 2025

https://github.com/chonspqx/modulated-deform-conv

deformable convolution 2D 3D DeformableConvolution DeformConv Modulated Pytorch CUDA

cuda cuda-extension deform-conv3d deformable-convolutional deformable-convolutional-networks python pytorch

Last synced: 07 Jul 2025

https://github.com/mathiasbourgoin/spoc

Stream Processing with OCaml

cuda gpgpu ocaml opencl spoc

Last synced: 10 Apr 2025

https://github.com/goofit/goofit

Code repository for the massively-parallel framework for maximum-likelihood fits, implemented in CUDA/OpenMP

cuda fitting gpu gpu-computing omp physics root-cern thrust

Last synced: 10 Apr 2025

https://github.com/roastduck/FreeTensor

A language and compiler for irregular tensor programs.

ast automatic-differentiation code-generation cuda gpu jit openmp tensor

Last synced: 11 Apr 2025

https://github.com/fhamborg/newsmtsc

Target-dependent sentiment classification in news articles reporting on political events. Includes a high-quality data set of over 11k sentences and a state-of-the-art classification model.

cuda dataset deep-learning news-articles pytorch sentiment-analysis sentiment-classification text-classification tsc

Last synced: 07 Apr 2025

https://github.com/cgtuebingen/ggnn

GGNN: State of the Art Graph-based GPU Nearest Neighbor Search

ann approximate-nearest-neighbor-search cuda gpu nearest-neighbor-search vector-database vector-db

Last synced: 20 Nov 2025

https://github.com/openmlsys/openmlsys-cuda

Tutorials for writing high-performance GPU operators in AI frameworks.

cuda gpu machine-learning

Last synced: 08 Oct 2025

https://github.com/JulianAssmann/opencv-cuda-docker

Dockerfiles for OpenCV compiled with CUDA, opencv_contrib modules and Python 3 bindings

cuda docker gpu nvidia opencv

Last synced: 06 Apr 2025

https://github.com/charlesq34/diy-deep-learning-workstation

Build a deep learning workstation from scratch (HW & SW).

cuda deep-learning gpu ubuntu workstations

Last synced: 25 Feb 2026

https://github.com/anicetngrt/jiro-nn

A Deep Learning and preprocessing framework in Rust with support for CPU and GPU.

adam classification cuda data-analysis deep-learning dropout gpu gpu-computing machine-learning ml nalgebra neural-networks nn opencl pipelines regression rust sgd

Last synced: 09 Apr 2025

https://github.com/rsnk96/Ubuntu-Setup-Scripts

Scripts to help you set up your Ubuntu quickly, especially if you're in any subfield of Data Science or AI!

anaconda cuda deep-learning deeplearning dl ffmpeg installers ml opencv python pytorch tensorflow tensorflow-setup ubuntu zsh

Last synced: 07 Apr 2025

https://github.com/AnicetNgrt/jiro-nn

A Deep Learning and preprocessing framework in Rust with support for CPU and GPU.

adam classification cuda data-analysis deep-learning dropout gpu gpu-computing machine-learning ml nalgebra neural-networks nn opencl pipelines regression rust sgd

Last synced: 25 Sep 2025

https://github.com/acdslab/mppi-generic

Templated C++/CUDA implementation of Model Predictive Path Integral Control (MPPI)

cpp cuda model-predictive-control model-predictive-path-integral robotics stochastic-optimization

Last synced: 05 Apr 2025

https://github.com/glotzerlab/fresnel

Publication quality path tracing in real time.

cuda optix path-tracing python simulation soft-matter

Last synced: 13 Oct 2025

https://github.com/GooFit/GooFit

Code repository for the massively-parallel framework for maximum-likelihood fits, implemented in CUDA/OpenMP

cuda fitting gpu gpu-computing omp physics root-cern thrust

Last synced: 08 Apr 2025

https://github.com/ROCm/hipBLAS

ROCm BLAS marshalling library

blas cuda hip rocm

Last synced: 23 Jul 2025

https://github.com/naeioi/pbf-cuda

Position Based Fluids CUDA implementation

cuda fluid-solver opengl real-time simulation

Last synced: 26 Apr 2025

https://github.com/qdLMF/LIO-SAM-GPU-ScanToMapOpt

A CUDA reimplementation of the line/plane odometry of LIO-SAM. A point cloud hash map (inspired by iVox of Faster-LIO) on GPU is used to accelerate 5-neighbour KNN search.

3d-mapping cuda faster-lio gpu ivox knn lidar lidar-inertial-odometry lidar-slam lio lio-sam loam slam

Last synced: 18 Mar 2025

https://github.com/ihhub/penguinv

Computer vision library with focus on heterogeneous systems

avx computer-vision cpp cuda gpu hacktoberfest heterogeneous-systems image-processing opencl python simd sse thread-pool

Last synced: 30 Oct 2025

https://github.com/inoryy/tensorflow-optimized-wheels

TensorFlow wheels built for latest CUDA/CuDNN and enabled performance flags: SSE, AVX, FMA; XLA

avx2 cuda cudnn python sse tensorflow tensorflow-gpu tensorflow-wheels wheels xla

Last synced: 02 Apr 2025

https://github.com/psmarter/mini-infer

基于PagedAttention的高性能大模型推理引擎(重构中)

ai cuda deep-learning gpu inference language-model llm machine-learning pagedattention python pytorch transformer triton

Last synced: 02 Apr 2026

https://github.com/gpmueller/eigen-cuda

MWE for using the Eigen library in CUDA kernels

cuda eigen eigen-cuda mwe

Last synced: 14 Apr 2025

https://github.com/MuGdxy/muda

μ-Cuda, COVER THE LAST MILE OF CUDA. With features: intellisense-friendly, structured launch, automatic cuda graph generation and updating.

cuda cuda-cpp cuda-programming

Last synced: 09 Jul 2025

https://github.com/rocm/rocrand

RAND library for HIP programming language

cuda gpu hip random rng rocm

Last synced: 02 Apr 2026

https://github.com/nirw4nna/dsc

Tensor library & inference framework for machine learning

cuda gpu large-language-models machine-learning pytorch tensor-algebra

Last synced: 21 Jan 2026

https://github.com/sniklaus/pytorch-extension

an example of a CUDA extension for PyTorch using CuPy which computes the Hadamard product of two tensors

cuda cupy deep-learning python pytorch

Last synced: 25 Dec 2025

https://github.com/src-d/minhashcuda

Weighted MinHash implementation on CUDA (multi-gpu).

cuda lsh machine-learning minhash

Last synced: 09 Apr 2025

https://github.com/arbor-sim/arbor

The Arbor multi-compartment neural network simulation library.

cuda gpu hip hpc modern-cpp mpi neuroscience

Last synced: 16 May 2025