An open API service indexing awesome lists of open source software.

CUDA

CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.

https://github.com/goldbattle/libelas-gpu

Implementation of LIBELAS in cuda.

cpu cuda depth-maps gpu libelas libelas-gpu

Last synced: 10 Apr 2025

https://github.com/ar-ray-code/darknet_ros_fp16

darknet + ROS2 Humble + OpenCV4 + CUDA 11(cuDNN, Jetson Orin)

cuda cudnn darknet object-detection opencv4 ros ros2-foxy yolo yolo-tiny yolov3 yolov7

Last synced: 29 Jul 2025

https://github.com/ecrc/kblas-gpu

Subset of BLAS routines optimized for NVIDIA GPUs

blas cuda

Last synced: 01 Mar 2026

https://github.com/harrism/sublimetext-cuda-cpp

CUDA C++ package for Sublime Text 2 & 3

cuda snippets sublime-text tmlanguage

Last synced: 05 Aug 2025

https://github.com/iconben/z-image-studio

A Cli, a webUI, and a MCP server for the Z-Image-Turbo text-to-image generation model (Tongyi-MAI/Z-Image-Turbo base model as well as quantized models)

ai apple apple-silicon cuda diffusers localllm lora mcp-server mps python text-to-image text2image webui z-image z-image-turbo

Last synced: 15 Jan 2026

https://github.com/nickkarpowicz/lightwaveexplorer

An efficient, user-friendly solver for nonlinear light-matter interaction

c-plus-plus cuda nonlinear-optics oneapi optics-simulation simulation sycl

Last synced: 07 Feb 2026

https://github.com/NickKarpowicz/LightwaveExplorer

An efficient, user-friendly solver for nonlinear light-matter interaction

c-plus-plus cuda nonlinear-optics oneapi optics-simulation simulation sycl

Last synced: 04 Apr 2025

https://github.com/ztxtech/Time-Evidence-Fusion-Network

Official implementation of "Time Evidence Fusion Network: Multi-source View in Long-Term Time Series Forecasting" (https://arxiv.org/abs/2405.06419)

cuda deep-learning machine-learning macos neural-network neural-networks pytorch time-series time-series-analysis time-series-forecasting time-series-prediction uestc

Last synced: 01 Apr 2025

https://github.com/sh1ng/arboretum

Gradient Boosting powered by GPU(NVIDIA CUDA)

arboretum cuda gpu gradient-boosting gradient-boosting-machine machine-learning python

Last synced: 07 Nov 2025

https://github.com/owensgroup/bght

BGHT: High-performance static GPU hash tables.

cuckoo cuda gpu hashing hashmap

Last synced: 25 Jul 2025

https://github.com/parthenon-hpc-lab/kharma

Kokkos-based High-Accuracy Relativistic Magnetohydrodynamics with AMR

cuda gpu grmhd hip kokkos mhd openmp sycl

Last synced: 04 Jun 2026

https://github.com/owensgroup/BGHT

BGHT: High-performance static GPU hash tables.

cuckoo cuda gpu hashing hashmap

Last synced: 16 May 2025

https://github.com/gunrock/loops

🎃 GPU load-balancing library for regular and irregular computations.

cuda gpu gpu-computing hpc load-balancing parallel

Last synced: 09 May 2026

https://github.com/tomrunia/pytorchsteerablepyramid

PyTorch implementation of the Complex Steerable Pyramid

batch computer-vision cuda image-processing mkl pyramid pytorch

Last synced: 04 May 2025

https://github.com/bruce-lee-ly/cuda_hgemv

Several optimization methods of half-precision general matrix vector multiplication (HGEMV) using CUDA core.

cublas cuda cuda-core gemm gemv gpu hgemm hgemv matrix-multiply nvidia tensor-core

Last synced: 17 Jun 2025

https://github.com/open-atmos/PySDM

Pythonic particle-based (super-droplet) warm-rain/aqueous-chemistry cloud microphysics package with box, parcel & 1D/2D prescribed-flow examples in Python, Julia and Matlab

atmospheric-modelling atmospheric-physics cuda gpu gpu-computing monte-carlo-simulation numba nvrtc particle-system physics-simulation pint pypi-package python research simulation thrust

Last synced: 04 Apr 2025

https://github.com/jpuigcerver/pytorch-baidu-ctc

PyTorch bindinga for Baidu's Warp-CTC

ctc-loss cuda pytorch

Last synced: 11 Apr 2025

https://github.com/rapidsai/nx-cugraph

GPU Accelerated Backend for NetworkX

cuda graph networkx rapids

Last synced: 17 Mar 2026

https://github.com/bokutotu/zenu

A Deep Learning framework with very few dependencies, Written in Rust

ai autograd blas cublas cuda cudnn deep-learning deep-neural-networks gpu-computing hpc rust

Last synced: 05 Apr 2025

https://github.com/Bruce-Lee-LY/cuda_hgemv

Several optimization methods of half-precision general matrix vector multiplication (HGEMV) using CUDA core.

cublas cuda cuda-core gemm gemv gpu hgemm hgemv matrix-multiply nvidia tensor-core

Last synced: 14 May 2025

https://github.com/dakenf/stable-diffusion-nodejs

GPU-accelerated javascript runtime for StableDiffusion. Uses modified ONNX runtime to support CUDA and DirectML.

cuda directml nodejs stable-diffusion typescript

Last synced: 26 Oct 2025

https://github.com/fynv/thrustrtc

CUDA tool set for non-C++ languages that provides similar functionality like Thrust, with NVRTC at its core.

cuda nvrtc thrust

Last synced: 07 Apr 2025

https://github.com/wizyoung/optical-flow-gpu-docker

Compute dense optical flow using TV-L1 algorithm with NVIDIA GPU acceleration.

cuda gpu optical-flow tvl1

Last synced: 07 May 2025

https://github.com/moinfra/sylvan

🌳 An educational modern C++ deep learning framework supporting CUDA

autograd cuda deep-learning-framework dnn machine-learning transformer

Last synced: 14 Jun 2026

https://github.com/saddam213/llamastack

ASP.NET Core Web, WebApi & WPF implementations for LLama.cpp & LLamaSharp

alpaca chatgpt cuda huggingface llama llama2 llamacpp llamasharp llm

Last synced: 30 Sep 2025

https://github.com/enp1s0/ozimmu

FP64 equivalent GEMM via Int8 Tensor Cores using the Ozaki scheme

cuda gemm mixed-precision tensorcore tensorcores

Last synced: 09 Apr 2025

https://github.com/kibae/pg_onnx

pg_onnx: ONNX Runtime integrated with PostgreSQL. Perform ML inference with data in your database.

ai contributions-welcome cuda deep-learning inference machine-learning onnx onnxruntime postgresql postgresql-extension

Last synced: 28 Apr 2026

https://github.com/goldsborough/k-means

Code accompanying my blog post on k-means in Python, C++ and CUDA

cpp cuda k-means machine-learning parallel python

Last synced: 12 Oct 2025

https://github.com/denzp/rust-ptx-builder

Convenient `build.rs` helper for NVPTX crates

cuda nvptx rust

Last synced: 16 Mar 2025

https://github.com/brickray/gpu-pathtracer

physically based path tracer on gpu

cuda gpu pathtracing raytracing tracing

Last synced: 08 May 2025

https://github.com/xiaosong9905/cuda-optimization-guide

Xiao's CUDA Optimization Guide [Active Adding New Contents]

cuda gpu hpc nvidia-gpu optimization parallel-computing

Last synced: 15 May 2025

https://github.com/jeng1220/openacc_fortran_examples

Simple OpenACC Fortran Examples

cuda fortran openacc

Last synced: 22 Mar 2025

https://github.com/enp1s0/ozIMMU

FP64 equivalent GEMM via Int8 Tensor Cores using the Ozaki scheme

cuda gemm mixed-precision tensorcore tensorcores

Last synced: 04 Apr 2025

https://github.com/pikselkroken/pixlstash

PixlStash is a Python-based image management, tagging and editing web app leveraging AI tools for tagging, similarity checks, grouping and quality assessment. It has a REST-API and a VUE-based web frontend.

captioning-images comfyui comfyui-workflow cross-platform cuda docker-image image-classification image-database image-editing image-management image-manager image-tagging locally-hosted machine-learning picture-management pictures python self-hosted stable-diffusion vue

Last synced: 07 Jun 2026

https://github.com/adamtiger/tinyGPUlang

Tutorial on building a gpu compiler backend in LLVM

cuda llvm

Last synced: 31 Mar 2026

https://github.com/stereolabs/zed-docker

Docker images for the ZED SDK

cuda docker nvidia-docker zed-camera

Last synced: 22 Feb 2026

https://github.com/rokibulislaam/colab-ffmpeg-cuda

FFmpeg build with CUDA support for Linux (especially for Google Colab)

colab-notebook cuda ffmpeg ffmpeg-installer h264 h265 hevc-encoder nvenc ubuntu1804

Last synced: 14 Apr 2025

https://github.com/khrylx/dsgpuraytracing

A GPU-based ray tracer using CUDA

cuda gpu raytracer raytracing

Last synced: 11 Jul 2025

https://github.com/Natsu-Akatsuki/RangeNet-TensorRT

Rangenet++ with high-version TensorRT (e.g.8~10), libtorch, CUDA programming.

cuda libtorch semantic-segmentation tensorrt

Last synced: 31 Jul 2025

https://github.com/loeeeee/immich-in-lxc

Install Immich in LXC with optional CUDA support

bare-metal cuda guide immich install-script lxc machine-learning proxmox-ve ubuntu

Last synced: 01 Oct 2025

https://github.com/luigifcruz/blade

Beamforming & Stuff ™

astronomy cuda dsp gpu

Last synced: 22 Jan 2026

https://github.com/juliafolds/foldscuda.jl

Data-parallelism on CUDA using Transducers.jl and for loops (FLoops.jl)

cuda gpu high-performance iterators julia map-reduce parallel transducers

Last synced: 13 Apr 2025

https://github.com/DefTruth/ffpa-attn-mma

📚[WIP] FFPA: Yet antother Faster Flash Prefill Attention with O(1)⚡️GPU SRAM complexity for headdim > 256, 1.8x~3x↑🎉faster vs SDPA EA.

attention cuda flash-attention mlsys sdpa tensor-cores

Last synced: 09 Oct 2025

https://github.com/forceflow/cuda2glcore

Implementation of Cuda to OpenGL rendering

cuda graphics opengl rendering

Last synced: 21 Jan 2026

https://github.com/Par4All/par4all

Par4All is an automatic parallelizing and optimizing compiler (workbench) for C and Fortran sequential programs

abstract-interpretation automatic-parallelization c99 cuda fortran interprocedural opencl parallelization polyhedral-model

Last synced: 22 Apr 2025

https://github.com/par4all/par4all

Par4All is an automatic parallelizing and optimizing compiler (workbench) for C and Fortran sequential programs

abstract-interpretation automatic-parallelization c99 cuda fortran interprocedural opencl parallelization polyhedral-model

Last synced: 10 Apr 2025

https://github.com/emptysoal/cuda-image-preprocess

Speed up image preprocess with cuda when handle image or tensorrt inference

cnn cuda cuda-demo cuda-kernels cuda-programming deep-learning image-processing tensorrt

Last synced: 01 Aug 2025

https://github.com/rbaygildin/learn-gpgpu

Algorithms implemented in CUDA + resources about GPGPU

cublas cuda curand gpgpu gpu gpu-computing image-processing nvidia opencl parallel-computing pycuda

Last synced: 14 May 2025

https://github.com/3dlg-hcvc/m3dref-clip

[ICCV 2023] Multi3DRefer: Grounding Text Description to Multiple 3D Objects

3d clip computer-vision cuda deep-learning localization pytorch pytorch-lightning transformer visual-grounding

Last synced: 04 Aug 2025

https://github.com/ingonyama-zk/fast-danksharding

Danksharding Builder with GPU acceleration

cuda danksharding icicle rust

Last synced: 10 Apr 2025

https://github.com/ctuning/ctuning-programs

Collective Knowledge extension with unified and customizable benchmarks (with extensible JSON meta information) to be easily integrated with customizable and portable Collective Knowledge workflows. You can easily compile and run these benchmarks using different compilers, environments, hardware and OS (Linux, MacOS, Windows, Android). More info:

c collaborative-benchmarking collaborative-optimization collective-knowledge common-benchmarks cpp crowd-benchmarking crowd-tuning cuda customizable-benchmarking fortran json-api json-metadata open-benchmarks opencl reproducible-research reproducible-workflows

Last synced: 10 Jan 2026

https://github.com/q-minh/physicsbasedanimationtoolkit

Cross-platform C++ library of algorithms and data structures commonly used in computer graphics research on physically-based simulation with Python bindings.

animation cmake cpp cuda gpu graphics physics python simulation

Last synced: 13 May 2025

https://github.com/stellar-group/octotiger

Astrophysics program simulating the evolution of star systems based on the fast multipole method on adaptive Octrees

astrophysics cuda cuda-kernels hpx kokkos simd stellar-mergers sycl

Last synced: 04 Jul 2025

https://github.com/yehengchen/ubuntu-deep-learning-environment-setup

Guide to installing Tensorflow with NVIDIA GPU and Deep learning enviroment - Nvidia Drivers/cuda/cuDNN/tensorflow-gpu/中文文档

cuda cudnn deep-learning nvidia-gpu tensorflow tensorflow-gpu ubuntu

Last synced: 05 May 2025

https://github.com/projectphysx/ptxprofiler

A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis.

cuda gpu gpu-acceleration gpu-computing gpu-programming hpc nvidia nvidia-cuda nvidia-gpu opencl profiler ptx ptx-utils roofline-model sycl

Last synced: 10 Sep 2025

https://github.com/ProjectPhysX/PTXprofiler

A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis.

cuda gpu gpu-acceleration gpu-computing gpu-programming hpc nvidia nvidia-cuda nvidia-gpu opencl profiler ptx ptx-utils roofline-model sycl

Last synced: 04 Apr 2025

https://github.com/jefflarkin/openacc-interoperability

Interoperability examples for OpenACC.

cuda fortran gpu openacc

Last synced: 31 Jul 2025

https://github.com/1ytic/warp-rna

Recurrent Neural Aligner

cuda forward-backward rna rnn-transducer

Last synced: 14 Aug 2025

https://github.com/andi611/apriori-and-eclat-frequent-itemset-mining

Implementation of the Apriori and Eclat algorithms, two of the best-known basic algorithms for mining frequent item sets in a set of transactions, implementation in Python.

apriori apriori-algorithm cuda data-mining data-mining-algorithms eclat eclat-algorithm frequent-itemset-mining frequent-itemsets frequent-pattern-mining gcc gpu gpu-acceleration gpu-programming plot pycuda python transaction transactions

Last synced: 13 Apr 2025

https://github.com/kevinzakka/learn-cuda

Learning some parallel programming with CUDA

cuda gpu

Last synced: 24 Mar 2025

https://github.com/denzp/rust-ptx-linker

The missing puzzle piece for NVPTX experience with Rust

cuda linker llvm nvptx rust

Last synced: 16 Mar 2025

https://github.com/eth-cscs/spfft

Sparse 3D FFT library with MPI, OpenMP, CUDA and ROCm support

cuda fft fft-library gpu-acceleration hpc mpi rocm

Last synced: 17 Jun 2025

https://github.com/abraham-ai/eden

Eden converts your python function into a hosted endpoint with minimal changes to your existing code :mage_man:

celery cuda fastapi python redis-client task-queue

Last synced: 23 Oct 2025

https://github.com/pwhiddy/pybind11-cuda

Template for GPU accelerated python libraries

cuda gpu numpy pybind11 python

Last synced: 13 Apr 2025

https://github.com/abhisheknair10/llama3.cu

Lightweight Llama 3 8B Inference Engine in CUDA C

cuda llama llm-inference

Last synced: 14 Apr 2025

https://github.com/STEllAR-GROUP/octotiger

Astrophysics program simulating the evolution of star systems based on the fast multipole method on adaptive Octrees

astrophysics cuda cuda-kernels hpx kokkos simd stellar-mergers sycl

Last synced: 04 Apr 2025

https://github.com/passionlab/openequivariance

OpenEquivariance: a fast, open-source GPU JIT kernel generator for the Clebsch-Gordon Tensor Product.

cuda equivariance geometric-deep-learning graph-neural-networks sparse-tensors

Last synced: 16 Jan 2026

https://github.com/gangliao/bazel.cmake

bazel.cmake mimics the behavior of bazel to simplify the usability of CMake

bazel cmake cpp11 cuda golang

Last synced: 26 Jul 2025

https://quokka-astro.github.io/quokka/

Two-moment AMR radiation hydrodynamics (with self-gravity, particles, and chemistry) on CPUs/GPUs for astrophysics

adaptive-mesh-refinement astrochemistry astrophysics cuda gpu hip hydrodynamics particles rocm self-gravity

Last synced: 09 Mar 2025

https://github.com/govertb/GPUGraphLayout

An experimental GPU accelerated implementation of ForceAtlas2

cuda forceatlas2 gephi graph-algorithms graph-layout social-network-analysis visualization

Last synced: 04 Apr 2025

https://github.com/js1010/cusim

Superfast CUDA implementation of Word2Vec and Latent Dirichlet Allocation (LDA)

cuda gensim gpu lda topic-modeling w2v word-embedding

Last synced: 30 Apr 2025

https://github.com/flatironinstitute/jaxmg

JAXMg: A multi-GPU linear solver in JAX

cuda distributed-computing jax

Last synced: 17 Mar 2026

https://github.com/AstroAccelerateOrg/astro-accelerate

AstroAccelerate is a many-core accelerated software package for processing time-domain radio-astronomy data.

cuda gpu radio-astronomy

Last synced: 31 Mar 2025

https://github.com/safeailab/zkdl

zkDL, an open source toolkit for zero-knowledge proofs of deep learning powered by CUDA

cuda deep-neural-networks gpu-acceleration privacy-enhancing-technologies zero-knowledge-proof

Last synced: 17 Jan 2026

https://github.com/lucidrains/autoregressive-linear-attention-cuda

CUDA implementation of autoregressive linear attention, with all the latest research findings

artificial-intelligence attention-mechanisms cuda deep-learning linear-attention

Last synced: 09 Oct 2025

https://github.com/opendronemap/pypopsift

Python module for CUDA accelerated SIFT on GPUs

cuda gpu popsift python sift

Last synced: 25 Jun 2025

https://github.com/chiehpower/Setup-deeplearning-tools

Set up CI in DL/ cuda/ cudnn/ TensorRT/ onnx2trt/ onnxruntime/ onnxsim/ Pytorch/ Triton-Inference-Server/ Bazel/ Tesseract/ PaddleOCR/ NVIDIA-docker/ minIO/ Supervisord on AGX or PC from scratch.

agx ci cuda cudnn deep-learning docker installation minio nvidia onnx-simplifier onnx2trt onnxruntime paddleocr pytorch supervisord tensorrt tensorrt-inference-server tesseract-ocr triton-inference-server triton-server

Last synced: 20 Mar 2025

https://github.com/neur1n/x.h

Cross platform C/C++ utilities.

c cpp cross-platform cublas cuda logger logging

Last synced: 14 Jan 2026

https://github.com/autodesk/neon

Multi-GPU Framework for Voxel Grid Computations

cuda gpu gpu-acceleration grid hpc lbm parallel parallel-computing

Last synced: 21 Aug 2025

https://github.com/weft/warp

continuous energy monte carlo neutron transport in general geometries on GPUs

carlo cuda gpu monte monte-carlo neutron transport

Last synced: 04 Apr 2025

https://github.com/sskorol/vosk-api-gpu

Vosk ASR Docker images with GPU for Jetson boards, PCs, M1 laptops and GPC

asr cuda docker gcp gpu jetson jetson-nano jetson-xavier-nx m1 nvidia nvidia-docker vosk vosk-api

Last synced: 23 Mar 2025

https://github.com/dansarie/sboxgates

Program for finding low gate count implementations of S-boxes.

cryptanalysis cuda logic-circuit mpi

Last synced: 21 Feb 2026

https://github.com/andravin/spio

Memory-Efficient CUDA kernels for training ConvNets with PyTorch.

convolutional-neural-networks cuda pytorch

Last synced: 14 Jul 2025

https://github.com/adrianpangithub/houdinipackage

Publish some small parts in my personal daily-used Houdini accessories

city cuda gpu houdini landscape pcg terrain

Last synced: 13 Jul 2025

https://github.com/lwYeo/SoliditySHA3Miner

All-in-one mixed multi-GPU (nVidia, AMD, Intel) & CPU miner solves proof of work to mine supported EIP918 tokens in a single instance (with API).

0xbitcoin amdminer cpuminer cuda ethos gpu-miner gpu-mining gpumining hiveos igpu linux miner nvidia-miner opencl solo-mining windows-10

Last synced: 06 May 2025

https://github.com/bruce-lee-ly/decoding_attention

Decoding Attention is specially optimized for MHA, MQA, GQA and MLA using CUDA core for the decoding stage of LLM inference.

cuda cuda-core decoding-attention flash-attention flashinfer flashmla gpu gqa inference large-language-model llm mha mla mqa multi-head-attention nvidia

Last synced: 19 Aug 2025