An open API service indexing awesome lists of open source software.

CUDA

CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.

https://github.com/lucaspar/poetry-torch

Installing hardware-accelerated PyTorch with Poetry on different hardware using the same `pyproject.toml`

cuda hardware-acceleration poetry pyproject-toml python-poetry pytorch torch

Last synced: 19 Oct 2025

https://github.com/llnl/fpchecker

A dynamic analysis tool to detect floating-point errors in HPC applications.

cancellation clang cuda exceptions floating-point floating-point-arithmetic infinity llvm overflow overflow-detection underflow-detection

Last synced: 21 May 2026

https://github.com/r00tman/eventhands

Real-Time Neural 3D Hand Pose Estimation from an Event Stream [ICCV 2021]

computer-vision cuda dataset deep-learning event-camera hand-pose hand-pose-estimation hand-tracking iccv2021 mano opengl pytorch smpl

Last synced: 11 Apr 2025

https://github.com/CAPS-UMU/FIDESlib

A server-side CKKS GPU library fully interoperable with OpenFHE.

ckks cuda gpu homomorphic-encryption openfhe

Last synced: 11 Apr 2026

https://github.com/harrism/ranger

Generate simple index ranges in C++ and CUDA C++

cpp cuda loops ranges

Last synced: 22 Mar 2025

https://github.com/shrec/ultrafastsecp256k1

Ultra high-performance secp256k1 ECC engine | Python, Node.js, Rust, Go, C#, Swift, Java bindings | CUDA, Metal, OpenCL GPU | ECDSA, Schnorr, FROST, MuSig2, BIP-352 | 15+ platforms

android arm64 bitcoin constant-time cryptography cuda ecdsa embedded ethereum ffi gpu-cryptography ios nodejs opencl python riscv rust schnorr-signatures secp256k1 webassembly

Last synced: 04 Jun 2026

https://github.com/termoshtt/link_cuda_kernel

HowTo: Compile CUDA with nvcc, and link to Rust

cuda nvcc rust

Last synced: 24 Apr 2025

https://github.com/hpi-epic/gpucsl

Constraint-based Causal Structure Learning on GPUs.

csl cuda cupy gpu-computing python

Last synced: 30 Oct 2025

https://github.com/mravanelli/pytorch_mlp_for_asr

This code implements a basic MLP for speech recognition. The MLP is trained with pytorch, while feature extraction, alignments, and decoding are performed with Kaldi. The current implementation supports dropout and batch normalization. An example for phoneme recognition using the standard TIMIT dataset is provided.

asr cuda deep-learning deep-neural-networks feedforward-neural-network kaldi kaldi-asr mlp multilayer-perceptron neural-networks python pytorch speech-recognition timit

Last synced: 27 Jul 2025

https://github.com/bacpop/mandrake

Mandrake 🌿/👨‍🔬🦆 – Fast visualisation of the population structure of pathogens using Stochastic Cluster Embedding

cuda embedding genomics gpu pathogens

Last synced: 30 Oct 2025

https://github.com/mravanelli/pytorch_MLP_for_ASR

This code implements a basic MLP for speech recognition. The MLP is trained with pytorch, while feature extraction, alignments, and decoding are performed with Kaldi. The current implementation supports dropout and batch normalization. An example for phoneme recognition using the standard TIMIT dataset is provided.

asr cuda deep-learning deep-neural-networks feedforward-neural-network kaldi kaldi-asr mlp multilayer-perceptron neural-networks python pytorch speech-recognition timit

Last synced: 19 Jul 2025

https://github.com/marianhlavac/fft-cuda

Fast Fourier Transform implementation, computable on CUDA platform. Seminar project for MI-PRC course at FIT CTU.

c-plus-plus coursework cuda fast-fourier-transform fit-ctu nvidia python

Last synced: 14 Apr 2025

https://github.com/AFD-Illinois/kharma

Kokkos-based High-Accuracy Relativistic Magnetohydrodynamics with AMR

cuda gpu grmhd hip kokkos mhd openmp sycl

Last synced: 04 Apr 2025

https://github.com/outofai/cudacanvas

Python Module for PyTorch Tensor Visualisation in CUDA Eliminating CPU Transfer

cuda deep-learning gpu machine-learning neural-network plotting pypi pypi-package python pytorch tensor visulization

Last synced: 09 Sep 2025

https://github.com/nvidia/numbast

Numbast is a tool to build an automated pipeline that converts CUDA APIs into Numba bindings.

cuda numba

Last synced: 25 Mar 2025

https://github.com/countzero/windows_llama.cpp

PowerShell automation to rebuild llama.cpp for a Windows environment.

cmake conda cuda llama-cpp openblas powershell windows

Last synced: 26 Apr 2026

https://github.com/quantumbfs/cuyao.jl

CUDA extension for Yao.jl

circuit cuda gpu quantum yao

Last synced: 07 Apr 2025

https://github.com/differentiableuniverseinitiative/jaxdecomp

JAX bindings for the NVIDIA cuDecomp library

cuda hpc jax xla

Last synced: 08 May 2025

https://github.com/bruce-lee-ly/flash_attention_inference

Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.

cuda cutlass flash-attention flash-attention-2 gpu inference large-language-model llm mha multi-head-attention nvidia tensor-core

Last synced: 25 Aug 2025

https://github.com/prg-titech/dynasoar

CUDA Dynamic Memory Allocator for SOA Data Layout

cuda memory-allocation simd smmo

Last synced: 12 May 2025

https://github.com/kwea123/python-ray-tracing-with-cuda-example

An example of cuda ray tracing in pure python syntax.

cuda numba python ray-tracing

Last synced: 26 Mar 2025

https://github.com/gangliao/vs-code-cuda

support cuda grammars in Visual Studio Code

cuda vs vs-code vscode-extension

Last synced: 26 Jul 2025

https://github.com/jnbntz/gpu-edu-workshops

Code examples for CUDA and OpenACC

cuda gpu openacc

Last synced: 04 Mar 2025

https://github.com/pinto0309/facemesh_onnx_tensorrt

Verify that the post-processing merged into FaceMesh works correctly. The object detection model can be anything other than BlazeFace. YOLOv4 and FaceMesh committed to this repository have modified post-processing.

cuda facemesh onnx python tensorrt

Last synced: 30 Apr 2025

https://github.com/gangliao/VS-Code-Cuda

support cuda grammars in Visual Studio Code

cuda vs vs-code vscode-extension

Last synced: 11 Mar 2025

https://github.com/SafeAILab/zkDL

zkDL, an open source toolkit for zero-knowledge proofs of deep learning powered by CUDA

cuda deep-neural-networks gpu-acceleration privacy-enhancing-technologies zero-knowledge-proof

Last synced: 01 May 2025

https://github.com/quim0/wfa-gpu

GPU implementation of the Wavefront Alignment Algorithm for global, gap-affine, pairwise sequence alginment

bioinformatics cuda gpu pairwise-alignment sequence-alignment

Last synced: 21 Jun 2025

https://github.com/wdmapp/gtensor

GTensor is a multi-dimensional array C++14 header-only library for hybrid GPU development.

cpp cpp14 cuda gpu hacktoberfest rocm sycl

Last synced: 04 Apr 2025

https://github.com/pkestene/euler_kokkos

Compressible hydro and magneto-hydrodynamics (2nd order Godunov) implemented with MPI+Kokkos

cea cfd cmake cpp cuda finite-volume finite-volume-method fluid-dynamics gpu kokkos magnetohydrodynamics mpi parallel-computing parallelism performance-portability

Last synced: 19 Aug 2025

https://github.com/rust-av/nvidia-video-codec-rs

Bindings for the NVIDIA Video Codec SDK

cuda cuvid nvenc nvidia rust rust-av

Last synced: 16 May 2025

https://github.com/eth-cscs/tiled-mm

Matrix multiplication on GPUs for matrices stored on a CPU. Similar to cublasXt, but ported to both NVIDIA and AMD GPUs.

amd cublas cublasxt cuda gpu matmul matrix-multiplication nvidia rocblas rocblasxt rocm

Last synced: 19 Jul 2025

https://github.com/slyautomation/osrs_yolov5

Yolov5 Object Detection In OSRS using Python code, Detecting Cows - Botting

botting cuda machine-learning mlbot osrs pycharm python pytorch runescape yolov5

Last synced: 23 Oct 2025

https://github.com/davidalgis/interopunitycuda

Demonstrate interoperability between Unity Engine and CUDA

cpp cuda dx11 gpu gpu-acceleration native-plugin opengl unity unity3d

Last synced: 26 Feb 2026

https://github.com/rightnow-ai/automegakernel

An agent harness that compiles a model into one provably-correct, self-retargeting CUDA megakernel and self-tunes it past cuBLAS at batch-1 LLM decode.

agent-harness cuda gpu gpu-programming kernel-fusion llm-inference machine-learning megakernel mlsys

Last synced: 15 Jun 2026

https://github.com/blurgyy/jaxngp

JAX implementation of instant-ngp (NeRF part)

cuda hashgrid instant-ngp jax nerf neural-radiance-field nix python wsl

Last synced: 09 Apr 2025

https://github.com/LatticeQCD/SIMULATeQCD

SIMULATeQCD is a multi-GPU Lattice QCD framework that makes it easy for physicists to implement lattice QCD formulas while still providing competitive performance.

cuda gpu hip hpc lattice lattice-qcd mpi parallel physics

Last synced: 26 Mar 2025

https://github.com/sdpython/onnx-extended

New operators for the ReferenceEvaluator, new kernels for onnxruntime, CPU, CUDA

cuda machine-learning onnx onnxruntime python

Last synced: 11 Oct 2025

https://github.com/pkestene/euler2d_cudafortran

2nd order Godunov solver for 2d Euler equations written in CUDA Fortran and stdpar (standard paralelism)

cea conservation-laws cuda cuda-fortran euler-equations fortran gpu gpu-computing hydrodynamics nvfortran nvhpc stdpar

Last synced: 19 Aug 2025

https://github.com/baderlab/ecuda

STL-like containers (array, vector, matrix, cube) useable in device code.

c-plus-plus cuda

Last synced: 15 Jun 2025

https://github.com/mumax/plus

More versatile and extensible GPU-accelerated micromagnetic simulator

cpp cuda finite-difference-time-domain gpu-computing micromagnetics python scientific-computing

Last synced: 20 Apr 2026

https://github.com/ashermancinelli/cxbqn

BQN virtual machine

apl bqn cplusplus cpp20 cuda

Last synced: 04 Sep 2025

https://github.com/niftypet/nipet

High-throughput PET image reconstruction with high quantitative accuracy and precision

analysis cuda gpu image-reconstruction medical-imaging mlem pet processing python

Last synced: 10 Sep 2025

https://github.com/rodrgo/OpenPH

Parallel reduction of boundary matrices for Persistent Homology with CUDA

cuda gpu-computing numerical-computation parallel-computing persistent-homology topological-data-analysis

Last synced: 01 May 2025

https://github.com/lddl/object-detection-opencv-rust

Just set of functions to utilize YOLO v3, v4, v7 and v8 version with OpenCV's DNN module

computer-vision cuda object-detection onnx onnxruntime opencv yolov3 yolov4 yolov7 yolov8

Last synced: 11 May 2026

https://github.com/williamvenner/squad-mortar-helper

💣 SMH – a computer vision project for automatic, precision mortar strike calculations in Squad

computer-vision cuda cv gpu mortar mortars rust smh squad squad-game squadgame

Last synced: 08 May 2025

https://github.com/coderonion/cuda-beginner-course-cpp-version

bilibili视频【CUDA 12.x 并行编程入门(C++版)】配套代码

cpp cublas cuda cuda-programming cudnn gpu gpu-programming nvcc nvidia parallel-programming python rust

Last synced: 15 Jun 2025

https://github.com/jundaf2/eigenmha

Forward and backward Attention DNN operators implementationed by LibTorch, cuDNN, and Eigen.

backpropagation cuda cudnn cudnn-v8 dnn inference pytorch

Last synced: 13 Apr 2025

https://github.com/optinsoft/gen_eth

Generate ethereum address with GPU (cuda)

address cuda ethereum generate gpu keccak256 secp256k1

Last synced: 16 Jan 2026

https://github.com/gjbex/python-for-hpc

Repository for participants of the "Python for HPC" training

cuda cython dask gpu hpc mpi numba python python-training scientific-computing swig training

Last synced: 13 Jul 2025

https://github.com/xapajiamnu/glm

A GPU language model, based on btree backed tries.

c-plus-plus cuda fast gpu language-model

Last synced: 13 Apr 2025

https://github.com/seonglae/llama2gptq

Chat to LLaMa 2 that also provides responses with reference documents over vector database. Locally available model using GPTQ 4bit quantization.

chatai chatbot chatgpt cuda gpt langchain llama-2 llama2 model-quantization quantization question-answering rye streamlit-chat transformers

Last synced: 22 Apr 2025

https://github.com/juliagpu/nvtx.jl

Julia bindings for NVTX, for instrumenting with the Nvidia Nsight Systems profiler

cuda julia nsys nvtx profiling

Last synced: 30 Apr 2025

https://github.com/ahdhn/cudatemplate

Template for starting CUDA/C++ project using CMake with Github Action for CI

cmake cuda template

Last synced: 19 Apr 2025

https://github.com/idealab-isu/GPView

GPU Accelerated Voxelization Framework for 3D CAD models.

cpp cuda gpu voxelization

Last synced: 20 Mar 2025

https://github.com/mnicely/nvml_examples

Examples showing how to utilize the NVML library for GPU monitoring

cublas cuda nvidia nvml

Last synced: 14 Apr 2025

https://github.com/jimouris/parallel-convolution

🖼️ Parallel Image Convolution, applying a blur filter to images. Written in C, optimized in three different ways: MPI, MPI & OpenMP and CUDA.

blur-filter cuda image-convolution image-processing mpi mpi-library parallel-processing

Last synced: 08 Apr 2025

https://github.com/dereklstinson/gocudnn

Go Bindings for cudnn and other cuda pacakges.

convolutional-neural-networks cuda cudnn go golang machine-learning neural-network

Last synced: 24 Jul 2025

https://github.com/anibali/docker-torch

A Docker image for Lua Torch

cuda docker docker-image lua torch

Last synced: 28 Oct 2025

https://github.com/johnh2o2/cuvarbase

Python library for fast time-series analysis on CUDA GPUs

cuda fourier-methods gpu gpu-computing lomb-scargle-periodogram nfft python python-3 time-series

Last synced: 07 May 2025

https://github.com/tier4/autoware_nova_carter

Integration of NVIDIA Nova Carter with Autoware

amr autonomous-driving autoware cuda nvidia nvidia-jetson ros2

Last synced: 07 Mar 2026

https://github.com/ashvardanian/cuda-python-starter-kit

Parallel Computing starter project to build GPU & CPU kernels in CUDA & C++ and call them from Python without a single line of CMake using PyBind11

cmake cuda cuda-programming hip hpc matrix-multiplication openmp parallel-computing parallel-programming pybind pybind11 python starter-kit starter-template tutorial

Last synced: 13 Jul 2025

https://github.com/eth-cscs/spla

Specialized Parallel Linear Algebra, providing distributed GEMM functionality for specific matrix distributions with optional GPU acceleration.

cuda gemm linear-algebra mpi rocm

Last synced: 14 Apr 2025

https://github.com/anderson101866/cualgo

A cross-platform Pytnon library for fundamental algorithm with GPU-accelerated computing

algorithm cuda gpu gpu-acceleration gpu-computing numpy python

Last synced: 14 Jul 2025

https://github.com/tjyuyao/cutex

PyCUDA based PyTorch Extension Made Easy

cuda customized extension operator pycuda pytorch

Last synced: 13 Sep 2025

https://github.com/akhuntsaria/canny-edge-detection

Canny edge detector implemented in CUDA C/C++

cuda image-processing video-processing

Last synced: 17 Jan 2026

https://github.com/fclc/multi-plexer

Goal: Low power cluster capable of serving 24+ streams of 4KHDR60 source transcodes while consuming no more than 100W at peak and idling at less than 10W

arm64 clustering cuda decoding encoding ffmpeg hardware hdr jetson jetson-nano jetson-xavier-nx opencl plex pocl raspberry-pi-4 rockpro64 transcode zfs zfsonlinux

Last synced: 14 Aug 2025

https://github.com/bigmat18/cuda-mesh-voxelization

GPU-accelerated pipeline for robust 3D mesh Boolean operations (CSG) using voxelized Signed Distance Fields (SDFs).

computer-graphics cuda hpc mesh-processing voxel

Last synced: 15 May 2026

https://github.com/adamtiger/tinygpulang

Tutorial on building a gpu compiler backend in LLVM

cuda llvm

Last synced: 25 Feb 2026

https://github.com/kerkelae/disimpy

Massively parallel Monte Carlo diffusion MR simulator written in Python.

cuda diffusion-mri gpu-computing monte-carlo-simulation

Last synced: 12 Apr 2025

https://github.com/eyalroz/gpu-kernel-runner

Runs a single CUDA/OpenCL kernel, taking its source from a file and arguments from the command-line

cuda debugging-tool gpgpu gpu gpu-kernel-performance gpu-kernels multi-language opencl performance-analysis performance-testing profiling runner

Last synced: 22 Jan 2026

https://github.com/whizzzkid/opencv-complete-build-cuda

Full build script for Open CV with/without cuda and bumblebee support

build build-automation cuda cudnn install installer opencv opencv-library opencv-python opencv3 opencv3-python

Last synced: 17 Nov 2025

https://github.com/unsalted/docker-nheqminer-cuda

CUDA capable docker image of nheqminer (zcash/equihash miner)

blockchain cuda docker docker-image gpu nheqminer nicehash nvidia nvidia-docker zcash

Last synced: 04 Apr 2026

https://github.com/m-pilia/disptools

Generate displacement fields with known volume changes

cuda image-processing jacobian python3

Last synced: 15 Oct 2025

https://github.com/pdziepak/ranges-gpu

Experimental ranges for CUDA

c-plus-plus cuda range

Last synced: 12 Apr 2025

https://github.com/gvaliente/pcps

CPU and GPU point cloud plane segmentation

cpp cpp11 cuda opencl pcl thrust

Last synced: 11 Jul 2025

https://github.com/tristanbilot/mlx-gcn

MLX implementation of GCN, with benchmark on MPS, CUDA and CPU (M1 Pro, M2 Ultra, M3 Max).

apple cuda deep-learning gnn mlx pytorch

Last synced: 13 Sep 2025

https://github.com/omlins/libdiffusion

Proof of Concept: a C-callable GPU-enabled parallel 2-D heat diffusion solver written in Julia using CUDA, MPI and graphics

c cuda distributed julia mpi mult-gpu

Last synced: 04 Apr 2025