An open API service indexing awesome lists of open source software.

CUDA

CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.

https://github.com/phrb/gpu-autotuning

Autotuning NVCC Compiler Parameters, published @ CCPE Journal

autotuning cuda nvcc opentuner

Last synced: 06 Jul 2025

https://github.com/actypedef/mixedgemm

a mixed-precision gemm with quantize and reorder kernel.

cuda inference-acceleration llm mlsys quantization

Last synced: 15 Jun 2025

https://github.com/ai-dock/python

Python docker images for use in GPU cloud and local environments. Includes AI-Dock base for authentication and improved user experience.

ai cuda docker machine-learning python rocm runpod vast

Last synced: 28 Aug 2025

https://github.com/pkestene/incremental-fluids-kokkos

Simple, single-file fluid solvers for learning purposes revisited with parallel programing (Kokkos: OpenMP / Cuda)

cfd cuda kokkos openmp parallel-programming

Last synced: 19 Aug 2025

https://github.com/101001000/tfg-pathtracer

CUDA Path tracing render engine, with MIS and the Disney BRDF

cuda pathtracing raytracing renderer

Last synced: 11 Apr 2025

https://github.com/adamdempsey90/fvm

My finite volume method project. Here I will implement the many pieces of a finite volume method to incorporate into a larger code.

c cfd cuda fvm gpu hydrodynamics

Last synced: 13 Apr 2025

https://github.com/previsionio/damavand

Damavand is a quantum circuit simulator. It can run on laptops or High Performance Computing architectures, such CPU distributed architectures or multi GPU distributed architectures.

cuda distributed-computing hpc multi-gpu multithreading quantum-computing rust simulator

Last synced: 19 Apr 2025

https://github.com/sashakolpakov/dire-rapids

DiRe accelerated by PyTorch, PyKeOps and cuVS

cuda cuda-kernels dimensionality-reduction pykeops pytorch rapidsai t-sne umap

Last synced: 05 Mar 2026

https://github.com/acecoooool/cs344-note

CS344-Note-zh

c cuda gpu

Last synced: 30 Jul 2025

https://github.com/ctknight/fluidsimulator

A CUDA-accelerated SPH Fluid Simulator capable of simulating millions of particles in seconds

computer-animation computer-graphics cuda fluid-simulation hydrostatics simulation-engine

Last synced: 14 Apr 2025

https://github.com/neoblizz/hip_template

🖤 Template for starting HIP/C++ project using CMake with Github Action for CI.

cpp cuda cuda-programming gpgpu gpu hip rocm template-project template-repository

Last synced: 26 Mar 2025

https://github.com/jundaf2/gpu-tensor-permute

permute sequence data on GPU with high bandwidth

cuda gpu-acceleration sequence-to-sequence

Last synced: 13 Apr 2025

https://github.com/ingonyama-zk/icicle-snark

Groth16 over ICICLE

cuda zkp

Last synced: 20 Mar 2025

https://github.com/dansarie/socracked

Performs key-recovery attacks on the SoDark family of algorithms.

cryptanalysis cryptography cuda hf-radio key-recovery

Last synced: 21 Feb 2026

https://github.com/wrathematics/proginfo

A small utility for getting some info post-hoc about a program's run.

cuda gpu nvidia profiler profiling

Last synced: 30 Apr 2025

https://github.com/alexiii/grafen

A performance-effective program for gravity field calculation for layered ellipsoidal density model.

cuda earth-science geophysical-inversions geophysics gravity-field gravity-model inverse-problems

Last synced: 27 Jun 2025

https://github.com/ghost---shadow/near-duplicate-image-detector

CUDA implementation of some perceptual hashing algorithms

cuda image-hashing thrust

Last synced: 29 Oct 2025

https://github.com/aknvictor/culingam

CULiNGAM accelerates LiNGAM analysis on GPUs.

causal-discovery cuda lingam

Last synced: 05 May 2025

https://github.com/raad-labs/raad-video

A high-performance video loading library for machine learning, designed for efficient training data preparation.

cuda machine-learning training-data

Last synced: 17 Oct 2025

https://github.com/amsokol/tensorflow-windows-build-tutorial

Tutorial how to build and install TensorFlow GPU/CPU for Windows from source code using bazel

bazel build cuda gpu sources tensorflow windows

Last synced: 30 Jun 2025

https://github.com/bourbonut/lbm-gpu

The Lattice Boltzmann Method on GPU

cuda cupy gpu hpc numba nvidia python

Last synced: 06 May 2025

https://github.com/harrism/nsys_easy

Easier, quicker command-line CUDA profiling

cuda nsight-systems profiling

Last synced: 15 Oct 2025

https://github.com/vc-bonn/charonload

Develop C++/CUDA extensions with PyTorch like Python scripts

cmake cpp cuda jit python pytorch torch

Last synced: 16 Aug 2025

https://github.com/prg-titech/kani-cuda

A program synthesizer for CUDA like GPGPU language

cuda racket

Last synced: 12 May 2025

https://github.com/bobbui/tensorflow-serving-cuda-docker

Docker image for tensorflow serving with Nvidia CUDA, CuDNN

cuda cudnn docker docker-image tensorflow tensorflow-serving ubuntu1604

Last synced: 09 Apr 2025

https://github.com/tgymnich/shallowwater.jl

🌊 Simple Finite Volumes models that solve the shallow water equations

cuda hpc julia shallow-water-equations simulation tsunami

Last synced: 13 Mar 2025

https://github.com/hevnsnt/collider

GPU-accelerated Bitcoin Puzzle solver using Pollard's Kangaroo algorithm. K=1.15 efficiency. CUDA + Metal.

bitcoin bitcoin-puzzle cryptocurrency cuda ecdlp gpu mining-pool open-source pollard-kangaroo secp256k1

Last synced: 21 May 2026

https://github.com/comcast/rapid-ip-checker

A GPU accelerated tool to compare large lists of IPv4/IPv6 addresses.

cuda ipv4 ipv6 numba parallel

Last synced: 11 Apr 2025

https://github.com/alpaka-group/bactria

Broadly Applicable C++ Tracing and Instrumentation API :camel:

cuda hardware-counters instrumentation-api metrics rocm tracing-events

Last synced: 21 Apr 2025

https://github.com/pkestene/kokkos-proj-tmpl

A minimal cmake based project skeleton for developping a kokkos application

cea cuda gpu kokkos openmp parallel-computing parallelization performance-portability

Last synced: 19 Aug 2025

https://github.com/elsa-lab/base-env

Basis of ELSA computational platform

cuda machine-learning server-utility ubuntu

Last synced: 14 Oct 2025

https://github.com/ikergarcia1996/matrix-benchmark

A cupy (GPU) / numpy benchmark to measure how fast different hardware can perform matrix operations.

benchmark cuda cupy embedding gpu matrix numpy python word-embeddings

Last synced: 05 Oct 2025

https://github.com/josonchan1998/opencv_install

Build OpenCV from sources with cuda in anaconda3

anaconda3 cuda opencv shell-script

Last synced: 12 Oct 2025

https://github.com/zephirfxec/hnanosolver

Houdini GPU Fluid Solver powered by NanoVDB

cpp cuda fluid-dynamics houdini nanovdb openvdb

Last synced: 05 May 2025

https://github.com/lebedov/cudamps

Python interface to CUDA Multi-Process Service

cuda gpu multi-gpu python

Last synced: 02 Mar 2026

https://github.com/nikelborm/amd-amdgpu-rocm-ollama-gfx90c-ati-radeon-vega-ryzen7-5800h-arch-linux

Run Ollama on AMD Ryzen 7 5800H CPU with integrated GPU AMD ATI Radeon Vega (gfx90c) with optimizations

amd amd-gpu amdgpu archlinux avx2 bash bash-scripting cuda linux llama llama3 llm ollama oneapi radeon rocm ssse3 vega

Last synced: 30 Apr 2025

https://github.com/ktaletsk/gpu_dsm

🔗Accessible quantitative polymer rheology predictions with slip-links on GPU

c-plus-plus cuda gpu polymer rheology

Last synced: 10 Sep 2025

https://github.com/NAGAGroup/Scalix

Scalix is a data parallel compute library that automatically scales to the available compute resources.

cuda hpc scientific-computing

Last synced: 01 Apr 2025

https://github.com/jacobtomlinson/advent-of-gpu-code-2020

Solutions for Advent of Code 2020 written for the GPU in Python

advent-of-code cuda gpu jupyter-notebooks numba python

Last synced: 25 Mar 2025

https://github.com/alesiong/template-matching

Simple template matching by GPU (CUDA)

computer-vision cuda template-matching

Last synced: 30 Apr 2025

https://github.com/christophe-foyer/darknet_wsl_cuda_install_scripts

Install scripts for Darknet and OpenCV with CUDA support on WSL

cuda darknet opencv wsl wsl2

Last synced: 31 Jul 2025

https://github.com/dendenxu/bvh-ray-tracing

CUDA Ray Tracing using BVH. Forked and modified from https://github.com/YuliangXiu/bvh-distance-queries

bvh cuda pytorch ray-tracing ray-triangle-intersection

Last synced: 28 Jul 2025

https://github.com/tillahoffmann/universal_tensorflow_image

Develop tensorflow models with or without a GPU accelerator using the same Docker image. 🥳

cuda nvidia-docker tensorflow

Last synced: 12 Jul 2025

https://github.com/dusanerdeljan/stereo-depth

Bachelor thesis - GPU accelerated single view passive stereo depth estimation pipeline

convolutional-neural-networks cuda depth-estimation pytorch real-time stereo-matching stereo-vision

Last synced: 28 Oct 2025

https://github.com/lynncoleart/guda

A High-Performance CPU-Based CUDA-Compatible Linear Algebra Library

ai blas cuda inference llm-inference

Last synced: 04 Mar 2026

https://github.com/neka-nat/cuimage

Rust implementation of image processing library with CUDA

computer-vision cuda rust

Last synced: 13 Apr 2025

https://github.com/jcbritobr/nvml-csharp

Nvml( nvidia monitoring library) wrapper for c#.

csharp cuda gpu library monitoring nvidia nvml

Last synced: 06 Apr 2025

https://github.com/enp1s0/shgemm

Fast multiplication of single-precision and half-precision matrices on Tensor Cores

cuda

Last synced: 31 Jul 2025

https://github.com/aniketsingh03/processing-history-of-images

:bulb: Detecting processing history of images by using Deep Learning

cuda deep-learning image-forensics matlab python3 pytorch

Last synced: 14 Jul 2025

https://github.com/alpindale/kizuna

Fast TTS Library for Kokoro

cuda text-to-speech

Last synced: 14 Apr 2025

https://github.com/mnicely/computeworks_examples

Matrix multiplication example performed with OpenMP, OpenACC, BLAS, cuBLABS, and CUDA

blas cublas cuda docker eclipse-plugin nsight nvidia nvidia-docker openacc openmp pgi-compiler

Last synced: 14 Apr 2025

https://github.com/wi2trier/gpu-server

System configuration for a CUDA-based GPU server using Nix

cuda gpu nix server system-config ubuntu

Last synced: 17 Jan 2026

https://github.com/bryanoliveira/cellular-automata

A cellular automata program built with C++, OpenGL, CUDA and OpenMP.

cellular-automata cuda life opengl openmp

Last synced: 08 Sep 2025

https://github.com/elftausend/gradients

Deep Learning library written in Rust (OpenCL, CUDA & CPU)

cpu cuda deep-learning gpu gpu-acceleration machine-learning mlp neural-networks opencl rust

Last synced: 11 Apr 2025

https://github.com/drsnowbird/cuda-pytorch-docker

Nvidia CUDA for GPU + PyTorch (latest) in Docker

cuda deep-learning docker gpu jupyter-notebook nvidia-gpu pytorch ssl-proxy

Last synced: 10 Apr 2025

https://github.com/hejia-zhang/libwave

C++ library for hardware-accelerated video stream decoding

cuda ffmpeg gpu video-decoding video-streaming

Last synced: 15 Apr 2025

https://github.com/kerneltuner/kernel_float

CUDA/HIP header-only library writing vectorized and low-precision (16 bit, 8 bit) GPU kernels

bfloat16 cpp cuda floating-point gpu half-precision header-only-library hip kernel-tuner low-precision mixed-precision performance reduced-precision vectorization

Last synced: 12 Apr 2025

https://github.com/Christophe-Foyer/darknet_wsl_cuda_install_scripts

Install scripts for Darknet and OpenCV with CUDA support on WSL

cuda darknet opencv wsl wsl2

Last synced: 12 Mar 2025

https://github.com/joaomlneto/cpds-heat

Heat Equation using different solvers (Jacobi, Red-Black, Gaussian) in C using different paradigms (sequential, OpenMP, MPI, CUDA) - Assignments for the Concurrent, Parallel and Distributed Systems course @ UPC 2013

cuda cuda-support gauss-seidel gaussian heat-equation jacobi mpi mpi-applications openmp openmp-applications openmp-parallelization openmp-support openmpi paradigms performance red-black solvers

Last synced: 22 Apr 2025

https://github.com/microsoft/hat

TOML-annotated C header file format for packaging binary files, from Microsoft Research

benchmarking cpp cprogramming cuda metadata platform-independent python-library rocm toml

Last synced: 10 Apr 2025

https://github.com/guangyancai/isoext

GPU isosurface extraction with pytorch support.

cuda dual-contouring isosurface-extraction marching-cubes nanobind python pytorch thrust

Last synced: 23 Apr 2025

https://github.com/tcoppex/cudaraster-linux

Linux port of cudaraster, Nvidia's GPU rasterizer.

cuda gpu rasterizer

Last synced: 14 Apr 2025

https://github.com/tigercosmos/simple-vgg16-cu

Simple VGG16 implemented in CUDA

cublas cuda cudnn vgg16

Last synced: 24 Jul 2025

https://github.com/nyo16/llama_cpp_ex

Elixir bindings for llama.cpp — run LLMs locally with Metal, CUDA, Vulkan, or CPU. Streaming, chat templates, embeddings, structured output, and concurrent batched inference.

cuda elixir llamacpp llm

Last synced: 04 Jun 2026

https://github.com/neoheartbeats/neoheartbeats-kernel

An architecture for LLMs' continual-learning and long-term memories

cuda fine-tuning llama-factory llm

Last synced: 05 May 2025

https://github.com/koesie10/gpjson

GPU-based JSON data processing system accessible via all GraalVM languages

cuda gpu graalvm json jsonpath

Last synced: 20 Jun 2025

https://github.com/raymondcm/blockmatching

CPU and CUDA implementation of Full Exhaustive Block Matching Algorithm using Integral Images

block-matching-algorithm cuda integral-image parallel vision

Last synced: 27 Apr 2025

https://github.com/aresio/cupsoda

cupSODA is CUDA-powered coarse-grain deterministic simulator of mass-action kinetics models

biochemical cuda gpu-computing mass-action simulation

Last synced: 21 Feb 2026

https://github.com/frgfm/torch-cuda-template

Template for CUDA / C++ extension writing with PyTorch

cpp cuda pytorch pytorch-extension

Last synced: 31 Jul 2025

https://github.com/NCAR/micm

A model-independent chemistry module for atmosphere models

atmospheric-chemistry atmospheric-modeling atmospheric-science cuda gpu gpu-acceleration hpc ode-solver

Last synced: 20 Jul 2025

https://github.com/chrxh/alien-docs

Documentation for ALIEN

cuda evolution physics-simulation simulation

Last synced: 24 Jun 2025

https://github.com/rmiguelkelly/quickcluster

A KMeans implemented in C++ with Python bindings and GPU acceleration

clustering clustering-algorithm cpp cuda gpu kmeans kmeans-clustering metal objective-c python python3 unsupervised-learning

Last synced: 26 Jul 2025

https://github.com/abus-aikorea/aria-coversong

The best gradio web-ui for creating cover song that uses mdx-net and rvc. Easy one click installation. Fully portable.

cuda demucs gradio karaoke mdx-net nvidia python pytorch rvc song-covers uvr vocal-remover voice-conversion

Last synced: 25 Apr 2025

https://github.com/pliablepixels/simpleyolo

A dead simple python wrapper for darknet that works with OpenCV 4.1, CUDA 10.1

cuda darknet opencv python3 yolov3

Last synced: 26 Oct 2025

https://github.com/benediktalkin/kappaprofiler

lightweight simple profiling for python/pytorch

cuda profiler python pytorch

Last synced: 19 Jul 2025

https://github.com/bhattbhavesh91/cudf-rapids-demo

A simple demo of cuDF which is a RAPIDS GPU-Accelerated Dataframe Library!

arrow cuda cudf demo gpu gpu-dataframe pandas python rapids

Last synced: 17 Apr 2025

https://github.com/xiangronglin/grayscale-conversion

grayscale conversion optimized with OpenMP, SIMD and CUDA

cuda grayscale hpc openmp simd

Last synced: 23 Mar 2025

https://github.com/p-ranav/vulkan-earth

Vulkan-based 3D Rendering of Earth

3d cuda engine gpu rendering simulation vulkan

Last synced: 05 May 2025

https://github.com/marcogarlet/cuda_cubeattack

CUDA implementation of Cube Attack

cryptography cubeattack cuda

Last synced: 28 Oct 2025

https://github.com/phineas-pta/nvidia-win

NVIDIA’s deep learning stack on Windows: CUDA toolkit + cuDNN + TensorRT

cuda cudnn guide tensorrt tutorial windows

Last synced: 12 Apr 2025

https://github.com/kabir5296/deep-learning-setup-for-ubuntu-guide

CUDA, CuDNN, NVIDIA Driver, and PyTorch Installation for Ubuntu

cuda cudnn deeplearning nlp python pytorch

Last synced: 15 Mar 2025

https://github.com/yomi4486/zundamon_v3

マスター、お冷ショットで。

cuda discord-bot discord-py docker docker-compose python tts voicevox zundamon

Last synced: 14 Apr 2025

https://github.com/radenmuaz/slope-ad

A small automatic differentiation engine, supporting higher-order derivatives

array autograd automatic-differentiation cuda gradient iree jvp machine-learning metal mlir onnx onnxruntime tensor vjp

Last synced: 26 Jun 2025

https://github.com/hrntsm/ghgpucomputingtest

Test using CUDA with Alea GPU in grasshopper.

cuda grasshopper3d

Last synced: 14 Apr 2025