An open API service indexing awesome lists of open source software.

CUDA

CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.

https://github.com/sandialabs/lgrtk

Tool Kit for Lagrangian Grid Reconnection

cuda gpu hpc physics sandia-national-laboratories scr-2300 snl-applications

Last synced: 02 May 2025

https://github.com/juliagpu/nccl.jl

A Julia wrapper for the NVIDIA Collective Communications Library.

cuda gpu julia nccl

Last synced: 20 Sep 2025

https://github.com/sgl-project/whl

SGLang Kernel Wheel Index

cuda cutlass flashinfer sglang

Last synced: 12 Jun 2026

https://github.com/tree-sitter-grammars/tree-sitter-cuda

CUDA grammar for tree-sitter

cuda parser tree-sitter

Last synced: 30 Dec 2025

https://github.com/amusi/sift-gpu

A CUDA implementation of SIFT

cuda feature-detection gpu keypoints-detector sift

Last synced: 25 Mar 2025

https://github.com/iowar/kecmatch-gpu

Finds matching solidity function signatures using GPU

cuda keccak256 solidity

Last synced: 15 Jul 2025

https://github.com/shrec/UltrafastSecp256k1

Ultra high-performance secp256k1 ECC library | C++20 | CUDA, Metal, OpenCL, ROCm, WASM | Apple Silicon M1-M4 | 15+ platforms | Branchless, allocation-free hot paths

android arm64 bitcoin constant-time crypto cryptocurrency cryptography cuda ecc ecdsa embedded ethereum gpu-cryptography ios opencl performance riscv schnorr-signatures secp256k1 webassembly

Last synced: 03 Apr 2026

https://github.com/SomeoneSerge/nixpkgs-cuda-ci

Building and caching nixpkgs with cudaSupport=true. We push to https://cuda-maintainers.cachix.org/

computer-vision cuda deep-learning nix nixpkgs

Last synced: 08 Aug 2025

https://github.com/ktaletsk/NCCV

Short course on computer vision and image processing using Numba+CUDA+OpenCV

computer-vision cuda jupyter-notebook numba

Last synced: 09 May 2025

https://github.com/sevagh/zen

optimized realtime harmonic/percussive source separation using the GPU (NVIDIA CUDA) and CPU (Intel IPP)

audio cuda digital-signal-processing dsp real-time source-separation thrust

Last synced: 13 Apr 2025

https://github.com/ENOT-AutoDL/ONNX-Runtime-with-TensorRT-and-OpenVINO

Docker scripts for building ONNX Runtime with TensorRT and OpenVINO in manylinux environment

cuda nvidia onnx onnxruntime openvino tensorrt

Last synced: 20 Mar 2025

https://github.com/kostyaev/sentence2vec

Deep sentence embedding using Sequence to Sequence learning

cuda sentence2vec seq2seq torch

Last synced: 21 Mar 2025

https://github.com/illuhad/hipcpu

Implementation of AMD HIP for CPUs

cuda gpgpu hip hpc openmp openmp-parallelization

Last synced: 16 Apr 2025

https://github.com/TristanBilot/mlx-GCN

MLX implementation of GCN, with benchmark on MPS, CUDA and CPU (M1 Pro, M2 Ultra, M3 Max).

apple cuda deep-learning gnn mlx pytorch

Last synced: 27 Mar 2025

https://github.com/ptsolvers/chmy.jl

Finite differences and staggered grids on CPUs and GPUs

cuda gpu julialang metal mpi parallel rocm staggeredgrid stencil

Last synced: 23 Apr 2025

https://github.com/illuhad/hipCPU

Implementation of AMD HIP for CPUs

cuda gpgpu hip hpc openmp openmp-parallelization

Last synced: 21 Apr 2025

https://github.com/src-d/infrastructure-dockerfiles

Dockerfile-s to build the images which power source{d}'s computing infrastructure.

cuda dockerfile infrastructure jupyterhub pytorch tensorflow

Last synced: 05 May 2025

https://github.com/shahriarrezghi/spyker

High-performance Spiking Neural Networks Library Written From Scratch with C++ and Python Interfaces.

computational-neuroscience cuda cudnn cxx high-performance neuroscience onednn python r-stdp snn stdp

Last synced: 02 Oct 2025

https://github.com/ktaletsk/nccv

Short course on computer vision and image processing using Numba+CUDA+OpenCV

computer-vision cuda jupyter-notebook numba

Last synced: 04 Sep 2025

https://github.com/ShahriarRezghi/Spyker

High-performance Spiking Neural Networks Library Written From Scratch with C++ and Python Interfaces.

computational-neuroscience cuda cudnn cxx high-performance neuroscience onednn python r-stdp snn stdp

Last synced: 04 Apr 2025

https://github.com/mikeswang/triumvirate

A Python/C++ package for three-point clustering measurements in LSS analyses

clustering-statistics cpp cuda cython hip large-scale-structure-cosmology python

Last synced: 14 Mar 2026

https://github.com/c3sr/comm_scope

NUMA-aware multi-CPU multi-GPU data transfer benchmarks

bandwidth benchmark-suite cuda gpu hip numa nvlink performance

Last synced: 17 Jan 2026

https://github.com/microsoft/svirl

Svirl is GPU-accelerated solver of complex Ginzburg-Landau equations for superconductivity. It consists of time-dependent solver to describe vortex dynamics and free energy minimizer to accurately find static configurations.

cuda ginzburg-landau gpu python scientific-computing superconductivity vortex

Last synced: 30 Jul 2025

https://github.com/heavyai/heavyai.jl

Julia client for OmniSci GPU-accelerated SQL engine and analytics platform

cuda data-science database gpu julia-language julia-package julialang sql

Last synced: 13 Aug 2025

https://github.com/bdhu/gpuinfo

A minimal command-line utility written in Rust for querying GPU status

command-line-tool cuda gpu nvidia nvidia-smi nvml rust rust-lang

Last synced: 13 Apr 2025

https://github.com/xiaosong9905/hpc-notes

Personal Notes for Learning HPC & Parallel Computation [Active Adding New Content]

cuda gpu hpc parallel-computing

Last synced: 15 May 2025

https://github.com/prg-titech/ikra-cpp

C++ Library for Object-oriented Programming with Structure of Arrays Layout

cpp cuda data-layout simd

Last synced: 12 May 2025

https://github.com/tylerjthomas9/rapids.jl

An unofficial Julia wrapper for the RAPIDS.ai ecosystem using PythonCall.jl

cuda gpu-acceleration julia

Last synced: 05 May 2025

https://github.com/pinto0309/20220228_intel_deeplearning_day_hitnet_demo

Special Presentation Demo at Intel IoT Planet 2021 DeepLearning Day / インテル IoT プラネット 2021 DeepLearning Dayの特別講演の発表資料 https://www.intel.co.jp/content/www/jp/ja/now/iot-planet/deep-learning-day.html

cuda docker intel onnx openvino

Last synced: 05 May 2025

https://github.com/mxpv/nvml-go

golang wrapper for NVIDIA Management Library (NVML)

cuda golang golang-wrapper gpu nvidia nvidia-smi nvml

Last synced: 05 Oct 2025

https://github.com/ema2159/equirectangular-cubemaptransform

OpenCV with CUDA and OpenMP implementations for transforming equirectangular images to cube maps and vice versa

cubemap-to-equirectangular cuda equirectangular-to-cubemap opencv openmp

Last synced: 15 Apr 2025

https://github.com/minhhn2910/cuda-half2

Convert CUDA programs from float data type to half or half2 with SIMDization

clang cuda half-precision

Last synced: 30 Apr 2025

https://github.com/shahruk10/nixshells

Frequently used nix shells for Python, CUDA and more.

cuda nix nix-shell nixpkgs python tensorflow torch virtualenv

Last synced: 10 Mar 2026

https://github.com/tracel-ai/cubek

CubeK: high-performance multi-platform kernels in CubeCL

cuda gpu hpc rocm vulkan

Last synced: 13 Jan 2026

https://github.com/kerneltuner/kernel_launcher

Using C++ magic to launch/capture CUDA kernels and tune them with Kernel Tuner

cpp cuda gpu kernel-tuner

Last synced: 12 Apr 2025

https://github.com/yhtang/graphdot

GPU-accelerated Marginalized Graph Kernel with customizable node and edge features; Gaussian process regression.

cheminformatics cuda gpu graph-algorithms machine-learning python

Last synced: 15 Apr 2025

https://github.com/enp1s0/cutf

CUDA Template Functions

cuda gpu

Last synced: 09 Apr 2025

https://github.com/pkestene/ms-hpc-ai-gpu

resources pour le cours d'introduction à la programmation des GPUs du mastère spécialisé HPC-AI

cuda deep-learning gpu gpu-computing machine-learning physics-informed-neural-networks pinn pinns

Last synced: 19 Aug 2025

https://github.com/dbklim/docker_image_with_cuda10_cudnn7

Dockerfiles and manual for easy build of docker image with CUDA10.X and cuDNN7.6 to run TensorFlow/PyTorch on the nvidia GPU in docker-container.

cuda cudnn docker docker-gpu docker-image docker-nvidia dockerfile gpu gpu-docker nvidia nvidia-docker pytorch pytorch-gpu tensorflow tensorflow-examples tensorflow-gpu tensorflow-gpu-docker torch

Last synced: 24 Oct 2025

https://github.com/parker-int64/yolov5-RGBD

Qt QML based yolov5 + RGBD camera program

cuda cudnn depth opencv openvino qml-applications qt rgbd tensorrt yolov5

Last synced: 21 Apr 2025

https://github.com/dancing-ui/uestc_vhm

使用yolov8、fast-reid、deepsort完成目标跟踪,使用yolov8、fast-reid、Faiss完成行人重识别

cuda deepsort dockerfile faiss fast-reid tensorrt yolov8n

Last synced: 29 Jul 2025

https://github.com/nvidia/optix-dev

OptiX SDK headers, everything needed to build & run OptiX applications. SDK samples not included.

cuda gpu gpu-acceleration gpu-programming nvidia optix ray-tracing raytracing

Last synced: 14 Apr 2025

https://github.com/kyegomez/neva

The open source implementation of "NeVA: NeMo Vision and Language Assistant"

artificial-intelligence cuda gpt4 multi-modal multi-modal-learning multithreading neva nvidia robotics

Last synced: 15 Oct 2025

https://github.com/bwohlberg/sporco-cuda

CUDA extension for the SPORCO project

convolutional-sparse-coding cuda gpu

Last synced: 12 Jul 2025

https://github.com/moldyn/clustering

Robust and stable clustering of molecular dynamics simulation trajectories.

biophysics clustering cpp cuda molecular-dynamics

Last synced: 14 Apr 2025

https://github.com/jtschwar/tomo_tv

C++ library for Regularized 2D and 3D Tomography Reconstructions.

3d-reconstruction cuda inverse-problems regularization tomography

Last synced: 25 Apr 2025

https://github.com/niftypet/nimpa

NiftyPET: Neuro-Image Manipulation, Processing and Analysis

analysis cuda gpu medical-imaging mr pet processing python

Last synced: 21 Sep 2025

https://github.com/pkestene/cuda-proj-tmpl

A minimal cmake based project skeleton for developping a CUDA application

cea cmake cuda gpu gpu-computing parallel-computing parallel-programming template

Last synced: 29 Jul 2025

https://github.com/datarhei/ffmpeg

FFmpeg base image for datarhei/core.

alpine cuda docker ffmpeg mmal raspberry-pi vaapi

Last synced: 16 Sep 2025

https://github.com/harrism/cuda_event_benchmark

Unit benchmarks of CUDA event APIs.

benchmarks cuda

Last synced: 22 Mar 2025

https://github.com/m0dulo/InferSpore

🌱 A fully independent Large Language Model (LLM) inference engine, built leveraging cuBLAS and cub. 🧩

cuda inference-engine llama2 llm

Last synced: 25 Apr 2025

https://github.com/forkni/cuda-link

Zero-copy bidirectional GPU texture sharing between TouchDesigner and Python via CUDA IPC. Sub-microsecond per-frame overhead with ring buffer architecture and GPU-side synchronization.

cuda cupy gpu inter-process-communication ipc python pytorch real-time shared-memory texture-sharing touchdesigner zero-copy

Last synced: 30 May 2026

https://github.com/roflmaostc/radonka.jl

A simple yet sufficiently fast (attenuated) Radon and backproject implementation using KernelAbstractions.jl. Runs on CPU, CUDA, ...

automatic-differentiation computed-tomography ct cuda gpu julia julia-language optimization radon radon-transform tomography x-ray

Last synced: 22 Jul 2025

https://github.com/denzp/rust-inline-cuda-tutorial

Let's jump into CUDA development with Rust

cuda rust

Last synced: 22 Mar 2025

https://github.com/anroshka/snake-ai

🐍 A Snake game AI that learns to play through Deep Q-Learning. Built with PyTorch and Pygame, featuring CUDA acceleration and real-time visualization of the learning process.

artificial-intelligence collaborate collaboration cuda deep-learning deep-q-learning dqn game-ai gpu-acceleration machine-learning neural-network pygame python pytorch q-learning reinforcement-learning snake-game

Last synced: 24 Feb 2026

https://github.com/ptaxom/pnn

pnn is Darknet compatible neural nets inference engine implemented in Rust.

cuda cudnn darknet rust tensorrt yolo

Last synced: 20 Apr 2025

https://github.com/matthewfeickert/nvidia-gpu-ml-library-test

Simple tests for JAX, PyTorch, and TensorFlow to test if the installed NVIDIA drivers are being properly picked up

cuda cudnn gpu jax nvidia pytorch setup tensorflow torch

Last synced: 15 Apr 2025

https://github.com/stellar-group/blaze_cuda

WIP · CUDA compatibility for Blaze · https://bitbucket.org/blaze-lib/blaze

blaze cpp cpp14 cuda gpu hpc linear-algebra metaprogramming

Last synced: 30 Apr 2025

https://github.com/vovod/pytorch-who-is-that-pokemon

All 151 classes pokemon Gen1 classification with torchvision model.

cuda deep-learning image-classification pokemon python pytorch torchvision

Last synced: 20 Jun 2025

https://github.com/OMEGAMAX10/Face-Mask-Detection-Using-YOLOv4

Because of the COVID-19 pandemic of 2020, more and more people are concerned with protecting themselves using masks, thus the need of software capable of monitoring whether the people are wearing masks or not. That is why I created a Python application using OpenCV (with CUDA support) based on the YOLOv4 algorithm, capable of monitoring the safety level of a space with video surveillance.

computer-vision covid-19 cuda cuda-support face-mask-detection gui gui-application masks monitoring opencv pyqt5 python safety-level video-surveillance wearing-masks yolov4 yolov4-algorithm

Last synced: 21 Apr 2025

https://github.com/ydrmaster/cuda-driver

基于 CUDA Driver API 的 cuda 运行时环境

cuda nvidia

Last synced: 23 Aug 2025

https://github.com/pinto0309/pytorch-build

Provide Docker build sequences of PyTorch for various environments.

cuda cudnn docker pytorch

Last synced: 07 May 2025

https://github.com/yinguobing/yolov5-trt

YOLO v5 inference with TensorRT (C++)

cpp cuda nvidia opencv tensorrt yolov5

Last synced: 09 Oct 2025

https://github.com/linonetwo/moss-dockerfile

用于在 Docker 里运行复旦的 MOSS 语言模型,使用 GradIO 提供 WebUI。

ai chatglm chatgpt cuda deeplearning docker gpu moss pytorch

Last synced: 12 Apr 2025

https://github.com/r-barnes/barnes2019-landscape

Landscape evolution models and graph processing on the GPU

algorithm cuda gpu

Last synced: 15 Apr 2025

https://github.com/abhishekyana/cyclegans-pytorch

CycleGANs-PyTorch applied on Young to Old image converter.

cuda cyclegan faceapp gan python pytorch resnet tutorial-code young2old

Last synced: 16 Jul 2025

https://github.com/serengil/gpuutils

GpuUtils: A Simple Tool for GPU Analysis and Allocation

cuda gpu nvidia nvidia-smi

Last synced: 21 Aug 2025

https://github.com/evilfreelancer/docker-whisper-server

whisper.cpp HTTP transcription server with OpenAI-like API in Docker

api api-server asr cuda docker docker-compose dockerfile nvidia openai openai-api whisper whisper-cpp

Last synced: 23 Oct 2025

https://github.com/ivangabriele/docker-cuda-desktop

Ubuntu PyTorch CUDA Docker image with KDE Plasma Desktop & VNC. Ideal for LLM & Deep Learning remote work.

cuda d-bus dbus deep-learning desktop docker gpu large-language-models llm nvidia python pytorch remote-desktop server ubuntu ubuntu-desktop vnc vnc-server x11

Last synced: 07 Mar 2026

https://github.com/sparselinearalgebra/spbla

Sparse Boolean linear algebra for Nvidia Cuda, OpenCL and CPU computations

boolean-algebra cplusplus cuda graph-algorithms graphblas opencl python sparse-matrix suitesparse

Last synced: 02 Jan 2026

https://github.com/yujun-shi/cfmatting_cuda_mkl

A cuda & mkl implementation of closed-form matting

cuda vision

Last synced: 10 Apr 2025

https://github.com/triagemd/tensorflow-builds

Tensorflow binaries and Docker images compiled with GPU support and CPU optimizations.

bazel cuda cudnn docker gpu machine-learning nvidia python tensorflow tensorflow-serving

Last synced: 09 Jul 2025

https://github.com/yashassamaga/convolutionbuildingblocks

GEMM and Winograd based convolutions using CUTLASS

convolution cuda cutlass deep-learning

Last synced: 28 Jul 2025

https://github.com/mberr/torch-max-mem

Decorators for maximizing memory utilization with PyTorch & CUDA

cuda python pytorch torch

Last synced: 30 Jul 2025

https://github.com/cggos/hpc

High-Performance Computing: CPU Instructions, GPU OpenCL & CUDA, etc. :sunny:

cuda heterogeneous-parallel-programming multi-threading neon opencl openmp simd sse

Last synced: 21 Mar 2025

https://github.com/lnstadrum/fastaugment

A handy data augmentation toolkit for image classification put in a single efficient TensorFlow/PyTorch op.

augmentation-transformations brightness-correction cuda cutout data-augmentation gamma-correction gpu mixup perspective-distortions tensorflow-op

Last synced: 23 Mar 2025

https://github.com/bkraad47/fat_llama

fat_llama is a Python package for upscaling audio files to FLAC or WAV formats using advanced audio processing techniques. It utilizes CUDA-accelerated calculations to enhance audio quality by upsampling and adding missing frequencies through FFT, resulting in richer and more detailed audio.

audio audio-engineering audio-processing audiophile cuda cufft cupy fft flac hi-res hpc mp3 music nvidia ogg parallel-computing physics upscaling wav

Last synced: 05 May 2025

https://github.com/bfrg/vim-cuda-syntax

CUDA syntax highlighting for Vim

cuda highlighting syntax vim vim-syntax

Last synced: 09 Apr 2025

https://github.com/cartersusi/pacman_cuda

[AUR][Pacman] Current Cuda compatibility with Tensorflow and Torch on Arch Linux

arch arch-linux archlinux aur compatibility cuda guide installer linux pacman script tensorflow torch

Last synced: 23 Apr 2025

https://github.com/d9d-project/d9d

d9d - d[istribute]d - distributed training framework based on PyTorch that tries to be efficient yet hackable

ai cuda distributed distributed-systems llm pytorch

Last synced: 14 Apr 2026

https://github.com/PINTO0309/Open3D-build

Provide Docker build sequences of Open3D for various environments.

cuda docker jetson jetson-nano open3d open3d-python pytorch tensorflow

Last synced: 20 Mar 2025