An open API service indexing awesome lists of open source software.

CUDA

CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.

https://github.com/zeloe/rtconvolver

A realtime convolution VST3

c convolution cplusplus cuda juce

Last synced: 22 Apr 2025

https://github.com/BrosnanYuen/RayBNN_Raytrace

Ray tracing library using GPUs, CPUs, and FPGAs via CUDA, OpenCL, and oneAPI

arrayfire cuda gpu gpu-computing opencl parallel parallel-computing ray ray-tracing raybnn raylib raytracer raytracing rust

Last synced: 04 Apr 2025

https://github.com/kuroko1t/gocuda

Go binding for Cuda Driver API

cuda go golang

Last synced: 02 May 2026

https://github.com/kareimgazer/mat-transpose-cuda

series of trials for optimizing matrix transpose with CUDA

cuda hpc matrix parallel-computing simd

Last synced: 29 Mar 2025

https://github.com/arminms/p2rng

A modern header-only C++ library for parallel algorithmic (pseudo) random number generation supporting OpenMP, CUDA, ROCm and oneAPI

cpp cuda cxx gpu header-only library linux macos multiplatorm oneapi openmp parallel pcg-random prng pseudorandom-number-generator random-number-distributions random-number-generation rocm stl-algorithms windows

Last synced: 04 Apr 2025

https://github.com/jackeylea/cuda_linux

linux下cuda/qt教程

cpp cuda cudnn qt5

Last synced: 26 Jul 2025

https://github.com/ivanrs297/pycuda-covariance-matrix

A PyCUDA covariance matrix parallel implementation

covariance-matrix cuda pycuda

Last synced: 25 Oct 2025

https://github.com/jaxony/pynvidia

⚙️ NVIDIA GPU utilities for Python 🔧

cuda deep-learning nvidia-gpu pip python utility

Last synced: 07 May 2025

https://github.com/brosnanyuen/raybnn_raytrace

Ray tracing library using GPUs, CPUs, and FPGAs via CUDA, OpenCL, and oneAPI

arrayfire cuda gpu gpu-computing opencl parallel parallel-computing ray ray-tracing raybnn raylib raytracer raytracing rust

Last synced: 26 Aug 2025

https://github.com/luismisanve/gguf-to-pytorchtensor

Simple Python Script that converts the Weight of a GGUF Model to a PyTorch Tensor

cuda gguf-models huggingface llamacpp numpy python pytorch tensor

Last synced: 20 Apr 2026

https://github.com/ellite/anchor-sub-sync

Anchor: A universal, hardware-accelerated CLI tool for subtitle synchronization (Whisper) and context-aware translation (NLLB)

ai audio-transcription automation cli cuda nllb python pytorch srt subtitle-sync subtitle-translation subtitles synchronization translation whisper

Last synced: 24 Feb 2026

https://github.com/cloudmercato/python-fpb

Python Floating Point Benchmark

benchmark cuda floating-point numpy pandas python

Last synced: 19 Apr 2026

https://github.com/gpuengineering/gputils

A C++ header-only library for parallel linear algebra on GPUs (CUDA/cuBLAS under the hood)

cplusplus-17 cplusplus-20 cpp cuda cuda-c cuda-cpp cuda-programming header-only linear-algebra

Last synced: 13 Aug 2025

https://github.com/shunk031/nvinfo-go

Rewrite of ikr7/nvinfo, a simple utility for monitoring your CUDA-enabled GPUs, with Golang

cli cuda go golang gpu nvidia nvidia-smi

Last synced: 02 Apr 2025

https://github.com/webis-de/pytorch-window-matmul

a custom CUDA kernel for windowed matrix multiplication

cuda cuda-kernel pytorch

Last synced: 31 Oct 2025

https://github.com/donpablonows/coin

🪙 Crypto Optimization Interface Network (aka COIN) is a high-performance Bitcoin address generator using CUDA acceleration and multi-threading. It optimizes GPU and CPU resources for fast address generation, ensures secure private key creation, and includes real-time monitoring and automatic system optimizations.

bitcoin blockchain cryptography cuda gpu-acceleration

Last synced: 07 May 2026

https://github.com/biodasturchi/gmx

🔬 Gromacs yordamida molekular modellashtirish

cuda gpu gromacs mdp topology tpr trr

Last synced: 12 May 2026

https://github.com/lordmathis/cudanet

Convolutional Neural Network inference library running on CUDA

convolutional-neural-networks cpp cuda pytorch

Last synced: 08 May 2026

https://github.com/tthebc01/kawpow

Containerized KAWPOW miner.

cuda docker kawpow ravencoin

Last synced: 22 Jun 2026

https://github.com/larygwil/ffmpeg-static-cuda

ffmpeg static binaries for Linux that work on some old Nvidia gpu (not tested)

avc cuda cuvid ffmpeg h264 h265 hevc nvdec nvenc

Last synced: 06 May 2026

https://github.com/ran-2012/inversion

solve geophysics using CUDA & TensorFlow

cpp cuda geophysics inversion-method python

Last synced: 11 May 2026

https://github.com/simmsb/p4haskell

P4 backend in haskell

compiler cuda gpu p4 p4c p4language

Last synced: 13 May 2026

https://github.com/deftruth/ptx-isa-8.2-zh

🎉持续更新:CUDA 12.2 PTX-ISA-8.2学习笔记,部分中文翻译 + 个人理解 + 内联汇编示例,讲解CUDA 12.2 PTX-ISA-8.2 汇编指令;进行中.....

asm cpp cuda ptx

Last synced: 13 May 2026

https://github.com/misha-kis/python-plane-ransac

Parallel RANSAC for plane detection for multiple point clouds using Python and CUDA

cuda numba plane-detection python ransac

Last synced: 13 May 2026

https://github.com/mindstudioofficial/fl_cuda_mandelbrot

Flutter example for visualizing the Mandelbrot Set using CUDA

cuda flutter-examples fractal-rendering

Last synced: 16 May 2026

https://github.com/paulvirally/vkfftcuda.jl

Julia bindings for VkFFT

cuda fft julia

Last synced: 04 May 2026

https://github.com/jtriley/gpucrate

Creates hard-linked GPU driver (currently just NVIDIA) volumes for use with docker, singularity, etc.

container cuda docker gpu singularity

Last synced: 27 Feb 2026

https://github.com/hrolive/fundamentals-of-accelerated-computing-with-cuda-c-cpp

Accelerate and optimize existing C/C++ CPU-only applications using the most essential CUDA tools and techniques.

cpp cuda cuda-kernels cuda-programming nsight nvidia profilling

Last synced: 10 Apr 2025

https://github.com/eddieoz/bananaforge

🎨 Professional AI-powered multi-layer 3D printing optimization tool that converts 2D images into optimized multi-layer 3D models for color printing with advanced transparency mixing.

3d-printer 3d-printing 3dprinting art cli-app cuda hueforge machine-learning python

Last synced: 17 Aug 2025

https://github.com/alejandrogallo/atrip

High Performance library for the CCSD(T) algorithm in quantum chemistry

asynchronous-programming coupled-cluster cuda literate-programming mpi quantum-chemistry

Last synced: 28 Oct 2025

https://github.com/yashkathe/image-noise-reduction-with-cuda

This project conducts an analysis of image denoising technique - median blur, comparing GPU-accelerated (Numba) and CPU-based (OpenCV) processing speeds.

cuda cuda-programming gpu-programming hardware-speed-analysis image-analysis image-processing numba nvidia nvidia-cuda nvidia-gpu opencv parallel-programming

Last synced: 14 May 2025

https://github.com/ventura8/whisper-pro-asr

A high-performance Docker container that runs OpenAI's Whisper model. Optimized for CPU, Intel NPU, Intel Arc/iGPU, and NVIDIA CUDA GPUs.

asr bazarr ctranslate2 cuda docker faster-whisper hardware-acceleration huggingface intel-npu media-automation openvino speech-to-text uvr vocal-isolation whisper whisper-asr

Last synced: 28 Apr 2026

https://github.com/sashakolpakov/graphem-rapids

Graph embedding for influence maximization in networks

cuda cuda-kernels embeddings graph-algorithms graph-theory pykeops pytorch rapidsai

Last synced: 16 Apr 2026

https://github.com/cppalliance/crypt

A C++20 module of cryptographic utilities for CPU and GPU

cpp20 cuda security

Last synced: 23 Apr 2025

https://github.com/eggy115/cuda

CUDA

cuda

Last synced: 22 Apr 2025

https://github.com/brosnanyuen/raybnn_neural

Neural Networks with Sparse Weights in Rust using GPUs, CPUs, and FPGAs via CUDA, OpenCL, and oneAPI

cpu cuda deep-learning gpu machine-learning machine-learning-algorithms neural-network neural-networks opencl parallel raybnn rust sparse-network sparse-neural-networks

Last synced: 09 Apr 2025

https://github.com/brosnanyuen/raybnn_diffeq

Differential Equation Solver using GPUs, CPUs, and FPGAs via CUDA, OpenCL, and oneAPI

arrayfire cuda differential differential-equations gpu gpu-computing opencl parallel parallel-computing parallel-programming raybnn rust

Last synced: 09 Apr 2025

https://github.com/btursunbayev/nvsonar

Active GPU diagnostic tool that identifies performance bottlenecks using micro-probes

cuda diagnostics gpu monitoring nvidia performance

Last synced: 02 Apr 2026

https://github.com/generic-matrix/node-js-cuda

Cuda Node JS binding using nan API with working example.

binding cuda cuda-node node-js node-js-cuda nodejs nodejs-gpu nodejs-modules

Last synced: 24 Jul 2025

https://github.com/postmalloc/barycuda

A tiny CUDA library for fast barycentric operations.

3d-graphics barycentric-coordinates cuda python simplex

Last synced: 31 Oct 2025

https://github.com/kianenigma/pmms-heat-dissipation

A set of assignments with comprehensive documentation to demonstrate multiple approaches to parallel programming in multi-core and many-core systems

cuda openmp parallel-programming pthreads

Last synced: 11 Sep 2025

https://github.com/lu-zero/nvidia-video-codec

Redistributable headers to build cuvid and nvenc

cuda cuvid nvenc nvidia nvidia-video-codec

Last synced: 19 Apr 2025

https://github.com/thomasjo/cudalicious

C++ header library intended to reduce CUDA boilerplate code

boilerplate cpp cuda header-only

Last synced: 19 May 2026

https://github.com/hurricane1988/check-gpu-device

✨本项目是一个基于 Flask + Gunicorn + NVIDIA CUDA 的 API 服务,提供 CUDA 设备信息查询 和 健康检查 接口。支持 GPU 运行,可用于 深度学习推理环境 部署

cuda docker makefile nvidia python3 pytorch

Last synced: 10 Jul 2025

https://github.com/arsfiqball/image-sharpen-cpp

Implementation of Image Sharpening algorithm in C++ & CUDA

cuda gpu image-processing image-sharpening-algorithm

Last synced: 22 Apr 2026

https://github.com/kiwijuice56/cuda-mandelbox

Ray marching renderer of the 3D mandelbox fractal, accelerated with CUDA GPU code

3d 3d-graphics cpp cuda fractal fractal-images fractal-rendering mandelbox nvidia-cuda

Last synced: 02 May 2026

https://github.com/thomasvonwu/interview-note

Share Interview Questions and Summarize Answers

cuda interview llm

Last synced: 23 Jun 2025

https://github.com/zhangge6/how-to-optimize-playground

High-performance computing (HPC) demos since I was a freshmen.

cuda gemm x86

Last synced: 15 May 2026

https://github.com/ancry1596/bitlocker-recovery-password-brute-forcer

GPU-accelerated BitLocker recovery password brute-forcer using BitCracker and CUDA

bitcracker bitlocker brute-force cuda gpu nvidia password-recovery python

Last synced: 08 Apr 2026

https://github.com/aresio/lassie

LASSIE is a black-box deterministic simulator of large-scale mass-action biochemical systems

biochemical cuda gpu-computing large-scale mass-action simulation stiff

Last synced: 21 Feb 2026

https://github.com/meetps/me-766

Assignment Solutions to course ME766 High Performance Scientific Computing.

cuda gpu-computing opencl openmp parallel-computing

Last synced: 18 May 2026

https://github.com/santhsecurity/vyre

Compiler-grade sequential GPU compute. Workgroup-local stacks, queues, hashmaps, dominator trees, fixed-point dataflow. CUDA + WGPU + SPIR-V with bit-exact conformance gate. Rust.

compute cuda gpgpu gpu gpu-computing parallel-computing rust spir-v wgpu

Last synced: 23 Jun 2026

https://github.com/ragibson/cuda-k-means

An implementation of Lloyd's algorithm for data clustering on GPUs and computational accelerators.

clustering cuda gpu k-means unsupervised-clustering

Last synced: 18 Jun 2026

https://github.com/rkv0id/boltzmanumba

GPU-Parallelization of a sequential Lattice Boltzmann gist on CUDA-capable devices using Numba.

cuda lbm numba

Last synced: 08 Sep 2025

https://github.com/silviopaganini/darknet-docker-nvidia

Docker Image to run Darknet on Nvidia with CUDA 9.0 and openCV 3.4.0

cuda darknet docker nvidia-docker opencv

Last synced: 13 Jul 2025

https://github.com/iitii/useless

逗比脚本备份,部分自用配置文件,一些自用脚本

aria2 bash-script cuda docker doubi ffmpeg frpc frps oh-my-zsh powerlevel10k

Last synced: 10 Apr 2026

https://github.com/fabryprog/java-gpu

Support for offloading parallel-for loops in Java to NVIDIA CUDA compatible cards.

cuda gpu java nvidia parallel-computing

Last synced: 15 Apr 2026

https://github.com/gurbaaz27/cs433a-design-exercises

Solutions of design exercises in CS433A: Parallel Programming, Spring Semester 2021-22

barriers cuda gpu-programming locks openmp parallel-programming posix-threads semaphores

Last synced: 29 Jan 2026

https://github.com/jedbrooke/cuda_bwt

CUDA accelerated burrows-wheeler transform

bioinformatics burrows-wheeler-transform bwt compression cuda

Last synced: 19 May 2026

https://github.com/maximedebarbat/dolphin

Dolphin is a python toolkit meant to speed up inference of TensorRT by providing CUDA-Accelerated processing.

cuda python tensorrt-inference

Last synced: 07 Jul 2025

https://github.com/usegalaxy-eu/ansible-cuda

Ansible role to install the CUDA toolkit as described in the NVIDIA CUDA Installation Guide in a Redhat/CentOS system.

ansible cuda

Last synced: 17 Jan 2026

https://github.com/puzzlef/pagerank-cuda

Design of CUDA-based PageRank algorithm for link analysis.

algorithm block config cuda graph launch pagerank point switch switched thread vertex

Last synced: 03 Feb 2026

https://github.com/ekzhang/kernels

Kernel programming environment

cuda cute-dsl gpu

Last synced: 06 Mar 2026

https://github.com/weiyu0824/flash-attention-lite

Basic Flash attention Implmentation

attention cuda torch

Last synced: 24 Jun 2025

https://github.com/phrb/nvidia-workshop-autotuning

Resources for autotuning CUDA compiler parameters

autotuning compilers cuda gpu julia nodal nvcc

Last synced: 03 May 2026

https://github.com/alejandroamat/3dgs-vulkan-cpp

Cross-platform Vulkan 3D Gaussian Splatting renderer - Windows/Mac/Linux, any GPU, with Python binding support

3d 3d-graphics 3d-reconstruction 3dgs apple computer-vision cuda differentiable-rendering gaussian-splatting glfw3 gpu gpu-acceleration linux macos nerf neural-rendering python real-time vulkan windows

Last synced: 15 Jun 2025

https://github.com/brosnanyuen/raybnn_optimizer

Gradient Descent Optimizers and Genetic Algorithms using GPUs, CPUs, and FPGAs via CUDA, OpenCL, and oneAPI

arrayfire cuda genetic-algorithm genetic-algorithms gpu gpu-computing gradient gradient-descent parallel parallel-computing raybnn rust

Last synced: 07 Oct 2025

https://github.com/archibate/cuda_aero_lbm

A toy 3D LBM solver in CUDA

cuda graphics simulation

Last synced: 13 Jun 2025

https://github.com/franneck94/cuda-aes

AES Implementation (Counter Mode) in C++, OpenMP and CUDA.

aes c-plus-plus counter cuda encryption openmp parallel

Last synced: 13 Apr 2025

https://github.com/krzemienski/ffmpeg-nvenc-bento4

Container to transcode and package hls and dash assets leveraging accelerated gpu for transcoding

bento4 cuda dash docker encoding ffmpeg hls mp4 nvenc nvidia video

Last synced: 29 Apr 2026

https://github.com/hahnjo/cgxx

Object-Oriented Implementation of the Conjugate Gradients Method

cg cuda hpc openacc opencl openmp

Last synced: 28 Apr 2026

https://github.com/potato3d/grid-rt

GPU-accelerated ray tracing using GLSL and CUDA

cuda glsl gpu ray-tracing real-time-rendering

Last synced: 15 Apr 2026

https://github.com/mrfoxak/evaluate-lip-reading-using-deep-learning-techniques.

This paper explores Silent Sound Technology, focusing on its potential to enhance communication in noisy environments through lip-reading and deep learning, with applications in hearing aids and security.

bi-lstm cnn cuda deep-learning image-processing lstm machine-learning mathematics neural-networks ovencv python research-paper sklearn tensorflow

Last synced: 03 Sep 2025

https://github.com/ammaryasirnaich/deeplearning_playland

This repository contains Docker Image files, which support the common frameworks required for Deep learning implementation. The images support both the latest GPU (Nvidia CUDA) and CPU processors.

cuda cuda11 cudnn cudnn8 deep-learning docker docker-image dockerfile gpu kersa opencv pytorch pytorch-cnn scikit-learn tensorflow2

Last synced: 12 Apr 2026

https://github.com/nodef/nvgraph.sh

CLI for nvGraph, which is a GPU-based graph analytics library written by NVIDIA, using CUDA.

analytics cli console cuda gpu graph nvgraph nvidia pagerank terminal

Last synced: 12 Sep 2025

https://github.com/romnn/nvbit-rs

Rust bindings to the NVIDIA NVBIT binary instrumentation API

cuda ffi gpgpu instrumentation nvbit nvidia profiling ptx rust sass tracing

Last synced: 05 May 2025

https://github.com/phael-exe/aco-selection-parallel

Parallelization of ACO with CUDA and OpenMP for large-scale instance selection.

cuda openmp parallel-computing

Last synced: 03 Jun 2026

https://github.com/lawmurray/gpu-gemm

CUDA kernel for matrix-matrix multiplication on Nvidia GPUs, using a Hilbert curve to improve L2 cache utilization.

cplusplus cuda cuda-kernels cuda-programming gpu gpu-computing gpu-programming matrix-multiplication numerical-methods scientific-computing

Last synced: 01 Mar 2026

https://github.com/bensuperpc/easyai

Make your own AI easily !

ai cuda python python3 tensorflow

Last synced: 16 Feb 2026

https://github.com/egororachyov/spbench

Benchmark for sparse linear algebra libraries for CPU and GPU platforms.

benchmark cpp cpu cuda gpu-computing graphblas opencl sparse-matrices

Last synced: 15 May 2025

https://github.com/demwafflez/cuda-2d-softbody-physics-simulation

Handcrafted from scratch! Felt and dealt with every single one of those thousand ACCESS_VIOLATION!

cpp cuda gpu-computing opengl physics-2d physics-simulation softbody-physics softbody-simulation verlet-physics

Last synced: 02 Mar 2025

https://github.com/rocm/rocmds-cmake

This is a collection of CMake modules that are useful for all ROCm-DS projects. By sharing the code in a single place it makes rolling out CMake fixes easier.

amd cmake cuda hip radeon-instinct-mi-series rocm

Last synced: 10 Apr 2025