An open API service indexing awesome lists of open source software.

CUDA

CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.

https://github.com/cklxx/arle

Rust-native inference runtime for Qwen3 / Qwen3.5 — OpenAI-compatible serving + integrated agent, train, and self-evolution workflows. CUDA + Metal, no PyTorch on the hot path.

agent cuda flashinfer gspo inference infra kv-cache llm metal mlx openai-compatible qwen3 qwen35 rl rust

Last synced: 02 May 2026

https://github.com/scarfy-sysu/rtx5060-pytorch-cuda129

Run PyTorch with CUDA 12.9 on RTX 50 series (e.g. RTX 5060)

cuda deep-learning pytorch rtx5060

Last synced: 20 Jul 2025

https://github.com/nellogan/distributed_compy

Distributed_compy is a distributed computing library that offers multi-threading, heterogeneous (CPU + mult-GPU), and multi-node support

cluster cuda heterogeneous-parallel-programming multi-threading multigpu openmp openmpi

Last synced: 16 Aug 2025

https://github.com/tortillazhawaii/rr_sort

Various sorting implementations using distributed and parallel methods

bazel cpp cuda java openmp spark threads

Last synced: 14 Apr 2026

https://github.com/murrellgroup/conflux.jl

Single-node data parallelism in Julia with CUDA

cuda data-parallelism flux julia nccl

Last synced: 22 May 2026

https://github.com/xlite-dev/HGEMM

⚡️Write HGEMM from scratch using Tensor Cores with WMMA, MMA PTX and CuTe API. 🎉🎉

cuda hgemm tensor-cores

Last synced: 30 Jul 2025

https://github.com/B1-663R/docker-mining

Dockerfiles to build docker images to start mining with an NVIDIA Docker architecture

cryptocurrency cuda docker-image docker-nvidia mining

Last synced: 28 Mar 2025

https://github.com/stdogpkg/cukuramoto

A python/CUDA pkg which solves numerically the kuramoto model through the Heun's method

complex-networks cuda kuramoto-model

Last synced: 28 Jan 2026

https://github.com/debowin/gpu-parallel-recommender-system

GPGPU Parallel User-User Collaborative Filtering System in CUDA C

collaborative-filtering cuda gpu-programming movielens-dataset recommender-system

Last synced: 24 Apr 2026

https://github.com/kpetridis24/four-russians-algorithm

Boolean matrix multiplication accelerated by the four-Russians algorithm

c cuda gpu high-performance matrix-multiplication preprocess

Last synced: 29 May 2026

https://github.com/lintenn/cudaaddvectors-explicit-vs-unified-memory

Performance comparison of two different forms of memory management in CUDA

c cuda explicit memory memory-management performance unified-memory

Last synced: 17 May 2026

https://github.com/l30nardosv/reproduce-parcosi-moleculardocking

Reproducing paper: "Benchmarking the Performance of Irregular Computations in AutoDock-GPU Molecular Docking"

autodock-gpu cpu cuda gpu molecular-docking molecular-docking-scripts opencl paper reproducible-research

Last synced: 16 Feb 2026

https://github.com/puzzlef/pagerank-cuda-dynamic

Design of CUDA-based Parallel Dynamic PageRank algorithm for measuring importance.

algorithm cuda gpu graph pagerank static temporal

Last synced: 21 Feb 2026

https://github.com/andreabak/whispersubs

Generate subtitles for your video or audio files using the power of AI

ai cuda deep-learning gpu-acceleration machine-learning srt subtitles transcribe transcription translate whisper

Last synced: 15 Feb 2026

https://github.com/teodutu/asc

Arhitectura Sistemelor de Calcul - UPB 2020

cache-optimization cuda parallel-programming profiling python-threading

Last synced: 24 Apr 2026

https://github.com/csvancea/gpu-hashtable

GPU-backed linear-probing hash table implemented in CUDA. Supports batch operations such as insert and retrieval.

cuda hashtable

Last synced: 24 Apr 2026

https://github.com/bdwhst/fluora

A CUDA PBR path tracer

cpp cuda pathtracing pbr rendering

Last synced: 13 Feb 2026

https://github.com/amruthapatil/nyu-cudamatrixoperations

Optimizing CUDA programs for vector addition and matrix multiplication

cuda high-performance-computing

Last synced: 21 May 2026

https://github.com/tiw302/mandelbrot-c

A simple Mandelbrot set explorer written in C. Crafted with SDL2 and multithreaded rendering for a smooth experience. ‹(•_•)›

c cuda fractal graphics mandelbrot multithreading sdl2 web webassembly

Last synced: 26 Apr 2026

https://github.com/denzp/current

CUDA high-level Rust framework

cuda rust

Last synced: 26 Apr 2026

https://github.com/alpinebuster/arkime-docker-compose

Deploy Arkime with GPU-accelerated Rust/Python parsers and custom plugins using Docker Compose.

arkime c cuda deep-neural-networks docker docker-compose llm machine-learning networking pcap pcapng python rust traffic-analysis

Last synced: 16 Apr 2026

https://github.com/true-real-michael/python-plane-ransac

Parallel RANSAC for plane detection for multiple point clouds using Python and CUDA

cuda numba plane-detection python ransac

Last synced: 14 Mar 2025

https://github.com/kagof/julia-image-processing

Image processing programs written in Julia

cuda image-processing julia

Last synced: 18 May 2026

https://github.com/neoblizz/cupti-plus-plus

CUPTI++ is a C++ interface to the CUDA Profiling Tools Interface (CUPTI).

cpp cuda cuda-profiler cupti profiler

Last synced: 26 Apr 2026

https://github.com/mazharuddin-mohammed/semidgfem

High-performance TCAD Simulator Using Discontinuous Galerkin FEM

cuda discontinuous-galerkin-method tcad tcad-device-simulator

Last synced: 15 Jun 2025

https://github.com/acrlakshman/gradient-augmented-levelset-cuda

Implementation of Gradient Augmented Levelset method for CPU and GPU

cfd cuda levelset

Last synced: 17 Feb 2026

https://github.com/l1cacheDell/CUDA_Code

Codes for learning cuda. Implementation of multiple kernels.

cuda cuda-programming

Last synced: 10 Mar 2025

https://github.com/hansalemaos/nvidiacheck

Monitors NVIDIA GPU information and log the data into a pandas DataFrame - Windows only.

cuda log logging nvidia torch

Last synced: 27 Apr 2026

https://github.com/capelliexp/sc2-im-pf-pathfinding-thesis

Master of science thesis project. Using CUDA to utilize a systems GPU to create pathfinding data (IM+PF), usable by multiple agents in the same environment.

ai cplusplus cuda gpgpu pathfinding starcraft2

Last synced: 15 May 2026

https://github.com/gjbex/gpu-programming

Material for a training on portable GPU programming

cuda gpu kokkos openmp openmp-off stl thrust

Last synced: 08 Feb 2026

https://github.com/avitase/fast_frechet

Comparison of different (fast) discrete Fréchet distance implementations in C++ and CUDA.

benchmark cpp cuda frechet-distance simd

Last synced: 18 May 2026

https://github.com/zeloe/juce_cuda_convolution

GPU acceleration for efficient, high-quality audio processing.

audio audio-processing convolution cuda dsp juce

Last synced: 03 Mar 2026

https://github.com/tank3-tk3/pi-calculation-cpu-gpu

PI calculation with CPU and GPU

c cpp cuda parallel-computing pi

Last synced: 13 Apr 2026

https://github.com/lukasboettcher/msc-code

This is the repo for my master thesis on a GPU accelerated andersen analysis.

andersen-analysis clang cuda llvm static-analysis

Last synced: 16 Jan 2026

https://github.com/alpha74/cuda_basics

Nvidia NVCC CUDA programs for begineers.

c cpp cuda cuda-programs nvcc nvidia parallel-computing parallel-programming

Last synced: 08 May 2026

https://github.com/terrylindev/image-to-ASCII

🖼️ A command-line tool for converting images to ASCII art

ascii ascii-art cli command-line cpp cuda docker image-processing image-to-ascii mpi opencv terminal

Last synced: 12 Jul 2025

https://github.com/neoblizz/spmv

Efficient Sparse Matrix-Vector Multiplication (SpMV) using ModernGPU (MTX + CSR formats).

csr cuda gpgpu load-balancing mtx spmv

Last synced: 28 Apr 2026

https://github.com/grakshith/parallel-k-means

K-Means clustering for Image Colour Quantization and Image Compression

cuda image-color-quantization image-compression k-means mpi opencv openmp

Last synced: 28 Apr 2026

https://github.com/demoriarty/doksparse

sparse DOK tensors on GPU, pytorch

cuda pytorch sparse

Last synced: 28 Jun 2026

https://github.com/xmas7/cudampi

A large hybrid CPU/GPU sorting network using CUDA and MPI. The sorting network uses a standard Quicksort for CPUs and a custom Bitonic Sort for GPUs. These two algorithms were the fastest in a number of prior benchmarks.

cpu cuda gpu hybrid mpi network

Last synced: 29 Apr 2026

https://github.com/digimortl/libguess

Patches that give Bitcoin Core an ability of CUDA mining

bitcoin c-plus-plus cryptocurrency cuda

Last synced: 16 Apr 2026

https://github.com/tawssie/zmpy3d_pt

Python implementation of 3D Zernike moments with PyTorch

3d-zernike cuda gpu protein-structure python pytorch structural-bioinformatics superposition zernike-moments

Last synced: 24 Oct 2025

https://github.com/nixos-cuda/cuda-legacy

Select CUDA package sets which have aged out of Nixpkgs. [maintainers=@ConnorBaker, @SomeoneSerge]

cuda nixpkgs nixpkgs-overlay

Last synced: 15 May 2026

https://github.com/shreyansh26/mlsys-experiments

A collection of scripts on experimenting and implementing MLSys-related stuff

cuda cuda-kernel gpu gpu-programming llm-inference profiling pytorch triton

Last synced: 30 Aug 2025

https://github.com/kishore-narendran/eecs221-highperformancecomputing

Assignments done during the graduate course EECS 221 - Introduction to HPC that I took in the Spring Quarter of 2016 at University of California, Irvine. Involves assignments that use OpenMP, MPI and CUDA.

cuda hpc mpi openmp

Last synced: 17 May 2026

https://github.com/boltzmannentropy/vllm-5090

vLLM-5090: Docker Container for RTX 5090 on WSL2/Windows

5090 cuda docker vllm

Last synced: 08 Oct 2025

https://github.com/tvanfossen/entropic

Local-first agentic inference engine in C/C++. Multi-tier model routing, grammar-constrained output, MCP tool servers. Embeddable via C ABI.

agentic-ai agentic-framework cpp cpp20 cuda edge-ai embedded-ai gbnf gguf grammar-constrained-decoding inference-engine llama-cpp llm local-llm mcp on-device-ai privacy-first tool-calling

Last synced: 30 May 2026

https://github.com/seungjaelim/cuda.tutorial

References content from the OLCF CUDA Training Series. (https://github.com/olcf/cuda-training-series)

cuda gpu-programming nsight-compute nsight-systems

Last synced: 07 Feb 2026

https://github.com/alexjmercer/fractal-art

Generating Fractals in C++ using SFML. For the ultimate visual stimulation and in-depth code!

cmake cmakelists cpp20 cuda cuda-programming fractal-rendering graphics mandelbrot multithreading sfml2

Last synced: 05 Mar 2026

https://github.com/isazi/aoflagger

AOFlagger Radio Frequency Interference mitigation algorithm.

cuda gpu many-core rfi

Last synced: 30 Apr 2026

https://github.com/headless-start/data-augmentation-impact

This repository contains effect of Data Augmentation of Training Set during Model Training.

augmented-images cuda data gpu keras matplotlib mnist opencv-python python3 tensorflow training-data

Last synced: 05 Apr 2026

https://github.com/muhac/jupyter-pytorch-docker

JupyterLab for AI in Docker! Anaconda and PyTorch GPU supported.

conda-environment cuda docker jupyterlab pytorch

Last synced: 01 Oct 2025

https://github.com/dqbd/cuda-btree

Implementation of B-Trees on NVIDIA CUDA

b-tree cuda nvidia

Last synced: 30 Apr 2026

https://github.com/elftausend/sliced

Array operations with automatic differentiation on CPU and GPU

autograd automatic-differentiation cuda custos matrix opencl

Last synced: 31 Jan 2026

https://github.com/szymon423/tsp-cpu-vs-gpu

Simple brute force approach to solve travelling salesman problem with CPU and GPU

cuda tsp

Last synced: 11 Mar 2025

https://github.com/steleman/pytorch-cuda-2.7.1

Clone of PyTorch: Tensors and Dynamic neural networks in Python and C++ with strong GPU acceleration.

cuda fedora macos pytorch sequoia

Last synced: 30 Apr 2026

https://github.com/kilamper/matrix-multiplication

AC - Matrix multiplication using OpenMP, MPI and CUDA

cuda ms-mpi openmp

Last synced: 16 May 2026

https://github.com/lightshade12/kittlespt

A hobby CUDA pathtracing renderer.

3d-graphics computer-graphics cuda gpu path-tracing ray-tracing

Last synced: 18 Mar 2025

https://github.com/thunder-compute/thunder-compute-documentation

Documentation for Thunder Compute, a cloud platform creating technology to virtualize GPUs over TCP

ai artificial-intelligence cloud cloud-computing cuda gpu llm machine-learning nvidia pytorch tensorflow thunder-compute virtualization

Last synced: 15 Oct 2025

https://github.com/shahed-chy-suzan/psd-to-html--cuda

Cuda is a single page creative portfolio psd to html template which is built with HTML5 & CSS3. The site can be customized easily to suit your needs.

cuda portfolio psd-to-html

Last synced: 18 Jan 2026

https://github.com/dhruvsrikanth/fastconv

Distributed and serial implementations of the 2D Convolution operation in c++ and CUDA.

convolution-filters cpp cuda gpu-programming high-performance-computing hpc image-editor image-processing nvidia parallel-programming

Last synced: 04 May 2026

https://github.com/pratikvn/nla4hpc-exercises-framework

The exercises framework for the Numerical Linear Algebra for HPC course at Karlsruhe Institute of Technology.

cuda ginkgo homeworks hpc-course teaching

Last synced: 19 May 2026

https://github.com/brosnanyuen/raybnn_graph

Graph Manipulation Library For GPUs, CPUs, and FPGAs via CUDA, OpenCL, and oneAPI

cuda gpu graph graph-algorithms neural-network neural-networks opencl raybnn rust

Last synced: 06 Feb 2026

https://github.com/perl-openmp/p5-openmp-environment

Perl interface for manipulating OpenMP's environmental runtime execution variables

compiler cuda gcc gpu hpc openmp perl pthreads

Last synced: 19 Feb 2026

https://github.com/alpha74/hungarianalgocuda

Hungarian Algorithm for Linear Assignment Problem implemented using CUDA.

cuda nvcc parallel-computing parallel-programming

Last synced: 01 Jun 2026

https://github.com/ashwani-rathee/imagesgpu.jl

Image Processing on GPU in Julia

cuda gpu image image-processing julia

Last synced: 11 Jul 2025

https://github.com/haleelrah/Vision-pro-MAX

A Raspberry Pi-based object detection system for assisting visually impaired individuals. This project utilizes YOLO object detection and a Hailo 8L TPU to identify obstacles like manholes, potholes, and bumps, providing real-time audio feedback to aid navigation.

bash computer-vision cuda fine-tuning jupyter-notebook object-detection opencv python pytorch raspberry-pi rpi-camera ssh text-to-speech ultralytics yolo yolov8

Last synced: 30 Dec 2025

https://github.com/mark0011astra/simplecuda

CUDAを使用したGPU演算をNumPyと同様のインターフェースで簡単行えるライブラリ。A library that allows users to easily perform GPU operations using CUDA with a NumPy-like interface.

cuda cupy gpu machine-learning numpy python vector

Last synced: 02 May 2026

https://github.com/jonathanraiman/mini_cuda_rtc

Miniature CUDA Array library with Runtime Compilation

cpp11 cuda jit runtime-compilation

Last synced: 14 Apr 2026

https://github.com/mcp-tool-shop-org/backpropagate

Headless LLM fine-tuning in 3 lines — smart defaults, VRAM-aware batch sizing, multi-run SLAO, GGUF export for Ollama.

api cuda fine-tuning headless llm lora machine-learning ollama python qlora training unsloth web-security windows

Last synced: 31 May 2026

https://github.com/rajarsheya/real-time-audio-feature-extraction-with-cuda-for-speech-recognition

This project accelerates MFCC extraction using CUDA for real-time speech recognition. Offloading the process to the GPU reduces latency and speeds up processing, enabling fast, local speech-to-text transcription for applications like virtual assistants, without cloud reliance.

audio-processing cpp cuda fourier-transform python

Last synced: 10 May 2026

https://github.com/lcsb-biocore/cufluxsampler.jl

GPU-accelerated algorithms for flux sampling in CUDA.jl

cobra cuda gpu julia metabolic-network metabolism sampling

Last synced: 02 May 2026

https://github.com/jxlarrea/homeassistant-voice-recipes

GPU/CUDA-accelerated voice control stack for Home Assistant. Runs on x86/x64 and ARM64 (including the NVIDIA DGX Spark). 100% Local - No Cloud, No Subscriptions.

arm64 cuda dgx-spark gb10 gpu-acceleration home-assistant local-llm qwen3 speech-to-text text-to-speech voice-assistant x86-64

Last synced: 26 May 2026

https://github.com/vishwamartur/btc_recovery

High-performance Bitcoin wallet password recovery system with GPU acceleration and integrated graphics support. Recover Bitcoin Core wallet.dat files without blockchain download using advanced algorithms and blockchain APIs.

bitcoin bitcoin-core blockchain blockchain-api cpp cryptocurrency cuda electrum gpu-acceleration integrated-graphics multithreading opencl password-recovery private-keys recovery-tools wallet-dat wallet-recovery

Last synced: 14 Apr 2026

https://github.com/brendanbignell/cuda_montecarlooptionpricer

CUDA Monte Carlo Barrier Option Pricing Demo & Jupyer lab ML models

cuda deep-learning ml pytorch quantitative-finance xgboost-regression

Last synced: 19 Apr 2026

https://github.com/dvhh/masscorrelation

An exercise in writing an efficient correlation calculator

calculations correlation-calculation cuda matrix multi-threading openmp

Last synced: 15 May 2026

https://github.com/bokutotu/cudnn_graph_api_example

cudnn graph api example

cuda cudnn cudnn-v8

Last synced: 04 May 2026

https://github.com/rnabla/cuda-des

Bruteforcing DES using CUDA

bruteforce cuda data des encryption gpu parallel standard

Last synced: 27 Oct 2025

https://github.com/rajarsheya/real-time-traffic-analysis-with-cuda-object-detection

Implemented CUDA-accelerated object detection (YOLO) to analyze a sample image dataset. Performed vehicle counting and simulated speed estimation to demonstrate real-time traffic analysis capabilities.

cpp cuda opencv python yolo

Last synced: 12 Apr 2026

https://github.com/liuyuweitarek/pytorch-docker-builder

Automate PyTorch Docker image builds with compatible Python, CUDA, and Poetry versions, including CI/CD for testing.

cicd containerd cuda docker docker-image poetry-python python python3 pytorch pytorch-docker

Last synced: 06 Feb 2026

https://github.com/hubenchang0515/fft-benchmark

一些 FFT 库的性能测试

cuda fft

Last synced: 27 Oct 2025

https://github.com/rhysdg/whisper-onnx-python

A low-footprint GPU accelerated Speech to Text Python package for the Jetpack 5 era bolstered by an optimized graph

ai chatbot cuda machine-learning onnxruntime speech-to-text whisper

Last synced: 16 Feb 2026

https://github.com/thomasonzhou/minitorch

rebuilding pytorch: from autograd to convolutions in CUDA

cuda numba numpy

Last synced: 02 Feb 2026

https://github.com/Programmer-RD-AI/DetectX

A Pythonic approach to object detection using Detectron2, a clean, modular framework for training and deploying computer vision models. DetectX simplifies the complexity of object detection while maintaining high performance and extensibility.

coco-dataset computer-vision computer-vision-library cuda deep-learning detectron2 faster-rcnn gpu-accelerated machine-learning ml-framework object-detection object-recognition python3 pytorch retinanet

Last synced: 04 May 2025

https://github.com/SanaeProject/Matrix-for-Cpp

This repository has types that handle matrices.

cpp14 cpp14-library cuda matrix-library

Last synced: 15 May 2025

https://github.com/alekseyscorpi/vacancies_server

This is a server for vacancies generation using LLM (Saiga3)

code cuda cuda-toolkit docker dockerfile flask llama3 llamacpp llm ngrok pydantic saiga

Last synced: 06 Feb 2026

https://github.com/hartorn/docker-python

Repository to build python image, based on ubuntu and CUDA

cuda docker mkl-dnn onednn python3 ubuntu ubuntu1804

Last synced: 05 May 2026