An open API service indexing awesome lists of open source software.

CUDA

CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.

https://github.com/sashakolpakov/graphem-rapids

Graph embedding for influence maximization in networks

cuda cuda-kernels embeddings graph-algorithms graph-theory pykeops pytorch rapidsai

Last synced: 16 Apr 2026

https://github.com/jtriley/gpucrate

Creates hard-linked GPU driver (currently just NVIDIA) volumes for use with docker, singularity, etc.

container cuda docker gpu singularity

Last synced: 27 Feb 2026

https://github.com/nolmoonen/cuda-sdf

CUDA-accelerated path traced Menger sponge using ray marching.

cuda menger path-tracer ray-marching sdf

Last synced: 12 Feb 2026

https://github.com/bensuperpc/easyai

Make your own AI easily !

ai cuda python python3 tensorflow

Last synced: 16 Feb 2026

https://github.com/lawmurray/gpu-gemm

CUDA kernel for matrix-matrix multiplication on Nvidia GPUs, using a Hilbert curve to improve L2 cache utilization.

cplusplus cuda cuda-kernels cuda-programming gpu gpu-computing gpu-programming matrix-multiplication numerical-methods scientific-computing

Last synced: 01 Mar 2026

https://github.com/btursunbayev/nvsonar

Active GPU diagnostic tool that identifies performance bottlenecks using micro-probes

cuda diagnostics gpu monitoring nvidia performance

Last synced: 02 Apr 2026

https://github.com/phael-exe/aco-selection-parallel

Parallelization of ACO with CUDA and OpenMP for large-scale instance selection.

cuda openmp parallel-computing

Last synced: 03 Jun 2026

https://github.com/arsfiqball/image-sharpen-cpp

Implementation of Image Sharpening algorithm in C++ & CUDA

cuda gpu image-processing image-sharpening-algorithm

Last synced: 22 Apr 2026

https://github.com/ventura8/whisper-pro-asr

A high-performance Docker container that runs OpenAI's Whisper model. Optimized for CPU, Intel NPU, Intel Arc/iGPU, and NVIDIA CUDA GPUs.

asr bazarr ctranslate2 cuda docker faster-whisper hardware-acceleration huggingface intel-npu media-automation openvino speech-to-text uvr vocal-isolation whisper whisper-asr

Last synced: 28 Apr 2026

https://github.com/yosh-matsuda/gpu-array

Maximum GPU performance with Modern C++ syntax. RAII and Range-based abstraction to GPU memory management and data layouts, enabling code safety and performance optimization with zero overhead.

cpp cpp20 cuda gpu header-only hip

Last synced: 08 Jun 2026

https://github.com/shmishtopher/cudnn-versions

A scoop bucket for installing NVIDIA cuDNN versions.

cuda cudnn scoop scoop-apps scoop-bucket

Last synced: 01 May 2026

https://github.com/phrb/nvidia-workshop-autotuning

Resources for autotuning CUDA compiler parameters

autotuning compilers cuda gpu julia nodal nvcc

Last synced: 03 May 2026

https://github.com/paulvirally/vkfftcuda.jl

Julia bindings for VkFFT

cuda fft julia

Last synced: 04 May 2026

https://github.com/demoriarty/doksparse

sparse DOK tensors on GPU, pytorch

cuda pytorch sparse

Last synced: 23 Feb 2025

https://github.com/avitase/fast_frechet

Comparison of different (fast) discrete Fréchet distance implementations in C++ and CUDA.

benchmark cpp cuda frechet-distance simd

Last synced: 18 May 2026

https://github.com/tiw302/mandelbrot-c

A simple Mandelbrot set explorer written in C. Crafted with SDL2 and multithreaded rendering for a smooth experience. ‹(•_•)›

c cuda fractal graphics mandelbrot multithreading sdl2 web webassembly

Last synced: 26 Apr 2026

https://github.com/kpetridis24/four-russians-algorithm

Boolean matrix multiplication accelerated by the four-Russians algorithm

c cuda gpu high-performance matrix-multiplication preprocess

Last synced: 29 May 2026

https://github.com/boltzmannentropy/vllm-5090

vLLM-5090: Docker Container for RTX 5090 on WSL2/Windows

5090 cuda docker vllm

Last synced: 08 Oct 2025

https://github.com/scarfy-sysu/rtx5060-pytorch-cuda129

Run PyTorch with CUDA 12.9 on RTX 50 series (e.g. RTX 5060)

cuda deep-learning pytorch rtx5060

Last synced: 20 Jul 2025

https://github.com/denzp/current

CUDA high-level Rust framework

cuda rust

Last synced: 26 Apr 2026

https://github.com/headless-start/data-augmentation-impact

This repository contains effect of Data Augmentation of Training Set during Model Training.

augmented-images cuda data gpu keras matplotlib mnist opencv-python python3 tensorflow training-data

Last synced: 05 Apr 2026

https://github.com/cklxx/arle

Rust-native inference runtime for Qwen3 / Qwen3.5 — OpenAI-compatible serving + integrated agent, train, and self-evolution workflows. CUDA + Metal, no PyTorch on the hot path.

agent cuda flashinfer gspo inference infra kv-cache llm metal mlx openai-compatible qwen3 qwen35 rl rust

Last synced: 02 May 2026

https://github.com/szaghi/adam

Multi-physics AMR SDK and apps for High Performance Computing — from laptop to exascale device-accelerated superpc

amr cfd cuda fluid-dynamics fortran gas-dynamics hpc hydro-dynamics mpi openacc openmp plasma-dynamics

Last synced: 04 Apr 2026

https://github.com/xlite-dev/HGEMM

⚡️Write HGEMM from scratch using Tensor Cores with WMMA, MMA PTX and CuTe API. 🎉🎉

cuda hgemm tensor-cores

Last synced: 30 Jul 2025

https://github.com/neoblizz/cupti-plus-plus

CUPTI++ is a C++ interface to the CUDA Profiling Tools Interface (CUPTI).

cpp cuda cuda-profiler cupti profiler

Last synced: 26 Apr 2026

https://github.com/kagof/julia-image-processing

Image processing programs written in Julia

cuda image-processing julia

Last synced: 18 May 2026

https://github.com/artain-ai/ignite-ms

Fast self-hosted embedding engine for search, RAG, and reindexing workloads on NVIDIA GPUs. Built in Rust + TensorRT for teams that care about scale, cost, and control.

batch-inference batch-processing cuda embeddings gpu high-performance huggingface machine-learning multi-gpu nlp rag rust self-hosted semantic-search tensorrt text-embeddings vector-search

Last synced: 04 Jun 2026

https://github.com/sthysel/jtx2-tools

nvidia jtx/xavier GPU monitor tool

cuda nvidia txt2 xavier

Last synced: 19 May 2026

https://github.com/gjbex/gpu-programming

Material for a training on portable GPU programming

cuda gpu kokkos openmp openmp-off stl thrust

Last synced: 08 Feb 2026

https://github.com/muhac/jupyter-pytorch-docker

JupyterLab for AI in Docker! Anaconda and PyTorch GPU supported.

conda-environment cuda docker jupyterlab pytorch

Last synced: 01 Oct 2025

https://github.com/hansalemaos/nvidiacheck

Monitors NVIDIA GPU information and log the data into a pandas DataFrame - Windows only.

cuda log logging nvidia torch

Last synced: 27 Apr 2026

https://github.com/andreasholt/cusmc

A CUDA-accelerated Statistical Model Checker for Stochastic Timed Automata

cuda smc

Last synced: 11 Feb 2026

https://github.com/tthebc01/cudaconda3

Lightweight container environment with Cuda, Miniconda3, and Jupyter Lab.

cuda docker gpu jupyterlab marimo-notebook miniconda3 reverse-proxy-application

Last synced: 11 Feb 2026

https://github.com/dpbm/qml-course

Minicurso de quantum Machine learning

cuda cuda-q cuquantum docker ml python qml quantum quantum-computing tensorflow

Last synced: 31 Jan 2026

https://github.com/capelliexp/sc2-im-pf-pathfinding-thesis

Master of science thesis project. Using CUDA to utilize a systems GPU to create pathfinding data (IM+PF), usable by multiple agents in the same environment.

ai cplusplus cuda gpgpu pathfinding starcraft2

Last synced: 15 May 2026

https://github.com/murrellgroup/conflux.jl

Single-node data parallelism in Julia with CUDA

cuda data-parallelism flux julia nccl

Last synced: 22 May 2026

https://github.com/galaxies99/inception-cuda

CUDA Implementation of Inception

cuda inception-v3

Last synced: 12 Apr 2025

https://github.com/zeloe/juce_cuda_convolution

GPU acceleration for efficient, high-quality audio processing.

audio audio-processing convolution cuda dsp juce

Last synced: 03 Mar 2026

https://github.com/geekysuavo/gpufield

A CUDA-accelerated electromagnetostatics solver

cuda magnetic-fields magnetostatics

Last synced: 24 Dec 2025

https://github.com/brocbyte/realtime-deformations

Snow simulation (Material Point Method)

cuda glm material-point-method opengl

Last synced: 10 Aug 2025

https://github.com/terrylindev/image-to-ASCII

🖼️ A command-line tool for converting images to ASCII art

ascii ascii-art cli command-line cpp cuda docker image-processing image-to-ascii mpi opencv terminal

Last synced: 12 Jul 2025

https://github.com/kim-hwiwon/T-espresso

A CUDA Library for Low-overhead Host-to-Device Transmission of Patterned Profile Data

cuda profiler

Last synced: 10 Apr 2025

https://github.com/nixos-cuda/cuda-legacy

Select CUDA package sets which have aged out of Nixpkgs. [maintainers=@ConnorBaker, @SomeoneSerge]

cuda nixpkgs nixpkgs-overlay

Last synced: 15 May 2026

https://github.com/neoblizz/spmv

Efficient Sparse Matrix-Vector Multiplication (SpMV) using ModernGPU (MTX + CSR formats).

csr cuda gpgpu load-balancing mtx spmv

Last synced: 28 Apr 2026

https://github.com/dito97/gol

High-performance Computing (90535) final project at UniGe

cuda mpi openmp

Last synced: 02 May 2026

https://github.com/grakshith/parallel-k-means

K-Means clustering for Image Colour Quantization and Image Compression

cuda image-color-quantization image-compression k-means mpi opencv openmp

Last synced: 28 Apr 2026

https://github.com/mulx10/firefly

Enhancing Object Detection in using Thermal Imaging for thin cross-section unidentifiable objects(eg. cyclist, pedestrians).

autonomous-cars autonomous-navigation autonomous-vehicles c cuda object-detection thermal-camera yolov3

Last synced: 03 Sep 2025

https://github.com/digimortl/libguess

Patches that give Bitcoin Core an ability of CUDA mining

bitcoin c-plus-plus cryptocurrency cuda

Last synced: 16 Apr 2026

https://github.com/juntyr/necsim-rust

Spatially explicit biodiversity simulations using a parallel library written in Rust

biodiversity cuda mpi necsim rust simulation

Last synced: 22 Mar 2025

https://github.com/kim-hwiwon/t-espresso

A CUDA Library for Low-overhead Host-to-Device Transmission of Patterned Profile Data

cuda profiler

Last synced: 04 May 2026

https://github.com/acrlakshman/gradient-augmented-levelset-cuda

Implementation of Gradient Augmented Levelset method for CPU and GPU

cfd cuda levelset

Last synced: 17 Feb 2026

https://github.com/toxy4ny/artaxerxes

Artaxerxes - Adaptive High-Performance Stress Tester v.1.0. Rebuild old version Xerxes DDoS. Supports GPU+io_uring, DPDK, eBPF/XDP with intelligent fallbacks. Educational tool for advanced cybersecurity labs

cuda cuda-programming cybersecurity cybersecurity-education cybersecurity-tools dpdk ebpf educational high-performance network-security network-security-tool penetration-testing penetration-testing-framework penetration-testing-tools security-tools stress-testing

Last synced: 08 Oct 2025

https://github.com/nachovizzo/saxpy_openacc_cpp

My way of thinking about OpenACC, C++, and Parallel computing in general

cpp cuda gpu openacc

Last synced: 04 Sep 2025

https://github.com/prithivsakthiur/vlm-parsing

VLM-Parsing is a Gradio-based web application for parsing documents and images into structured HTML and Markdown formats using advanced Vision Language Models (VLMs).

cuda gradio html huggingface-models huggingface-spaces huggingface-transformers logics markdown ocr-recognition pytorch qwen2-5-vl spaces vlm

Last synced: 05 Apr 2026

https://github.com/xmas7/cudampi

A large hybrid CPU/GPU sorting network using CUDA and MPI. The sorting network uses a standard Quicksort for CPUs and a custom Bitonic Sort for GPUs. These two algorithms were the fastest in a number of prior benchmarks.

cpu cuda gpu hybrid mpi network

Last synced: 29 Apr 2026

https://github.com/stdogpkg/cukuramoto

A python/CUDA pkg which solves numerically the kuramoto model through the Heun's method

complex-networks cuda kuramoto-model

Last synced: 28 Jan 2026

https://github.com/amruthapatil/nyu-cudamatrixoperations

Optimizing CUDA programs for vector addition and matrix multiplication

cuda high-performance-computing

Last synced: 21 May 2026

https://github.com/l1cacheDell/CUDA_Code

Codes for learning cuda. Implementation of multiple kernels.

cuda cuda-programming

Last synced: 10 Mar 2025

https://github.com/lintenn/cudaaddvectors-explicit-vs-unified-memory

Performance comparison of two different forms of memory management in CUDA

c cuda explicit memory memory-management performance unified-memory

Last synced: 17 May 2026

https://github.com/fattorib/thunderkittens-simple-gemm

Simple Tensorcore GEMM in ThunderKittens

cuda gemm gpu thunderkittens

Last synced: 09 Feb 2026

https://github.com/nellogan/distributed_compy

Distributed_compy is a distributed computing library that offers multi-threading, heterogeneous (CPU + mult-GPU), and multi-node support

cluster cuda heterogeneous-parallel-programming multi-threading multigpu openmp openmpi

Last synced: 16 Aug 2025

https://github.com/copperfr/blendervxkex

Windows 7 CUDA & OptiX support for Blender 4.x

blender cuda cycles-renderer optix vxkex windows-7

Last synced: 20 Jan 2026

https://github.com/hanzhi713/bitonic-sort

In-place GPU sort with bitonic sort

bitonic-sort cuda gpu in-place sorting

Last synced: 09 Feb 2026

https://github.com/alpha74/cuda_basics

Nvidia NVCC CUDA programs for begineers.

c cpp cuda cuda-programs nvcc nvidia parallel-computing parallel-programming

Last synced: 08 May 2026

https://github.com/alexjmercer/fractal-art

Generating Fractals in C++ using SFML. For the ultimate visual stimulation and in-depth code!

cmake cmakelists cpp20 cuda cuda-programming fractal-rendering graphics mandelbrot multithreading sfml2

Last synced: 05 Mar 2026

https://github.com/l30nardosv/reproduce-parcosi-moleculardocking

Reproducing paper: "Benchmarking the Performance of Irregular Computations in AutoDock-GPU Molecular Docking"

autodock-gpu cpu cuda gpu molecular-docking molecular-docking-scripts opencl paper reproducible-research

Last synced: 16 Feb 2026

https://github.com/mazharuddin-mohammed/semidgfem

High-performance TCAD Simulator Using Discontinuous Galerkin FEM

cuda discontinuous-galerkin-method tcad tcad-device-simulator

Last synced: 15 Jun 2025

https://github.com/kohulan/tensorflow-2.0-installation-with-cuda-support

A detailed step by step guide to install Tensorflow-2.0-gpu with CUDA Drivers on Ubuntu Server/ Desktop LTS

cuda gpu nvidia ubuntu

Last synced: 07 May 2025

https://github.com/isazi/aoflagger

AOFlagger Radio Frequency Interference mitigation algorithm.

cuda gpu many-core rfi

Last synced: 30 Apr 2026

https://github.com/pothosware/pothosgpu

Pothos toolkit for ArrayFire API support

arrayfire cuda dataflow dataflow-programming gpu opencl pothos

Last synced: 19 Apr 2026

https://github.com/dqbd/cuda-btree

Implementation of B-Trees on NVIDIA CUDA

b-tree cuda nvidia

Last synced: 30 Apr 2026

https://github.com/tortillazhawaii/rr_sort

Various sorting implementations using distributed and parallel methods

bazel cpp cuda java openmp spark threads

Last synced: 14 Apr 2026

https://github.com/betarixm/cuecc

POSTECH: Heterogeneous Parallel Computing (Fall 2023)

cryptography ctypes cuda ecc postech secp256k1

Last synced: 12 May 2025

https://github.com/steleman/pytorch-cuda-2.7.1

Clone of PyTorch: Tensors and Dynamic neural networks in Python and C++ with strong GPU acceleration.

cuda fedora macos pytorch sequoia

Last synced: 30 Apr 2026

https://github.com/navdeep-g/dimreduce4gpu

Dimensionality reduction ("dimreduce") on GPUs ("4gpu")

cplusplus cuda dimensionality-reduction gpu linear-algebra pca python svd unsupervised-learning

Last synced: 14 Apr 2025

https://github.com/mu7annad0/100gpu

100 Days of CUDA: Optimizing My Life, One Kernel at a Time. 🔄🔥

cuda gpu

Last synced: 08 Mar 2026

https://github.com/matthewfeickert/cuda-tf-torch

An Ubuntu 18.04 NVIDIA Docker image with CUDA 10.1 CuDNN 7 with TensorFlow and PyTorch

cuda cuda-101 cudnn cudnn-v7 docker docker-image gpu nvidia-docker nvidia-gpu pytorch tensorflow torch

Last synced: 07 Jan 2026

https://github.com/seungjaelim/cuda.tutorial

References content from the OLCF CUDA Training Series. (https://github.com/olcf/cuda-training-series)

cuda gpu-programming nsight-compute nsight-systems

Last synced: 07 Feb 2026

https://github.com/ginkgo-project/cudaarchitectureselector

A CMake module simplifying the specification of CUDA architectures

cmake cmake-modules cuda

Last synced: 05 Nov 2025

https://github.com/dhruvsrikanth/cudann

A distributed implementation of a deep learning framework in CUDA.

cpp cuda deep-learning deep-learning-framework gpu-programming high-performance-computing hpc parallel-programming

Last synced: 01 May 2026

https://github.com/true-real-michael/python-plane-ransac

Parallel RANSAC for plane detection for multiple point clouds using Python and CUDA

cuda numba plane-detection python ransac

Last synced: 14 Mar 2025

https://github.com/lukasboettcher/msc-code

This is the repo for my master thesis on a GPU accelerated andersen analysis.

andersen-analysis clang cuda llvm static-analysis

Last synced: 16 Jan 2026

https://github.com/cfries/javagpuexperiments

Repository used to demo OpenCL, JOCL, JCuda.

cuda

Last synced: 25 Apr 2026

https://github.com/tawssie/zmpy3d_pt

Python implementation of 3D Zernike moments with PyTorch

3d-zernike cuda gpu protein-structure python pytorch structural-bioinformatics superposition zernike-moments

Last synced: 24 Oct 2025

https://github.com/aiday-mar/mpi-cuda-project

Using MPI and CUDA in order to accelerate the conjugate gradient algorithm execution in C++

c-plus-plus cuda gpu mpi university-project

Last synced: 02 May 2026