An open API service indexing awesome lists of open source software.

CUDA

CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.

https://github.com/mrkct/cuda-raytracer

Simple CUDA-Accelerated raytracer

cuda gpu raytracing raytracing-one-weekend

Last synced: 21 Apr 2026

https://github.com/rai-project/dlperf

Déjà vu: Modeling DNN Performance by Recalling History

benchmark cuda deep-learning modeling onnx performance tensorflow

Last synced: 21 Apr 2026

https://github.com/musaibbashir/object-detection

Pytorch+CUDA implementation of several image classification and object detection models like YOLO , Fast-CNN, RF-DETR

cnn computer-vision cuda image-classification object-detection pytorch yolo

Last synced: 21 Apr 2026

https://github.com/jiaau/kernels

This repository showcases common optimization techniques for kernels.

cpp cuda cute cutlass hpc kernel

Last synced: 21 Apr 2026

https://github.com/fxzxmicah/fedora-llama-cpp

llama.cpp tools with OpenMP, CUDA, and OpenVINO support

cuda fedora llama-cpp openmp openvino rpm

Last synced: 05 Jun 2026

https://github.com/mcp-tool-shop-org/gpu-container

Model-aware inference memory-placement planner for single-GPU rigs: profile hardware + model, generate explicit VRAM/RAM/NVMe placement plans across runtimes (llama.cpp/vLLM/...), and prove them with a measured receipt. Not VRAM overflow - declared placement.

cuda gpu inference llama-cpp llm moe offload vram wsl2

Last synced: 09 Jun 2026

https://github.com/dimitrijkrstev/pp-cuda-fft

A parallelised CUDA implementation of the FFT Radix-2 algorithm and its execution time comparison to the DFT and non-parallelised Radix-2

cuda fft parallel-computing

Last synced: 22 Apr 2026

https://github.com/mdnpascual/judgebarmashvp

Error bar for the game called Mash VP

cuda emgucv screencapturer tesseract-ocr

Last synced: 22 Apr 2026

https://github.com/maxenceleguery/jat

Tensor library

computation cuda tensor

Last synced: 24 Apr 2026

https://github.com/bikemazzell/tuonella-sift

A high-performance, memory-efficient CSV deduplication tool

csv cuda deduplication logger osint rust

Last synced: 24 Apr 2026

https://github.com/bardifarsi/threadpoolmanager

ThreadPoolManager is a C++ project that implements an efficient multi-threading system using a thread pool for generic functions of the same type and different tasks. It includes task management, synchronization mechanisms, and thread-safe logging to demonstrate concurrent task execution.

cpp cpp17 cpp20 cuda cuda-programming memory-management multiprocessing multithreading parallel-computing parallel-processing parallel-programming thread thread-pool thread-safety threadpool threads threadsafe

Last synced: 24 Apr 2026

https://github.com/juntyr/necsim-rust-analysis

Analysis of the spatially explicit biodiversity simulation `necsim-rust`

analysis biodiversity cuda mpi necsim rust simulation

Last synced: 24 Apr 2026

https://github.com/seanwevans/damnati

A CUDA-accelerated iterated prisoner's dilemma arena

arena cuda iterated-prisoners-dilemma prisoners-dilemma tournament

Last synced: 14 May 2026

https://github.com/0xsooki/extending-jax

JAX Custom Operations with C++ and CUDA (using Pybind11)

cuda jax pybind11 xla

Last synced: 25 Apr 2026

https://github.com/illagrenan/cuda-90-cudnn7-runtime-1604-py36

Ubuntu 16.04 with Python 3.6 and CUDA9 Dockerfile

cuda dockerfile python ubuntu

Last synced: 03 May 2026

https://github.com/sangioai/torchpace

PyTorch CUDA/C++ extension of PACE: Transformer non-linearlity accelerator engine.

cuda pytorch transformer

Last synced: 25 Apr 2026

https://github.com/daviddavo/19gpu

Short exercises for GPU at Complutense University of Madrid. Mirror from GitLab

accelerator cuda gpu-programming

Last synced: 26 Apr 2026

https://github.com/shashshukla/ee-210-signals-and-systems

Code for the assignments for EE-210, Signals and Systems, at IIT Bombay 2016.

cuda image-processing signal-processing

Last synced: 26 Apr 2026

https://github.com/dwain-barnes/llm-gguf-auto-converter

Automated Jupyter notebook solution for batch converting Large Language Models to GGUF format with multiple quantization options. Built on llama.cpp with HuggingFace integration.

auto-converter batch-processing cuda gguf huggingface jupyter-notebook llama-cpp llm model-quantization

Last synced: 17 Jun 2025

https://github.com/countzero/windows_exllama

This is a playground to explore the ExLlama project in a Windows environment.

conda cuda exllama python torch

Last synced: 26 Apr 2026

https://github.com/alexyzha/cuda-bioinformatics

A CUDA-Accelerated Bioinformatics Toolchain

bioinformatics bioinformatics-tool cplusplus cuda

Last synced: 26 Apr 2026

https://github.com/mateuszk098/parallel-programming-examples

Simple parallel programming examples with CUDA, MPI and OpenMP.

cpp cuda mpi openmp parallel-programming

Last synced: 27 Apr 2026

https://github.com/kbredies/tgv_pycuda

Algorithms, examples and tests for denoising, deblurring, zooming, dequantization and compressive imaging with total variation (TV) and second-order total generalized variation (TGV) regularization. GPU-accelerated code using PyCUDA.

compressive-imaging cuda image-deblurring image-denoising image-dequantization image-zooming python3 total-generalized-variation total-variation

Last synced: 27 Apr 2026

https://github.com/notkartikye/cuda-image-box-filters

🖼️ CUDA-powered tool for applying box filters to a large amount of images

cuda cuda-library cuda-programming npp

Last synced: 27 Apr 2026

https://github.com/gladap/heterogeneous_computing_project

Heterogeneous parallel programming exercise using OpenMP and CUDA to parallelize image filters

cuda heterogeneous-parallel-programming

Last synced: 27 Apr 2026

https://github.com/perhuepenbecker/cudyn

CUDA library for irregular tasks using a dynamic block-internal balancing mechanism

cpp cuda cuda-library cuda-programming gpu-computing gpu-programming irregular

Last synced: 28 Apr 2026

https://github.com/ncorgan/arrayfire-config-info

A small command-line utility that outputs all available ArrayFire devices

arrayfire cuda gpu opencl

Last synced: 28 Apr 2026

https://github.com/obsidianplusplus/yolov5-tensorrt-accelerator

基于TensorRT加速的YOLOv5高性能推理框架 | High-performance YOLOv5 inference framework accelerated by TensorRT with dynamic optimization

cuda dynamic-shapes-cuda-stream fp16 int8 pycuda tensorrt yolov5

Last synced: 28 Apr 2026

https://github.com/dlzou/rt-weekend

Ray Tracing in One Weekend, using CUDA

cuda ray-tracing

Last synced: 28 Apr 2026

https://github.com/rog0d/gpuss_watchers

"The GPU Watchers swore upon their shared memory hierarchy, from L1 to global memory, which also served as their mandate as lords of parallel computation."

cuda gpu-acceleration gpu-monitoring gpu-profiling

Last synced: 28 Apr 2026

https://github.com/axeloooo/pytorch

Collection of deep learning workflows in PyTorch, from fundamentals and classification to transfer learning and experiment tracking.

cuda python pytorch

Last synced: 28 Apr 2026

https://github.com/ltsyk/smart-snake-ai

Advanced Deep Q-Network AI for Snake Game with CUDA support and 700% performance boost

artificial-intelligence cuda deep-q-network dqn game-ai machine-learning pytorch reinforcement-learning snake-game

Last synced: 28 Apr 2026

https://github.com/atelierarith/julia_gpu_playground

For those who want use Julia with GPU

cuda docker docker-compose julia

Last synced: 28 Apr 2026

https://github.com/ccfelius/hpc

High Performance Computing (CUDA, MPI/openMP, high performance ML)

cuda high-performance-computing machine-learning mpi

Last synced: 28 Apr 2026

https://github.com/emanuelemessina/cuda-benchmark

Evaluate matrix calculations time between CPU and GPU (CUDA)

benchmark cuda matrix-calculations

Last synced: 28 Apr 2026

https://github.com/shermanlo77/modefilter

ImageJ plugin, Java and CuPy implementation of the mode filter and empirical null filter. The mode filter is an edge-preserving smoothing filter by taking the mode of the empirical density.

cuda cupy empirical-null fiji filter image-filter imagej jcuda mode-filter

Last synced: 28 Apr 2026

https://github.com/psteinb/gtc2017

Slides for my presentation at GTC 2017 from May 8-11 in Silicon Valley

compression cuda ffmpeg gpu gpu-computing h264 h265 microscopes spim

Last synced: 03 May 2026

https://github.com/fedimser/aldyparen

Renders pictures and videos with algebraic fractals

cuda fractals graphics

Last synced: 29 Apr 2026

https://github.com/sandialabs/tenzing

Core library for optimizing CUDA+MPI programs as sequential decision problems.

cuda mpi scr-2759 sequential-decision-problem

Last synced: 29 Apr 2026

https://github.com/snandasena/cuda-at-scale-for-the-enterprise

Gauss Filter with CUDA and NPP

cpp cuda gpu nvidia

Last synced: 29 Apr 2026

https://github.com/baro-00/cpp-cuda-lab

Experimental C++ projects using NVIDIA CUDA for parallel computing. Learning & testing GPU kernels

cpp cuda

Last synced: 04 May 2026

https://github.com/apostolis1/parallel-processing-systems

Project of the undergrad course "Parallel Processing Systems" - NTUA

benchmark c cuda mpi openmp parallel-computing

Last synced: 29 Apr 2026

https://github.com/giog97/histogram_equalization_cuda

Performance comparison of sequential and parallel CUDA Histogram Equalization for image contrast enhancement.

cuda cuda-kernels cuda-programming histogram-equalization image-processing parallel-computing parallel-programming

Last synced: 29 Apr 2026

https://github.com/jonastoth/cuda_raytracer

University project to implement a basic Raytracer in CUDA

cpp14 cuda raytracer

Last synced: 29 Apr 2026

https://github.com/ousscher/esi_2cs_hpc_tp

A collection of High-Performance Computing (HPC) codes showcasing parallel computing techniques. This repository includes implementations in CUDA, MPI, OpenMP, and threading ...

c cuda mpi openmp pthreads

Last synced: 18 Mar 2025

https://github.com/rdma-from-gpu/.github

Public code release for our paper "Toward GPU-centric Networking on Commodity Hardware"

cuda gpu linux network rdma research

Last synced: 29 Apr 2026

https://github.com/dogrego/gpgpu-rainbow-raytracer

A GPU-accelerated rainbow ray tracer with CPU reference implementation, CUDA for parallelized refraction/reflection, and OpenGL for interactive visualization

cuda gpgpu raytracing

Last synced: 29 Apr 2026

https://github.com/jeong-j/multicore

Multi Thread in Java / C / C++ / Pthread / CUDA

c cpp cuda java multicore pthread thread

Last synced: 29 Apr 2026

https://github.com/mathiasotnes/gemm

General Matrix Multiplication (GEMM) optimization in Cuda.

cuda gpu

Last synced: 26 Mar 2025

https://github.com/fikri-rouzan/cuda-c-program-part-2

CUDA C program from NVIDIA course.

c cuda

Last synced: 30 Apr 2026

https://github.com/fulvius31/triton-cache-tracker

A lightweight utility for monitoring and analyzing Triton kernel compilation cache behavior.

cache cuda gpu gpu-kernels triton triton-openai

Last synced: 30 Apr 2026

https://github.com/gaurisharan/cuda-ml-kernels

Repo for CUDA C++ GPU kernels for ML and HPC.

cpp cuda gpu hpc kernels ml parallel-computing systems-ml

Last synced: 30 Apr 2026

https://github.com/neel-dandiwala/npp_cudaatscale_project

For the enterprise course project, I have created a model that executes the histogram equalisation procedure on the given input image file.

cuda npp

Last synced: 30 Apr 2026

https://github.com/puzzlef/vector-multiplication-cuda

Comparing approaches for CUDA-based vector multiplication.

algorithm cuda map multiply operation pagerank primitive

Last synced: 30 Apr 2026

https://github.com/mahshid1378/piper-plus-3

Multilingual neural TTS (6 languages: JA/EN/ZH/ES/FR/PT, code supports SV) — C++, C#, Rust, Go, Python, npm (WASM). VITS + Prosody, streaming, CUDA/CoreML/DirectML. pip install piper-plus | npm install piper-plus | cargo install piper-plus-cli

cross-platform csharp cuda deep-learning dotnet japanese multilingual nuget onnx pytorch rust speech-synthesis streaming text-to-speech tts vits webassembly

Last synced: 08 Jun 2026

https://github.com/actepukc/uv-app-starter-pack

Bootstrap PySide6 GUI apps quickly using uv, with built-in PyTorch/CUDA handling.

astral-uv cross-platform cuda gui pyside6 python pytorch qt6 starter-kit template

Last synced: 30 Apr 2026

https://github.com/manishklach/gb300-rl-runtime

Close-to-metal C/CUDA lab for RL inference fast paths: persistent GPU workers, hugepage KV arenas, cacheline-aware command rings, and async reward handoff. Goal: remove page faults, malloc/free, scheduler wakeups, CPU round-trips, and KV migration from the per-token path.

ai-infrastructure close-to-metal cuda gb300 gpu-inference hpc lock-free nvlink reinforcement-learning spsc-queue

Last synced: 09 Jun 2026

https://github.com/ivanbuccella/sf2bio

Deep reinforcement learning for de novo drug design: a ReLeaSe method execution on a Docker Environment

cuda deep-learning deep-reinforcement-learning docker docker-compose machine-learning nvidia-cuda nvidia-docker reinforcement-learning release release-method

Last synced: 01 May 2026

https://github.com/mrtejas/cv-sandbox

A collection of Computer Vision mini-projects tuned for a number of tasks, including face detection, object detection, image segmentation and CLIP. Trained on popular datasets and includes comparative study of the methods. Done as a part of S24 course : Computer Vision at IIIT Hyd

computer-vision cuda ml opencv pytorch yolo

Last synced: 01 May 2026

https://github.com/fikri-rouzan/cuda-c-program-part-3

CUDA C program from NVIDIA course.

c cuda

Last synced: 01 May 2026

https://github.com/darshanakgr/meanfiltergpu

A gpu implementation of mean filter in CUDA

c cuda image-processing

Last synced: 01 May 2026

https://github.com/hrolive/fundamentals-of-accelerated-computing-with-cuda-python

Explore how to use Numba—the just-in-time, type-specializing Python function compiler—to create and launch CUDA kernels to accelerate Python programs on massively parallel NVIDIA GPUs.

accelerated-computing cuda cuda-programming jit numba nvidia python

Last synced: 01 May 2026

https://github.com/andresvalle/ocr-extraction

Text extraction from images using EasyOCR and parallelization with PyTorch

cuda ocr pytorch

Last synced: 01 May 2026

https://github.com/marius311/cudadistributedtools.jl

A set of utility tools for multi-GPU + multi-process workflows

cuda distributed julia

Last synced: 01 May 2026

https://github.com/f14-bertolotti/torchess

cuda torch extension for a chess engine

chess cuda torch

Last synced: 01 May 2026

https://github.com/vladd12/libexecstd

Modern C++ library for using an execution context of computer devices

cpp cpp17 cuda gpu-acceleration gpu-computing

Last synced: 06 May 2026

https://github.com/BardiFarsi/ThreadPoolManager

ThreadPoolManager is a C++ project that implements an efficient multi-threading system using a thread pool for generic functions of the same type and different tasks. It includes task management, synchronization mechanisms, and thread-safe logging to demonstrate concurrent task execution.

cpp cpp17 cpp20 cuda cuda-programming memory-management multiprocessing multithreading parallel-computing parallel-processing parallel-programming thread thread-pool thread-safety threadpool threads threadsafe

Last synced: 15 May 2025

https://github.com/proafxin/cuda-docker

High performance computing Images with pycuda and tensorrt preinstalled

cuda docker dockerfile libcudnn nvidia-tensorrt pycuda python tensorrt

Last synced: 11 Apr 2026

https://github.com/zhaocc1106/cuda-programming

Learning cuda programming

cuda nvidia

Last synced: 23 Mar 2025

https://github.com/zhaocc1106/cuxx-programing

一些cuda库的样例,cuda、cublas、cublaslt、cusparse...

cublas cublaslt cuda cusparse

Last synced: 23 Mar 2025

https://github.com/gammahazard/locate-anything

Sleek, mobile-friendly web UI for NVIDIA LocateAnything-3B — open-vocabulary object detection & grounding on your own GPU, via one docker compose up.

bounding-boxes computer-vision cuda docker fastapi gpu grounding locate-anything machine-learning nvidia object-detection ocr open-vocabulary-detection react self-hosted tailwindcss typescript vision-language-model web-ui

Last synced: 28 May 2026

https://github.com/abhiram-kandiyana/cuda-blast-2024

Reimplementation of NCBI BLAST with CUDA backend for faster retrieval

blast cuda gpu-acceleration parallel-processing

Last synced: 15 Mar 2025

https://github.com/mvishiu11/kmeans-clustering

K-Means Clustering with both GPU (CUDA) and CPU implementations

cuda kmeans-clustering

Last synced: 15 Mar 2025

https://github.com/sahil-rajwar-2004/vector-cuda

vector calculation with GPU acceleration using CUDA

c cpp11 cuda cuda-kernels cuda-programming nvcc

Last synced: 15 May 2025

https://github.com/neel-dandiwala/cuda-programs

Miscellaneous programs that grasp the concept of Parallel Computing

cuda gpu-programming parallel-programming

Last synced: 16 May 2025

https://github.com/tchung1970/sd-cli-cuda

CUDA-accelerated Stable Diffusion plugin for wavespeed-desktop

cuda gpu linux nvidia stable-diffusion

Last synced: 09 May 2026

https://github.com/bikrammajhi/100-days-of-gpu

This is my 🔥 100 Days of GPU — a wild, hands-on journey through CUDA kernels, Triton spells, and PTX sorcery.

cuda nsight-compute ptx triton

Last synced: 18 Jun 2025

https://github.com/lk/gpu-nbody

GPU-accelerated n-body engine for t-SNE and physics simulation

cuda gpu n-body n-body-simulator

Last synced: 02 Sep 2025

https://github.com/usman619/pdc

Parallel and Distributed Computing

cuda distributed-computing distributed-systems nextcloud

Last synced: 11 Apr 2026

https://github.com/neugence/acehub

AI Champions for Excellence: Fresh, informative courses and content designed to help developers, researchers, and leaders advance in the field of AI.

ai cuda cv ml mlops nlp pytorch rl rlhf tensorflow

Last synced: 05 Jan 2026

https://github.com/lordofhyphens/gpu-path-delay-coverage

CUDA-based Path Delay Fault Coverage

cpp cuda gpgpu moderngpu

Last synced: 04 May 2026

https://github.com/hit07/ml-dl-torch

This repository contains comprehensive understanding of Machine Leaning, DeepLeaning using Pytorch

computer-vision convolutional-neural-networks cuda neural-networks pytorch

Last synced: 28 Feb 2025

https://github.com/gaaniruddha/mphil-gpu-imager

This repository contains code for project #1 of MPhil: test-version of GPU imager for a single time-step, single-channel and single time-step, multi-channel.

astronomy benchmarks cuda cufft google-sheets gpu-imager imaging-astronomy interferometry radio-astronomy

Last synced: 11 Jun 2026

https://github.com/alan-cooney/python-cuda-starter-template

Python CUDA Starter Template

cuda deep-learning

Last synced: 30 Mar 2025

https://github.com/h4ck3r-04/fpassword

Fpassword merges Hashcat's hash-cracking precision with Hydra's parallelized network login, offering penetration testers a powerful tool for swift hash deciphering and simultaneous login attempts across diverse protocols.

brute-force brute-force-attacks c cracking cuda gpgpu hashcat hashes hydra network-security opencl password penetration-testing

Last synced: 16 Jan 2026

https://github.com/jesuscopado/parallel-programming

My solutions for the course Programming Parallel Computers at Aalto University (http://ppc.cs.aalto.fi/). Grade: 5/5

cpp cuda image-segmentation median-filter sorting-algorithms

Last synced: 19 Apr 2026