An open API service indexing awesome lists of open source software.

CUDA

CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.

https://github.com/theogravity/dual-rtx-6000-blackwell-gemma-4-31b-it-nvfp4

Optimized vLLM setup for Gemma 4 31B NVFP4 with MTP on dual RTX PRO 6000 Blackwell using vllm and docker: native FP4 Tensor Cores, Multi-Token Prediction (96.5% acceptance rate), and prefix caching. Includes benchmark results and replication scripts.

am5 amd blackwell cuda docker fp4 gemma gemma4 llm-inference multi-token-prediction nvfp4 prefix-caching rtx-6000 speculative-decoding tensor-parallel vllm

Last synced: 11 May 2026

https://github.com/realdougeubanks/unmanic.plugin.encoder_video_hevc_nvenc_gpu

Unmanic plugin: H.265/HEVC encoder using NVIDIA hevc_nvenc with a true end-to-end GPU pipeline. Fork of Josh5/unmanic.plugin.encoder_video_hevc_nvenc that adds -hwaccel_output_format cuda when NVDEC HW decoding is enabled, keeping decoded frames in GPU memory through NVENC. Drop-in replacement with sensible defaults and full settings parity.

cuda ffmpeg hardware-acceleration nvdec nvenc nvidia unmanic unmanic-plugin video-transcoding

Last synced: 12 May 2026

https://github.com/nourmorsy/convolution-neural-network-cuda

Code for optimization to CNN using CUDA

c cnn cuda

Last synced: 13 May 2026

https://github.com/yinguobing/opencv-docker

Dockerfiles for OpenCV build.

cuda docker ffmpeg opencv

Last synced: 10 Apr 2026

https://github.com/kenmalik/cuda-dr-bcg

CUDA C++ implementation of the DR-BCG algorithm for numerically solving linear systems.

cpp cuda hpc numerical-methods

Last synced: 19 Apr 2026

https://github.com/prateekshukla1108/thunderkittens-docs

Documentation for ThunderKittens framework

cuda deep-le

Last synced: 18 Mar 2025

https://github.com/moshidev/acap

Prácticas de la asignatura Arquitectura y Computación de Altas Prestaciones

cuda homework-assignments mpi pthreads

Last synced: 30 Mar 2025

https://github.com/ragu-manjegowda/parallel-programming

Assignments and Projects of Udacity's Introduction to Parallel Programming Course

cuda gpu-programming nvidia-cuda nvidia-gpu udacity-parallel-programming

Last synced: 25 May 2026

https://github.com/unknownnuts/meshsdk

Mesh processing library with extra `C/C#/JS/TS/PYTHON` bindings.

cuda dicom electron emscripten mesh modelling pybind11 stl stomatology threejs wasm

Last synced: 10 Apr 2026

https://github.com/separatrixxx/pgp_labs_7_sem

👓 Laboratory work for the 7 semester of MAI on PGP and PDP

cpp cuda nvidia

Last synced: 15 May 2026

https://github.com/m-torhan/advent-of-code

🎄 Solutions for the Advent of Code

advent-of-code advent-of-code-2024 cuda

Last synced: 07 Apr 2025

https://github.com/jpodivin/gputomata

Cellular automata running on CUDA capable GPUs

cellular-automata cellular-automaton cuda

Last synced: 07 Nov 2025

https://github.com/kylesayrs/pttp

PyTorch Tensor Profiler with fully-supported memory timelines and events

cuda memory profiling pytorch

Last synced: 07 May 2026

https://github.com/kis-balazs/cuda-research

CUDA Research & Code. Course-style structured. Inspiration from @Infatoshi.

cuda

Last synced: 14 May 2025

https://github.com/mcp-tool-shop-org/gpu-container

Model-aware inference memory-placement planner for single-GPU rigs: profile hardware + model, generate explicit VRAM/RAM/NVMe placement plans across runtimes (llama.cpp/vLLM/...), and prove them with a measured receipt. Not VRAM overflow - declared placement.

cuda gpu inference llama-cpp llm moe offload vram wsl2

Last synced: 09 Jun 2026

https://github.com/bjornmelin/ai-system-design

🎨 Large-scale AI system architectures and implementations. Features distributed training systems, multi-GPU pipelines, and efficient resource management. 🏗️

architecture cuda distributed-systems engineering gpu-computing production scalability system-design

Last synced: 23 Jul 2025

https://github.com/derek-palmer/dvr-scan-file-organizer

DVR-Scan-Organizer is a Dockerized extension for DVR-Scan, designed to process multiple video files and organize output in a structured format.

cuda dvr dvr-scan multimedia opencv opencv-python python video video-processing

Last synced: 01 May 2026

https://github.com/ngoma1713/rushirb2001

🤖 Explore advanced AI and machine learning solutions for protein modeling and medical applications, developed by a dedicated data science graduate student.

computer-vision-opencv cuda data-science-portfolio deep-learning generative-ai machine-learning medical-ai protein-modeling published-researcher pytorch quantum-ml rag-chatbot tensorflow

Last synced: 02 May 2026

https://github.com/bjornmelin/llm-gpu-optimization

🚄 Advanced LLM optimization techniques using CUDA. Features efficient attention mechanisms, custom CUDA kernels for transformers, and memory-efficient training strategies. ⚡

cuda deep-learning gpu-acceleration llm-optimization machine-learning memory-optimization parallel-computing transformers

Last synced: 18 Mar 2025

https://github.com/elymsyr/auv_ws

An open-source simulation and control workspace for an Autonomous Underwater Vehicle (AUV) built on ROS 2 Humble and Gazebo. It features a high-fidelity dynamics model and an advanced AI-based motion controller (FossenNet) that uses a pre-trained LibTorch model to imitate a NL-MPC for real-time, high-performance manoeuvring.

autonomous-vehicles auv control-systems cpp cuda deep-learning gazebo imitation-learning libtorch mpc python robotics ros2 simulation

Last synced: 15 Apr 2026

https://github.com/quik-fe/node-nvidia-smi

Node wrapper around nvidia-smi.

cuda gpu nodejs nvidia nvidia-smi typescript

Last synced: 19 Feb 2026

https://github.com/deepschneider/tinygrad-universal

Universal version of Tinygrad with CUDA and OpenCL support

autograd automatic-differentiation cuda pycuda pyopencl tinygrad tinygrad-cuda

Last synced: 06 Mar 2025

https://github.com/maneeshsit/pcie

Modify run:ai and other FOSS projects code for use with PCIe card-based AI accelerators for both inference and training

cuda cxl cxl-mem distro exo k3s k8s kestra llamacpp llm-d mpi4py mpio onnxoptimizer opentelemetry-ebpf-profiler paxos-cluster pcie photonics-computing runai visualize vllm

Last synced: 24 Aug 2025

https://github.com/ghusta/jcuda-demo

JCUDA demo

cuda java nvidia

Last synced: 14 May 2026

https://github.com/mjun0812/setup-cuda

Set up a specific version of NVIDIA CUDA in GitHub Actions on Linux x86_64, arm64 (Debian and Fedora based distribution) and Windows

action cuda cuda-toolkit github-actions

Last synced: 13 Jan 2026

https://github.com/sshoecraft/shepherd

An interactive multi-backend LLM runtime with intelligent cache eviction and persistent retrieval-augmented memory.

anthropic cli cpp cuda gemini grok inference kv-cache llama-cpp llm mcp ollama openai openai-server rag smart-evictions tensorrt tool-calling ulimited-context

Last synced: 10 Apr 2026

https://github.com/kmock930/texture-image-comparison

This project aims to build a model which classifies the type of an unseen image as accurate as possible, by implementing, evaluating, and comparing amongst 2 different multi-layer perceptron Neural Networks.

computer-vision conda confusion-matrix convolutional-neural-networks cuda image-preprocessing keras keras-tensorflow learning-curve-analysis matplotlib multi-layer-perceptron neural-network pickle-file python3 skimage

Last synced: 12 Apr 2026

https://github.com/camille-004/cusprec

🏁 Sparse signal recovery library written in PyCUDA.

cuda ml python signal-processing sparse-recovery

Last synced: 18 Jan 2026

https://github.com/1ytic/cuda-gpu-zoo

Properties of the CUDA devices

cuda gpu

Last synced: 20 Aug 2025

https://github.com/sid911/neuralnetworkcpp

A small experiment to learn about neural networks and their runtimes in cpp

cpp cuda machine-learning neural-network

Last synced: 20 Aug 2025

https://github.com/flavienbwk/tensorflow2-cuda-10.2-docker

Tensorflow 2.3, CUDA 10.2, Docker compatible image

cuda docker python3 tensorflow ubuntu1804

Last synced: 11 Apr 2026

https://github.com/pvgupta24/parallel-programming

Basic algorithms for parallel programming in CUDA C++, Java and OpenMP

cuda openmp parallel-programming

Last synced: 19 Aug 2025

https://github.com/dmalexx/cuda_check

How can you check if CUDA is available in Tensorflow

cuda python tensorflow

Last synced: 10 Apr 2026

https://github.com/promptromp/aws-bootstrap-g4dn

fast and easy bootstrapping of AWS EC2 instances for CUDA development. Use as a CLI, as a programmatic SDK, or as an Agent Skill!

aws cuda ec2 jupyter-notebook machine-learning mlops python

Last synced: 21 Feb 2026

https://github.com/williamzhang20/cuda-practice

Exercises in CUDA

cuda n-body-problem

Last synced: 23 Mar 2025

https://github.com/rmeli/cuda-pg

CUDA C++ Playground

cpp cuda gpu

Last synced: 16 Apr 2026

https://github.com/ojeda-e/fokker-planck

Numerical solution of the Fokker-Planck equation in large times using CUDA/C.

cuda fokker-planck-equations

Last synced: 17 Aug 2025

https://github.com/alessiobugetti/integral-image-processing

Implements sequential and parallel integral image computation in C++ and Python, utilizing CUDA for parallel computation on GPU

cuda gpu-acceleration integral-image numba parallel-computing pycuda

Last synced: 24 May 2026

https://github.com/i-m-iron-man/abmax

Abmax is an agent-based modelling framework in Jax, focused on dynamic population size

abm agent agent-based agent-based-modeling agent-based-simulation agents cuda jax python

Last synced: 04 Oct 2025

https://github.com/sonhm3029/setup-experience

This project for storage my setup experience, error met-and-solve in developing end to end AI, software project

ai computer-vision cuda deep-learning software

Last synced: 10 Jun 2026

https://github.com/jesuscopado/parallel-programming

My solutions for the course Programming Parallel Computers at Aalto University (http://ppc.cs.aalto.fi/). Grade: 5/5

cpp cuda image-segmentation median-filter sorting-algorithms

Last synced: 19 Apr 2026

https://github.com/andreeo/parallel-computing-cuda

Programs in terminal applying the parallel programming model with the CUDA arquitecture

c cpp cuda docker lineal-search parallel-computing parallel-reduction rank-sort-algorithm

Last synced: 09 Apr 2026

https://github.com/nwpu66/cookiekiss-engine

CookieKiss Engine include a render and other small tech related to compute graphic.

compute-graphics cpp cuda opengl vulkan

Last synced: 09 Apr 2026

https://github.com/alan-cooney/python-cuda-starter-template

Python CUDA Starter Template

cuda deep-learning

Last synced: 30 Mar 2025

https://github.com/ibrar-syed/complete_deep-learning-nvidia_gpu-setup-linux

Full setup for a deep learning environment on Ubuntu Linux with CUDA, cuDNN, TensorRT, and TensorFlow GPU. Includes scripts, test code, and environment configuration

ai bash conda cuda cudnn deep-learning environment-setup gcc gpu jupyter linux machine-learning nvidia-cuda nvidia-gpu pytorch setup-script tensorflow tensorrt

Last synced: 09 Apr 2026

https://github.com/timdev-r/cv-ground-truth-extraction

(Dump) Helper for ground truth extraction, movement analytics and silhouette visual demonstration

computer-vision cuda ground-truth intel-realsense pandas python

Last synced: 18 Apr 2026

https://github.com/datasagess/fic

NLP Hackaton \w NN + FastAPI + Docker

catboost cuda docker fastapi lstm python pytorch rapidfuzz tensorflow

Last synced: 08 Aug 2025

https://github.com/notargets/gocca

Go bindings for OCCA - Portable parallel programming framework

bindings cfd cgo cuda golang gpu hpc occa opencl parallel-computing

Last synced: 20 Jan 2026

https://github.com/gaaniruddha/mphil-gpu-imager

This repository contains code for project #1 of MPhil: test-version of GPU imager for a single time-step, single-channel and single time-step, multi-channel.

astronomy benchmarks cuda cufft google-sheets gpu-imager imaging-astronomy interferometry radio-astronomy

Last synced: 11 Jun 2026

https://github.com/dmitryyurov/bitonic-cuda

An implementation of bitonic search on CUDA

cuda gpu-programming sorting-algorithms

Last synced: 02 Oct 2025

https://github.com/conan-kiln/kiln

An actively maintained fork of ConanCenter with an emphasis on CV, ML and robotics capabilities on edge devices

computer-vision conan cuda machine-learning oneapi packaging robotics rust scientific-computing

Last synced: 02 Oct 2025

https://github.com/brave-tarnished/gpu-accelerated-opc

Optical Proximity Correction (OPC) is a photolithography technique that modifies photomask geometry to counteract diffraction and process effects, ensuring accurate printing of patterns on the wafer. This work demonstrates a proof of concept showing how using a GPU-based approach can significantly speed up these modifications compared to a CPU.

cpp cuda gpu-acceleration photolithography semiconductors

Last synced: 02 Oct 2025

https://github.com/hit07/ml-dl-torch

This repository contains comprehensive understanding of Machine Leaning, DeepLeaning using Pytorch

computer-vision convolutional-neural-networks cuda neural-networks pytorch

Last synced: 28 Feb 2025

https://github.com/lordofhyphens/gpu-path-delay-coverage

CUDA-based Path Delay Fault Coverage

cpp cuda gpgpu moderngpu

Last synced: 04 May 2026

https://github.com/sankeer28/pptx-text-audio-transcriber

Extract text and transcribe audio from PowerPoint presentations using OpenAI Whisper.

audio-transcription cuda openai-whisper powerpoint pptx-parser

Last synced: 02 Oct 2025

https://github.com/usman619/pdc

Parallel and Distributed Computing

cuda distributed-computing distributed-systems nextcloud

Last synced: 11 Apr 2026

https://github.com/desmondjs/cuda_mceliece_kem

CUDA-Accelerated McEliece KEM 🔑 | Post-Quantum Cryptography on GPU Implementation of Classic McEliece key encapsulation, encryption, decryption, and decapsulation on CPU & GPU with CUDA, including benchmarking scripts and full FYP2 report

academic-project benchmarking classic-mceliece cuda fyp gpu-acceleration kem pqc

Last synced: 02 Oct 2025

https://github.com/nvaranki/cmmx

CUDA matrix multiplication (official guide, modified)

cuda cuda-kernels

Last synced: 08 Aug 2025

https://github.com/f-koehler/itesol

WIP: Iterative eigensolvers for C++20, Python and CUDA

cpp20 cuda eigenvalues linear-algebra python

Last synced: 08 Nov 2025

https://github.com/cerit-sc/scipion-docker

Scipion (Cryo em image processing framework (https://scipion.i2pc.es/)) adapted to run in Kubernetes.

cryo-em cryoem cuda desktop kubernetes scipion vnc

Last synced: 02 Aug 2025

https://github.com/empenoso/doorcam-face-report

Пример проекта по распознаванию лиц с CUDA-ускорением. Включает скрипты для автоматической сборки dlib и анализа видео на GPU

cuda dlib dlib-face-detection

Last synced: 19 May 2026

https://github.com/tornikeo/sample-openmp-in-cuda

Sample of using OpenMP and CUDA: single GPU, multiple CPU

cuda meson openmp

Last synced: 01 Aug 2025

https://github.com/9prady9/imageconvolve

Qt app for previewing Image convolution. Uses CUDA for convolution.

c-plus-plus convolution cuda desktop-app qt

Last synced: 03 May 2026

https://github.com/jarmak-personal/vibespatial

GPU-first spatial analytics for Python. Drop-in GeoPandas replacement powered by runtime-compiled CUDA kernels

cccl cuda geodataframe geopandas geospatial gpu gpu-computing nvrtc python spatial-analytics

Last synced: 21 Apr 2026

https://github.com/mvishiu11/kmeans-clustering

K-Means Clustering with both GPU (CUDA) and CPU implementations

cuda kmeans-clustering

Last synced: 15 Mar 2025

https://github.com/illagrenan/cuda-90-cudnn7-runtime-1604-py36

Ubuntu 16.04 with Python 3.6 and CUDA9 Dockerfile

cuda dockerfile python ubuntu

Last synced: 03 May 2026

https://github.com/abhiram-kandiyana/cuda-blast-2024

Reimplementation of NCBI BLAST with CUDA backend for faster retrieval

blast cuda gpu-acceleration parallel-processing

Last synced: 15 Mar 2025

https://github.com/shambac/shamboflow

Fierce tensorflow competitor

cuda cupy machine-learning numpy pypi-package

Last synced: 19 Feb 2026

https://github.com/zhaocc1106/cuxx-programing

一些cuda库的样例,cuda、cublas、cublaslt、cusparse...

cublas cublaslt cuda cusparse

Last synced: 23 Mar 2025

https://github.com/zhaocc1106/cuda-programming

Learning cuda programming

cuda nvidia

Last synced: 23 Mar 2025

https://github.com/luis-kr/depthmap

Depth map estimation tool using Depth-Anything-V2. Generate accurate depth maps from images with support for both relative and metric depth measurements.

cuda depth-anything depth-estimation depth-map image-processing python pytorch

Last synced: 08 Feb 2026

https://github.com/proafxin/cuda-docker

High performance computing Images with pycuda and tensorrt preinstalled

cuda docker dockerfile libcudnn nvidia-tensorrt pycuda python tensorrt

Last synced: 11 Apr 2026

https://github.com/BardiFarsi/ThreadPoolManager

ThreadPoolManager is a C++ project that implements an efficient multi-threading system using a thread pool for generic functions of the same type and different tasks. It includes task management, synchronization mechanisms, and thread-safe logging to demonstrate concurrent task execution.

cpp cpp17 cpp20 cuda cuda-programming memory-management multiprocessing multithreading parallel-computing parallel-processing parallel-programming thread thread-pool thread-safety threadpool threads threadsafe

Last synced: 15 May 2025

https://github.com/macaycz/nn

A lightweight, GPU-accelerated machine learning library built with CUDA.

cuda deep-learning gpu machine-learning neural-network

Last synced: 25 Jul 2025

https://github.com/vladd12/libexecstd

Modern C++ library for using an execution context of computer devices

cpp cpp17 cuda gpu-acceleration gpu-computing

Last synced: 06 May 2026

https://github.com/psteinb/gtc2017

Slides for my presentation at GTC 2017 from May 8-11 in Silicon Valley

compression cuda ffmpeg gpu gpu-computing h264 h265 microscopes spim

Last synced: 03 May 2026

https://github.com/malolm/football-player-detection-with-yolov8

Football player detection YOLOv8 fine-tuning

cuda jupyterlab python3 yolov8-detection

Last synced: 07 May 2026

https://github.com/faresargus/artaxerxes

Adaptive high-performance stress tester "artaxerxes" supports GPU, io_uring, DPDK, and eBPF/XDP for advanced cybersecurity labs. Ideal for network testing. 🚀🛠️

cuda cuda-programming cybersecurity cybersecurity-education cybersecurity-tools dpdk ebpf educational github-config high-performance network-security network-security-tool penetration-testing penetration-testing-framework penetration-testing-tools stress-testing

Last synced: 24 Jul 2025

https://github.com/gammahazard/locate-anything

Sleek, mobile-friendly web UI for NVIDIA LocateAnything-3B — open-vocabulary object detection & grounding on your own GPU, via one docker compose up.

bounding-boxes computer-vision cuda docker fastapi gpu grounding locate-anything machine-learning nvidia object-detection ocr open-vocabulary-detection react self-hosted tailwindcss typescript vision-language-model web-ui

Last synced: 28 May 2026

https://github.com/drc0ns0le/rtxvideoprocessor

CLI tool to apply NVIDIA RTX VSR and TrueHDR processing to video files

cuda ffmpeg hdr nvidia rtx upscale

Last synced: 20 Apr 2026

https://github.com/lttofu/cosmic

Fast, lightweight GUI-based C++ Ethereum ERC918 token miner for Win64 | CUDA GPUs | CPUs | Pool | Solo Mining

0xbitcoin 0xbtc cplusplus cplusplus-cli cpuminer cuda erc20 erc918 ethereum ethereum-token gpuminer gui pool-mining solo-mining windows windows-10 windows-7 windows-gui winforms

Last synced: 08 Apr 2026

https://github.com/sahil-rajwar-2004/vector-cuda

vector calculation with GPU acceleration using CUDA

c cpp11 cuda cuda-kernels cuda-programming nvcc

Last synced: 15 May 2025

https://github.com/neel-dandiwala/cuda-programs

Miscellaneous programs that grasp the concept of Parallel Computing

cuda gpu-programming parallel-programming

Last synced: 16 May 2025

https://github.com/tchung1970/sd-cli-cuda

CUDA-accelerated Stable Diffusion plugin for wavespeed-desktop

cuda gpu linux nvidia stable-diffusion

Last synced: 09 May 2026

https://github.com/bikrammajhi/100-days-of-gpu

This is my 🔥 100 Days of GPU — a wild, hands-on journey through CUDA kernels, Triton spells, and PTX sorcery.

cuda nsight-compute ptx triton

Last synced: 18 Jun 2025

https://github.com/awikramanayake/optimized-matrix-mult

Optimizing matrix multiplication using parallelism and SIMD (AVX2, CUDA)

avx2 cuda matrix-multiplication

Last synced: 22 May 2026

https://github.com/manishklach/gb300-rl-runtime

Close-to-metal C/CUDA lab for RL inference fast paths: persistent GPU workers, hugepage KV arenas, cacheline-aware command rings, and async reward handoff. Goal: remove page faults, malloc/free, scheduler wakeups, CPU round-trips, and KV migration from the per-token path.

ai-infrastructure close-to-metal cuda gb300 gpu-inference hpc lock-free nvlink reinforcement-learning spsc-queue

Last synced: 09 Jun 2026

https://github.com/curiousci/wind

Multicore Systems Programming project

cuda mpi openmp pthreads

Last synced: 25 Dec 2025

https://github.com/neugence/acehub

AI Champions for Excellence: Fresh, informative courses and content designed to help developers, researchers, and leaders advance in the field of AI.

ai cuda cv ml mlops nlp pytorch rl rlhf tensorflow

Last synced: 05 Jan 2026