CUDA
CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.
- GitHub: https://github.com/topics/cuda
- Wikipedia: https://en.wikipedia.org/wiki/CUDA
- Created by: Nvidia
- Released: June 23, 2007
- Related Topics: nvcc,
- Last updated: 2026-06-30 00:07:24 UTC
- JSON Representation
https://github.com/theogravity/dual-rtx-6000-blackwell-gemma-4-31b-it-nvfp4
Optimized vLLM setup for Gemma 4 31B NVFP4 with MTP on dual RTX PRO 6000 Blackwell using vllm and docker: native FP4 Tensor Cores, Multi-Token Prediction (96.5% acceptance rate), and prefix caching. Includes benchmark results and replication scripts.
am5 amd blackwell cuda docker fp4 gemma gemma4 llm-inference multi-token-prediction nvfp4 prefix-caching rtx-6000 speculative-decoding tensor-parallel vllm
Last synced: 11 May 2026
https://github.com/realdougeubanks/unmanic.plugin.encoder_video_hevc_nvenc_gpu
Unmanic plugin: H.265/HEVC encoder using NVIDIA hevc_nvenc with a true end-to-end GPU pipeline. Fork of Josh5/unmanic.plugin.encoder_video_hevc_nvenc that adds -hwaccel_output_format cuda when NVDEC HW decoding is enabled, keeping decoded frames in GPU memory through NVENC. Drop-in replacement with sensible defaults and full settings parity.
cuda ffmpeg hardware-acceleration nvdec nvenc nvidia unmanic unmanic-plugin video-transcoding
Last synced: 12 May 2026
https://github.com/skailasa/msc-thesis
A modular thesis
cuda fast-multipole-method kernel-independent numba python3
Last synced: 12 May 2026
https://github.com/nourmorsy/convolution-neural-network-cuda
Code for optimization to CNN using CUDA
Last synced: 13 May 2026
https://github.com/kenmalik/cuda-dr-bcg
CUDA C++ implementation of the DR-BCG algorithm for numerically solving linear systems.
cpp cuda hpc numerical-methods
Last synced: 19 Apr 2026
https://github.com/wojcikmikolaj/particles-in-a-jar
Collisions between particles simulated on GPU.
algorithms-and-data-structures collision-detection collisions cuda gpu-programming
Last synced: 25 May 2026
https://github.com/prateekshukla1108/thunderkittens-docs
Documentation for ThunderKittens framework
Last synced: 18 Mar 2025
https://github.com/moshidev/acap
Prácticas de la asignatura Arquitectura y Computación de Altas Prestaciones
cuda homework-assignments mpi pthreads
Last synced: 30 Mar 2025
https://github.com/ragu-manjegowda/parallel-programming
Assignments and Projects of Udacity's Introduction to Parallel Programming Course
cuda gpu-programming nvidia-cuda nvidia-gpu udacity-parallel-programming
Last synced: 25 May 2026
https://github.com/unknownnuts/meshsdk
Mesh processing library with extra `C/C#/JS/TS/PYTHON` bindings.
cuda dicom electron emscripten mesh modelling pybind11 stl stomatology threejs wasm
Last synced: 10 Apr 2026
https://github.com/followthesapper/atlas-q
GPU-accelerated quantum tensor network simulator with adaptive MPS
ai cuda gpu-acceleration high-performance-computing matrix-product-states nisq python pytorch qaoa quantum-algorithms quantum-computing quantum-simulator scientific-computing shors-algorithm tensor-networks triton vqe
Last synced: 20 Jan 2026
https://github.com/separatrixxx/pgp_labs_7_sem
👓 Laboratory work for the 7 semester of MAI on PGP and PDP
Last synced: 15 May 2026
https://github.com/m-torhan/advent-of-code
🎄 Solutions for the Advent of Code
advent-of-code advent-of-code-2024 cuda
Last synced: 07 Apr 2025
https://github.com/jpodivin/gputomata
Cellular automata running on CUDA capable GPUs
cellular-automata cellular-automaton cuda
Last synced: 07 Nov 2025
https://github.com/kylesayrs/pttp
PyTorch Tensor Profiler with fully-supported memory timelines and events
Last synced: 07 May 2026
https://github.com/kis-balazs/cuda-research
CUDA Research & Code. Course-style structured. Inspiration from @Infatoshi.
Last synced: 14 May 2025
https://github.com/mcp-tool-shop-org/gpu-container
Model-aware inference memory-placement planner for single-GPU rigs: profile hardware + model, generate explicit VRAM/RAM/NVMe placement plans across runtimes (llama.cpp/vLLM/...), and prove them with a measured receipt. Not VRAM overflow - declared placement.
cuda gpu inference llama-cpp llm moe offload vram wsl2
Last synced: 09 Jun 2026
https://github.com/bjornmelin/ai-system-design
🎨 Large-scale AI system architectures and implementations. Features distributed training systems, multi-GPU pipelines, and efficient resource management. 🏗️
architecture cuda distributed-systems engineering gpu-computing production scalability system-design
Last synced: 23 Jul 2025
https://github.com/derek-palmer/dvr-scan-file-organizer
DVR-Scan-Organizer is a Dockerized extension for DVR-Scan, designed to process multiple video files and organize output in a structured format.
cuda dvr dvr-scan multimedia opencv opencv-python python video video-processing
Last synced: 01 May 2026
https://github.com/GTruf/Driver-Drowsiness-Detector
Prototype of an intelligent safety system for detecting driver drowsiness
cpp cuda cudnn deep-learning driver-drowsiness-detection driver-drowsiness-detector drowsiness-detection face-recognition image-recognition machine-learning neural-network nvidia-cuda object-recognition opencv qt6 recognition-neural-network yolo yolov10 yolov5 yolov9
Last synced: 14 Mar 2025
https://github.com/ngoma1713/rushirb2001
🤖 Explore advanced AI and machine learning solutions for protein modeling and medical applications, developed by a dedicated data science graduate student.
computer-vision-opencv cuda data-science-portfolio deep-learning generative-ai machine-learning medical-ai protein-modeling published-researcher pytorch quantum-ml rag-chatbot tensorflow
Last synced: 02 May 2026
https://github.com/bjornmelin/llm-gpu-optimization
🚄 Advanced LLM optimization techniques using CUDA. Features efficient attention mechanisms, custom CUDA kernels for transformers, and memory-efficient training strategies. ⚡
cuda deep-learning gpu-acceleration llm-optimization machine-learning memory-optimization parallel-computing transformers
Last synced: 18 Mar 2025
https://github.com/elymsyr/auv_ws
An open-source simulation and control workspace for an Autonomous Underwater Vehicle (AUV) built on ROS 2 Humble and Gazebo. It features a high-fidelity dynamics model and an advanced AI-based motion controller (FossenNet) that uses a pre-trained LibTorch model to imitate a NL-MPC for real-time, high-performance manoeuvring.
autonomous-vehicles auv control-systems cpp cuda deep-learning gazebo imitation-learning libtorch mpc python robotics ros2 simulation
Last synced: 15 Apr 2026
https://github.com/quik-fe/node-nvidia-smi
Node wrapper around nvidia-smi.
cuda gpu nodejs nvidia nvidia-smi typescript
Last synced: 19 Feb 2026
https://github.com/deepschneider/tinygrad-universal
Universal version of Tinygrad with CUDA and OpenCL support
autograd automatic-differentiation cuda pycuda pyopencl tinygrad tinygrad-cuda
Last synced: 06 Mar 2025
https://github.com/maneeshsit/pcie
Modify run:ai and other FOSS projects code for use with PCIe card-based AI accelerators for both inference and training
cuda cxl cxl-mem distro exo k3s k8s kestra llamacpp llm-d mpi4py mpio onnxoptimizer opentelemetry-ebpf-profiler paxos-cluster pcie photonics-computing runai visualize vllm
Last synced: 24 Aug 2025
https://github.com/mjun0812/setup-cuda
Set up a specific version of NVIDIA CUDA in GitHub Actions on Linux x86_64, arm64 (Debian and Fedora based distribution) and Windows
action cuda cuda-toolkit github-actions
Last synced: 13 Jan 2026
https://github.com/sshoecraft/shepherd
An interactive multi-backend LLM runtime with intelligent cache eviction and persistent retrieval-augmented memory.
anthropic cli cpp cuda gemini grok inference kv-cache llama-cpp llm mcp ollama openai openai-server rag smart-evictions tensorrt tool-calling ulimited-context
Last synced: 10 Apr 2026
https://github.com/kmock930/texture-image-comparison
This project aims to build a model which classifies the type of an unseen image as accurate as possible, by implementing, evaluating, and comparing amongst 2 different multi-layer perceptron Neural Networks.
computer-vision conda confusion-matrix convolutional-neural-networks cuda image-preprocessing keras keras-tensorflow learning-curve-analysis matplotlib multi-layer-perceptron neural-network pickle-file python3 skimage
Last synced: 12 Apr 2026
https://github.com/camille-004/cusprec
🏁 Sparse signal recovery library written in PyCUDA.
cuda ml python signal-processing sparse-recovery
Last synced: 18 Jan 2026
https://github.com/sid911/neuralnetworkcpp
A small experiment to learn about neural networks and their runtimes in cpp
cpp cuda machine-learning neural-network
Last synced: 20 Aug 2025
https://github.com/flavienbwk/tensorflow2-cuda-10.2-docker
Tensorflow 2.3, CUDA 10.2, Docker compatible image
cuda docker python3 tensorflow ubuntu1804
Last synced: 11 Apr 2026
https://github.com/pvgupta24/parallel-programming
Basic algorithms for parallel programming in CUDA C++, Java and OpenMP
cuda openmp parallel-programming
Last synced: 19 Aug 2025
https://github.com/dmalexx/cuda_check
How can you check if CUDA is available in Tensorflow
Last synced: 10 Apr 2026
https://github.com/promptromp/aws-bootstrap-g4dn
fast and easy bootstrapping of AWS EC2 instances for CUDA development. Use as a CLI, as a programmatic SDK, or as an Agent Skill!
aws cuda ec2 jupyter-notebook machine-learning mlops python
Last synced: 21 Feb 2026
https://github.com/ojeda-e/fokker-planck
Numerical solution of the Fokker-Planck equation in large times using CUDA/C.
Last synced: 17 Aug 2025
https://github.com/alessiobugetti/integral-image-processing
Implements sequential and parallel integral image computation in C++ and Python, utilizing CUDA for parallel computation on GPU
cuda gpu-acceleration integral-image numba parallel-computing pycuda
Last synced: 24 May 2026
https://github.com/i-m-iron-man/abmax
Abmax is an agent-based modelling framework in Jax, focused on dynamic population size
abm agent agent-based agent-based-modeling agent-based-simulation agents cuda jax python
Last synced: 04 Oct 2025
https://github.com/sonhm3029/setup-experience
This project for storage my setup experience, error met-and-solve in developing end to end AI, software project
ai computer-vision cuda deep-learning software
Last synced: 10 Jun 2026
https://github.com/jesuscopado/parallel-programming
My solutions for the course Programming Parallel Computers at Aalto University (http://ppc.cs.aalto.fi/). Grade: 5/5
cpp cuda image-segmentation median-filter sorting-algorithms
Last synced: 19 Apr 2026
https://github.com/andreeo/parallel-computing-cuda
Programs in terminal applying the parallel programming model with the CUDA arquitecture
c cpp cuda docker lineal-search parallel-computing parallel-reduction rank-sort-algorithm
Last synced: 09 Apr 2026
https://github.com/nwpu66/cookiekiss-engine
CookieKiss Engine include a render and other small tech related to compute graphic.
compute-graphics cpp cuda opengl vulkan
Last synced: 09 Apr 2026
https://github.com/alan-cooney/python-cuda-starter-template
Python CUDA Starter Template
Last synced: 30 Mar 2025
https://github.com/ibrar-syed/complete_deep-learning-nvidia_gpu-setup-linux
Full setup for a deep learning environment on Ubuntu Linux with CUDA, cuDNN, TensorRT, and TensorFlow GPU. Includes scripts, test code, and environment configuration
ai bash conda cuda cudnn deep-learning environment-setup gcc gpu jupyter linux machine-learning nvidia-cuda nvidia-gpu pytorch setup-script tensorflow tensorrt
Last synced: 09 Apr 2026
https://github.com/timdev-r/cv-ground-truth-extraction
(Dump) Helper for ground truth extraction, movement analytics and silhouette visual demonstration
computer-vision cuda ground-truth intel-realsense pandas python
Last synced: 18 Apr 2026
https://github.com/datasagess/fic
NLP Hackaton \w NN + FastAPI + Docker
catboost cuda docker fastapi lstm python pytorch rapidfuzz tensorflow
Last synced: 08 Aug 2025
https://github.com/gaaniruddha/mphil-gpu-imager
This repository contains code for project #1 of MPhil: test-version of GPU imager for a single time-step, single-channel and single time-step, multi-channel.
astronomy benchmarks cuda cufft google-sheets gpu-imager imaging-astronomy interferometry radio-astronomy
Last synced: 11 Jun 2026
https://github.com/dmitryyurov/bitonic-cuda
An implementation of bitonic search on CUDA
cuda gpu-programming sorting-algorithms
Last synced: 02 Oct 2025
https://github.com/conan-kiln/kiln
An actively maintained fork of ConanCenter with an emphasis on CV, ML and robotics capabilities on edge devices
computer-vision conan cuda machine-learning oneapi packaging robotics rust scientific-computing
Last synced: 02 Oct 2025
https://github.com/brave-tarnished/gpu-accelerated-opc
Optical Proximity Correction (OPC) is a photolithography technique that modifies photomask geometry to counteract diffraction and process effects, ensuring accurate printing of patterns on the wafer. This work demonstrates a proof of concept showing how using a GPU-based approach can significantly speed up these modifications compared to a CPU.
cpp cuda gpu-acceleration photolithography semiconductors
Last synced: 02 Oct 2025
https://github.com/hit07/ml-dl-torch
This repository contains comprehensive understanding of Machine Leaning, DeepLeaning using Pytorch
computer-vision convolutional-neural-networks cuda neural-networks pytorch
Last synced: 28 Feb 2025
https://github.com/lordofhyphens/gpu-path-delay-coverage
CUDA-based Path Delay Fault Coverage
Last synced: 04 May 2026
https://github.com/sankeer28/pptx-text-audio-transcriber
Extract text and transcribe audio from PowerPoint presentations using OpenAI Whisper.
audio-transcription cuda openai-whisper powerpoint pptx-parser
Last synced: 02 Oct 2025
https://github.com/usman619/pdc
Parallel and Distributed Computing
cuda distributed-computing distributed-systems nextcloud
Last synced: 11 Apr 2026
https://github.com/desmondjs/cuda_mceliece_kem
CUDA-Accelerated McEliece KEM 🔑 | Post-Quantum Cryptography on GPU Implementation of Classic McEliece key encapsulation, encryption, decryption, and decapsulation on CPU & GPU with CUDA, including benchmarking scripts and full FYP2 report
academic-project benchmarking classic-mceliece cuda fyp gpu-acceleration kem pqc
Last synced: 02 Oct 2025
https://github.com/nvaranki/cmmx
CUDA matrix multiplication (official guide, modified)
Last synced: 08 Aug 2025
https://github.com/f-koehler/itesol
WIP: Iterative eigensolvers for C++20, Python and CUDA
cpp20 cuda eigenvalues linear-algebra python
Last synced: 08 Nov 2025
https://github.com/cerit-sc/scipion-docker
Scipion (Cryo em image processing framework (https://scipion.i2pc.es/)) adapted to run in Kubernetes.
cryo-em cryoem cuda desktop kubernetes scipion vnc
Last synced: 02 Aug 2025
https://github.com/empenoso/doorcam-face-report
Пример проекта по распознаванию лиц с CUDA-ускорением. Включает скрипты для автоматической сборки dlib и анализа видео на GPU
Last synced: 19 May 2026
https://github.com/bfalls/img-compressor
GPU-accelerated JPEG compressor
cli-tool command-line compression cpp cpp-cuda-gpu-programming-parallel-computing cuda dct demo-project gpgpu gpu-programming high-performance-computing hpc image-compression image-processing jpeg parallel-computing
Last synced: 20 Apr 2026
https://github.com/tornikeo/sample-openmp-in-cuda
Sample of using OpenMP and CUDA: single GPU, multiple CPU
Last synced: 01 Aug 2025
https://github.com/9prady9/imageconvolve
Qt app for previewing Image convolution. Uses CUDA for convolution.
c-plus-plus convolution cuda desktop-app qt
Last synced: 03 May 2026
https://github.com/jarmak-personal/vibespatial
GPU-first spatial analytics for Python. Drop-in GeoPandas replacement powered by runtime-compiled CUDA kernels
cccl cuda geodataframe geopandas geospatial gpu gpu-computing nvrtc python spatial-analytics
Last synced: 21 Apr 2026
https://github.com/mvishiu11/kmeans-clustering
K-Means Clustering with both GPU (CUDA) and CPU implementations
Last synced: 15 Mar 2025
https://github.com/illagrenan/cuda-90-cudnn7-runtime-1604-py36
Ubuntu 16.04 with Python 3.6 and CUDA9 Dockerfile
Last synced: 03 May 2026
https://github.com/abhiram-kandiyana/cuda-blast-2024
Reimplementation of NCBI BLAST with CUDA backend for faster retrieval
blast cuda gpu-acceleration parallel-processing
Last synced: 15 Mar 2025
https://github.com/shambac/shamboflow
Fierce tensorflow competitor
cuda cupy machine-learning numpy pypi-package
Last synced: 19 Feb 2026
https://github.com/zhaocc1106/cuxx-programing
一些cuda库的样例,cuda、cublas、cublaslt、cusparse...
Last synced: 23 Mar 2025
https://github.com/luis-kr/depthmap
Depth map estimation tool using Depth-Anything-V2. Generate accurate depth maps from images with support for both relative and metric depth measurements.
cuda depth-anything depth-estimation depth-map image-processing python pytorch
Last synced: 08 Feb 2026
https://github.com/proafxin/cuda-docker
High performance computing Images with pycuda and tensorrt preinstalled
cuda docker dockerfile libcudnn nvidia-tensorrt pycuda python tensorrt
Last synced: 11 Apr 2026
https://github.com/BardiFarsi/ThreadPoolManager
ThreadPoolManager is a C++ project that implements an efficient multi-threading system using a thread pool for generic functions of the same type and different tasks. It includes task management, synchronization mechanisms, and thread-safe logging to demonstrate concurrent task execution.
cpp cpp17 cpp20 cuda cuda-programming memory-management multiprocessing multithreading parallel-computing parallel-processing parallel-programming thread thread-pool thread-safety threadpool threads threadsafe
Last synced: 15 May 2025
https://github.com/baudneo/zomi-server
FastAPI ML server designed for ZoneMinder (zomi-client)
alpr coral-tpu cuda face-detection face-recognition fastapi machine-learning object-detection onnxruntime opencv pydantic-v2 tensorrt torch zoneminder
Last synced: 18 Jan 2026
https://github.com/macaycz/nn
A lightweight, GPU-accelerated machine learning library built with CUDA.
cuda deep-learning gpu machine-learning neural-network
Last synced: 25 Jul 2025
https://github.com/vladd12/libexecstd
Modern C++ library for using an execution context of computer devices
cpp cpp17 cuda gpu-acceleration gpu-computing
Last synced: 06 May 2026
https://github.com/psteinb/gtc2017
Slides for my presentation at GTC 2017 from May 8-11 in Silicon Valley
compression cuda ffmpeg gpu gpu-computing h264 h265 microscopes spim
Last synced: 03 May 2026
https://github.com/jaidevd/ipec-fdp
cuda hpc keras mapreduce numba spark tensorflow
Last synced: 11 Apr 2026
https://github.com/malolm/football-player-detection-with-yolov8
Football player detection YOLOv8 fine-tuning
cuda jupyterlab python3 yolov8-detection
Last synced: 07 May 2026
https://github.com/faresargus/artaxerxes
Adaptive high-performance stress tester "artaxerxes" supports GPU, io_uring, DPDK, and eBPF/XDP for advanced cybersecurity labs. Ideal for network testing. 🚀🛠️
cuda cuda-programming cybersecurity cybersecurity-education cybersecurity-tools dpdk ebpf educational github-config high-performance network-security network-security-tool penetration-testing penetration-testing-framework penetration-testing-tools stress-testing
Last synced: 24 Jul 2025
https://github.com/gammahazard/locate-anything
Sleek, mobile-friendly web UI for NVIDIA LocateAnything-3B — open-vocabulary object detection & grounding on your own GPU, via one docker compose up.
bounding-boxes computer-vision cuda docker fastapi gpu grounding locate-anything machine-learning nvidia object-detection ocr open-vocabulary-detection react self-hosted tailwindcss typescript vision-language-model web-ui
Last synced: 28 May 2026
https://github.com/lttofu/cosmic
Fast, lightweight GUI-based C++ Ethereum ERC918 token miner for Win64 | CUDA GPUs | CPUs | Pool | Solo Mining
0xbitcoin 0xbtc cplusplus cplusplus-cli cpuminer cuda erc20 erc918 ethereum ethereum-token gpuminer gui pool-mining solo-mining windows windows-10 windows-7 windows-gui winforms
Last synced: 08 Apr 2026
https://github.com/sahil-rajwar-2004/vector-cuda
vector calculation with GPU acceleration using CUDA
c cpp11 cuda cuda-kernels cuda-programming nvcc
Last synced: 15 May 2025
https://github.com/neel-dandiwala/cuda-programs
Miscellaneous programs that grasp the concept of Parallel Computing
cuda gpu-programming parallel-programming
Last synced: 16 May 2025
https://github.com/tchung1970/sd-cli-cuda
CUDA-accelerated Stable Diffusion plugin for wavespeed-desktop
cuda gpu linux nvidia stable-diffusion
Last synced: 09 May 2026
https://github.com/bikrammajhi/100-days-of-gpu
This is my 🔥 100 Days of GPU — a wild, hands-on journey through CUDA kernels, Triton spells, and PTX sorcery.
cuda nsight-compute ptx triton
Last synced: 18 Jun 2025
https://github.com/awikramanayake/optimized-matrix-mult
Optimizing matrix multiplication using parallelism and SIMD (AVX2, CUDA)
avx2 cuda matrix-multiplication
Last synced: 22 May 2026
https://github.com/manishklach/gb300-rl-runtime
Close-to-metal C/CUDA lab for RL inference fast paths: persistent GPU workers, hugepage KV arenas, cacheline-aware command rings, and async reward handoff. Goal: remove page faults, malloc/free, scheduler wakeups, CPU round-trips, and KV migration from the per-token path.
ai-infrastructure close-to-metal cuda gb300 gpu-inference hpc lock-free nvlink reinforcement-learning spsc-queue
Last synced: 09 Jun 2026