CUDA
CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.
- GitHub: https://github.com/topics/cuda
- Wikipedia: https://en.wikipedia.org/wiki/CUDA
- Created by: Nvidia
- Released: June 23, 2007
- Related Topics: nvcc,
- Last updated: 2026-07-01 00:07:09 UTC
- JSON Representation
https://github.com/haleelrah/Vision-pro-MAX
A Raspberry Pi-based object detection system for assisting visually impaired individuals. This project utilizes YOLO object detection and a Hailo 8L TPU to identify obstacles like manholes, potholes, and bumps, providing real-time audio feedback to aid navigation.
bash computer-vision cuda fine-tuning jupyter-notebook object-detection opencv python pytorch raspberry-pi rpi-camera ssh text-to-speech ultralytics yolo yolov8
Last synced: 30 Dec 2025
https://github.com/programmer-rd-ai/object-detection-framework
A Pythonic approach to object detection using Detectron2, a clean, modular framework for training and deploying computer vision models. DetectX simplifies the complexity of object detection while maintaining high performance and extensibility.
coco-dataset computer-vision computer-vision-library cuda deep-learning detectron2 faster-rcnn gpu-accelerated machine-learning ml-framework object-detection object-recognition python3 pytorch retinanet
Last synced: 24 Sep 2025
https://github.com/chintak/theano-lasagne-docker
Dockerfile for Lasagne with Cuda support. Look at the branches for relevant Dockerfiles - ``cpu`` and ``gpu``.
caffe cuda docker dockerfile install-script lasagne machine-learning machine-learning-library theano
Last synced: 10 Apr 2025
https://github.com/gogolb/ee147
Intro to GPU Computing
c cuda cuda-kernels cuda-toolkit gpu-computing gpu-programming university-course
Last synced: 01 May 2026
https://github.com/rhysdg/whisper-onnx-python
A low-footprint GPU accelerated Speech to Text Python package for the Jetpack 5 era bolstered by an optimized graph
ai chatbot cuda machine-learning onnxruntime speech-to-text whisper
Last synced: 16 Feb 2026
https://github.com/mortafix/quickshift
A working implementation of Quickshift algorithm in CUDA, GPU-compatible.
Last synced: 08 May 2026
https://github.com/brendanbignell/cuda_montecarlooptionpricer
CUDA Monte Carlo Barrier Option Pricing Demo & Jupyer lab ML models
cuda deep-learning ml pytorch quantitative-finance xgboost-regression
Last synced: 19 Apr 2026
https://github.com/tommaso-dognini/polimi_gpu101_courseproject
Polimi Passion In Action GPU101 course project. Implementation in CUDA of BFS algorithm
cpp cuda cuda-programming parallel-computing
Last synced: 10 Apr 2026
https://github.com/kpetridis24/non-local-means
Gaussian noise image-filtering using GPU
cuda gaussian-noise gpu-computing image-denoising image-processing non-local-means parallel-computing pixels
Last synced: 29 May 2026
https://github.com/a-nau/python-cuda-envs
Script to automatically map a specific CUDA version to a Conda Python environment.
anaconda anaconda-environment cuda installation installation-script python python-environment python3
Last synced: 18 Apr 2026
https://github.com/jessetg/cuda-practice
Working through the chapters of Cuda by Example
c cpp cuda cuda-by-example gpgpu
Last synced: 01 May 2026
https://github.com/antonioberna/nn-gpu-logic-gates
Neural Network implementation on GPU using CUDA C++ to learn logic gates operations
cpp cuda gpu logic-gates neural-networks nvidia
Last synced: 01 May 2026
https://github.com/alwaysai/jetpack-46-hacky-hour
NVIDIA’s Jetpack 4.6 capabilities and how to use them with EdgeIQ, alwaysAI Computer Vision framework.
alwaysai computer-vision cuda edge-computing jetpack tensorrt
Last synced: 01 May 2026
https://github.com/dhruvsrikanth/monte-carlo-ray-tracing
In this repository, you will find a serial and distributed GPU-based implementation of the ray tracing simulation.
c cpp cuda gpu-computing gpu-programming high-performance-computing parallel-programming raytracing unified-memory-parallelism
Last synced: 01 May 2026
https://github.com/daelsepara/hipmandelbrot
GPU Implementation of Mandelbrot Fractal Generator with Benchmarking
amd cuda fractal gpu gpu-compute gpu-computing hip mandelbrot parallel-computing rocm sdk
Last synced: 20 Feb 2026
https://github.com/pvdberg1998/cufft_rust
A safe Rust wrapper around a subset of cuFFT.
Last synced: 19 Apr 2025
https://github.com/shahed-chy-suzan/psd-to-html--cuda
Cuda is a single page creative portfolio psd to html template which is built with HTML5 & CSS3. The site can be customized easily to suit your needs.
Last synced: 18 Jan 2026
https://github.com/gvvsnrnaveen/cuda
this repository contains the various programs that can written using CUDA Toolkit.
c cpp cuda nvcc nvidia-cuda nvidia-gpu
Last synced: 17 Jan 2026
https://github.com/abhisheknair10/occupancy.nn
An multi-step pipeline to train and inference Occupancy Networks
Last synced: 20 Jul 2025
https://github.com/villekf/helmet
High-dimensional Kalman filter toolbox (HELMET)
arrayfire cuda gpgpu kalman-filter kalman-smoother matlab octave opencl reconstruction scientific-computing state-estimation
Last synced: 01 May 2026
https://github.com/naidezhujimo/cuda-rewrite-fast-matrix-multiplication
This repository contains an optimized implementation of matrix multiplication using CUDA. The goal of this project is to provide a high-performance solution for matrix multiplication operations on NVIDIA GPUs.
Last synced: 26 Mar 2025
https://github.com/stanczakdominik/cuda_poisson
A 2D poisson solver via CUDA
Last synced: 29 Jun 2025
https://github.com/torotoki/simple-paged-attention
A simple implementation of PagedAttention purely written in CUDA and C++.
attention cpp cuda llm transformer
Last synced: 18 May 2026
https://github.com/hdelan/msc-hpc-final-project
In this project I implement a CUDA Lanczos method to approximate the matrix exponential. The matrix exponential is an important centrality measure for large, sparse graphs.
cuda graph-algorithms krylov-methods
Last synced: 12 Apr 2025
https://github.com/le-ander/msc_bioinfo-experimental_design
Using information theory to inform experimental design with GPU acceleration. Computing group project as part of the MSc in Bioinformatics and Theorectical Systems Biology at Imperial College London 2016/2017.
cuda experimental-design gpu-computing information-theory pycuda systems-biology
Last synced: 26 Apr 2026
https://github.com/yaronkoresh/definers
A comprehensive Python toolkit for AI, data processing, media manipulation, and system utilities.
artificial-intelligence cuda data-science deep-learning diffusers feature-extraction generative-ai gpu gradio image-generation machine-learning multimedia music-generation python-library pytorch scikit-learn toolkit transformers video-generation web-scraping
Last synced: 08 Apr 2026
https://github.com/lcsb-biocore/cufluxsampler.jl
GPU-accelerated algorithms for flux sampling in CUDA.jl
cobra cuda gpu julia metabolic-network metabolism sampling
Last synced: 02 May 2026
https://github.com/deepankaracharyya/6th_sem_assignments
c cuda data-mining postgresql-database python
Last synced: 02 May 2026
https://github.com/vietdoo/seam-carving-cuda
CUDA Seam Carving: Accelerating Image Resizing with GPU Computing
cc cuda cuda-programming gpu-computing parrallel-computing seam-carving
Last synced: 02 May 2026
https://github.com/saiccoumar/cuda-programming-exercises
Brief collection of GPU exercises (my reimplementation). Comes with relevant resources.
cuda cuda-programming nvcc nvidia
Last synced: 25 May 2026
https://github.com/straightchlorine/quantum-pipeline
A Python module for executing and monitoring quantum algorithms across local simulators and IBM Quantum platforms. Seamlessly handles data collection, organization, and streaming to Apache Kafka
apache-kafka apache-spark aws-s3 cuda docker gpu-acceleration ibm-cloud ibm-quantum minio qiskit qiskit-aer qiskit-nature quantum-computing visualizations vqe
Last synced: 08 Oct 2025
https://github.com/dansolombrino/gphungarian
A GPU-accelerated implementation of the Hungarian Algorithm, written in CUDA
Last synced: 31 Aug 2025
https://github.com/manishklach/intent-attention-kernel
Intent-aware attention research prototype that treats long-context inference as structured semantic blocks instead of a flat token stream, proving CPU-first correctness and analytical KV/FLOP savings before GPU kernel implementation.
agentic-ai ai-infrastructure attention block-attention cost-model cuda gpu-kernels inference kernel-research kv-cache llm-inference long-context python pytorch research semantic-attention sparse-attention systems transformers triton
Last synced: 28 May 2026
https://github.com/mala13f/statistical-learning-in-finance
This Repository contains all the codes, papers and related data for assignments done during the course.
cuda gpu-acceleration jupyter-notebook machine-learning python statistical-learning
Last synced: 12 Apr 2026
https://github.com/bjornmelin/deep-learning-evolution
🧠 Deep-Learning Evolution: Unified collection of TensorFlow & PyTorch projects, featuring custom CUDA kernels, distributed training, memory‑efficient methods, and production‑ready pipelines. Showcases advanced GPU optimizations, from foundational models to cutting‑edge architectures. 🚀
ai-research cuda data-science deep-learning distributed-training gan gpu-acceleration machine-learning model-optimization neural-networks python pytorch tensorflow training-pipeline transformers
Last synced: 09 May 2026
https://github.com/ashwani-rathee/imagesgpu.jl
Image Processing on GPU in Julia
cuda gpu image image-processing julia
Last synced: 11 Jul 2025
https://github.com/aliyoussef97/triton-hub
A container of various PyTorch neural network modules written in Triton.
cuda deep-learning openai pytorch triton triton-lang
Last synced: 30 Mar 2025
https://github.com/gabrielmaialva33/enton
Autonomous AI Robot Assistant — Vision, Voice, and Soul
ai autonomous-agent computer-vision cuda llm python pytorch robot stt tts whisper yolo
Last synced: 01 Apr 2026
https://github.com/mcp-tool-shop-org/backpropagate
Headless LLM fine-tuning in 3 lines — smart defaults, VRAM-aware batch sizing, multi-run SLAO, GGUF export for Ollama.
api cuda fine-tuning headless llm lora machine-learning ollama python qlora training unsloth web-security windows
Last synced: 31 May 2026
https://github.com/graiphic/graiphic-documentation
Graiphic Toolkits for LabVIEW provide advanced AI, GPU, and graph-oriented computing capabilities directly inside LabVIEW. Built on ONNX Runtime, they enable seamless integration of SOTA, Accelerator, and Deep Learning Toolkit for high-performance execution across CPUs, GPUs, and edge devices.
accelerator-toolkit ai-orchestration computer-vision cuda deep-learning directml edge-ai graph-computing hardware-acceleration high-performance-computing inference labview neural-networks onednn onnx onnxruntime openvino sota tensorrt training
Last synced: 22 Nov 2025
https://github.com/pratikvn/nla4hpc-exercises-framework
The exercises framework for the Numerical Linear Algebra for HPC course at Karlsruhe Institute of Technology.
cuda ginkgo homeworks hpc-course teaching
Last synced: 19 May 2026
https://github.com/giorgiogamba/parallel_programming
Experimenting with parallel programming
cuda cuda-kernels cuda-programming cuda-toolkit parallel parallel-computing parallel-processing parallel-programming visual-studio
Last synced: 18 Feb 2026
https://github.com/anne-andresen/multi-modal-cuda-c-gan
Raw C/cuda implementation of 3d GAN
3d 3d-models attention-mechanism c cross-attention cross-attention-c cuda gan gan-models low-level-programming medical-imaging multimodal-deep-learning pytorch transformer-pytorch transformers transformers-c
Last synced: 06 Jan 2026
https://github.com/dpetrosy/fractal
This project is a Fractal Visualizer developed in C++ with SFML and CUDA.
burning-ship cmake cmakelists cpp cpp-programming cpp-project cuda cuda-opengl cuda-programming fractal fractal-generation fractal-visualization julia mandelbox mandelbrot opengl opengl-project sfml sfml-library tricorn
Last synced: 21 Feb 2026
https://github.com/unvercan/ssd300-model-pytorch
SSD300 Model using PyTorch
cnn computer-vision convolutional-neural-networks cuda deep-learning image-processing neural-network object-detection opencv python pytorch single-shot-detection ssd ssd300
Last synced: 17 Mar 2025
https://github.com/applicative-systems/nixos-gpu-tests
GPU-enabled tests with CUDA in the NixOS integration test driver
amd cuda nix nixos nvidia nvidia-gpu radeon sandbox test test-automation test-automation-framework test-framework zluda
Last synced: 02 Apr 2026
https://github.com/exprays/atlas
Atlas is a specialized convolutional neural network designed for satellite image change detection
alembic celery cnn-for-visual-recognition cuda geospatial-visualization python pytorch tensors
Last synced: 28 Feb 2026
https://github.com/nickolasrm/gpuvscpumatrixmultiplication
CPU and GPU optimized matrix multiplication (AVX, transposition, CUDA and other)
avx comparison cuda hpc matrix multiplication
Last synced: 06 Sep 2025
https://github.com/fardinsabid/aleam
Aleam: True randomness for AI. Non-recursive, stateless, cryptographically secure random number generator.
ai aleam cryptographic-random cuda cupy deep-learning distributions entropy gpu-acceleration jax machine-learning opensource probability pypi python pytorch random-number-generator statistics tensorflow true-randomness
Last synced: 06 Apr 2026
https://github.com/sleeepyjack/multisplit
Simple multisplit for CUDA accelerators
cpp cuda gpu nvidia parallel-programming primitive split
Last synced: 20 May 2026
https://github.com/pintamonas4575/tfg-diffusion-model-customdataset
Creación en Pytorch de un modelo de difusión para generación incondicional de imágenes con un dataset propio.
attention-mechanism cnn cosine-scheduler cuda custom-dataset ddim deep-learning diffusion-models gpu image-generation pytorch
Last synced: 17 Apr 2026
https://github.com/memergamer/cuda-fluid-simulation-with-interactive-visualization
A real-time fluid dynamics simulation implemented in Python using CUDA for GPU acceleration, featuring interactive ASCII visualization and automated movement patterns.
colab-notebook cuda liquid-simulations navier-stokes
Last synced: 18 May 2026
https://github.com/programmer-rd-ai/digivis
A PyTorch-based deep learning implementation for MNIST digit recognition featuring CNNs, GPU acceleration, experiment tracking, and comprehensive testing capabilities.
cnn computer-vision cuda data-science deep-learning digit-recognition image-classification machine-learning mnist neural-networks python pytorch wandb
Last synced: 10 Jun 2025
https://github.com/pintamonas4575/tfg-classification-model-customdataset
Modelo de clasificación en Tensorflow y Keras sobre un Dataset propio.
cnn cnn-classification cuda deep-learning efficientnet gpu image-classification keras tensorflow transfer-learning
Last synced: 02 May 2026
https://github.com/aaronms1/ai-initializer-project
Universal LLM Framework designed to abstract away the technical mumbo-jumbo of using pre-trained, or creating new ai llm's.
ai cuda djl foss java llm-framework llm-inference llm-training nvidia-gpu opencl oss reactor spirv spring tornadovm typescript-react
Last synced: 30 Jun 2026
https://github.com/viktor-shcherb/triage
Script running tool for optimizing GPU memory utilization
automation cli cuda deep-learning devops-tools experiment-runner gpu-monitoring gpu-scheduler hyperparameter-sweep job-queue machine-learning nvidia-smi pypi-package python resource-management script-runner
Last synced: 12 Feb 2026
https://github.com/thisalmandula/gpu_accelerated_lpt_cfd_code
This repository contains GPU accelerated version of the particle tracking model developed by Merel Kooi for biofouled microplastic particles ( available at: https://pubs.acs.org/doi/10.1021/acs.est.6b04702) written in CUDA Fortran and CUDA Python. This repository is intended as a learning tool for GPU programming.
biofouling computational-fluid-dynamics cuda fortran lagrangian-particle-tracking microplastics python
Last synced: 02 May 2026
https://github.com/meirbek-dev/face-mask_detector
Обнаружие маски на лице в реальном времени
artificial-intelligence covid-19 cuda cudnn deep-learning face-mask graduation-project jupyter-notebook keras machine-learning mask-detection mobilnet-v2 object-detection object-recognition object-tracking opencv4-python python real-time supervised-learning tensorflow2-gpu
Last synced: 03 May 2026
https://github.com/ergonomech/comfyui-windows-installer
Automated setup for ComfyUI on Windows with CUDA, custom plugins, and optimized PyTorch settings. Made to Run as Server and Error Correct,. Easy installation and launch using Miniconda.
automation comfy conda conda-environment cuda hosting-deployment setup windows
Last synced: 31 Mar 2025
https://github.com/aaronjs99/planmux
PlanMux: Path Planning using Parallel/Multiplexed Computing
bellman-ford-algorithm cpp cuda dijkstra-algorithm floyd-warshall-algorithm graphs hpc openmp parallel-computing path-planning shortest-path-algorithm slurm
Last synced: 03 May 2026
https://github.com/tortillazhawaii/fishes_cuda
3D boid simulation with GPU.
Last synced: 04 May 2026
https://github.com/davidalgis/godot_cuda
Demonstration that it is possible to use CUDA directly from Godot engine.
Last synced: 03 May 2026
https://github.com/tensorbfs/cutropicalgemm.jl
The fastest Tropical number matrix multiplication on GPU
Last synced: 20 Jan 2026
https://github.com/matx64/rs-netbot
Old School Runescape bot with CNN for object identification
Last synced: 04 May 2026
https://github.com/bhattbhavesh91/rapids-cudf-cuml-example
Running KNN algorithm much faster on GPU for free using RAPIDS packages like cuML and cuDF
cuda cuml deep-learning nvidia-gpu rapids rapidsai
Last synced: 17 Apr 2026
https://github.com/bjornmelin/ml-vision-lab
👁️ Production-grade computer vision implementations. Real-world applications in image processing, object detection, and video analytics with GPU acceleration. 📸
computer-vision cuda deep-learning image-processing object-detection opencv pytorch video-analytics
Last synced: 04 Apr 2026
https://github.com/poyea/lollipop
🍭 Sweet GPU compute kernels in CUDA, wrapped via CuPy
cuda cuda-kernel cuda-kernels cuda-programming gpu-kernels gpu-programming python
Last synced: 17 Jun 2026
https://github.com/ezamagni/knapsack-simd
A genetic 01-Knapsack problem solver in CUDA
cuda knapsack-problem knapsack01
Last synced: 09 May 2026
https://github.com/sun-zhenxing/fast-neural-style
快速风格迁移部署
cuda cv2 fast-neural-style opencv
Last synced: 05 May 2026
https://github.com/manishklach/gpu-resident-inference-lab
Research lab for GPU-resident LLM inference loops: persistent kernels, sparse KV selection, tiered residency, speculative decode, and trace-driven scheduling.
cuda gpu-systems kv-cache llm-inference mega-kernel model-systems persistent-kernel runtime speculative-decoding
Last synced: 19 Jun 2026
https://github.com/jayemscript/llm-systems-from-scratch
A hands-on learning project for building the core systems behind Large Language Models using C++, Rust, and optional Python/JavaScript bindings. Includes tensor operations, autograd, neural networks, tokenization, and a minimal transformer pipeline.
ai-systems autograd c-language cpp cuda educational-project high-performance-computing inference-engine machine-learning neural-networks-from-scratch pybind11 tensor-library tokenization transformers wasm
Last synced: 19 Jun 2026
https://github.com/speedcell4/torchdevice
Setup CUDA_VISIBLE_DEVICES
cuda deep-learning gpu machine-learning pytorch
Last synced: 07 May 2026
https://github.com/seongwon980/htop-gpu
Terminal dashboard for NVIDIA GPUs, system CPU/memory, and processes — clickable, with conda env / docker container / cwd info per process.
btop cli cuda dashboard gpu htop machine-learning monitor nvidia nvtop python sysadmin terminal tui
Last synced: 22 Jun 2026
https://github.com/daaboulex/unsloth-nix
Unsloth (git main) packaged for NixOS — CPU/CUDA/ROCm LoRA fine-tuning envs
cuda fine-tuning flake lora machine-learning nix nixos nixos-module pytorch rocm unsloth
Last synced: 10 Jun 2026
https://github.com/daelsepara/hipslm
CPU and GPU (using HIP) implementations of phase pattern generators for use with spatial light modulators
computer-generated-holography cuda gpu hip hologram holography phase phase-pattern slm spatial-light-modulator
Last synced: 22 Jun 2026
https://github.com/alextmjugador/rust-cuda-quickstart
Bring the Rust-CUDA project back to life under modern Linux environments.
cuda cuda-programming cuda-rust cuda-support docker rust
Last synced: 06 May 2026
https://github.com/abhans/archdev
Container that is built with Arch Linux with NVIDIA Driver & CUDA support, PyTorch and TensorFlow built in.
archlinux container cuda docker
Last synced: 07 May 2026
https://github.com/xebastex/sfw-python
Python package designed to provide the essentials tools for off-the-grid inverse problem. This is the bedrock for future GUI implementation.
blasso cuda frank-wolfe pytorch
Last synced: 09 May 2026
https://github.com/jblaschke/pynvtx
Thin pybind11 wrapper for NVTX wrappers -- with some bells and whistles attached.
Last synced: 23 Jun 2026
https://github.com/kibotu/llm-windows-server
Turn your Windows GPU into a private, low-latency LLM server. Docker-based, OpenAI-compatible API.
agentic cuda docker gguf llma-cpp local-llm nvidia-gpu openai-api opencode qwen self-hosted windows
Last synced: 10 Jun 2026
https://github.com/timothystewart6/ubuntu-gb10
Ubuntu 24.04 + NVIDIA stack setup guide for GB10 / DGX Spark systems
ansible ansible-playbook arm64 blackwell cuda dgx gpu grace-blackwell homelab nvidia nvidia-driver ubuntu
Last synced: 26 Jun 2026