CUDA
CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.
- GitHub: https://github.com/topics/cuda
- Wikipedia: https://en.wikipedia.org/wiki/CUDA
- Created by: Nvidia
- Released: June 23, 2007
- Related Topics: nvcc,
- Last updated: 2026-06-30 00:07:24 UTC
- JSON Representation
https://github.com/mhaseeb123/gcb
GCB includes a suite of benchmarks and basic tests for CUDA-aware MPI and C++ compilers.
cpp cpp23 cuda mpi partitioned-communication st-mpi
Last synced: 17 May 2026
https://github.com/pjueon/cuda_intellisense
A simple python script to fix cuda C++ intellisense for visual studio.
Last synced: 09 Apr 2026
https://github.com/antonioberna/nn-gpu-logic-gates
Neural Network implementation on GPU using CUDA C++ to learn logic gates operations
cpp cuda gpu logic-gates neural-networks nvidia
Last synced: 01 May 2026
https://github.com/rajarsheya/real-time-traffic-analysis-with-cuda-object-detection
Implemented CUDA-accelerated object detection (YOLO) to analyze a sample image dataset. Performed vehicle counting and simulated speed estimation to demonstrate real-time traffic analysis capabilities.
Last synced: 12 Apr 2026
https://github.com/enriquebdel/clases-cuda-programacion-paralela-en-c-
En este repositorio encontrarás varias lecciones creadas por mí sobre la librería CUDA en C. El programa que utilizo para programar es MobaXterm.
c cuda cuda-programming gnu-linux googlecolab mobaxterm nvidia parallel-programming ubuntu university
Last synced: 19 May 2026
https://github.com/satyajitghana/gpu-programming
Contains the contents of GPU Architecture and Programming course done on NPTEL
c cpp cuda cuda-programming gpu-programming nptel nvidia
Last synced: 09 Mar 2026
https://github.com/brendanbignell/cuda_montecarlooptionpricer
CUDA Monte Carlo Barrier Option Pricing Demo & Jupyer lab ML models
cuda deep-learning ml pytorch quantitative-finance xgboost-regression
Last synced: 19 Apr 2026
https://github.com/tensorbfs/cutropicalgemm.jl
The fastest Tropical number matrix multiplication on GPU
Last synced: 20 Jan 2026
https://github.com/graiphic/graiphic-documentation
Graiphic Toolkits for LabVIEW provide advanced AI, GPU, and graph-oriented computing capabilities directly inside LabVIEW. Built on ONNX Runtime, they enable seamless integration of SOTA, Accelerator, and Deep Learning Toolkit for high-performance execution across CPUs, GPUs, and edge devices.
accelerator-toolkit ai-orchestration computer-vision cuda deep-learning directml edge-ai graph-computing hardware-acceleration high-performance-computing inference labview neural-networks onednn onnx onnxruntime openvino sota tensorrt training
Last synced: 22 Nov 2025
https://github.com/rhysdg/whisper-onnx-python
A low-footprint GPU accelerated Speech to Text Python package for the Jetpack 5 era bolstered by an optimized graph
ai chatbot cuda machine-learning onnxruntime speech-to-text whisper
Last synced: 16 Feb 2026
https://github.com/dhruvsrikanth/fastconv
Distributed and serial implementations of the 2D Convolution operation in c++ and CUDA.
convolution-filters cpp cuda gpu-programming high-performance-computing hpc image-editor image-processing nvidia parallel-programming
Last synced: 04 May 2026
https://github.com/aliyoussef97/triton-hub
A container of various PyTorch neural network modules written in Triton.
cuda deep-learning openai pytorch triton triton-lang
Last synced: 30 Mar 2025
https://github.com/thomasonzhou/minitorch
rebuilding pytorch: from autograd to convolutions in CUDA
Last synced: 02 Feb 2026
https://github.com/gogolb/ee147
Intro to GPU Computing
c cuda cuda-kernels cuda-toolkit gpu-computing gpu-programming university-course
Last synced: 01 May 2026
https://github.com/gabrielmaialva33/enton
Autonomous AI Robot Assistant — Vision, Voice, and Soul
ai autonomous-agent computer-vision cuda llm python pytorch robot stt tts whisper yolo
Last synced: 01 Apr 2026
https://github.com/mcp-tool-shop-org/backpropagate
Headless LLM fine-tuning in 3 lines — smart defaults, VRAM-aware batch sizing, multi-run SLAO, GGUF export for Ollama.
api cuda fine-tuning headless llm lora machine-learning ollama python qlora training unsloth web-security windows
Last synced: 31 May 2026
https://github.com/manishklach/intent-attention-kernel
Intent-aware attention research prototype that treats long-context inference as structured semantic blocks instead of a flat token stream, proving CPU-first correctness and analytical KV/FLOP savings before GPU kernel implementation.
agentic-ai ai-infrastructure attention block-attention cost-model cuda gpu-kernels inference kernel-research kv-cache llm-inference long-context python pytorch research semantic-attention sparse-attention systems transformers triton
Last synced: 28 May 2026
https://github.com/anne-andresen/multi-modal-cuda-c-gan
Raw C/cuda implementation of 3d GAN
3d 3d-models attention-mechanism c cross-attention cross-attention-c cuda gan gan-models low-level-programming medical-imaging multimodal-deep-learning pytorch transformer-pytorch transformers transformers-c
Last synced: 06 Jan 2026
https://github.com/dpetrosy/fractal
This project is a Fractal Visualizer developed in C++ with SFML and CUDA.
burning-ship cmake cmakelists cpp cpp-programming cpp-project cuda cuda-opengl cuda-programming fractal fractal-generation fractal-visualization julia mandelbox mandelbrot opengl opengl-project sfml sfml-library tricorn
Last synced: 21 Feb 2026
https://github.com/vietdoo/seam-carving-cuda
CUDA Seam Carving: Accelerating Image Resizing with GPU Computing
cc cuda cuda-programming gpu-computing parrallel-computing seam-carving
Last synced: 02 May 2026
https://github.com/rajarsheya/real-time-audio-feature-extraction-with-cuda-for-speech-recognition
This project accelerates MFCC extraction using CUDA for real-time speech recognition. Offloading the process to the GPU reduces latency and speeds up processing, enabling fast, local speech-to-text transcription for applications like virtual assistants, without cloud reliance.
audio-processing cpp cuda fourier-transform python
Last synced: 10 May 2026
https://github.com/Programmer-RD-AI/DetectX
A Pythonic approach to object detection using Detectron2, a clean, modular framework for training and deploying computer vision models. DetectX simplifies the complexity of object detection while maintaining high performance and extensibility.
coco-dataset computer-vision computer-vision-library cuda deep-learning detectron2 faster-rcnn gpu-accelerated machine-learning ml-framework object-detection object-recognition python3 pytorch retinanet
Last synced: 04 May 2025
https://github.com/hartorn/docker-python
Repository to build python image, based on ubuntu and CUDA
cuda docker mkl-dnn onednn python3 ubuntu ubuntu1804
Last synced: 05 May 2026
https://github.com/xza85hrf/ml-framework_checker
ML Framework and CUDA Checker is a Python-based GUI application for checking PyTorch, TensorFlow, and CUDA installations. It provides detailed system specs, compatibility checks, advanced GPU management, and offers options to view instructions, export logs, and update machine learning frameworks.
compatibility cuda gpu-management gui-application machine-learning python pytorch system-checker system-specs tensorflow
Last synced: 28 Apr 2026
https://github.com/ezroot/gacc
GIACC - Generate Images, Art, Code and Conversations
ai codegen cuda huggingface image imagegeneration python rust stablediffusion
Last synced: 06 Apr 2026
https://github.com/programmer-rd-ai/digivis
A PyTorch-based deep learning implementation for MNIST digit recognition featuring CNNs, GPU acceleration, experiment tracking, and comprehensive testing capabilities.
cnn computer-vision cuda data-science deep-learning digit-recognition image-classification machine-learning mnist neural-networks python pytorch wandb
Last synced: 10 Jun 2025
https://github.com/ergonomech/comfyui-windows-installer
Automated setup for ComfyUI on Windows with CUDA, custom plugins, and optimized PyTorch settings. Made to Run as Server and Error Correct,. Easy installation and launch using Miniconda.
automation comfy conda conda-environment cuda hosting-deployment setup windows
Last synced: 31 Mar 2025
https://github.com/tommaso-dognini/polimi_gpu101_courseproject
Polimi Passion In Action GPU101 course project. Implementation in CUDA of BFS algorithm
cpp cuda cuda-programming parallel-computing
Last synced: 10 Apr 2026
https://github.com/sleeepyjack/multisplit
Simple multisplit for CUDA accelerators
cpp cuda gpu nvidia parallel-programming primitive split
Last synced: 20 May 2026
https://github.com/villekf/helmet
High-dimensional Kalman filter toolbox (HELMET)
arrayfire cuda gpgpu kalman-filter kalman-smoother matlab octave opencl reconstruction scientific-computing state-estimation
Last synced: 01 May 2026
https://github.com/mala13f/statistical-learning-in-finance
This Repository contains all the codes, papers and related data for assignments done during the course.
cuda gpu-acceleration jupyter-notebook machine-learning python statistical-learning
Last synced: 12 Apr 2026
https://github.com/bjornmelin/deep-learning-evolution
🧠 Deep-Learning Evolution: Unified collection of TensorFlow & PyTorch projects, featuring custom CUDA kernels, distributed training, memory‑efficient methods, and production‑ready pipelines. Showcases advanced GPU optimizations, from foundational models to cutting‑edge architectures. 🚀
ai-research cuda data-science deep-learning distributed-training gan gpu-acceleration machine-learning model-optimization neural-networks python pytorch tensorflow training-pipeline transformers
Last synced: 09 May 2026
https://github.com/applicative-systems/nixos-gpu-tests
GPU-enabled tests with CUDA in the NixOS integration test driver
amd cuda nix nixos nvidia nvidia-gpu radeon sandbox test test-automation test-automation-framework test-framework zluda
Last synced: 02 Apr 2026
https://github.com/tortillazhawaii/fishes_cuda
3D boid simulation with GPU.
Last synced: 04 May 2026
https://github.com/kpetridis24/non-local-means
Gaussian noise image-filtering using GPU
cuda gaussian-noise gpu-computing image-denoising image-processing non-local-means parallel-computing pixels
Last synced: 29 May 2026
https://github.com/fardinsabid/aleam
Aleam: True randomness for AI. Non-recursive, stateless, cryptographically secure random number generator.
ai aleam cryptographic-random cuda cupy deep-learning distributions entropy gpu-acceleration jax machine-learning opensource probability pypi python pytorch random-number-generator statistics tensorflow true-randomness
Last synced: 06 Apr 2026
https://github.com/vishwamartur/btc_recovery
High-performance Bitcoin wallet password recovery system with GPU acceleration and integrated graphics support. Recover Bitcoin Core wallet.dat files without blockchain download using advanced algorithms and blockchain APIs.
bitcoin bitcoin-core blockchain blockchain-api cpp cryptocurrency cuda electrum gpu-acceleration integrated-graphics multithreading opencl password-recovery private-keys recovery-tools wallet-dat wallet-recovery
Last synced: 14 Apr 2026
https://github.com/saiccoumar/cuda-programming-exercises
Brief collection of GPU exercises (my reimplementation). Comes with relevant resources.
cuda cuda-programming nvcc nvidia
Last synced: 25 May 2026
https://github.com/bl33h/pythagoreantheorem
A program that calculates the Pythagorean theorem for a large number of elements using GPU parallel processing.
arrays cuda kernel parallel-programming pythagoras pythagorean-theorem
Last synced: 19 May 2026
https://github.com/matx64/rs-netbot
Old School Runescape bot with CNN for object identification
Last synced: 04 May 2026
https://github.com/straightchlorine/quantum-pipeline
A Python module for executing and monitoring quantum algorithms across local simulators and IBM Quantum platforms. Seamlessly handles data collection, organization, and streaming to Apache Kafka
apache-kafka apache-spark aws-s3 cuda docker gpu-acceleration ibm-cloud ibm-quantum minio qiskit qiskit-aer qiskit-nature quantum-computing visualizations vqe
Last synced: 08 Oct 2025
https://github.com/haleelrah/Vision-pro-MAX
A Raspberry Pi-based object detection system for assisting visually impaired individuals. This project utilizes YOLO object detection and a Hailo 8L TPU to identify obstacles like manholes, potholes, and bumps, providing real-time audio feedback to aid navigation.
bash computer-vision cuda fine-tuning jupyter-notebook object-detection opencv python pytorch raspberry-pi rpi-camera ssh text-to-speech ultralytics yolo yolov8
Last synced: 30 Dec 2025
https://github.com/pintamonas4575/tfg-diffusion-model-customdataset
Creación en Pytorch de un modelo de difusión para generación incondicional de imágenes con un dataset propio.
attention-mechanism cnn cosine-scheduler cuda custom-dataset ddim deep-learning diffusion-models gpu image-generation pytorch
Last synced: 17 Apr 2026
https://github.com/kchristin22/ising_model
Implementation of a cellular automaton on GPU using different features of CUDA
cellular-automaton cuda gpu-programming hpc ising-model parallel-computing
Last synced: 15 Mar 2025
https://github.com/ruturaj4/cuda_nvidia_tutorial
cuda projects
cuda cuda-vector-addition nvidia nvidia-cuda parallel
Last synced: 26 Oct 2025
https://github.com/dvhh/masscorrelation
An exercise in writing an efficient correlation calculator
calculations correlation-calculation cuda matrix multi-threading openmp
Last synced: 15 May 2026
https://github.com/piyush26c/cuda-programming
c cuda ipynb-jupyter-notebook mathematics sppu-computer-engineering
Last synced: 03 Mar 2026
https://github.com/microo8/micronn
Simple neural network library with backpropagation using CUDA
Last synced: 19 May 2026
https://github.com/giorgiogamba/parallel_programming
Experimenting with parallel programming
cuda cuda-kernels cuda-programming cuda-toolkit parallel parallel-computing parallel-processing parallel-programming visual-studio
Last synced: 18 Feb 2026
https://github.com/jtompuri/weighted-voronoi-stippling
High-performance weighted Voronoi stippling implementation. Exports PNG and TSP files. Visualizes TSP tours as continuous line drawings.
computer-graphics cuda gpu-acceleration lloyd-relaxation numba python stippling traveling-salesman tsp voronoi
Last synced: 18 May 2026
https://github.com/rnabla/cuda-des
Bruteforcing DES using CUDA
bruteforce cuda data des encryption gpu parallel standard
Last synced: 27 Oct 2025
https://github.com/shahed-chy-suzan/psd-to-html--cuda
Cuda is a single page creative portfolio psd to html template which is built with HTML5 & CSS3. The site can be customized easily to suit your needs.
Last synced: 18 Jan 2026
https://github.com/liuyuweitarek/pytorch-docker-builder
Automate PyTorch Docker image builds with compatible Python, CUDA, and Poetry versions, including CI/CD for testing.
cicd containerd cuda docker docker-image poetry-python python python3 pytorch pytorch-docker
Last synced: 06 Feb 2026
https://github.com/bhattbhavesh91/rapids-cudf-cuml-example
Running KNN algorithm much faster on GPU for free using RAPIDS packages like cuML and cuDF
cuda cuml deep-learning nvidia-gpu rapids rapidsai
Last synced: 17 Apr 2026
https://github.com/pintamonas4575/tfg-classification-model-customdataset
Modelo de clasificación en Tensorflow y Keras sobre un Dataset propio.
cnn cnn-classification cuda deep-learning efficientnet gpu image-classification keras tensorflow transfer-learning
Last synced: 02 May 2026
https://github.com/alwaysai/jetpack-46-hacky-hour
NVIDIA’s Jetpack 4.6 capabilities and how to use them with EdgeIQ, alwaysAI Computer Vision framework.
alwaysai computer-vision cuda edge-computing jetpack tensorrt
Last synced: 01 May 2026
https://github.com/xihuai18/image-processing-in-cuda
Implementation of Image Processing Method
Last synced: 04 Oct 2025
https://github.com/ayoussf/triton-hub
A container of various PyTorch neural network modules written in Triton.
cuda deep-learning openai pytorch triton triton-lang
Last synced: 14 Apr 2025
https://github.com/andrewboessen/bitonic-merge-sort
Bitonic Merge Sort algorithm optimized for GPU execution
bitonic-merge-sort cuda sorting-network
Last synced: 16 May 2026
https://github.com/dansolombrino/gphungarian
A GPU-accelerated implementation of the Hungarian Algorithm, written in CUDA
Last synced: 31 Aug 2025
https://github.com/bl33h/productoftwovectors
This code utilizes CUDA for parallel vector multiplication on a GPU, demonstrating GPU's acceleration capabilities.
cuda gpu kernel paralelism parallel-programming product vector
Last synced: 16 May 2026
https://github.com/ehsanmok/cs-521
UBC CS 521: Parallel Computing and Architectures
cuda erlang parallel-algorithm parallel-computing
Last synced: 16 May 2026
https://github.com/alekseyscorpi/vacancies_server
This is a server for vacancies generation using LLM (Saiga3)
code cuda cuda-toolkit docker dockerfile flask llama3 llamacpp llm ngrok pydantic saiga
Last synced: 06 Feb 2026
https://github.com/ashwani-rathee/imagesgpu.jl
Image Processing on GPU in Julia
cuda gpu image image-processing julia
Last synced: 11 Jul 2025
https://github.com/gunrock/template
Template repository for essentials applications to get you started asap!
cpp cuda essentials gpu graph-algorithms graph-analytics gunrock
Last synced: 15 May 2026
https://github.com/bjornmelin/ml-vision-lab
👁️ Production-grade computer vision implementations. Real-world applications in image processing, object detection, and video analytics with GPU acceleration. 📸
computer-vision cuda deep-learning image-processing object-detection opencv pytorch video-analytics
Last synced: 04 Apr 2026
https://github.com/emmanuelmess/firstcollisiontimesteprarefiedgassimulator
This simulator computes all possible intersections for a very small timestep for a particle model
Last synced: 17 Apr 2026
https://github.com/dafadey/GPGPU_OpenCL_vs_CUDA
This is a repository with sample codes for testing memory bandwidth, arithmetic latency hiding and shared/local memory performance on AMD and nVidia devices
cuda gpgpu gpgpu-computing opencl
Last synced: 16 May 2025
https://github.com/michaelfranzl/image_debian-gpgpu
Dockerfile for a Debian base image with AMD and Nvidia GPGPU support
amd container container-image cuda debian docker gpgpu nvidia opencl
Last synced: 10 May 2026
https://github.com/shivendrra/axgrad
lightweight tensor library that contains it's own auto-diff engine like pytorch
autograd cuda pytorch scratch-implementation tinygrad
Last synced: 08 May 2026
https://github.com/enp1s0/curand_fp16
FP16 pseudo random number generator on GPU
cuda gpu half-precision random-number-generators
Last synced: 20 Aug 2025
https://github.com/mre/talks
...mostly Computer Science related.
computer-science cuda talks tech-talks
Last synced: 28 Apr 2026
https://github.com/andih/cuda-fortran-stream
Variant of STREAM Benchmark in CUDA Fortran
cuda cuda-fortran gpu stream-benchmarks variants
Last synced: 02 Mar 2025
https://github.com/iag-geo/image-classification
Image classification scripts using YOLOv5 with aerial imagery
cuda image-classification python pytorch swimming-pools yolov5
Last synced: 22 Feb 2026
https://github.com/ezamagni/knapsack-simd
A genetic 01-Knapsack problem solver in CUDA
cuda knapsack-problem knapsack01
Last synced: 09 May 2026
https://github.com/poyea/lollipop
🍭 Sweet GPU compute kernels in CUDA, wrapped via CuPy
cuda cuda-kernel cuda-kernels cuda-programming gpu-kernels gpu-programming python
Last synced: 17 Jun 2026
https://github.com/seieric/gst-dsobjectsmask
📀NVIDIA DeepStream integrated GStreamer Plugin. Mask objects with cuda cores on Jetson boards. Fast and smooth since everything is done on NVMM.🏎
cuda cuda-programming deepstream gpu gstreamer gstreamer-plugins instance-segmentation jetson-agx-orin jetson-agx-xavier jetson-tx1 jetson-tx2 jetson-xavier maskrcnn nvidia-jetson nvidia-jetson-nano opencv opencv4 resnet resnet50
Last synced: 06 May 2026
https://github.com/manishklach/gpu-resident-inference-lab
Research lab for GPU-resident LLM inference loops: persistent kernels, sparse KV selection, tiered residency, speculative decode, and trace-driven scheduling.
cuda gpu-systems kv-cache llm-inference mega-kernel model-systems persistent-kernel runtime speculative-decoding
Last synced: 19 Jun 2026
https://github.com/jayemscript/llm-systems-from-scratch
A hands-on learning project for building the core systems behind Large Language Models using C++, Rust, and optional Python/JavaScript bindings. Includes tensor operations, autograd, neural networks, tokenization, and a minimal transformer pipeline.
ai-systems autograd c-language cpp cuda educational-project high-performance-computing inference-engine machine-learning neural-networks-from-scratch pybind11 tensor-library tokenization transformers wasm
Last synced: 19 Jun 2026
https://github.com/speedcell4/torchdevice
Setup CUDA_VISIBLE_DEVICES
cuda deep-learning gpu machine-learning pytorch
Last synced: 07 May 2026
https://github.com/seongwon980/htop-gpu
Terminal dashboard for NVIDIA GPUs, system CPU/memory, and processes — clickable, with conda env / docker container / cwd info per process.
btop cli cuda dashboard gpu htop machine-learning monitor nvidia nvtop python sysadmin terminal tui
Last synced: 22 Jun 2026
https://github.com/alextmjugador/rust-cuda-quickstart
Bring the Rust-CUDA project back to life under modern Linux environments.
cuda cuda-programming cuda-rust cuda-support docker rust
Last synced: 06 May 2026
https://github.com/daelsepara/hipslm
CPU and GPU (using HIP) implementations of phase pattern generators for use with spatial light modulators
computer-generated-holography cuda gpu hip hologram holography phase phase-pattern slm spatial-light-modulator
Last synced: 22 Jun 2026
https://github.com/poodarchu/vision-lab
Computer Vision Experiments in all.
computer-vision cuda object-detection
Last synced: 07 May 2026
https://github.com/kibotu/llm-windows-server
Turn your Windows GPU into a private, low-latency LLM server. Docker-based, OpenAI-compatible API.
agentic cuda docker gguf llma-cpp local-llm nvidia-gpu openai-api opencode qwen self-hosted windows
Last synced: 10 Jun 2026
https://github.com/jblaschke/pynvtx
Thin pybind11 wrapper for NVTX wrappers -- with some bells and whistles attached.
Last synced: 23 Jun 2026
https://github.com/xebastex/sfw-python
Python package designed to provide the essentials tools for off-the-grid inverse problem. This is the bedrock for future GUI implementation.
blasso cuda frank-wolfe pytorch
Last synced: 09 May 2026
https://github.com/uefi-code/msra_thepracticespaceproject_pytorchcuda
My repo to attend MSRA the Practice Space Project 2022, CUDA Implement and Optimize
Last synced: 06 May 2026
https://github.com/timothystewart6/ubuntu-gb10
Ubuntu 24.04 + NVIDIA stack setup guide for GB10 / DGX Spark systems
ansible ansible-playbook arm64 blackwell cuda dgx gpu grace-blackwell homelab nvidia nvidia-driver ubuntu
Last synced: 26 Jun 2026
https://github.com/sun-zhenxing/fast-neural-style
快速风格迁移部署
cuda cv2 fast-neural-style opencv
Last synced: 05 May 2026