CUDA
CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.
- GitHub: https://github.com/topics/cuda
- Wikipedia: https://en.wikipedia.org/wiki/CUDA
- Created by: Nvidia
- Released: June 23, 2007
- Related Topics: nvcc,
- Last updated: 2026-06-22 00:07:17 UTC
- JSON Representation
https://github.com/programmer-rd-ai/digivis
A PyTorch-based deep learning implementation for MNIST digit recognition featuring CNNs, GPU acceleration, experiment tracking, and comprehensive testing capabilities.
cnn computer-vision cuda data-science deep-learning digit-recognition image-classification machine-learning mnist neural-networks python pytorch wandb
Last synced: 10 Jun 2025
https://github.com/brendanbignell/cuda_montecarlooptionpricer
CUDA Monte Carlo Barrier Option Pricing Demo & Jupyer lab ML models
cuda deep-learning ml pytorch quantitative-finance xgboost-regression
Last synced: 19 Apr 2026
https://github.com/exprays/atlas
Atlas is a specialized convolutional neural network designed for satellite image change detection
alembic celery cnn-for-visual-recognition cuda geospatial-visualization python pytorch tensors
Last synced: 28 Feb 2026
https://github.com/orgh0/highperformancecnn
Implementation of a High Performance CNN for MNIST dataset
Last synced: 18 May 2026
https://github.com/rhysdg/whisper-onnx-python
A low-footprint GPU accelerated Speech to Text Python package for the Jetpack 5 era bolstered by an optimized graph
ai chatbot cuda machine-learning onnxruntime speech-to-text whisper
Last synced: 16 Feb 2026
https://github.com/aaronjs99/planmux
PlanMux: Path Planning using Parallel/Multiplexed Computing
bellman-ford-algorithm cpp cuda dijkstra-algorithm floyd-warshall-algorithm graphs hpc openmp parallel-computing path-planning shortest-path-algorithm slurm
Last synced: 03 May 2026
https://github.com/davidalgis/godot_cuda
Demonstration that it is possible to use CUDA directly from Godot engine.
Last synced: 03 May 2026
https://github.com/sbstndb/grayscott_k
A simple 3D GrayScott simulation using Kokkos enabling CUDA or OpenMP backend
cuda finite-difference grayscott grid kokkos laplacian openmp simulation visualisation
Last synced: 16 May 2026
https://github.com/jxlarrea/homeassistant-voice-recipes
GPU/CUDA-accelerated voice control stack for Home Assistant. Runs on x86/x64 and ARM64 (including the NVIDIA DGX Spark). 100% Local - No Cloud, No Subscriptions.
arm64 cuda dgx-spark gb10 gpu-acceleration home-assistant local-llm qwen3 speech-to-text text-to-speech voice-assistant x86-64
Last synced: 26 May 2026
https://github.com/manishklach/intent-attention-kernel
Intent-aware attention research prototype that treats long-context inference as structured semantic blocks instead of a flat token stream, proving CPU-first correctness and analytical KV/FLOP savings before GPU kernel implementation.
agentic-ai ai-infrastructure attention block-attention cost-model cuda gpu-kernels inference kernel-research kv-cache llm-inference long-context python pytorch research semantic-attention sparse-attention systems transformers triton
Last synced: 28 May 2026
https://github.com/thomasonzhou/minitorch
rebuilding pytorch: from autograd to convolutions in CUDA
Last synced: 02 Feb 2026
https://github.com/le-ander/msc_bioinfo-experimental_design
Using information theory to inform experimental design with GPU acceleration. Computing group project as part of the MSc in Bioinformatics and Theorectical Systems Biology at Imperial College London 2016/2017.
cuda experimental-design gpu-computing information-theory pycuda systems-biology
Last synced: 26 Apr 2026
https://github.com/tudasc/cusan-tests
A test suite for CUDA-aware MPI race detection
Last synced: 03 May 2026
https://github.com/Programmer-RD-AI/DetectX
A Pythonic approach to object detection using Detectron2, a clean, modular framework for training and deploying computer vision models. DetectX simplifies the complexity of object detection while maintaining high performance and extensibility.
coco-dataset computer-vision computer-vision-library cuda deep-learning detectron2 faster-rcnn gpu-accelerated machine-learning ml-framework object-detection object-recognition python3 pytorch retinanet
Last synced: 04 May 2025
https://github.com/bolner/totally-diffused
Debian/NVIDIA Docker image for AUTOMATIC1111's Stable Diffusion application.
automatic1111 cuda debian docker-image nvidia stable-diffusion xformers
Last synced: 11 Apr 2026
https://github.com/hartorn/docker-python
Repository to build python image, based on ubuntu and CUDA
cuda docker mkl-dnn onednn python3 ubuntu ubuntu1804
Last synced: 05 May 2026
https://github.com/fynv/cudainline
A CUDA interface for Python. A distillation of the engine part of ThrustRTC.
Last synced: 18 May 2026
https://github.com/ezroot/gacc
GIACC - Generate Images, Art, Code and Conversations
ai codegen cuda huggingface image imagegeneration python rust stablediffusion
Last synced: 06 Apr 2026
https://github.com/naidezhujimo/cuda-rewrite-fast-matrix-multiplication
This repository contains an optimized implementation of matrix multiplication using CUDA. The goal of this project is to provide a high-performance solution for matrix multiplication operations on NVIDIA GPUs.
Last synced: 26 Mar 2025
https://github.com/stanczakdominik/cuda_poisson
A 2D poisson solver via CUDA
Last synced: 29 Jun 2025
https://github.com/chintak/theano-lasagne-docker
Dockerfile for Lasagne with Cuda support. Look at the branches for relevant Dockerfiles - ``cpu`` and ``gpu``.
caffe cuda docker dockerfile install-script lasagne machine-learning machine-learning-library theano
Last synced: 10 Apr 2025
https://github.com/gunrock/template
Template repository for essentials applications to get you started asap!
cpp cuda essentials gpu graph-algorithms graph-analytics gunrock
Last synced: 15 May 2026
https://github.com/shivendrra/axgrad
lightweight tensor library that contains it's own auto-diff engine like pytorch
autograd cuda pytorch scratch-implementation tinygrad
Last synced: 08 May 2026
https://github.com/gabrielmaialva33/enton
Autonomous AI Robot Assistant — Vision, Voice, and Soul
ai autonomous-agent computer-vision cuda llm python pytorch robot stt tts whisper yolo
Last synced: 01 Apr 2026
https://github.com/torotoki/simple-paged-attention
A simple implementation of PagedAttention purely written in CUDA and C++.
attention cpp cuda llm transformer
Last synced: 18 May 2026
https://github.com/eshibusawa/cupy-cuda
Learn CUDA programming essentials with CuPy, from basic kernels to advanced memory patterns
cooperative-thread-array cub cuda cupy gpu parallel-computing python
Last synced: 15 Jun 2025
https://github.com/kchristin22/ising_model
Implementation of a cellular automaton on GPU using different features of CUDA
cellular-automaton cuda gpu-programming hpc ising-model parallel-computing
Last synced: 15 Mar 2025
https://github.com/sd7campeon/yelp-sentiment-analysis-with-python-bs4-and-llm
A scalable pipeline for automated extraction, preprocessing, and sentiment analysis of Yelp reviews. Uses advanced HTTP requests, HTML parsing, and text normalization (tokenization, stopword removal, lemmatization) to enable precise polarity and subjectivity analysis for consumer insights and business analytics.
beautifulsoup beautifulsoup4 business-analytics cuda data-analysis nlp-machine-learning nltk opinion-mining pandas python python3 requests-library-python sentiment-analysis text-preprocessing textblob torch web-scraping yelp-reviews
Last synced: 06 May 2026
https://github.com/memergamer/cuda-fluid-simulation-with-interactive-visualization
A real-time fluid dynamics simulation implemented in Python using CUDA for GPU acceleration, featuring interactive ASCII visualization and automated movement patterns.
colab-notebook cuda liquid-simulations navier-stokes
Last synced: 18 May 2026
https://github.com/applicative-systems/nixos-gpu-tests
GPU-enabled tests with CUDA in the NixOS integration test driver
amd cuda nix nixos nvidia nvidia-gpu radeon sandbox test test-automation test-automation-framework test-framework zluda
Last synced: 02 Apr 2026
https://github.com/fardinsabid/aleam
Aleam: True randomness for AI. Non-recursive, stateless, cryptographically secure random number generator.
ai aleam cryptographic-random cuda cupy deep-learning distributions entropy gpu-acceleration jax machine-learning opensource probability pypi python pytorch random-number-generator statistics tensorflow true-randomness
Last synced: 06 Apr 2026
https://github.com/pintamonas4575/tfg-diffusion-model-customdataset
Creación en Pytorch de un modelo de difusión para generación incondicional de imágenes con un dataset propio.
attention-mechanism cnn cosine-scheduler cuda custom-dataset ddim deep-learning diffusion-models gpu image-generation pytorch
Last synced: 17 Apr 2026
https://github.com/tortillazhawaii/fishes_cuda
3D boid simulation with GPU.
Last synced: 04 May 2026
https://github.com/mayukhdeb/patrick
Tiny neural net library written from scratch with cupy :warning: under construction :warning:
cuda deep-learning gpu-computing machine-learning neural-network regression
Last synced: 08 May 2026
https://github.com/rkv0id/automata-vtk
Multi-dimensional Cellular Automata visualization using Python's VTK bindings on top of a CUDA-parallel grid updates.
cellular-automata cuda game-of-life python vtk
Last synced: 19 Apr 2026
https://github.com/satyajitghana/gpu-programming
Contains the contents of GPU Architecture and Programming course done on NPTEL
c cpp cuda cuda-programming gpu-programming nptel nvidia
Last synced: 09 Mar 2026
https://github.com/steleman/openai-triton
Fork of OpenAI's Triton compiler v3.4.0 using LLVM 21.1.0 / 21.1.1 on Fedora 41+
cuda fedora linux llvm mlir mlir-dialect openai rocm triton
Last synced: 08 Apr 2026
https://github.com/bhattbhavesh91/rapids-cudf-cuml-example
Running KNN algorithm much faster on GPU for free using RAPIDS packages like cuML and cuDF
cuda cuml deep-learning nvidia-gpu rapids rapidsai
Last synced: 17 Apr 2026
https://github.com/mcp-tool-shop-org/backpropagate
Headless LLM fine-tuning in 3 lines — smart defaults, VRAM-aware batch sizing, multi-run SLAO, GGUF export for Ollama.
api cuda fine-tuning headless llm lora machine-learning ollama python qlora training unsloth web-security windows
Last synced: 31 May 2026
https://github.com/denyskryvytskyi/capgemini-cuda
CUDA implementation of vector additon, matrix multiplication, reduction and sorting
bitonic-sort cpp cuda cuda-kernels gpgpu matrix matrix-multiplication matrix-multiplication-parallel matrix-transpose nvidia nvidia-cuda nvidia-gpu reduction-dimension sort sorting-algorithms-implemented vector vector-addition vectorization
Last synced: 14 May 2026
https://github.com/bjornmelin/ml-vision-lab
👁️ Production-grade computer vision implementations. Real-world applications in image processing, object detection, and video analytics with GPU acceleration. 📸
computer-vision cuda deep-learning image-processing object-detection opencv pytorch video-analytics
Last synced: 04 Apr 2026
https://github.com/emmanuelmess/firstcollisiontimesteprarefiedgassimulator
This simulator computes all possible intersections for a very small timestep for a particle model
Last synced: 17 Apr 2026
https://github.com/brosnanyuen/raybnn_sparse
Sparse Matrix Library for GPUs, CPUs, and FPGAs via CUDA, OpenCL, and oneAPI
arrayfire cpu cuda gpu gpu-computing opencl parallel parallel-computing parallel-programming raybnn rust sparse sparse-coding sparse-matrix sparse-neural-networks
Last synced: 19 Jan 2026
https://github.com/ergonomech/comfyui-windows-installer
Automated setup for ComfyUI on Windows with CUDA, custom plugins, and optimized PyTorch settings. Made to Run as Server and Error Correct,. Easy installation and launch using Miniconda.
automation comfy conda conda-environment cuda hosting-deployment setup windows
Last synced: 31 Mar 2025
https://github.com/m15kh/cuda_programming
CUDA programming enables parallel computing on NVIDIA GPUs for high-performance tasks like deep learning and scientific computing
cuda cuda-programming gpu nvidia parallel-computing practice-programming
Last synced: 03 Apr 2025
https://github.com/malolm/jupyter-ml-with-gpu-support
Jupyter with GPU acceleration for Windows 10/11
cuda cudnn jupternotebook jupyter jupyterlab nvidia-gpu windows-10 windows-11
Last synced: 09 Apr 2026
https://github.com/umer-farooq-cs/canny-edge-detector
High-performance Canny edge detector with CPU and CUDA implementations. Loads PGM images, performs Gaussian smoothing, gradients, non-max suppression, and hysteresis. Benchmarks both paths, outputs edge maps, and reports speedup. Simple Makefile, sample images included.
c canny-edge-detection computer-vision cpp cuda gpu high-performance-computing image-processing nvcc pgm
Last synced: 18 Apr 2026
https://github.com/bjornmelin/deep-learning-evolution
🧠 Deep-Learning Evolution: Unified collection of TensorFlow & PyTorch projects, featuring custom CUDA kernels, distributed training, memory‑efficient methods, and production‑ready pipelines. Showcases advanced GPU optimizations, from foundational models to cutting‑edge architectures. 🚀
ai-research cuda data-science deep-learning distributed-training gan gpu-acceleration machine-learning model-optimization neural-networks python pytorch tensorflow training-pipeline transformers
Last synced: 09 May 2026
https://github.com/vietdoo/seam-carving-cuda
CUDA Seam Carving: Accelerating Image Resizing with GPU Computing
cc cuda cuda-programming gpu-computing parrallel-computing seam-carving
Last synced: 02 May 2026
https://github.com/yooodleee/hello-cuda
👽Nice to meet you, CUDA!👽
c cc cuda gpgpu multiprocessing
Last synced: 09 Apr 2026
https://github.com/sleeepyjack/multisplit
Simple multisplit for CUDA accelerators
cpp cuda gpu nvidia parallel-programming primitive split
Last synced: 20 May 2026
https://github.com/hyunjinno/multicore_computing
A repository of multicore programming in Java and C.
c cpp cuda java multithreading openmp thread thrust
Last synced: 18 Apr 2026
https://github.com/wallneradam/docker-ccminer
CCMiner (tpruvot version) Docker Builder
ccminer cuda docker gpu litecoin miner monero nvidia nvidia-docker
Last synced: 18 Apr 2026
https://github.com/straightchlorine/quantum-pipeline
A Python module for executing and monitoring quantum algorithms across local simulators and IBM Quantum platforms. Seamlessly handles data collection, organization, and streaming to Apache Kafka
apache-kafka apache-spark aws-s3 cuda docker gpu-acceleration ibm-cloud ibm-quantum minio qiskit qiskit-aer qiskit-nature quantum-computing visualizations vqe
Last synced: 08 Oct 2025
https://github.com/mala13f/statistical-learning-in-finance
This Repository contains all the codes, papers and related data for assignments done during the course.
cuda gpu-acceleration jupyter-notebook machine-learning python statistical-learning
Last synced: 12 Apr 2026
https://github.com/senli1073/docker-gpu-monitor
A lightweight GPU monitor designed for real-time web-based viewing of GPU server status.
container cuda docker flask gpu gpu-monitoring linux memory-usage nvidia-smi web
Last synced: 05 Apr 2026
https://github.com/makischristou/mandelbrot
Mandelbrot set visualizer using CUDA.
cpp cuda gpu mandelbrot nvidia renderer rust
Last synced: 09 Apr 2026
https://github.com/5had3z/torch-discounted-cumsum-nd
PyTorch Discounted Cumsum with Autograd (CPU + CUDA)
Last synced: 18 Apr 2026
https://github.com/sohhamseal/scalable-systems-programs
A little less effort to learn parallel programming...
Last synced: 18 Apr 2026
https://github.com/franciscoda/psvm
R package and C++ library that allows training SVM models in a GPU using CUDA and predicting out-of-sample data. A support vector machine (SVM) is a type of machine learning model that is trained using supervised data to classify samples.
cpp cpp17 cuda machine-learning r svm-classifier svm-training
Last synced: 18 Apr 2026
https://github.com/steleman/quadratic-assignment
Research on the Quadratic Assignment Problem with CUDA Acceleration
cuda cuda-kernels cuda-programming cuda-programming-project quadratic-assignment quadratic-assignment-problem
Last synced: 07 Apr 2026
https://github.com/giorgiogamba/parallel_programming
Experimenting with parallel programming
cuda cuda-kernels cuda-programming cuda-toolkit parallel parallel-computing parallel-processing parallel-programming visual-studio
Last synced: 18 Feb 2026
https://github.com/rajarsheya/real-time-audio-feature-extraction-with-cuda-for-speech-recognition
This project accelerates MFCC extraction using CUDA for real-time speech recognition. Offloading the process to the GPU reduces latency and speeds up processing, enabling fast, local speech-to-text transcription for applications like virtual assistants, without cloud reliance.
audio-processing cpp cuda fourier-transform python
Last synced: 10 May 2026
https://github.com/ayoussf/triton-hub
A container of various PyTorch neural network modules written in Triton.
cuda deep-learning openai pytorch triton triton-lang
Last synced: 14 Apr 2025
https://github.com/baonguyen6742/uv-install-torch
Tutorial to install torch/pytorch with cuda using uv
cuda install installation package python pytorch resolver torch torchaudio torchvision tutorial uv
Last synced: 13 Apr 2026
https://github.com/abhisheknair10/occupancy.nn
An multi-step pipeline to train and inference Occupancy Networks
Last synced: 20 Jul 2025
https://github.com/adamczykpiotr/cudamatrixlibrary
Matrix operation library using single, n-threads or CUDA supported GPU
agh agh-ust cpp cuda cuda-library matrix matrix-computations matrix-functions matrix-multiplication
Last synced: 19 Apr 2026
https://github.com/programmer-rd-ai/object-detection-framework
A Pythonic approach to object detection using Detectron2, a clean, modular framework for training and deploying computer vision models. DetectX simplifies the complexity of object detection while maintaining high performance and extensibility.
coco-dataset computer-vision computer-vision-library cuda deep-learning detectron2 faster-rcnn gpu-accelerated machine-learning ml-framework object-detection object-recognition python3 pytorch retinanet
Last synced: 24 Sep 2025
https://github.com/dotblueshoes/robertscross
The Roberts cross operator is used in image processing and computer vision for edge detection.
cuda edge-detection image-processing
Last synced: 30 Mar 2025
https://github.com/inventwithdean/cuda_mlp
Implementation of a simple Multilayer Perceptron in pure CUDA
cuda cuda-programming deep-learning neural-networks
Last synced: 30 Mar 2025
https://github.com/quantum-integrated-technologies/deepforge
DeepForge : framework for working with machine learning.
ai artificial-intelligence cuda library machine-learning ml neural-network
Last synced: 31 Jul 2025
https://github.com/mhaseeb123/gcb
GCB includes a suite of benchmarks and basic tests for CUDA-aware MPI and C++ compilers.
cpp cpp23 cuda mpi partitioned-communication st-mpi
Last synced: 17 May 2026
https://github.com/varun-1703/eu-act-navigator-rag-qabot
An interactive, privacy-first application for querying the European Union’s AI Act using a local Retrieval-Augmented Generation (RAG) pipeline. Combines semantic search (FAISS) and a quantized TinyLlama LLM for fast, accurate, and context-aware answers—all running on your own hardware.
cuda faiss hugging-face-transformers langchain legal-tech local-slm machine-learning nlp open-source privacy rag-chatbot sentence-transformers streamlit tinyllama
Last synced: 03 May 2026
https://github.com/fandreuz/parallel-programming-for-hpc
Scientific codes in C/C++ with CUDA, OpenACC, FFTW, (cu)BLAS
Last synced: 20 Apr 2026
https://github.com/ssoehdata/cuda_fortran_sci_eng
Working through examples from the Cuda Fortran for Scientists and Engineers 2nd Edition Book
cuda cuda-fortran fortran hpc nvfortran
Last synced: 21 Aug 2025
https://github.com/brosnanyuen/raybnn_graph
Graph Manipulation Library For GPUs, CPUs, and FPGAs via CUDA, OpenCL, and oneAPI
cuda gpu graph graph-algorithms neural-network neural-networks opencl raybnn rust
Last synced: 06 Feb 2026
https://github.com/nondairyneutrino/pararealgpu.jl
A distributed and GPU-based implementation of the Parareal algorithm for parallel-in-time integration of equations of motion.
accelerator computational-physics computational-science cuda differential-equation-solvers distributed-computing gpu-computing high-performance-computing julialang ode ordinary-differential-equations parallel-computing parallel-in-time-integration parareal partial-differential-equation pde simulation
Last synced: 21 Apr 2026
https://github.com/jakubriegel/game_of_life_3d
3D game of life implemented in CUDA
concurency cuda gameoflife nvidia put-poznan
Last synced: 21 Apr 2026
https://github.com/patrickm663/localglmnet.jl
This is a WIP implementation of Richman & Wüthrich (2022) using Julia's Flux.jl + CUDA.jl
cuda deep-learning flux julia neural-networks symbolic-regression xai
Last synced: 22 Apr 2026
https://github.com/duskvirkus/ofxarrayfire
An openFrameworks addon with pre-compiled binaries of ArrayFire.
arrayfire cuda ofxaddon openframeworks openframeworks-addon
Last synced: 09 May 2026
https://github.com/katpercent/raytracing
A foundation for ray tracing using CUDA and parallel computing techniques.
3d cuda engine game parrallel-computing ray raytracing
Last synced: 01 Nov 2025
https://github.com/bokutotu/cudnn_graph_api_example
cudnn graph api example
Last synced: 04 May 2026
https://github.com/hariprashad-ravikumar/accelerated-computing-in-cuda-c
This repo contains my codes for problem sets in NVIDIA Getting Started with Accelerated Computing in CUDA C/C++
c cuda cuda-kernels cuda-toolkit
Last synced: 24 Apr 2026
https://github.com/sarah627/horus_eye_fcih_graduation_project
An AI-powered tourism website using YOLOv7 for real-time landmark detection in images. Built with Flask, PyTorch, and Roboflow for seamless tourist interaction.
computer-vision cuda flask jupyter-notebook kaggle matplotlib object-detection opencv python pytorch roboflow
Last synced: 14 Apr 2026
https://github.com/tensorbfs/cutropicalgemm.jl
The fastest Tropical number matrix multiplication on GPU
Last synced: 20 Jan 2026
https://github.com/alegau03/parallel-k-means
Implementation of C programs for the K-Means algorithm for parallel computing.
c c-programming cuda parallel parallel-programming
Last synced: 24 Apr 2026
https://github.com/fblupi/grado_informatica-ppr
Prácticas de la asignatura Programación Paralela de la UGR
cuda mpi openmp parallel-computing
Last synced: 22 Apr 2026
https://github.com/rajarsheya/real-time-traffic-analysis-with-cuda-object-detection
Implemented CUDA-accelerated object detection (YOLO) to analyze a sample image dataset. Performed vehicle counting and simulated speed estimation to demonstrate real-time traffic analysis capabilities.
Last synced: 12 Apr 2026
https://github.com/kpetridis24/non-local-means
Gaussian noise image-filtering using GPU
cuda gaussian-noise gpu-computing image-denoising image-processing non-local-means parallel-computing pixels
Last synced: 29 May 2026