CUDA
CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.
- GitHub: https://github.com/topics/cuda
- Wikipedia: https://en.wikipedia.org/wiki/CUDA
- Created by: Nvidia
- Released: June 23, 2007
- Related Topics: nvcc,
- Last updated: 2026-06-30 00:07:24 UTC
- JSON Representation
https://github.com/umer-farooq-cs/canny-edge-detector
High-performance Canny edge detector with CPU and CUDA implementations. Loads PGM images, performs Gaussian smoothing, gradients, non-max suppression, and hysteresis. Benchmarks both paths, outputs edge maps, and reports speedup. Simple Makefile, sample images included.
c canny-edge-detection computer-vision cpp cuda gpu high-performance-computing image-processing nvcc pgm
Last synced: 18 Apr 2026
https://github.com/shahed-chy-suzan/psd-to-html--cuda
Cuda is a single page creative portfolio psd to html template which is built with HTML5 & CSS3. The site can be customized easily to suit your needs.
Last synced: 18 Jan 2026
https://github.com/nondairyneutrino/pararealgpu.jl
A distributed and GPU-based implementation of the Parareal algorithm for parallel-in-time integration of equations of motion.
accelerator computational-physics computational-science cuda differential-equation-solvers distributed-computing gpu-computing high-performance-computing julialang ode ordinary-differential-equations parallel-computing parallel-in-time-integration parareal partial-differential-equation pde simulation
Last synced: 21 Apr 2026
https://github.com/seralexeev/rabbit0
Robot Rabbit
cuda jetson nvidia robotics ros2 zed-camera
Last synced: 15 Jun 2026
https://github.com/sarah627/horus_eye_fcih_graduation_project
An AI-powered tourism website using YOLOv7 for real-time landmark detection in images. Built with Flask, PyTorch, and Roboflow for seamless tourist interaction.
computer-vision cuda flask jupyter-notebook kaggle matplotlib object-detection opencv python pytorch roboflow
Last synced: 14 Apr 2026
https://github.com/jessetg/cuda-practice
Working through the chapters of Cuda by Example
c cpp cuda cuda-by-example gpgpu
Last synced: 01 May 2026
https://github.com/alekseyscorpi/vacancies_server
This is a server for vacancies generation using LLM (Saiga3)
code cuda cuda-toolkit docker dockerfile flask llama3 llamacpp llm ngrok pydantic saiga
Last synced: 06 Feb 2026
https://github.com/alwaysai/jetpack-46-hacky-hour
NVIDIA’s Jetpack 4.6 capabilities and how to use them with EdgeIQ, alwaysAI Computer Vision framework.
alwaysai computer-vision cuda edge-computing jetpack tensorrt
Last synced: 01 May 2026
https://github.com/villekf/helmet
High-dimensional Kalman filter toolbox (HELMET)
arrayfire cuda gpgpu kalman-filter kalman-smoother matlab octave opencl reconstruction scientific-computing state-estimation
Last synced: 01 May 2026
https://github.com/m-torhan/cuda-stl-renderer
CUDA C++ implementation of STL file renderer using ray tracing method
Last synced: 25 Feb 2026
https://github.com/straightchlorine/quantum-pipeline
A Python module for executing and monitoring quantum algorithms across local simulators and IBM Quantum platforms. Seamlessly handles data collection, organization, and streaming to Apache Kafka
apache-kafka apache-spark aws-s3 cuda docker gpu-acceleration ibm-cloud ibm-quantum minio qiskit qiskit-aer qiskit-nature quantum-computing visualizations vqe
Last synced: 08 Oct 2025
https://github.com/deepankaracharyya/6th_sem_assignments
c cuda data-mining postgresql-database python
Last synced: 02 May 2026
https://github.com/gravitytwog/electromagneticfield
Electro-magnetic field simulation made with CUDA
c cuda cuda-kernels cuda-programming
Last synced: 26 Apr 2026
https://github.com/codingrule/cuda-mbrot
Just another mandlebrot with cuda
cuda cuda-toolkit cupy fractal mandelbrot mathematics nvidia
Last synced: 27 Apr 2026
https://github.com/davidalgis/godot_cuda
Demonstration that it is possible to use CUDA directly from Godot engine.
Last synced: 03 May 2026
https://github.com/pintamonas4575/tfg-diffusion-model-customdataset
Creación en Pytorch de un modelo de difusión para generación incondicional de imágenes con un dataset propio.
attention-mechanism cnn cosine-scheduler cuda custom-dataset ddim deep-learning diffusion-models gpu image-generation pytorch
Last synced: 17 Apr 2026
https://github.com/ophoperhpo/dcgan-lentach-logo-generator
The Lentach logo generator. #MachineLearningFun
cuda dcgan dcgan-tensorflow keras lentach machinelearning ml
Last synced: 26 Jun 2026
https://github.com/poyea/lollipop
🍭 Sweet GPU compute kernels in CUDA, wrapped via CuPy
cuda cuda-kernel cuda-kernels cuda-programming gpu-kernels gpu-programming python
Last synced: 17 Jun 2026
https://github.com/abhinavsharma07/streamlit
Stable Diffusion
clip cuda denoising diffusers generative-models latent-diffusion latent-space lms-scheduler unet
Last synced: 28 Apr 2026
https://github.com/liberxue/parallel_computing
CUDA Algorithm && Hacker's Delight
algorithms cuda cuda-kernels cuda-programming hacker-s-delight nvidia
Last synced: 24 Feb 2026
https://github.com/naidezhujimo/cuda-rewrite-fast-matrix-multiplication
This repository contains an optimized implementation of matrix multiplication using CUDA. The goal of this project is to provide a high-performance solution for matrix multiplication operations on NVIDIA GPUs.
Last synced: 26 Mar 2025
https://github.com/bhattbhavesh91/rapids-cudf-cuml-example
Running KNN algorithm much faster on GPU for free using RAPIDS packages like cuML and cuDF
cuda cuml deep-learning nvidia-gpu rapids rapidsai
Last synced: 17 Apr 2026
https://github.com/kartavyaantani/cuda_image_processing
A CUDA-accelerated image processing project featuring multiple GPU-based filters and enhancement techniques. Implements convolution, edge detection, Non-Local Means (NLM) denoising, K-Nearest Neighbors (KNN), and pixelization. Each operation is optimized using CUDA kernels for real-time performance on large images. The project supports command-line
cuda cuda-kernels cuda-programming cuda-toolkit gpu-programming high-performance-computing image-manipulation image-processing nvidia-cuda nvidia-gpu
Last synced: 30 Apr 2026
https://github.com/ismailtekin05/caloriedetectingai
🍎🔍 Smart AI system that identifies food items in photos and calculates their calorie content automatically. Built with TensorFlow, YOLOv8, CUDA and computer vision for accurate nutrition tracking.
ai aimodel calorie-calculator computer-vision cuda data-analysis data-science data-segmentation data-visualization dataset dataset-generation image-processing image-recognition python segmentation-models tensorflow ultralytics yaml yolo yolov8
Last synced: 29 Apr 2026
https://github.com/jblaschke/pynvtx
Thin pybind11 wrapper for NVTX wrappers -- with some bells and whistles attached.
Last synced: 23 Jun 2026
https://github.com/timothystewart6/ubuntu-gb10
Ubuntu 24.04 + NVIDIA stack setup guide for GB10 / DGX Spark systems
ansible ansible-playbook arm64 blackwell cuda dgx gpu grace-blackwell homelab nvidia nvidia-driver ubuntu
Last synced: 26 Jun 2026
https://github.com/nofaralfasi/parallel-sequence-alignment
A parallelized version of multiple DNA sequence alignment algorithm with MPI, OpenMP and CUDA
cuda mpi openmp sequence-alignment
Last synced: 29 Apr 2026
https://github.com/asadiahmad/gesture-detection
Real-time Gesture Detection using CUDA-accelerated OpenCV in Python.
computer-vision cuda gesture-recognition gpu-acceleration open-pose opencv opencv-cuda pose-detection real-time
Last synced: 29 Apr 2026
https://github.com/sartajbhuvaji/cuda
Deloped CUDA kernel functions to load and train a Convolution Neural Network from scratch.
cuda cuda-programming gpu-programming neural-network nvidia-cuda
Last synced: 30 Mar 2025
https://github.com/torotoki/simple-paged-attention
A simple implementation of PagedAttention purely written in CUDA and C++.
attention cpp cuda llm transformer
Last synced: 18 May 2026
https://github.com/croko22/vit-cpp
An implementation of the Transformer model architecture ("Attention Is All You Need") in pure C++17 from scratch
cpp cuda deep-learning machine-learning neural-network transformer
Last synced: 17 Jan 2026
https://github.com/thisalmandula/gpu_accelerated_lpt_cfd_code
This repository contains GPU accelerated version of the particle tracking model developed by Merel Kooi for biofouled microplastic particles ( available at: https://pubs.acs.org/doi/10.1021/acs.est.6b04702) written in CUDA Fortran and CUDA Python. This repository is intended as a learning tool for GPU programming.
biofouling computational-fluid-dynamics cuda fortran lagrangian-particle-tracking microplastics python
Last synced: 02 May 2026
https://github.com/a-nau/python-cuda-envs
Script to automatically map a specific CUDA version to a Conda Python environment.
anaconda anaconda-environment cuda installation installation-script python python-environment python3
Last synced: 18 Apr 2026
https://github.com/SanaeProject/Matrix-for-Cpp
This repository has types that handle matrices.
cpp14 cpp14-library cuda matrix-library
Last synced: 15 May 2025
https://github.com/rajarsheya/real-time-audio-feature-extraction-with-cuda-for-speech-recognition
This project accelerates MFCC extraction using CUDA for real-time speech recognition. Offloading the process to the GPU reduces latency and speeds up processing, enabling fast, local speech-to-text transcription for applications like virtual assistants, without cloud reliance.
audio-processing cpp cuda fourier-transform python
Last synced: 10 May 2026
https://github.com/tensorbfs/cutropicalgemm.jl
The fastest Tropical number matrix multiplication on GPU
Last synced: 20 Jan 2026
https://github.com/daelsepara/hipmandelbrot
GPU Implementation of Mandelbrot Fractal Generator with Benchmarking
amd cuda fractal gpu gpu-compute gpu-computing hip mandelbrot parallel-computing rocm sdk
Last synced: 20 Feb 2026
https://github.com/sunsided/rust-arrayfire-experiments
Toying around with ArrayFire in Rust
arrayfire conways-game-of-life cuda gpgpu gpu-acceleration gpu-computing opencl rust
Last synced: 28 Apr 2026
https://github.com/xusworld/tars
Tars is a cool deep learning framework.
avx2 avx512 cuda deep-learning
Last synced: 27 Apr 2026
https://github.com/shivendrra/axgrad
lightweight tensor library that contains it's own auto-diff engine like pytorch
autograd cuda pytorch scratch-implementation tinygrad
Last synced: 08 May 2026
https://github.com/pkestene/mandelbrot_kokkos
cuda gpu gpu-computing kokkos mandelbrot openmp performance-portability
Last synced: 27 Apr 2026
https://github.com/satyajitghana/gpu-programming
Contains the contents of GPU Architecture and Programming course done on NPTEL
c cpp cuda cuda-programming gpu-programming nptel nvidia
Last synced: 09 Mar 2026
https://github.com/axel-ex/seame-ads-autonomous-lane-detection-24-25
🚗 Real-time lane detection and autonomous steering for JetRacer, powered by ROS2 and GPU-accelerated CV on Jetson Nano.
cuda jetson-nano ros2 tensorrt
Last synced: 27 Apr 2026
https://github.com/pharmcat/metidacu.jl
CUDA solver for Metida.jl
cuda julia-language metida mixed-models
Last synced: 27 Apr 2026
https://github.com/brosnanyuen/raybnn_sparse
Sparse Matrix Library for GPUs, CPUs, and FPGAs via CUDA, OpenCL, and oneAPI
arrayfire cpu cuda gpu gpu-computing opencl parallel parallel-computing parallel-programming raybnn rust sparse sparse-coding sparse-matrix sparse-neural-networks
Last synced: 19 Jan 2026
https://github.com/rkv0id/automata-vtk
Multi-dimensional Cellular Automata visualization using Python's VTK bindings on top of a CUDA-parallel grid updates.
cellular-automata cuda game-of-life python vtk
Last synced: 19 Apr 2026
https://github.com/lhldev/rust-neural-network
neural network implementation in rust
cuda feedforward-neural-network
Last synced: 16 May 2026
https://github.com/david-palma/cuda-programming
Educational CUDA C/C++ programming repository with commented examples on GPU parallel computing, matrix operations, and performance profiling. Requires a CUDA-enabled NVIDIA GPU.
c-cpp cpp cuda cuda-toolkit education gpu gpu-programming kernel matrix-operations nvcc nvidia parallel-computing parallel-programming practice profiling threads
Last synced: 25 Apr 2026
https://github.com/fynv/cudainline
A CUDA interface for Python. A distillation of the engine part of ThrustRTC.
Last synced: 18 May 2026
https://github.com/rajarsheya/real-time-traffic-analysis-with-cuda-object-detection
Implemented CUDA-accelerated object detection (YOLO) to analyze a sample image dataset. Performed vehicle counting and simulated speed estimation to demonstrate real-time traffic analysis capabilities.
Last synced: 12 Apr 2026
https://github.com/alegau03/parallel-k-means
Implementation of C programs for the K-Means algorithm for parallel computing.
c c-programming cuda parallel parallel-programming
Last synced: 24 Apr 2026
https://github.com/bolner/totally-diffused
Debian/NVIDIA Docker image for AUTOMATIC1111's Stable Diffusion application.
automatic1111 cuda debian docker-image nvidia stable-diffusion xformers
Last synced: 11 Apr 2026
https://github.com/piyush26c/cuda-programming
c cuda ipynb-jupyter-notebook mathematics sppu-computer-engineering
Last synced: 03 Mar 2026
https://github.com/gunrock/template
Template repository for essentials applications to get you started asap!
cpp cuda essentials gpu graph-algorithms graph-analytics gunrock
Last synced: 15 May 2026
https://github.com/emilienmendes/gpgpu
Parallélisation et optimisation de reconnaissance de point dans une image
cuda gpgpu parallel-programming
Last synced: 28 Oct 2025
https://github.com/hariprashad-ravikumar/accelerated-computing-in-cuda-c
This repo contains my codes for problem sets in NVIDIA Getting Started with Accelerated Computing in CUDA C/C++
c cuda cuda-kernels cuda-toolkit
Last synced: 24 Apr 2026
https://github.com/orgh0/highperformancecnn
Implementation of a High Performance CNN for MNIST dataset
Last synced: 18 May 2026
https://github.com/patrickm663/localglmnet.jl
This is a WIP implementation of Richman & Wüthrich (2022) using Julia's Flux.jl + CUDA.jl
cuda deep-learning flux julia neural-networks symbolic-regression xai
Last synced: 22 Apr 2026
https://github.com/jakubriegel/game_of_life_3d
3D game of life implemented in CUDA
concurency cuda gameoflife nvidia put-poznan
Last synced: 21 Apr 2026
https://github.com/kchristin22/ising_model
Implementation of a cellular automaton on GPU using different features of CUDA
cellular-automaton cuda gpu-programming hpc ising-model parallel-computing
Last synced: 15 Mar 2025
https://github.com/xihuai18/image-processing-in-cuda
Implementation of Image Processing Method
Last synced: 04 Oct 2025
https://github.com/subatomicplanets/simplebitcoinminer
A simple Bitcoin C++ and CUDA solo miner
bitcoin cpp cryptocurrency cuda miner
Last synced: 19 Apr 2026
https://github.com/lightshade12/kittlespt
A hobby CUDA pathtracing renderer.
3d-graphics computer-graphics cuda gpu path-tracing ray-tracing
Last synced: 18 Mar 2025
https://github.com/hatamiarash7/cuda-python
GPU programming using CUDA & Python
cuda gpu gpu-computing gpu-programming python
Last synced: 29 Apr 2026
https://github.com/haleelrah/Vision-pro-MAX
A Raspberry Pi-based object detection system for assisting visually impaired individuals. This project utilizes YOLO object detection and a Hailo 8L TPU to identify obstacles like manholes, potholes, and bumps, providing real-time audio feedback to aid navigation.
bash computer-vision cuda fine-tuning jupyter-notebook object-detection opencv python pytorch raspberry-pi rpi-camera ssh text-to-speech ultralytics yolo yolov8
Last synced: 30 Dec 2025
https://github.com/raumberg/hypervision
Neural Network based real-time aimbot system, operating on TensorRT with custom CUDA kernel and C FFI extensions
ai aim cuda cython neural-networks python tensorrt yolo
Last synced: 20 May 2026
https://github.com/adamczykpiotr/cudamatrixlibrary
Matrix operation library using single, n-threads or CUDA supported GPU
agh agh-ust cpp cuda cuda-library matrix matrix-computations matrix-functions matrix-multiplication
Last synced: 19 Apr 2026
https://github.com/eric900115/parallelprogramming
The repository contains the coursework for CS5422, NTHU's Parallel Programming Course.
Last synced: 26 May 2026
https://github.com/eshibusawa/cupy-cuda
Learn CUDA programming essentials with CuPy, from basic kernels to advanced memory patterns
cooperative-thread-array cub cuda cupy gpu parallel-computing python
Last synced: 15 Jun 2025
https://github.com/sohhamseal/scalable-systems-programs
A little less effort to learn parallel programming...
Last synced: 18 Apr 2026
https://github.com/5had3z/torch-discounted-cumsum-nd
PyTorch Discounted Cumsum with Autograd (CPU + CUDA)
Last synced: 18 Apr 2026
https://github.com/senli1073/docker-gpu-monitor
A lightweight GPU monitor designed for real-time web-based viewing of GPU server status.
container cuda docker flask gpu gpu-monitoring linux memory-usage nvidia-smi web
Last synced: 05 Apr 2026
https://github.com/inventwithdean/cuda_mlp
Implementation of a simple Multilayer Perceptron in pure CUDA
cuda cuda-programming deep-learning neural-networks
Last synced: 30 Mar 2025
https://github.com/matx64/rs-netbot
Old School Runescape bot with CNN for object identification
Last synced: 04 May 2026
https://github.com/gvvsnrnaveen/cuda
this repository contains the various programs that can written using CUDA Toolkit.
c cpp cuda nvcc nvidia-cuda nvidia-gpu
Last synced: 17 Jan 2026
https://github.com/wallneradam/docker-ccminer
CCMiner (tpruvot version) Docker Builder
ccminer cuda docker gpu litecoin miner monero nvidia nvidia-docker
Last synced: 18 Apr 2026
https://github.com/sd7campeon/yelp-sentiment-analysis-with-python-bs4-and-llm
A scalable pipeline for automated extraction, preprocessing, and sentiment analysis of Yelp reviews. Uses advanced HTTP requests, HTML parsing, and text normalization (tokenization, stopword removal, lemmatization) to enable precise polarity and subjectivity analysis for consumer insights and business analytics.
beautifulsoup beautifulsoup4 business-analytics cuda data-analysis nlp-machine-learning nltk opinion-mining pandas python python3 requests-library-python sentiment-analysis text-preprocessing textblob torch web-scraping yelp-reviews
Last synced: 06 May 2026
https://github.com/andrewboessen/bitonic-merge-sort
Bitonic Merge Sort algorithm optimized for GPU execution
bitonic-merge-sort cuda sorting-network
Last synced: 16 May 2026
https://github.com/emmanuelmess/firstcollisiontimesteprarefiedgassimulator
This simulator computes all possible intersections for a very small timestep for a particle model
Last synced: 17 Apr 2026
https://github.com/le-ander/msc_bioinfo-experimental_design
Using information theory to inform experimental design with GPU acceleration. Computing group project as part of the MSc in Bioinformatics and Theorectical Systems Biology at Imperial College London 2016/2017.
cuda experimental-design gpu-computing information-theory pycuda systems-biology
Last synced: 26 Apr 2026
https://github.com/tortillazhawaii/fishes_cuda
3D boid simulation with GPU.
Last synced: 04 May 2026
https://github.com/kichappa/spy-sim
Simulate a spying strategy on a topography
combat-modeling cuda differential-equations julia modeling-and-simulation topography-simulation
Last synced: 09 Mar 2026
https://github.com/ergonomech/comfyui-windows-installer
Automated setup for ComfyUI on Windows with CUDA, custom plugins, and optimized PyTorch settings. Made to Run as Server and Error Correct,. Easy installation and launch using Miniconda.
automation comfy conda conda-environment cuda hosting-deployment setup windows
Last synced: 31 Mar 2025
https://github.com/jtompuri/weighted-voronoi-stippling
High-performance weighted Voronoi stippling implementation. Exports PNG and TSP files. Visualizes TSP tours as continuous line drawings.
computer-graphics cuda gpu-acceleration lloyd-relaxation numba python stippling traveling-salesman tsp voronoi
Last synced: 18 May 2026
https://github.com/bl33h/productoftwovectors
This code utilizes CUDA for parallel vector multiplication on a GPU, demonstrating GPU's acceleration capabilities.
cuda gpu kernel paralelism parallel-programming product vector
Last synced: 16 May 2026
https://github.com/enp1s0/curand_fp16
FP16 pseudo random number generator on GPU
cuda gpu half-precision random-number-generators
Last synced: 20 Aug 2025
https://github.com/jxlarrea/homeassistant-voice-recipes
GPU/CUDA-accelerated voice control stack for Home Assistant. Runs on x86/x64 and ARM64 (including the NVIDIA DGX Spark). 100% Local - No Cloud, No Subscriptions.
arm64 cuda dgx-spark gb10 gpu-acceleration home-assistant local-llm qwen3 speech-to-text text-to-speech voice-assistant x86-64
Last synced: 26 May 2026
https://github.com/ehsanmok/cs-521
UBC CS 521: Parallel Computing and Architectures
cuda erlang parallel-algorithm parallel-computing
Last synced: 16 May 2026
https://github.com/denyskryvytskyi/capgemini-cuda
CUDA implementation of vector additon, matrix multiplication, reduction and sorting
bitonic-sort cpp cuda cuda-kernels gpgpu matrix matrix-multiplication matrix-multiplication-parallel matrix-transpose nvidia nvidia-cuda nvidia-gpu reduction-dimension sort sorting-algorithms-implemented vector vector-addition vectorization
Last synced: 14 May 2026
https://github.com/tudasc/cusan-tests
A test suite for CUDA-aware MPI race detection
Last synced: 03 May 2026
https://github.com/matteogianferrari/qr-decomposition
Tthis project implements different methods to exploit caches usage, the multicore CPU and the GPU architectures, on the Gram-Schmidt QR Decomposition algorithm and measure the performance of the different implementations.
cuda openmp parallel-computing
Last synced: 12 Apr 2026
https://github.com/microo8/micronn
Simple neural network library with backpropagation using CUDA
Last synced: 19 May 2026
https://github.com/programmer-rd-ai/digivis
A PyTorch-based deep learning implementation for MNIST digit recognition featuring CNNs, GPU acceleration, experiment tracking, and comprehensive testing capabilities.
cnn computer-vision cuda data-science deep-learning digit-recognition image-classification machine-learning mnist neural-networks python pytorch wandb
Last synced: 10 Jun 2025
https://github.com/ashwani-rathee/imagesgpu.jl
Image Processing on GPU in Julia
cuda gpu image image-processing julia
Last synced: 11 Jul 2025