CUDA
CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.
- GitHub: https://github.com/topics/cuda
- Wikipedia: https://en.wikipedia.org/wiki/CUDA
- Created by: Nvidia
- Released: June 23, 2007
- Related Topics: nvcc,
- Last updated: 2026-06-30 00:07:24 UTC
- JSON Representation
https://github.com/ezroot/gacc
GIACC - Generate Images, Art, Code and Conversations
ai codegen cuda huggingface image imagegeneration python rust stablediffusion
Last synced: 06 Apr 2026
https://github.com/enriquebdel/clases-cuda-programacion-paralela-en-c-
En este repositorio encontrarás varias lecciones creadas por mí sobre la librería CUDA en C. El programa que utilizo para programar es MobaXterm.
c cuda cuda-programming gnu-linux googlecolab mobaxterm nvidia parallel-programming ubuntu university
Last synced: 19 May 2026
https://github.com/vietdoo/seam-carving-cuda
CUDA Seam Carving: Accelerating Image Resizing with GPU Computing
cc cuda cuda-programming gpu-computing parrallel-computing seam-carving
Last synced: 02 May 2026
https://github.com/renatomaynard/a-multiple-population-coarse-grained-genetic-algorithm-to-solve-the-quadratic-assignment-problem-
A Multiple-population coarse-grained Genetic Algorithm to solve the Quadratic Assignment Problem
c cuda genetic-algorithm quadratic-assignment-problem
Last synced: 09 May 2026
https://github.com/ssoehdata/cuda_fortran_sci_eng
Working through examples from the Cuda Fortran for Scientists and Engineers 2nd Edition Book
cuda cuda-fortran fortran hpc nvfortran
Last synced: 21 Aug 2025
https://github.com/xza85hrf/ml-framework_checker
ML Framework and CUDA Checker is a Python-based GUI application for checking PyTorch, TensorFlow, and CUDA installations. It provides detailed system specs, compatibility checks, advanced GPU management, and offers options to view instructions, export logs, and update machine learning frameworks.
compatibility cuda gpu-management gui-application machine-learning python pytorch system-checker system-specs tensorflow
Last synced: 28 Apr 2026
https://github.com/bl33h/pythagoreantheorem
A program that calculates the Pythagorean theorem for a large number of elements using GPU parallel processing.
arrays cuda kernel parallel-programming pythagoras pythagorean-theorem
Last synced: 19 May 2026
https://github.com/microo8/micronn
Simple neural network library with backpropagation using CUDA
Last synced: 19 May 2026
https://github.com/piyush26c/cuda-programming
c cuda ipynb-jupyter-notebook mathematics sppu-computer-engineering
Last synced: 03 Mar 2026
https://github.com/m-torhan/cuda-stl-renderer
CUDA C++ implementation of STL file renderer using ray tracing method
Last synced: 25 Feb 2026
https://github.com/rkv0id/automata-vtk
Multi-dimensional Cellular Automata visualization using Python's VTK bindings on top of a CUDA-parallel grid updates.
cellular-automata cuda game-of-life python vtk
Last synced: 19 Apr 2026
https://github.com/arakiss/hecate-os
Linux distro with automatic hardware detection and per-system optimization. Ubuntu 24.04 base. Alpha.
ai cuda docker gpu hardware-optimization kernel-tuning linux linux-distribution machine-learning nvidia operating-system performance sysctl ubuntu workstation zram
Last synced: 16 Feb 2026
https://github.com/gunrock/template
Template repository for essentials applications to get you started asap!
cpp cuda essentials gpu graph-algorithms graph-analytics gunrock
Last synced: 15 May 2026
https://github.com/shivendrra/axgrad
lightweight tensor library that contains it's own auto-diff engine like pytorch
autograd cuda pytorch scratch-implementation tinygrad
Last synced: 08 May 2026
https://github.com/manishklach/intent-attention-kernel
Intent-aware attention research prototype that treats long-context inference as structured semantic blocks instead of a flat token stream, proving CPU-first correctness and analytical KV/FLOP savings before GPU kernel implementation.
agentic-ai ai-infrastructure attention block-attention cost-model cuda gpu-kernels inference kernel-research kv-cache llm-inference long-context python pytorch research semantic-attention sparse-attention systems transformers triton
Last synced: 28 May 2026
https://github.com/greg-tarr/fastsimplex
CUDA/MPS accelerated 2D & 3D simplex noise generation.
cuda mps noise-generator python simplex-noise
Last synced: 20 Apr 2026
https://github.com/aaronms1/ai-initializer-project
Universal LLM Framework designed to abstract away the technical mumbo-jumbo of using pre-trained, or creating new ai llm's.
ai cuda djl foss java llm-framework llm-inference llm-training nvidia-gpu opencl oss reactor spirv spring tornadovm typescript-react
Last synced: 30 Jun 2026
https://github.com/a-nau/python-cuda-envs
Script to automatically map a specific CUDA version to a Conda Python environment.
anaconda anaconda-environment cuda installation installation-script python python-environment python3
Last synced: 18 Apr 2026
https://github.com/amirbroker/cudadtw
Use CUDA with numba for Dynamic Time Warping
cuda dtw dynamic-time-warping gpu numba
Last synced: 16 Apr 2026
https://github.com/mre/talks
...mostly Computer Science related.
computer-science cuda talks tech-talks
Last synced: 28 Apr 2026
https://github.com/viktor-shcherb/triage
Script running tool for optimizing GPU memory utilization
automation cli cuda deep-learning devops-tools experiment-runner gpu-monitoring gpu-scheduler hyperparameter-sweep job-queue machine-learning nvidia-smi pypi-package python resource-management script-runner
Last synced: 12 Feb 2026
https://github.com/kchristin22/ising_model
Implementation of a cellular automaton on GPU using different features of CUDA
cellular-automaton cuda gpu-programming hpc ising-model parallel-computing
Last synced: 15 Mar 2025
https://github.com/exprays/atlas
Atlas is a specialized convolutional neural network designed for satellite image change detection
alembic celery cnn-for-visual-recognition cuda geospatial-visualization python pytorch tensors
Last synced: 28 Feb 2026
https://github.com/pjueon/cuda_intellisense
A simple python script to fix cuda C++ intellisense for visual studio.
Last synced: 09 Apr 2026
https://github.com/daelsepara/hipmandelbrot
GPU Implementation of Mandelbrot Fractal Generator with Benchmarking
amd cuda fractal gpu gpu-compute gpu-computing hip mandelbrot parallel-computing rocm sdk
Last synced: 20 Feb 2026
https://github.com/chintak/theano-lasagne-docker
Dockerfile for Lasagne with Cuda support. Look at the branches for relevant Dockerfiles - ``cpu`` and ``gpu``.
caffe cuda docker dockerfile install-script lasagne machine-learning machine-learning-library theano
Last synced: 10 Apr 2025
https://github.com/tommaso-dognini/polimi_gpu101_courseproject
Polimi Passion In Action GPU101 course project. Implementation in CUDA of BFS algorithm
cpp cuda cuda-programming parallel-computing
Last synced: 10 Apr 2026
https://github.com/croko22/vit-cpp
An implementation of the Transformer model architecture ("Attention Is All You Need") in pure C++17 from scratch
cpp cuda deep-learning machine-learning neural-network transformer
Last synced: 17 Jan 2026
https://github.com/hdelan/msc-hpc-final-project
In this project I implement a CUDA Lanczos method to approximate the matrix exponential. The matrix exponential is an important centrality measure for large, sparse graphs.
cuda graph-algorithms krylov-methods
Last synced: 12 Apr 2025
https://github.com/eshibusawa/cupy-cuda
Learn CUDA programming essentials with CuPy, from basic kernels to advanced memory patterns
cooperative-thread-array cub cuda cupy gpu parallel-computing python
Last synced: 15 Jun 2025
https://github.com/hatamiarash7/cuda-python
GPU programming using CUDA & Python
cuda gpu gpu-computing gpu-programming python
Last synced: 29 Apr 2026
https://github.com/sd7campeon/yelp-sentiment-analysis-with-python-bs4-and-llm
A scalable pipeline for automated extraction, preprocessing, and sentiment analysis of Yelp reviews. Uses advanced HTTP requests, HTML parsing, and text normalization (tokenization, stopword removal, lemmatization) to enable precise polarity and subjectivity analysis for consumer insights and business analytics.
beautifulsoup beautifulsoup4 business-analytics cuda data-analysis nlp-machine-learning nltk opinion-mining pandas python python3 requests-library-python sentiment-analysis text-preprocessing textblob torch web-scraping yelp-reviews
Last synced: 06 May 2026
https://github.com/sartajbhuvaji/cuda
Deloped CUDA kernel functions to load and train a Convolution Neural Network from scratch.
cuda cuda-programming gpu-programming neural-network nvidia-cuda
Last synced: 30 Mar 2025
https://github.com/denyskryvytskyi/capgemini-cuda
CUDA implementation of vector additon, matrix multiplication, reduction and sorting
bitonic-sort cpp cuda cuda-kernels gpgpu matrix matrix-multiplication matrix-multiplication-parallel matrix-transpose nvidia nvidia-cuda nvidia-gpu reduction-dimension sort sorting-algorithms-implemented vector vector-addition vectorization
Last synced: 14 May 2026
https://github.com/brosnanyuen/raybnn_sparse
Sparse Matrix Library for GPUs, CPUs, and FPGAs via CUDA, OpenCL, and oneAPI
arrayfire cpu cuda gpu gpu-computing opencl parallel parallel-computing parallel-programming raybnn rust sparse sparse-coding sparse-matrix sparse-neural-networks
Last synced: 19 Jan 2026
https://github.com/m15kh/cuda_programming
CUDA programming enables parallel computing on NVIDIA GPUs for high-performance tasks like deep learning and scientific computing
cuda cuda-programming gpu nvidia parallel-computing practice-programming
Last synced: 03 Apr 2025
https://github.com/subatomicplanets/simplebitcoinminer
A simple Bitcoin C++ and CUDA solo miner
bitcoin cpp cryptocurrency cuda miner
Last synced: 19 Apr 2026
https://github.com/meirbek-dev/face-mask_detector
Обнаружие маски на лице в реальном времени
artificial-intelligence covid-19 cuda cudnn deep-learning face-mask graduation-project jupyter-notebook keras machine-learning mask-detection mobilnet-v2 object-detection object-recognition object-tracking opencv4-python python real-time supervised-learning tensorflow2-gpu
Last synced: 03 May 2026
https://github.com/gabrielmaialva33/enton
Autonomous AI Robot Assistant — Vision, Voice, and Soul
ai autonomous-agent computer-vision cuda llm python pytorch robot stt tts whisper yolo
Last synced: 01 Apr 2026
https://github.com/aaronjs99/planmux
PlanMux: Path Planning using Parallel/Multiplexed Computing
bellman-ford-algorithm cpp cuda dijkstra-algorithm floyd-warshall-algorithm graphs hpc openmp parallel-computing path-planning shortest-path-algorithm slurm
Last synced: 03 May 2026
https://github.com/iag-geo/image-classification
Image classification scripts using YOLOv5 with aerial imagery
cuda image-classification python pytorch swimming-pools yolov5
Last synced: 22 Feb 2026
https://github.com/davidalgis/godot_cuda
Demonstration that it is possible to use CUDA directly from Godot engine.
Last synced: 03 May 2026
https://github.com/matteogianferrari/qr-decomposition
Tthis project implements different methods to exploit caches usage, the multicore CPU and the GPU architectures, on the Gram-Schmidt QR Decomposition algorithm and measure the performance of the different implementations.
cuda openmp parallel-computing
Last synced: 12 Apr 2026
https://github.com/applicative-systems/nixos-gpu-tests
GPU-enabled tests with CUDA in the NixOS integration test driver
amd cuda nix nixos nvidia nvidia-gpu radeon sandbox test test-automation test-automation-framework test-framework zluda
Last synced: 02 Apr 2026
https://github.com/fardinsabid/aleam
Aleam: True randomness for AI. Non-recursive, stateless, cryptographically secure random number generator.
ai aleam cryptographic-random cuda cupy deep-learning distributions entropy gpu-acceleration jax machine-learning opensource probability pypi python pytorch random-number-generator statistics tensorflow true-randomness
Last synced: 06 Apr 2026
https://github.com/kichappa/spy-sim
Simulate a spying strategy on a topography
combat-modeling cuda differential-equations julia modeling-and-simulation topography-simulation
Last synced: 09 Mar 2026
https://github.com/malolm/jupyter-ml-with-gpu-support
Jupyter with GPU acceleration for Windows 10/11
cuda cudnn jupternotebook jupyter jupyterlab nvidia-gpu windows-10 windows-11
Last synced: 09 Apr 2026
https://github.com/pintamonas4575/tfg-diffusion-model-customdataset
Creación en Pytorch de un modelo de difusión para generación incondicional de imágenes con un dataset propio.
attention-mechanism cnn cosine-scheduler cuda custom-dataset ddim deep-learning diffusion-models gpu image-generation pytorch
Last synced: 17 Apr 2026
https://github.com/orgh0/highperformancecnn
Implementation of a High Performance CNN for MNIST dataset
Last synced: 18 May 2026
https://github.com/matx64/rs-netbot
Old School Runescape bot with CNN for object identification
Last synced: 04 May 2026
https://github.com/yooodleee/hello-cuda
👽Nice to meet you, CUDA!👽
c cc cuda gpgpu multiprocessing
Last synced: 09 Apr 2026
https://github.com/bhattbhavesh91/rapids-cudf-cuml-example
Running KNN algorithm much faster on GPU for free using RAPIDS packages like cuML and cuDF
cuda cuml deep-learning nvidia-gpu rapids rapidsai
Last synced: 17 Apr 2026
https://github.com/stanczakdominik/cuda_poisson
A 2D poisson solver via CUDA
Last synced: 29 Jun 2025
https://github.com/andih/cuda-fortran-stream
Variant of STREAM Benchmark in CUDA Fortran
cuda cuda-fortran gpu stream-benchmarks variants
Last synced: 02 Mar 2025
https://github.com/bjornmelin/ml-vision-lab
👁️ Production-grade computer vision implementations. Real-world applications in image processing, object detection, and video analytics with GPU acceleration. 📸
computer-vision cuda deep-learning image-processing object-detection opencv pytorch video-analytics
Last synced: 04 Apr 2026
https://github.com/emmanuelmess/firstcollisiontimesteprarefiedgassimulator
This simulator computes all possible intersections for a very small timestep for a particle model
Last synced: 17 Apr 2026
https://github.com/tortillazhawaii/fishes_cuda
3D boid simulation with GPU.
Last synced: 04 May 2026
https://github.com/naidezhujimo/cuda-rewrite-fast-matrix-multiplication
This repository contains an optimized implementation of matrix multiplication using CUDA. The goal of this project is to provide a high-performance solution for matrix multiplication operations on NVIDIA GPUs.
Last synced: 26 Mar 2025
https://github.com/bolner/totally-diffused
Debian/NVIDIA Docker image for AUTOMATIC1111's Stable Diffusion application.
automatic1111 cuda debian docker-image nvidia stable-diffusion xformers
Last synced: 11 Apr 2026
https://github.com/umer-farooq-cs/canny-edge-detector
High-performance Canny edge detector with CPU and CUDA implementations. Loads PGM images, performs Gaussian smoothing, gradients, non-max suppression, and hysteresis. Benchmarks both paths, outputs edge maps, and reports speedup. Simple Makefile, sample images included.
c canny-edge-detection computer-vision cpp cuda gpu high-performance-computing image-processing nvcc pgm
Last synced: 18 Apr 2026
https://github.com/fynv/cudainline
A CUDA interface for Python. A distillation of the engine part of ThrustRTC.
Last synced: 18 May 2026
https://github.com/ergonomech/comfyui-windows-installer
Automated setup for ComfyUI on Windows with CUDA, custom plugins, and optimized PyTorch settings. Made to Run as Server and Error Correct,. Easy installation and launch using Miniconda.
automation comfy conda conda-environment cuda hosting-deployment setup windows
Last synced: 31 Mar 2025
https://github.com/sbstndb/grayscott_k
A simple 3D GrayScott simulation using Kokkos enabling CUDA or OpenMP backend
cuda finite-difference grayscott grid kokkos laplacian openmp simulation visualisation
Last synced: 16 May 2026
https://github.com/hyunjinno/multicore_computing
A repository of multicore programming in Java and C.
c cpp cuda java multithreading openmp thread thrust
Last synced: 18 Apr 2026
https://github.com/wallneradam/docker-ccminer
CCMiner (tpruvot version) Docker Builder
ccminer cuda docker gpu litecoin miner monero nvidia nvidia-docker
Last synced: 18 Apr 2026
https://github.com/torotoki/simple-paged-attention
A simple implementation of PagedAttention purely written in CUDA and C++.
attention cpp cuda llm transformer
Last synced: 18 May 2026
https://github.com/ayoussf/triton-hub
A container of various PyTorch neural network modules written in Triton.
cuda deep-learning openai pytorch triton triton-lang
Last synced: 14 Apr 2025
https://github.com/senli1073/docker-gpu-monitor
A lightweight GPU monitor designed for real-time web-based viewing of GPU server status.
container cuda docker flask gpu gpu-monitoring linux memory-usage nvidia-smi web
Last synced: 05 Apr 2026
https://github.com/manishklach/thermal-observatory
A generic thermal observability framework for CPU, GPU, board, and platform telemetry across vendor APIs, kernel interfaces, and runtime correlation layers.
amd arm64 cuda linux nvidia nvml observability rocm telemetry thermal-framework thermal-monitoring x86-64
Last synced: 09 Jun 2026
https://github.com/5had3z/torch-discounted-cumsum-nd
PyTorch Discounted Cumsum with Autograd (CPU + CUDA)
Last synced: 18 Apr 2026
https://github.com/sohhamseal/scalable-systems-programs
A little less effort to learn parallel programming...
Last synced: 18 Apr 2026
https://github.com/franciscoda/psvm
R package and C++ library that allows training SVM models in a GPU using CUDA and predicting out-of-sample data. A support vector machine (SVM) is a type of machine learning model that is trained using supervised data to classify samples.
cpp cpp17 cuda machine-learning r svm-classifier svm-training
Last synced: 18 Apr 2026
https://github.com/sleeepyjack/multisplit
Simple multisplit for CUDA accelerators
cpp cuda gpu nvidia parallel-programming primitive split
Last synced: 20 May 2026
https://github.com/memergamer/cuda-fluid-simulation-with-interactive-visualization
A real-time fluid dynamics simulation implemented in Python using CUDA for GPU acceleration, featuring interactive ASCII visualization and automated movement patterns.
colab-notebook cuda liquid-simulations navier-stokes
Last synced: 18 May 2026
https://github.com/baonguyen6742/uv-install-torch
Tutorial to install torch/pytorch with cuda using uv
cuda install installation package python pytorch resolver torch torchaudio torchvision tutorial uv
Last synced: 13 Apr 2026
https://github.com/adamczykpiotr/cudamatrixlibrary
Matrix operation library using single, n-threads or CUDA supported GPU
agh agh-ust cpp cuda cuda-library matrix matrix-computations matrix-functions matrix-multiplication
Last synced: 19 Apr 2026
https://github.com/dotblueshoes/robertscross
The Roberts cross operator is used in image processing and computer vision for edge detection.
cuda edge-detection image-processing
Last synced: 30 Mar 2025
https://github.com/inventwithdean/cuda_mlp
Implementation of a simple Multilayer Perceptron in pure CUDA
cuda cuda-programming deep-learning neural-networks
Last synced: 30 Mar 2025
https://github.com/andrewboessen/bitonic-merge-sort
Bitonic Merge Sort algorithm optimized for GPU execution
bitonic-merge-sort cuda sorting-network
Last synced: 16 May 2026
https://github.com/enp1s0/curand_fp16
FP16 pseudo random number generator on GPU
cuda gpu half-precision random-number-generators
Last synced: 20 Aug 2025
https://github.com/bl33h/productoftwovectors
This code utilizes CUDA for parallel vector multiplication on a GPU, demonstrating GPU's acceleration capabilities.
cuda gpu kernel paralelism parallel-programming product vector
Last synced: 16 May 2026
https://github.com/varun-1703/eu-act-navigator-rag-qabot
An interactive, privacy-first application for querying the European Union’s AI Act using a local Retrieval-Augmented Generation (RAG) pipeline. Combines semantic search (FAISS) and a quantized TinyLlama LLM for fast, accurate, and context-aware answers—all running on your own hardware.
cuda faiss hugging-face-transformers langchain legal-tech local-slm machine-learning nlp open-source privacy rag-chatbot sentence-transformers streamlit tinyllama
Last synced: 03 May 2026
https://github.com/raumberg/hypervision
Neural Network based real-time aimbot system, operating on TensorRT with custom CUDA kernel and C FFI extensions
ai aim cuda cython neural-networks python tensorrt yolo
Last synced: 20 May 2026
https://github.com/programmer-rd-ai/digivis
A PyTorch-based deep learning implementation for MNIST digit recognition featuring CNNs, GPU acceleration, experiment tracking, and comprehensive testing capabilities.
cnn computer-vision cuda data-science deep-learning digit-recognition image-classification machine-learning mnist neural-networks python pytorch wandb
Last synced: 10 Jun 2025
https://github.com/makischristou/mandelbrot
Mandelbrot set visualizer using CUDA.
cpp cuda gpu mandelbrot nvidia renderer rust
Last synced: 09 Apr 2026
https://github.com/fandreuz/parallel-programming-for-hpc
Scientific codes in C/C++ with CUDA, OpenACC, FFTW, (cu)BLAS
Last synced: 20 Apr 2026
https://github.com/satyajitghana/gpu-programming
Contains the contents of GPU Architecture and Programming course done on NPTEL
c cpp cuda cuda-programming gpu-programming nptel nvidia
Last synced: 09 Mar 2026
https://github.com/bjornmelin/deep-learning-evolution
🧠 Deep-Learning Evolution: Unified collection of TensorFlow & PyTorch projects, featuring custom CUDA kernels, distributed training, memory‑efficient methods, and production‑ready pipelines. Showcases advanced GPU optimizations, from foundational models to cutting‑edge architectures. 🚀
ai-research cuda data-science deep-learning distributed-training gan gpu-acceleration machine-learning model-optimization neural-networks python pytorch tensorflow training-pipeline transformers
Last synced: 09 May 2026