CUDA
CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.
- GitHub: https://github.com/topics/cuda
- Wikipedia: https://en.wikipedia.org/wiki/CUDA
- Created by: Nvidia
- Released: June 23, 2007
- Related Topics: nvcc,
- Last updated: 2026-06-23 00:07:15 UTC
- JSON Representation
https://github.com/ray-chew/modified_ch
Density functional theory (DFT) and self-consistent field theory (SCFT) simulation of diblock copolymers
cuda density-functional-theory diblock-copolymer numerical-analysis numerical-methods self-consistent-field-theory
Last synced: 11 May 2026
https://github.com/hr-fahim/transformer-model-optimization
Sample GPT Transformer Model from Scratch.
cuda few-shot-learning transfomers
Last synced: 02 May 2026
https://github.com/fedimser/aldyparen
Renders pictures and videos with algebraic fractals
Last synced: 29 Apr 2026
https://github.com/nikhilrout/thetensorcoreproject
Microarchitecture implementation of Nvidia's Tensor Cores
cuda floating-point gpgpu hybrid-precision-training tensorcore
Last synced: 01 Apr 2025
https://github.com/aurelienperez/gpu-heston-monte-carlo
GPU-accelerated Monte Carlo simulation for option pricing under the Heston model using CUDA.
Last synced: 01 Apr 2025
https://github.com/drilonaliu/parallel-mandelbrot-set
GPU-accelerated Mandelbrot Set generation with CUDA and OpenGL interoperability.
cuda fractals gpu mandelbrot-fractal parallel-programming
Last synced: 12 Apr 2026
https://github.com/TeamBipartite/bipartite-gemm
High throughput data-parallel GEMM implementations in Cuda using Cuda cores and Tensor cores
Last synced: 14 Jan 2026
https://github.com/sephiroth7712/k-nearest-neigbours
Implementation of K-Nearest Neighbors algorithm using multiple parallel computing approaches: CUDA (GPU), Hadoop, Spark, MPI, OpenMP, and PThreads. Demonstrates scalable machine learning across different parallel computing paradigms from GPU to distributed frameworks.
cuda cuda-programming hadoop-mapreduce java mpi multiprocessing multithreading openmp pthreads scala spark
Last synced: 12 Apr 2026
https://github.com/tomtolleson/cuda-kernel-benchmarking-tool
A benchmarking tool in C++ that creates Cuda kernels and tests the overall system performance between CPU and GPU
cuda cuda-kernels cuda-support cuda-toolkit nvidia nvidia-cuda nvidia-gpu
Last synced: 30 Mar 2025
https://github.com/ysl1016/cudadigitfilter
CUDA-based parallel image filtering system for MNIST dataset
computer-vision cuda deep-learning gpu-acceleration image-processing mnist parallel-computing
Last synced: 28 Mar 2025
https://github.com/asadiahmad/100_sports_image_classification
A deep learning project for sport image classification using a custom VGG19-based architecture with integrated Grad-CAM heatmap visualization for model interpretability.
computer-vision cuda data-augmentation deep-learning explainable-ai gpu-acceleration grad-cam heatmap-visualization image-classification mixed-precision-training pytorch pytorch-grad-cam sports-analytics sports-classification transfer-learning vgg19
Last synced: 11 Jun 2025
https://github.com/eminem5410/devmind-platform
Linux-first CLI for AI environment diagnostics, repair & automation
ai automation cli cuda developer-tools devops docker generative-ai linux local-llm observability ollama python self-hosted system-monitoring
Last synced: 30 May 2026
https://github.com/xza85hrf/flux_pipeline
FluxPipeline is a prototype experimental project that provides a framework for working with the FLUX.1-schnell image generation model. This project is intended for educational and experimental purposes only.
ai cuda docker educational experimental flux1 flux1-schnell flux1ai gradio image-generation model non-commercial python pytorch research transformer-model
Last synced: 05 Jul 2025
https://github.com/doxakis/cosinesimilaritydistancesongpu
Compute cosine similarity distances for all combinations of the dataset on the gpu with CUDA
Last synced: 13 Apr 2026
https://github.com/kronbii/thermal-super-resolution
State-of-the-art thermal super-resolution system (IMDN) with RGB→thermal adaptation, custom multi-component loss, 29.6 dB PSNR, 0.713 SSIM, 250+ FPS, production-ready PyTorch + CUDA implementation.
computer-vision cuda deep-learning image-enhancement imdn model-optimization production-machine-learning pytorch real-time real-time-processing research super-resolution thermal-imaging
Last synced: 18 Apr 2026
https://github.com/voltr0x/raytracing-cuda
Raytracing in a weekend using CUDA
Last synced: 01 Apr 2026
https://github.com/codename-detective/cuda_gpgpus_shared_memory_systems_pdp
CUDA GPGPUs Shared Memory Systems Parallel & Distributed Programming
cuda cuda-programming numa parallel-programming
Last synced: 30 Mar 2025
https://github.com/tdavidcl/cu_intercept
cuda cuda-memory cuda-programming hook massif memory-tracking preload
Last synced: 03 May 2026
https://github.com/kataglyphis/machinelearningalgorithms
Basic Machine Learning Algorithms
cuda machine-learning python tensorflow
Last synced: 31 Mar 2025
https://github.com/Neuro-Mechatronics-Interfaces/python-intan
Tools and demos for working with EMG data from intan using python
circuitpython cuda emg pico python realtime tensorflow
Last synced: 13 Jan 2026
https://github.com/eyelor/text-to-image-item-generator
A Python workflow for generating random item images using models from Hugging Face.
ai conda cuda flux-schnell generator huggingface item llama python pytorch text-to-image
Last synced: 13 Apr 2026
https://github.com/sandialabs/tenzing
Core library for optimizing CUDA+MPI programs as sequential decision problems.
cuda mpi scr-2759 sequential-decision-problem
Last synced: 29 Apr 2026
https://github.com/monajemi-arman/sparkling
Easy to use Spark cluster management panel with GPU support
apache-spark csharp cuda distributed-computing distributed-learning docker gpu javascript nextjs torch typescript
Last synced: 12 Apr 2026
https://github.com/shtrophic/wicuvanity
Generate wireguard vanity keys on your Nvidia GPU
cuda gpu vanity-address vanity-addresses vanitygen wireguard
Last synced: 10 Mar 2025
https://github.com/cuda8/brainwords2
GPU brainflayer for sale $250
brain brainflayer brainwords cuda gpu key pass passphrase private
Last synced: 10 Mar 2025
https://github.com/ionmich/cs149-local-dev
Provides `conda` installation instructions for Stanford's CS149 (Parallel Computing) programming assignments
conda cs149 cuda ispc parallel-computing
Last synced: 31 Mar 2025
https://github.com/marnovo/cuda-projects
cuda cuda-kernels gpu gpu-programming nvidia-cuda parallel-computing
Last synced: 10 Jun 2025
https://github.com/ndgigliotti/torch-ipca
GPU-accelerated Incremental PCA for PyTorch
cuda dimensionality-reduction gpu incremental-pca machine-learning pca pytorch
Last synced: 26 Jan 2026
https://github.com/moesio-f/cla
C Linear Algebra (CLA) library. A simple toy library for basic vector/matrix operations with CUDA support and Python bindings.
Last synced: 09 May 2026
https://github.com/minseoc03/cuda-100-days
A 100-day journey to master CUDA programming, inspired by the CUDA-120-DAYS--CHALLENGE project. This repo contains daily CUDA exercises and code folders, with learning notes hosted on Notion. Practicing on leetgpu.com due to lack of local NVIDIA GPU.
100daysofcode cuda deeplearning gpgpu gpu hpc nvidia parallel-computing
Last synced: 19 Apr 2025
https://github.com/mattjesc/federated-learning-simulation-1gpu-mi-is
Federated Learning Simulation on a Single GPU with Model Interpretability and Interactive Visualization
ai cuda deep-learning distributed-systems federated-learning gpu hpc keras machine-learning ml model-interpretability python pytorch simulation streamlit tensorflow
Last synced: 05 Jan 2026
https://github.com/Parxd/cuda-optim
various CUDA kernels optimized for specific ML algos
Last synced: 02 Sep 2025
https://github.com/snandasena/cuda-at-scale-for-the-enterprise
Gauss Filter with CUDA and NPP
Last synced: 29 Apr 2026
https://github.com/efecaliskannn/pneumonia-detection-with-cnn--vgg16--and-resnet50-deep-learning-models
In this project, pneumonia detection using deep learning, a subset of artificial intelligence, is aimed. The performance of deep learning algorithms, including CNN, VGG16, and ResNet50 models, in detecting pneumonia has been examined.(Bu projede yapay zekanın alt kümesi olan derin öğrenme ile zatürre tespiti amaçlanmaktadır.)
artificial-intelligence convolutional-neural-networks cuda deep-learning keras-tensorflow nvidia-cuda pyhton transfer-learning
Last synced: 13 Jun 2025
https://github.com/cmazakas/cuda-stuff
A CUDA-based playground
cmake cuda delaunay-triangulation vscode
Last synced: 24 Mar 2025
https://github.com/akshaysinhaaa/emova
A deep learning framework designed for emotion and sentiment recognition using text, audio, and video modalities. This project leverages the MELD (Multimodal EmotionLines Dataset) to train a robust and flexible model that reflects human communication more accurately than unimodal models.
bert cnn cuda deep-learning multimodal python pytorch resnet-18 tensorboard transformers
Last synced: 05 May 2026
https://github.com/lucatedeschini/feedforwardnn
This project is my submission for the exam "Project Work in Architecture and Platform for Artificial Intelligence"
c cuda neural-networks openmp scratch-implementation
Last synced: 20 Apr 2026
https://github.com/h4ck3r-04/fpassword
Fpassword merges Hashcat's hash-cracking precision with Hydra's parallelized network login, offering penetration testers a powerful tool for swift hash deciphering and simultaneous login attempts across diverse protocols.
brute-force brute-force-attacks c cracking cuda gpgpu hashcat hashes hydra network-security opencl password penetration-testing
Last synced: 16 Jan 2026
https://github.com/lk/gpu-nbody
GPU-accelerated n-body engine for t-SNE and physics simulation
cuda gpu n-body n-body-simulator
Last synced: 02 Sep 2025
https://github.com/bikrammajhi/100-days-of-gpu
This is my 🔥 100 Days of GPU — a wild, hands-on journey through CUDA kernels, Triton spells, and PTX sorcery.
cuda nsight-compute ptx triton
Last synced: 18 Jun 2025
https://github.com/tchung1970/sd-cli-cuda
CUDA-accelerated Stable Diffusion plugin for wavespeed-desktop
cuda gpu linux nvidia stable-diffusion
Last synced: 09 May 2026
https://github.com/mrgkanev/tensorflow-gpu-docker-setup
A Docker environment for TensorFlow GPU development with optimized configurations for WSL2, troubleshooting guides, and common error fixes
cuda cuda-toolkit deep-learning dev-environment development-tools docker gpu-acceleration machine-learning nvidia-docker nvidia-docker-support python tensorflow
Last synced: 13 Apr 2026
https://github.com/neel-dandiwala/cuda-programs
Miscellaneous programs that grasp the concept of Parallel Computing
cuda gpu-programming parallel-programming
Last synced: 16 May 2025
https://github.com/sahil-rajwar-2004/vector-cuda
vector calculation with GPU acceleration using CUDA
c cpp11 cuda cuda-kernels cuda-programming nvcc
Last synced: 15 May 2025
https://github.com/hrshl212/custom-cuda-kernels-with-neural-network-implementation
The repository contains custom CUDA kernels for linear layer, softmax and relu which are integrated with python to develop a Neural Network
cuda neural-network python pytorch
Last synced: 08 May 2026
https://github.com/gammahazard/locate-anything
Sleek, mobile-friendly web UI for NVIDIA LocateAnything-3B — open-vocabulary object detection & grounding on your own GPU, via one docker compose up.
bounding-boxes computer-vision cuda docker fastapi gpu grounding locate-anything machine-learning nvidia object-detection ocr open-vocabulary-detection react self-hosted tailwindcss typescript vision-language-model web-ui
Last synced: 28 May 2026
https://github.com/viktor-akusoff/chernabogpy
ChernabogPy is a Python package for visualizing gravitational distortions caused by black holes using nonlinear ray tracing.
cuda gpu physics-simulation python3 relativity-of-space-and-time torch
Last synced: 15 May 2026
https://github.com/jaidevd/ipec-fdp
cuda hpc keras mapreduce numba spark tensorflow
Last synced: 11 Apr 2026
https://github.com/vladd12/libexecstd
Modern C++ library for using an execution context of computer devices
cpp cpp17 cuda gpu-acceleration gpu-computing
Last synced: 06 May 2026
https://github.com/baudneo/zomi-server
FastAPI ML server designed for ZoneMinder (zomi-client)
alpr coral-tpu cuda face-detection face-recognition fastapi machine-learning object-detection onnxruntime opencv pydantic-v2 tensorrt torch zoneminder
Last synced: 18 Jan 2026
https://github.com/lord-turmoil/cudacmakedemo
A demo for building CUDA program with CMake
Last synced: 16 Mar 2025
https://github.com/delusionary/histoptimizer
Solves a minimum variance cost of the partition problem.
Last synced: 14 Jan 2026
https://github.com/dgcnz/nvtx-vscode
Create NVIDIA NVTX ranges directly in VS Code, then profile with Nsight Systems without modifying source code.
Last synced: 13 Apr 2026
https://github.com/ran-2012/cuda-practice
cuda practice code for nvidia programming guide
Last synced: 27 Feb 2025
https://github.com/BardiFarsi/ThreadPoolManager
ThreadPoolManager is a C++ project that implements an efficient multi-threading system using a thread pool for generic functions of the same type and different tasks. It includes task management, synchronization mechanisms, and thread-safe logging to demonstrate concurrent task execution.
cpp cpp17 cpp20 cuda cuda-programming memory-management multiprocessing multithreading parallel-computing parallel-processing parallel-programming thread thread-pool thread-safety threadpool threads threadsafe
Last synced: 15 May 2025
https://github.com/avicted/hip_fm_synthesis
This project demonstrates FM Synthesis (Frequency Modulation) using HIP (Heterogeneous Compute Interface), enabling high-performance sound generation on both AMD and NVIDIA GPUs.
amd audio-processing cuda fm-synthesis hip nvidia rocm
Last synced: 16 Mar 2025
https://github.com/jeremywildsmith/shadowhash-distributed
Elixir distributed Shadow File password cracker with GPU accelerated cracking for md5crypt hashing algorithm.
cracking-hash cracking-hashes cracking-password cuda distributed-systems elixir erlang hashing nx security
Last synced: 11 May 2026
https://github.com/fabulani/360ip-with-cuda
360° Image Processing with CUDA and OpenCV.
360-image 360-video cpp cuda image-processing opencv
Last synced: 11 May 2026
https://github.com/islamshahil/live-video-analysis
Live Video Analysis using PyTorch
cuda deeplearning neural-network opencv-python python pytorch video-processing webcam
Last synced: 11 May 2026
https://github.com/apws25/accelmoe
This repository is for CUDA kernel re-implementation of CPU-based MoE model.
Last synced: 11 May 2026
https://github.com/daniilvorontsov/fourier-option-pricing
MSc thesis project concerned with option pricing for Levy Jump models. Package includes pricing implementations for European Call and Put options for Carr-Madan, COS and Fourier Time Stepping.
carr-madan cuda fourier-transform monte-carlo option-pricing
Last synced: 11 May 2026
https://github.com/theogravity/dual-rtx-6000-blackwell-gemma-4-31b-it-nvfp4
Optimized vLLM setup for Gemma 4 31B NVFP4 with MTP on dual RTX PRO 6000 Blackwell using vllm and docker: native FP4 Tensor Cores, Multi-Token Prediction (96.5% acceptance rate), and prefix caching. Includes benchmark results and replication scripts.
am5 amd blackwell cuda docker fp4 gemma gemma4 llm-inference multi-token-prediction nvfp4 prefix-caching rtx-6000 speculative-decoding tensor-parallel vllm
Last synced: 11 May 2026
https://github.com/realdougeubanks/unmanic.plugin.encoder_video_hevc_nvenc_gpu
Unmanic plugin: H.265/HEVC encoder using NVIDIA hevc_nvenc with a true end-to-end GPU pipeline. Fork of Josh5/unmanic.plugin.encoder_video_hevc_nvenc that adds -hwaccel_output_format cuda when NVDEC HW decoding is enabled, keeping decoded frames in GPU memory through NVENC. Drop-in replacement with sensible defaults and full settings parity.
cuda ffmpeg hardware-acceleration nvdec nvenc nvidia unmanic unmanic-plugin video-transcoding
Last synced: 12 May 2026
https://github.com/skailasa/msc-thesis
A modular thesis
cuda fast-multipole-method kernel-independent numba python3
Last synced: 12 May 2026
https://github.com/tomaszrewak/csgpathtracer
A constructive solid geometry path tracer.
computer-graphics cuda path-tracing rendering
Last synced: 12 May 2026
https://github.com/thesupercd/rainbow_table_builder
A high performance CUDA-based GPU accelerated Rainbow-Table maker, written in C++ without any external libraries or dependencies needed.
cpp cryptography cuda hash-table hashing parallel-processing rainbow-table sha3 sha3-512 uuid
Last synced: 12 May 2026
https://github.com/vishalanandv/small_scale_parallel_programming
The project describes the design and development of a sparse matrixvector product kernel, implemented using super computer.
Last synced: 12 May 2026
https://github.com/brocbyte/cuball
CUDA-based implementation of "Real-Time Rigid Body Simulation on GPUs" [from GPU Gems 3]
Last synced: 12 May 2026
https://github.com/aspragueumkc/hydra2dgpu
GPU-accelerated 2D shallow water equation solver for QGIS — CUDA finite-volume method with unstructured mesh support
cuda finite-volume-method gis gpu-computing hydraulic-modeling hydrodynamics qgis shallow-water-equations
Last synced: 11 Jun 2026
https://github.com/programmergnome/cuda-codes
Snippet repository for learning parallel GPU programming with CUDA.
c cpp-programming cuda cuda-kernel gpu-programming learning-materials parallel-programming parallelization
Last synced: 13 May 2026
https://github.com/rossbates/rummage
Rummage is a GPU accelerated npub miner for Nostr
Last synced: 13 May 2026
https://github.com/nyxflower/mosaics-cuda-openmp
Simple image mosaic command line too (CUDA-OpenMP-C Implementation)
c cuda gpu-programming mosaic mosaic-images openmp parallel-computing parallel-processing
Last synced: 13 May 2026
https://github.com/gianmariaromano/pmc-translated-notes
The repository contains translated notes for the course "Programmazione di Sistemi Multicore" given by Professor De Sensi for the "Informatica" course at Sapienza Università di Roma.
cuda cuda-programming mpi multicore openmp parallel-computing parallel-programming pthreads
Last synced: 14 May 2026
https://github.com/gcol33/resolve
Neural network framework for species distribution modelling (PyTorch/C++/CUDA)
cpp cuda deep-learning ecology machine-learning neural-network pytorch species-distribution
Last synced: 12 Jun 2026
https://github.com/kaierikniermann/hpc-uzh-notes
These are some notes for the High Performance Computing course taught at UZH
cuda high-performance-computing mpi openacc openmp
Last synced: 13 Jun 2026
https://github.com/g023/cuda_inf
A self-contained CUDA inference engine for LiquidAI/LFM2.5-8B-A1B (hybrid conv + GQA-attention MoE, 8.5B params, 1B active) targeting a single RTX 3060 (12 GB). No Python, no frameworks at runtime: a single .cu engine + a header-only byte-level BPE tokenizer.
3060 ai c cpp cuda fast-inference gpu inference inference-engine large-language-models lfm25 liquidai llm moe nvidia open-source rtx token
Last synced: 15 Jun 2026
https://github.com/p4suta/mojiokoshi
Local audio transcription tool with real-time progress, powered by faster-whisper and CUDA
audio-transcription cuda docker fastapi faster-whisper gpu python self-hosted speech-to-text sveltekit transcription whisper
Last synced: 16 Jun 2026
https://github.com/acuoci/pbe-fixed-pivot-cuda
Fast CUDA implementation of aggregation and breakage terms in Population Balance Equations using the fixed pivot sectional method
aggregation breakage cuda fixed-pivot pbe
Last synced: 18 Jun 2026
https://github.com/farukalamai/jetson-yolo-cpp
Real-time object detection, segmentation and tracking on NVIDIA Jetson using YOLO + TensorRT in C++
cpp cuda jetson object-detection tensorrt yolo26
Last synced: 19 Jun 2026
https://github.com/aeyage/intraday-prices
gpu-accelerated portfolio optimisation
Last synced: 19 Jun 2026
https://github.com/drilonaliu/parallel-image-scaling
cuda gpu image-processing scaling-algorithms
Last synced: 21 Jun 2026
https://github.com/sbstndb/neural_k
A simple Neural Network library using Kokkos enabling CUDA or OpenMP backend
ai cuda kokkos library neural-network openmp
Last synced: 22 Jun 2026
https://github.com/sebsop/kmeans-thesis-segmentation
Real-time hybrid quantum-classical K-means segmentation using C++ and CUDA. Bachelor's Thesis at BBU bridging HPC and Quantum Machine Learning (QML).
cpp cuda hpc imgui kmeans opencv quantum-computing
Last synced: 23 Jun 2026
https://github.com/ironjr/minimal-cuda-pytorch
Repository-level snippet for minimal implementation of a PyTorch CUDA extension.
Last synced: 04 May 2026
https://github.com/sebsop/realtime-parallel-kmeans-segmentation
Real-time C++ K-means image segmentation on live video streams, using OpenCV, RCC trees, and 5D features, optimized for consumer hardware with Sequential, Multi-threaded, MPI, and CUDA backends.
cpp cuda k-means-clustering mpi multithreading opencv rcc real-time-stream-processing
Last synced: 23 Jun 2026
https://github.com/rurumimic/candle
huggingface candle
cuda gpu huggingface nvidia transformer
Last synced: 05 May 2026
https://github.com/llm-db/understanding-gpu-architecture-implications-on-llm-serving-workloads
Understanding GPU Architecture Implications on LLM Serving Workloads (Master Thesis, ETH Zürich, 2024)
cuda inference pytorch rocm transformer
Last synced: 05 May 2026
https://github.com/kobinarth-panchalingam/parallel-and-concurrent-programming
Semester - 7 | CS4533 - Parallel and Concurrent Programming | Labs
c concurrent-programming cuda java openmp pthreads
Last synced: 05 May 2026
https://github.com/zelosleone/audiobook-generator
A GPU-accelerated Python application that converts PDF and TXT documents into high-quality MP4 audio files using WhisperSpeech technology.
ai-audio audiobook cuda gpu-acceleration machine-learning pdf-converter python pytorch speech-synthesis text-processing text-to-speech
Last synced: 05 May 2026