CUDA
CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.
- GitHub: https://github.com/topics/cuda
- Wikipedia: https://en.wikipedia.org/wiki/CUDA
- Created by: Nvidia
- Released: June 23, 2007
- Related Topics: nvcc,
- Last updated: 2026-06-23 00:07:15 UTC
- JSON Representation
https://github.com/xueeinstein/udacity-cs344-cuda8
Code for Udacity CS344 (Intro to Parallel Programming) using CUDA 8.0
cuda cuda-8 parallel-computing
Last synced: 02 May 2026
https://github.com/juntyr/necsim-rust-docs
Documentation of the spatially explicit biodiversity simulation necsim-rust
biodiversity cuda docs mpi necsim rust simulation
Last synced: 14 May 2026
https://github.com/shermanlo77/oxwasp_phd
Code for the PhD thesis. The topic was on defect detection of 3D printing using x-rays. The repository includes an implementation of the mode filter and empirical null filter.
3d-printing applied-statistics computational-statistics cuda empirical-null imagej mode-filter statistics xray-projection
Last synced: 27 Mar 2025
https://github.com/dimitrijkrstev/pp-cuda-fft
A parallelised CUDA implementation of the FFT Radix-2 algorithm and its execution time comparison to the DFT and non-parallelised Radix-2
Last synced: 22 Apr 2026
https://github.com/mdnpascual/judgebarmashvp
Error bar for the game called Mash VP
cuda emgucv screencapturer tesseract-ocr
Last synced: 22 Apr 2026
https://github.com/bergolho/sycl
Repository with simple programs to learn SYCL.
Last synced: 16 May 2026
https://github.com/bikemazzell/tuonella-sift
A high-performance, memory-efficient CSV deduplication tool
csv cuda deduplication logger osint rust
Last synced: 24 Apr 2026
https://github.com/cserajdeep/dnn-iris-pytorch
Deep Neural Network with Batch normalization for tabulat datasets.
batch batch-normalization classification cuda deep-learning dnn iris-dataset
Last synced: 02 May 2026
https://github.com/bardifarsi/threadpoolmanager
ThreadPoolManager is a C++ project that implements an efficient multi-threading system using a thread pool for generic functions of the same type and different tasks. It includes task management, synchronization mechanisms, and thread-safe logging to demonstrate concurrent task execution.
cpp cpp17 cpp20 cuda cuda-programming memory-management multiprocessing multithreading parallel-computing parallel-processing parallel-programming thread thread-pool thread-safety threadpool threads threadsafe
Last synced: 24 Apr 2026
https://github.com/jackrekirby/raytracing-cuda
Raytracing using CUDA
cpp cuda raytracing raytracing-in-one-weekend
Last synced: 24 Apr 2026
https://github.com/nguyenpanda/gemm
Parallel Computing Assignment - K251 - HCMUT - VNU
cpp23 cuda forkjoin matrix-multiplication mpi openmp openmpi parallel-computing simd simd-instructions strassen-multiplication
Last synced: 14 May 2026
https://github.com/fanziyang-v/parallel-computing
Parallel Computing course materials from Harbin Institute of Technology(Shenzhen).
cuda openmp openmpi parallel-computing
Last synced: 27 Mar 2025
https://github.com/juntyr/necsim-rust-analysis
Analysis of the spatially explicit biodiversity simulation `necsim-rust`
analysis biodiversity cuda mpi necsim rust simulation
Last synced: 24 Apr 2026
https://github.com/tzervas/unsloth-rs
Memory-optimized GPU kernels for LLM fine-tuning in Rust (2-5x speedup, 70-80% less VRAM)
cuda gpu machine-learning optimization rust
Last synced: 25 Jan 2026
https://github.com/illagrenan/cuda-80-cudnn6-runtime-1604-py36
Ubuntu 16.04 with Python 3.6 and CUDA Dockerfile
Last synced: 22 Jun 2025
https://github.com/0xsooki/extending-jax
JAX Custom Operations with C++ and CUDA (using Pybind11)
Last synced: 25 Apr 2026
https://github.com/danieljvickers/fluid_simulation
An educational example for learning the Navier-Stoke equations. Also included is a C++ and CUDA shared object library, buildable with CMake, for use in your personal projects.
cpp cuda differential-equations navier-stokes numpy physics python simulation
Last synced: 04 May 2026
https://github.com/sangioai/torchpace
PyTorch CUDA/C++ extension of PACE: Transformer non-linearlity accelerator engine.
Last synced: 25 Apr 2026
https://github.com/snandasena/courseera_gpu_specilization_capstone_project
Coursera GPU Specilization Capstone Project
cpp cuda gpu-programming imageprocessing linearalgebra
Last synced: 02 May 2026
https://github.com/shineiarakawa/particle-stabilizer
A C++ and CUDA-based program for simulating the motion of particles.
Last synced: 12 May 2026
https://github.com/brainlesslabs/jalebi
C++ String algorithms for maximum performance
c-plus-plus cplusplus cpp cpp-library cpu cuda library parallel performance simd sse string string-matching vectorization
Last synced: 14 May 2026
https://github.com/daviddavo/19gpu
Short exercises for GPU at Complutense University of Madrid. Mirror from GitLab
accelerator cuda gpu-programming
Last synced: 26 Apr 2026
https://github.com/oaslananka/cv_cuda_cpp_sample
This is a sample project demonstrating how to use OpenCV and CUDA in C++ for detecting people in drone footage with YOLO. The project aims to be simple and understandable for those who want to learn how to use OpenCV and CUDA in C++.
computervision cpp cuda opencv
Last synced: 01 May 2026
https://github.com/waz4/tinycomb
A lightweight C and CUDA library for efficiently calculating combinations with repetition. Jump to any combination much faster than bruteforce methods, leveraging precomputed factorials and `tiny-bignum-c` for big-number support.
c combinations-generator combinations-with-repetition cuda tiny-bignum-c tinycomb
Last synced: 02 May 2026
https://github.com/shashshukla/ee-210-signals-and-systems
Code for the assignments for EE-210, Signals and Systems, at IIT Bombay 2016.
cuda image-processing signal-processing
Last synced: 26 Apr 2026
https://github.com/sergiomarquezdev/yt-transcriber
🛠️ CLI tool to transcribe YouTube videos using OpenAI Whisper with CUDA acceleration, generate AI summaries (EN/ES) with Gemini, and create LinkedIn/Twitter content. Supports YouTube, Google Drive, and local files.
ai cli cuda gemini python transcription whisper youtube
Last synced: 15 May 2026
https://github.com/alexyzha/cuda-bioinformatics
A CUDA-Accelerated Bioinformatics Toolchain
bioinformatics bioinformatics-tool cplusplus cuda
Last synced: 26 Apr 2026
https://github.com/separatrixxx/pgp_labs_7_sem
👓 Laboratory work for the 7 semester of MAI on PGP and PDP
Last synced: 15 May 2026
https://github.com/bjornmelin/edge-ai-engineering
📱 Optimized ML for edge devices. Showcasing efficient model deployment, GPU-CPU memory transfer optimization, and real-world edge AI applications. 🤖
cuda edge-computing embedded-systems gpu-optimization iot mobile-ml model-optimization python tflite
Last synced: 02 May 2026
https://github.com/mathiasotnes/gemm
General Matrix Multiplication (GEMM) optimization in Cuda.
Last synced: 26 Mar 2025
https://github.com/baro-00/cpp-cuda-lab
Experimental C++ projects using NVIDIA CUDA for parallel computing. Learning & testing GPU kernels
Last synced: 04 May 2026
https://github.com/mateuszk098/parallel-programming-examples
Simple parallel programming examples with CUDA, MPI and OpenMP.
cpp cuda mpi openmp parallel-programming
Last synced: 27 Apr 2026
https://github.com/kbredies/tgv_pycuda
Algorithms, examples and tests for denoising, deblurring, zooming, dequantization and compressive imaging with total variation (TV) and second-order total generalized variation (TGV) regularization. GPU-accelerated code using PyCUDA.
compressive-imaging cuda image-deblurring image-denoising image-dequantization image-zooming python3 total-generalized-variation total-variation
Last synced: 27 Apr 2026
https://github.com/notkartikye/cuda-image-box-filters
🖼️ CUDA-powered tool for applying box filters to a large amount of images
cuda cuda-library cuda-programming npp
Last synced: 27 Apr 2026
https://github.com/tornikeo/minimal-vscode-cuda-meson
Minimal sample of using VSCode and Meson to build CUDA applications
Last synced: 08 Sep 2025
https://github.com/lablup/backend.ai-accelerator-cuda
The Backend.AI CUDA Accelerator Plugin
Last synced: 16 May 2026
https://github.com/luchrist69/ascent
📄 Improve your resume with Ascent, a simple web app that provides instant feedback to help you land more interviews, all for free.
agentic-ai ascent cuda dapr dapr-pub-sub datalog differential-equations docker engine kafka mpi odeint openai openai-api rancher-desktop rendering simulation simulation-framework
Last synced: 02 May 2026
https://github.com/0x778/gaussian_filter_using_cuda
Implemention of gaussain filter using CUDA
cuda cuda-kernels cuda-programming image-processing
Last synced: 04 May 2026
https://github.com/seanwevans/damnati
A CUDA-accelerated iterated prisoner's dilemma arena
arena cuda iterated-prisoners-dilemma prisoners-dilemma tournament
Last synced: 14 May 2026
https://github.com/gladap/heterogeneous_computing_project
Heterogeneous parallel programming exercise using OpenMP and CUDA to parallelize image filters
cuda heterogeneous-parallel-programming
Last synced: 27 Apr 2026
https://github.com/perhuepenbecker/cudyn
CUDA library for irregular tasks using a dynamic block-internal balancing mechanism
cpp cuda cuda-library cuda-programming gpu-computing gpu-programming irregular
Last synced: 28 Apr 2026
https://github.com/timxor/c_code
Some of my C code
c cuda m4 parallel-programming
Last synced: 03 May 2026
https://github.com/ncorgan/arrayfire-config-info
A small command-line utility that outputs all available ArrayFire devices
Last synced: 28 Apr 2026
https://github.com/dwain-barnes/llm-gguf-auto-converter
Automated Jupyter notebook solution for batch converting Large Language Models to GGUF format with multiple quantization options. Built on llama.cpp with HuggingFace integration.
auto-converter batch-processing cuda gguf huggingface jupyter-notebook llama-cpp llm model-quantization
Last synced: 17 Jun 2025
https://github.com/obsidianplusplus/yolov5-tensorrt-accelerator
基于TensorRT加速的YOLOv5高性能推理框架 | High-performance YOLOv5 inference framework accelerated by TensorRT with dynamic optimization
cuda dynamic-shapes-cuda-stream fp16 int8 pycuda tensorrt yolov5
Last synced: 28 Apr 2026
https://github.com/rajkamalsah/flow-hpc-shocktrack
GPU-accelerated, fault-tolerant Schlieren/PIV shock tracking with interactive ROI, 1-px edges, and resumable training.
ai-ml computer-vision cuda fluid-dynamics hpc mlsystem opencv piv pytorch schlieren scientific-ml smalldata transformer
Last synced: 03 May 2026
https://github.com/rog0d/gpuss_watchers
"The GPU Watchers swore upon their shared memory hierarchy, from L1 to global memory, which also served as their mandate as lords of parallel computation."
cuda gpu-acceleration gpu-monitoring gpu-profiling
Last synced: 28 Apr 2026
https://github.com/axeloooo/pytorch
Collection of deep learning workflows in PyTorch, from fundamentals and classification to transfer learning and experiment tracking.
Last synced: 28 Apr 2026
https://github.com/ltsyk/smart-snake-ai
Advanced Deep Q-Network AI for Snake Game with CUDA support and 700% performance boost
artificial-intelligence cuda deep-q-network dqn game-ai machine-learning pytorch reinforcement-learning snake-game
Last synced: 28 Apr 2026
https://github.com/elcruzo/cuda-conv
Lightweight CUDA kernel for 2D image convolution achieving 20x+ speedup. Built with CuPy for the NVIDIA Hackathon.
computer-vision convolution cuda cupy gpu-computing hackathon high-performance-computing image-processing nvidia python
Last synced: 15 May 2026
https://github.com/atelierarith/julia_gpu_playground
For those who want use Julia with GPU
cuda docker docker-compose julia
Last synced: 28 Apr 2026
https://github.com/jegp/aestream-paper
AEStream paper
coroutines cuda event-based-vision gpu
Last synced: 03 May 2026
https://github.com/ccfelius/hpc
High Performance Computing (CUDA, MPI/openMP, high performance ML)
cuda high-performance-computing machine-learning mpi
Last synced: 28 Apr 2026
https://github.com/lehoangan2906/cuda_basics
A simple implementation of operations on vectors and matrices, optimized for running on Nvidia GPU with CUDA
Last synced: 16 Jun 2025
https://github.com/emanuelemessina/cuda-benchmark
Evaluate matrix calculations time between CPU and GPU (CUDA)
benchmark cuda matrix-calculations
Last synced: 28 Apr 2026
https://github.com/shermanlo77/modefilter
ImageJ plugin, Java and CuPy implementation of the mode filter and empirical null filter. The mode filter is an edge-preserving smoothing filter by taking the mode of the empirical density.
cuda cupy empirical-null fiji filter image-filter imagej jcuda mode-filter
Last synced: 28 Apr 2026
https://github.com/lionpsiuc/cflow
A computational model for heat propagation in a cylindrical radiator using both CPU and GPU parallel processing. The simulation uses finite difference methods to model the directional flow of heat through a cylindrical pipe system with specific boundary conditions and cyclic connections between pipe segments.
Last synced: 29 May 2026
https://github.com/bjornmelin/cuda-core-projects
🎯 Essential CUDA programming patterns and optimizations. Showcasing parallel computing expertise through matrix operations, memory management, and advanced kernel implementations. 💻
cpp cuda cuda-kernels gpu-computing high-performance-computing nvidia optimization parallel-computing
Last synced: 12 Apr 2026
https://github.com/jalberty2018/run-pytorch-cuda-develop
Compile environment for Pytorch with CUDA
cloud code-server compiler cuda cuda-toolkit docker-image flash-attn jupyterlab python python3 pytorch sage-attention
Last synced: 28 Apr 2026
https://github.com/redhat-et/triton-cache-performance-comparison
amd-gpu cache cuda gpu nvidia-gpu performance rocm triton
Last synced: 12 Apr 2026
https://github.com/karusb/2dca-cuda
2 Dimensional Cellular Automata Visualisation (Game of Life)
algorithm-flowchart cellular-automata cuda game game-of-life glut visual-studio
Last synced: 12 Apr 2026
https://github.com/emanuelemessina/gigacheck
ABFT Matrix Multiplication of any size in CUDA
abft cuda matrix-multiplication
Last synced: 28 Feb 2025
https://github.com/enapiuz/logic-circuit-simulator
Logic circuit (based on NAND gates) simulator using OpenCL
c circuit-simulator cuda digital-logic gpgpu logic-gates opencl simulator
Last synced: 03 May 2026
https://github.com/deltatecs/voses
Volatile Secret Searcher - massively parallel, brute force memory dump analysis for (D)TLS secret extraction
cuda memory-hacking reverse-engineering tls
Last synced: 15 Jun 2025
https://github.com/fmigneault/dockers
Collection of docker setup with common libraries for image processing and machine learning.
boost cuda docker image-processing opencv python
Last synced: 12 Apr 2026
https://github.com/mohammadshabazuddin/text_to_speech_generation_with_llm_with_hugging_face
Build a text-to-speech generation system using LLMs and Hugging Face to convert text into natural audio speech.
cuda huggingface-transformers llms nlp
Last synced: 03 May 2026
https://github.com/boned-fruitwood759/whisperx-asr-with-fastapi
🎤 Enable real-time speech recognition with WhisperX using FastAPI for efficient, scalable audio processing.
asr ctranslate2 cuda fastapi openai python speech-recognition torch transformers whisper whisperx
Last synced: 12 Apr 2026
https://github.com/occisor2/fluidsimulation
Second project of my parallel algorithms course
cuda high-performance-computing
Last synced: 28 Feb 2025
https://github.com/prdai/mnist-digit-recognition
A PyTorch-based deep learning implementation for MNIST digit recognition featuring CNNs, GPU acceleration, experiment tracking, and comprehensive testing capabilities.
cnn computer-vision cuda data-science deep-learning digit-recognition image-classification machine-learning mnist neural-networks python pytorch wandb
Last synced: 12 Apr 2026
https://github.com/boohohoo/shamining
Shamining is a cloud mining service that allows users to mine cryptocurrencies without the need for personal hardware. By renting computing power from eco-friendly data centers, users can mine efficiently. The platform offers easy-to-use interface, flexible contracts, and daily payouts.
cryptocurrency cryptomining cuda gpu-mining mining mining-software open-source opencl
Last synced: 04 Jul 2025
https://github.com/marcorentap/kokkos-docker-cluster
Deploy Docker containers with Kokkos, OpenMP, OpenMPI and CUDA as a Docker swarm.
Last synced: 10 Mar 2025
https://github.com/pintamonas4575/rlgan-project-maadm-upm
Neuroevolution to learn the Lunar Lander from Gymnasium and a GAN to learn to color images. Subject from the ML and BD master´s degree of UPM.
cifar10 cuda dcgan deep-learning flappy-bird gan genetic-algorithm lunar-lander machine-learning mlp python3 pytorch reinforcement-learning tensorflow wgan-gp
Last synced: 12 Apr 2026
https://github.com/phrutis/bip39scan.com
Collective search for old coins
bip39 brute-force client-server cuda gpu mnemonic pass passphrase passphrase-generator passwords
Last synced: 04 Sep 2025
https://github.com/pipecruz/cuda-flocking-sim
CPU and GPU (CUDA) implementations of naive/optimized flocking algorithms
Last synced: 07 May 2026
https://github.com/hrolive/data-analytics-in-the-era-of-large-scale-machine-learning
Slides and other material for the Cyprus NCC training event about "Data analytics in the era of large-scale machine learning".
cuda deep-learning gpu-acceleration gradient-boosting large-language-models machine-learning preprocessing python pytorch
Last synced: 13 Apr 2026
https://github.com/alpinebuster/meshlib
Mesh processing library with extra `C/C#/JS/TS/PYTHON` bindings.
cuda dicom electron emscripten mesh mesh-modelling pybind11 stl stomatology threejs wasm
Last synced: 03 Jul 2025
https://github.com/9prady9/archdock
Arch linux docker image for app development
arch-linux arrayfire cuda docker-image forge opencl
Last synced: 03 May 2026
https://github.com/crazyguitar/libefaxx
aws benchmark cpp20-coroutine cuda efa gpu gpu-benchmarks hpc large-language-models llm rdma rdma-benchmarks
Last synced: 16 Jan 2026
https://github.com/fikri-rouzan/cuda-c-program-part-1
CUDA C program from NVIDIA course.
Last synced: 12 Apr 2026
https://github.com/isquicha/cuda-parallel-studies
Learning CUDA programming here =D
cuda cuda-programming cuda-toolkit
Last synced: 03 Jul 2025
https://github.com/yutakseo/docker_ubuntu-cuda_environment
🐳 A ready-to-use Docker environment for deep learning development with Ubuntu 22.04 and CUDA 11.8.
container cuda docker environment ubuntu
Last synced: 12 Apr 2026
https://github.com/matthewfeickert/report-urssi-fellowship-2025
Report on URSSI 2025 Early-Career Fellowship
Last synced: 17 Jan 2026
https://github.com/alessiobugetti/histogram-equalization
Implements sequential and parallel histogram equalization in C++ and Python, utilizing CUDA for parallel computation on GPU
cuda gpu-acceleration histogram-equalization parallel-computing pycuda
Last synced: 04 May 2026