CUDA
CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.
- GitHub: https://github.com/topics/cuda
- Wikipedia: https://en.wikipedia.org/wiki/CUDA
- Created by: Nvidia
- Released: June 23, 2007
- Related Topics: nvcc,
- Last updated: 2026-07-02 00:07:18 UTC
- JSON Representation
https://github.com/py-sandy/llama.cpp-windows-builder
Automated, reproducible build scripts for llama.cpp on Windows 10/11. Installs prerequisites, configures CMake and builds with CUDA.
ai build-scripts build-tool builder cuda llamacpp script scripts windows windows-10 windows-11
Last synced: 20 Apr 2026
https://github.com/zepedroresende/matrixmultiplication
Matrix Multiplication optimizations on intel and CUDA
c cpp cuda hpc matrix-multiplication omp optimization
Last synced: 01 May 2026
https://github.com/maxenceleguery/jare
3D Render engine accelerated with CUDA
Last synced: 21 May 2026
https://github.com/mrkct/cuda-raytracer
Simple CUDA-Accelerated raytracer
cuda gpu raytracing raytracing-one-weekend
Last synced: 21 Apr 2026
https://github.com/rai-project/dlperf
Déjà vu: Modeling DNN Performance by Recalling History
benchmark cuda deep-learning modeling onnx performance tensorflow
Last synced: 21 Apr 2026
https://github.com/musaibbashir/object-detection
Pytorch+CUDA implementation of several image classification and object detection models like YOLO , Fast-CNN, RF-DETR
cnn computer-vision cuda image-classification object-detection pytorch yolo
Last synced: 21 Apr 2026
https://github.com/d-krylov/cuda_to_opengl
Simple examples for CUDA OpenGL interoperability
Last synced: 01 May 2026
https://github.com/sbstndb/nbody_k
A simple 3D naïve NBody simulation using Kokkos enabling CUDA or OpenMP backend
cuda kokkos nbody openmp simulation
Last synced: 21 May 2026
https://github.com/dimitrijkrstev/pp-cuda-fft
A parallelised CUDA implementation of the FFT Radix-2 algorithm and its execution time comparison to the DFT and non-parallelised Radix-2
Last synced: 22 Apr 2026
https://github.com/mdnpascual/judgebarmashvp
Error bar for the game called Mash VP
cuda emgucv screencapturer tesseract-ocr
Last synced: 22 Apr 2026
https://github.com/shermanlo77/poisson_icing
Gibbs sampling on the Poisson-Ising model. The Poisson-Ising model is a 2D image of Poisson distributed random variables but has a dependency on their four neighbours. This causes the Poisson random variables to be similar (or dissimilar) to their neighbours.
cuda cupy gibbs-sampling gpu ising-model mcmc monte-carlo poisson poisson-ising
Last synced: 21 May 2026
https://github.com/bikemazzell/tuonella-sift
A high-performance, memory-efficient CSV deduplication tool
csv cuda deduplication logger osint rust
Last synced: 24 Apr 2026
https://github.com/xueeinstein/udacity-cs344-cuda8
Code for Udacity CS344 (Intro to Parallel Programming) using CUDA 8.0
cuda cuda-8 parallel-computing
Last synced: 02 May 2026
https://github.com/bjornmelin/ml-algorithm-playground
🧪 Core ML algorithm implementations with GPU acceleration. Featuring optimized implementations across various libraries with comprehensive analysis. 📈
algorithms cuda gpu-computing lightgbm machine-learning python scikit-learn xgboost
Last synced: 13 May 2026
https://github.com/bardifarsi/threadpoolmanager
ThreadPoolManager is a C++ project that implements an efficient multi-threading system using a thread pool for generic functions of the same type and different tasks. It includes task management, synchronization mechanisms, and thread-safe logging to demonstrate concurrent task execution.
cpp cpp17 cpp20 cuda cuda-programming memory-management multiprocessing multithreading parallel-computing parallel-processing parallel-programming thread thread-pool thread-safety threadpool threads threadsafe
Last synced: 24 Apr 2026
https://github.com/jackrekirby/raytracing-cuda
Raytracing using CUDA
cpp cuda raytracing raytracing-in-one-weekend
Last synced: 24 Apr 2026
https://github.com/ojaswithag/opencv-doc
OpenCV ile görüntü ve video işleme, makine öğrenmesi ve proje uygulamaları için Türkçe kapsamlı bir rehber. 🐙 Adım adım kod örnekleriyle öğrenin ve projeler geliştirin.
arm-architecture cuda cuda-support deployment django docker-image docker-images heroku image-processing javascript nodejs nvidia opencv-contrib opencv3 production python scanner tutorial
Last synced: 08 Apr 2026
https://github.com/dasbd72/nthu-ipc-2022
National Tsing Hua University - Introduction to Parallel Computing - 2022
cuda cuda-programming hpc mpi openmp pthreads
Last synced: 30 Mar 2025
https://github.com/juntyr/necsim-rust-analysis
Analysis of the spatially explicit biodiversity simulation `necsim-rust`
analysis biodiversity cuda mpi necsim rust simulation
Last synced: 24 Apr 2026
https://github.com/daelsepara/hipnewton
GPU Implementation of Newton Fractal Generator with Benchmarking
amd cuda fractal gpu gpu-compute gpu-computing hip newton parallel-computing rocm sdk
Last synced: 03 May 2026
https://github.com/anne-andresen/autoencoder_3d_c_cuda
3D Autoencoder training in raw C/CUDA
Last synced: 28 Apr 2026
https://github.com/0xsooki/extending-jax
JAX Custom Operations with C++ and CUDA (using Pybind11)
Last synced: 25 Apr 2026
https://github.com/fedesky25/hpc-project-2024
Project for the 2024 course of HPC: generator of streamplot of complex-valued functions
Last synced: 30 Mar 2025
https://github.com/sangioai/torchpace
PyTorch CUDA/C++ extension of PACE: Transformer non-linearlity accelerator engine.
Last synced: 25 Apr 2026
https://github.com/cserajdeep/dnn-iris-pytorch
Deep Neural Network with Batch normalization for tabulat datasets.
batch batch-normalization classification cuda deep-learning dnn iris-dataset
Last synced: 02 May 2026
https://github.com/cs550-epfl/review
Review of the paper A Formal Analysis of the NVIDIA PTX Memory Consistency Model
cuda formal-verification gpu memory-consistency ptx simt
Last synced: 30 Mar 2025
https://github.com/nxoti1/points-reader-ocr
🖥️ Extract text from images easily with POINTS-Reader OCR, a high-accuracy application for seamless document conversion and processing.
cuda gradio huggingface-transformers ocr open-source points-reader reportlab spaces tencent vision-language-model vlm
Last synced: 20 May 2026
https://github.com/daviddavo/19gpu
Short exercises for GPU at Complutense University of Madrid. Mirror from GitLab
accelerator cuda gpu-programming
Last synced: 26 Apr 2026
https://github.com/snandasena/courseera_gpu_specilization_capstone_project
Coursera GPU Specilization Capstone Project
cpp cuda gpu-programming imageprocessing linearalgebra
Last synced: 02 May 2026
https://github.com/shashshukla/ee-210-signals-and-systems
Code for the assignments for EE-210, Signals and Systems, at IIT Bombay 2016.
cuda image-processing signal-processing
Last synced: 26 Apr 2026
https://github.com/td99/ai-sandbox
A collection of AI tools and prototypes.
ai cuda docker image-generation-ai nvidia python
Last synced: 08 Apr 2026
https://github.com/alexyzha/cuda-bioinformatics
A CUDA-Accelerated Bioinformatics Toolchain
bioinformatics bioinformatics-tool cplusplus cuda
Last synced: 26 Apr 2026
https://github.com/belrbez/ship-graphic-qt-qml-cuda-c
Client-Server application for Rocket driving in QML graphics
c client-server cpp cuda qml qt5 rocket
Last synced: 08 Apr 2026
https://github.com/yangfengzzz/tardis
Travel space and time by using autodiff and codegen
Last synced: 03 May 2026
https://github.com/waz4/tinycomb
A lightweight C and CUDA library for efficiently calculating combinations with repetition. Jump to any combination much faster than bruteforce methods, leveraging precomputed factorials and `tiny-bignum-c` for big-number support.
c combinations-generator combinations-with-repetition cuda tiny-bignum-c tinycomb
Last synced: 02 May 2026
https://github.com/bjornmelin/edge-ai-engineering
📱 Optimized ML for edge devices. Showcasing efficient model deployment, GPU-CPU memory transfer optimization, and real-world edge AI applications. 🤖
cuda edge-computing embedded-systems gpu-optimization iot mobile-ml model-optimization python tflite
Last synced: 02 May 2026
https://github.com/mateuszk098/parallel-programming-examples
Simple parallel programming examples with CUDA, MPI and OpenMP.
cpp cuda mpi openmp parallel-programming
Last synced: 27 Apr 2026
https://github.com/kbredies/tgv_pycuda
Algorithms, examples and tests for denoising, deblurring, zooming, dequantization and compressive imaging with total variation (TV) and second-order total generalized variation (TGV) regularization. GPU-accelerated code using PyCUDA.
compressive-imaging cuda image-deblurring image-denoising image-dequantization image-zooming python3 total-generalized-variation total-variation
Last synced: 27 Apr 2026
https://github.com/notkartikye/cuda-image-box-filters
🖼️ CUDA-powered tool for applying box filters to a large amount of images
cuda cuda-library cuda-programming npp
Last synced: 27 Apr 2026
https://github.com/uefi-code/bachelorgraduationdesign
I developed a PyTorch_For_PoorGuys framework and Let it train LLM on NVIDIA GeForce 2080Ti GPU as my Bachelor's Graduation Design Project
chatbot cuda gpu hacking large-language-models pytorch
Last synced: 03 May 2026
https://github.com/sergeipapina/color2graycuda
color to gray image conversion nvidia CUDA kernel implementation using make or cmake to compile and link
cmake cuda cuda-kernels cuda-programming link makefile nvidia
Last synced: 06 Apr 2025
https://github.com/manishklach/gb300-rl-runtime
Close-to-metal C/CUDA lab for RL inference fast paths: persistent GPU workers, hugepage KV arenas, cacheline-aware command rings, and async reward handoff. Goal: remove page faults, malloc/free, scheduler wakeups, CPU round-trips, and KV migration from the per-token path.
ai-infrastructure close-to-metal cuda gb300 gpu-inference hpc lock-free nvlink reinforcement-learning spsc-queue
Last synced: 09 Jun 2026
https://github.com/andreasholt/cuda-matmul-benchmarking
Implementing and benchmarking various matmul implementations in CUDA
Last synced: 01 Nov 2025
https://github.com/luchrist69/ascent
📄 Improve your resume with Ascent, a simple web app that provides instant feedback to help you land more interviews, all for free.
agentic-ai ascent cuda dapr dapr-pub-sub datalog differential-equations docker engine kafka mpi odeint openai openai-api rancher-desktop rendering simulation simulation-framework
Last synced: 02 May 2026
https://github.com/redhat-et/triton-cache-performance-comparison
amd-gpu cache cuda gpu nvidia-gpu performance rocm triton
Last synced: 12 Apr 2026
https://github.com/tzervas/unsloth-rs
Memory-optimized GPU kernels for LLM fine-tuning in Rust (2-5x speedup, 70-80% less VRAM)
cuda gpu machine-learning optimization rust
Last synced: 25 Jan 2026
https://github.com/fanziyang-v/parallel-computing
Parallel Computing course materials from Harbin Institute of Technology(Shenzhen).
cuda openmp openmpi parallel-computing
Last synced: 27 Mar 2025
https://github.com/illagrenan/cuda-80-cudnn6-runtime-1604-py36
Ubuntu 16.04 with Python 3.6 and CUDA Dockerfile
Last synced: 22 Jun 2025
https://github.com/bjornmelin/cuda-core-projects
🎯 Essential CUDA programming patterns and optimizations. Showcasing parallel computing expertise through matrix operations, memory management, and advanced kernel implementations. 💻
cpp cuda cuda-kernels gpu-computing high-performance-computing nvidia optimization parallel-computing
Last synced: 12 Apr 2026
https://github.com/karusb/2dca-cuda
2 Dimensional Cellular Automata Visualisation (Game of Life)
algorithm-flowchart cellular-automata cuda game game-of-life glut visual-studio
Last synced: 12 Apr 2026
https://github.com/emanuelemessina/gigacheck
ABFT Matrix Multiplication of any size in CUDA
abft cuda matrix-multiplication
Last synced: 28 Feb 2025
https://github.com/danieljvickers/fluid_simulation
An educational example for learning the Navier-Stoke equations. Also included is a C++ and CUDA shared object library, buildable with CMake, for use in your personal projects.
cpp cuda differential-equations navier-stokes numpy physics python simulation
Last synced: 04 May 2026
https://github.com/fmigneault/dockers
Collection of docker setup with common libraries for image processing and machine learning.
boost cuda docker image-processing opencv python
Last synced: 12 Apr 2026
https://github.com/lionpsiuc/cflow
A computational model for heat propagation in a cylindrical radiator using both CPU and GPU parallel processing. The simulation uses finite difference methods to model the directional flow of heat through a cylindrical pipe system with specific boundary conditions and cyclic connections between pipe segments.
Last synced: 29 May 2026
https://github.com/boned-fruitwood759/whisperx-asr-with-fastapi
🎤 Enable real-time speech recognition with WhisperX using FastAPI for efficient, scalable audio processing.
asr ctranslate2 cuda fastapi openai python speech-recognition torch transformers whisper whisperx
Last synced: 12 Apr 2026
https://github.com/bergolho/sycl
Repository with simple programs to learn SYCL.
Last synced: 16 May 2026
https://github.com/occisor2/fluidsimulation
Second project of my parallel algorithms course
cuda high-performance-computing
Last synced: 28 Feb 2025
https://github.com/prdai/mnist-digit-recognition
A PyTorch-based deep learning implementation for MNIST digit recognition featuring CNNs, GPU acceleration, experiment tracking, and comprehensive testing capabilities.
cnn computer-vision cuda data-science deep-learning digit-recognition image-classification machine-learning mnist neural-networks python pytorch wandb
Last synced: 12 Apr 2026
https://github.com/shermanlo77/oxwasp_phd
Code for the PhD thesis. The topic was on defect detection of 3D printing using x-rays. The repository includes an implementation of the mode filter and empirical null filter.
3d-printing applied-statistics computational-statistics cuda empirical-null imagej mode-filter statistics xray-projection
Last synced: 27 Mar 2025
https://github.com/boohohoo/shamining
Shamining is a cloud mining service that allows users to mine cryptocurrencies without the need for personal hardware. By renting computing power from eco-friendly data centers, users can mine efficiently. The platform offers easy-to-use interface, flexible contracts, and daily payouts.
cryptocurrency cryptomining cuda gpu-mining mining mining-software open-source opencl
Last synced: 04 Jul 2025
https://github.com/shineiarakawa/particle-stabilizer
A C++ and CUDA-based program for simulating the motion of particles.
Last synced: 12 May 2026
https://github.com/marcorentap/kokkos-docker-cluster
Deploy Docker containers with Kokkos, OpenMP, OpenMPI and CUDA as a Docker swarm.
Last synced: 10 Mar 2025
https://github.com/oaslananka/cv_cuda_cpp_sample
This is a sample project demonstrating how to use OpenCV and CUDA in C++ for detecting people in drone footage with YOLO. The project aims to be simple and understandable for those who want to learn how to use OpenCV and CUDA in C++.
computervision cpp cuda opencv
Last synced: 01 May 2026
https://github.com/gladap/heterogeneous_computing_project
Heterogeneous parallel programming exercise using OpenMP and CUDA to parallelize image filters
cuda heterogeneous-parallel-programming
Last synced: 27 Apr 2026
https://github.com/riciokzz/computer-vision
Computer Vision project
cuda data-cleaning data-engineering data-science exploratory-data-analysis machine-learning neural-network
Last synced: 20 May 2026
https://github.com/perhuepenbecker/cudyn
CUDA library for irregular tasks using a dynamic block-internal balancing mechanism
cpp cuda cuda-library cuda-programming gpu-computing gpu-programming irregular
Last synced: 28 Apr 2026
https://github.com/pintamonas4575/rlgan-project-maadm-upm
Neuroevolution to learn the Lunar Lander from Gymnasium and a GAN to learn to color images. Subject from the ML and BD master´s degree of UPM.
cifar10 cuda dcgan deep-learning flappy-bird gan genetic-algorithm lunar-lander machine-learning mlp python3 pytorch reinforcement-learning tensorflow wgan-gp
Last synced: 12 Apr 2026
https://github.com/sergiomarquezdev/yt-transcriber
🛠️ CLI tool to transcribe YouTube videos using OpenAI Whisper with CUDA acceleration, generate AI summaries (EN/ES) with Gemini, and create LinkedIn/Twitter content. Supports YouTube, Google Drive, and local files.
ai cli cuda gemini python transcription whisper youtube
Last synced: 15 May 2026
https://github.com/voduchuy/cudafsp
CUDA-based implementation of the Finite State Projection (FSP) algorithm.
chemical-master-equation cuda stochastic-reaction-networks sundials
Last synced: 20 Jan 2026
https://github.com/ncorgan/arrayfire-config-info
A small command-line utility that outputs all available ArrayFire devices
Last synced: 28 Apr 2026
https://github.com/phrutis/bip39scan.com
Collective search for old coins
bip39 brute-force client-server cuda gpu mnemonic pass passphrase passphrase-generator passwords
Last synced: 04 Sep 2025
https://github.com/djenriquez/ccminer
Dockerized ccminer
cuda docker ethereum mining nvidia nvidia-docker
Last synced: 05 May 2026
https://github.com/airvzxf/c-plus-plus-understanding-cuda
Understanding CUDA with C++
cuda hacktoberfest hacktoberfest-accepted
Last synced: 22 Mar 2025
https://github.com/voschezang/holographic-projector-simulations
Optimizations of Simulations of Holographic Projectors using CUDA
cuda gpu holography parallel-computing photonics
Last synced: 16 May 2026
https://github.com/pipecruz/cuda-flocking-sim
CPU and GPU (CUDA) implementations of naive/optimized flocking algorithms
Last synced: 07 May 2026
https://github.com/obsidianplusplus/yolov5-tensorrt-accelerator
基于TensorRT加速的YOLOv5高性能推理框架 | High-performance YOLOv5 inference framework accelerated by TensorRT with dynamic optimization
cuda dynamic-shapes-cuda-stream fp16 int8 pycuda tensorrt yolov5
Last synced: 28 Apr 2026
https://github.com/alpinebuster/meshlib
Mesh processing library with extra `C/C#/JS/TS/PYTHON` bindings.
cuda dicom electron emscripten mesh mesh-modelling pybind11 stl stomatology threejs wasm
Last synced: 03 Jul 2025
https://github.com/crazyguitar/libefaxx
aws benchmark cpp20-coroutine cuda efa gpu gpu-benchmarks hpc large-language-models llm rdma rdma-benchmarks
Last synced: 16 Jan 2026
https://github.com/separatrixxx/pgp_labs_7_sem
👓 Laboratory work for the 7 semester of MAI on PGP and PDP
Last synced: 15 May 2026
https://github.com/fikri-rouzan/cuda-c-program-part-1
CUDA C program from NVIDIA course.
Last synced: 12 Apr 2026
https://github.com/hrolive/data-analytics-in-the-era-of-large-scale-machine-learning
Slides and other material for the Cyprus NCC training event about "Data analytics in the era of large-scale machine learning".
cuda deep-learning gpu-acceleration gradient-boosting large-language-models machine-learning preprocessing python pytorch
Last synced: 13 Apr 2026
https://github.com/lanceberge/cuda-newton-fractals
Parallelize and visualize the Newton Iteration
cpp cuda mathematical-modelling visualization
Last synced: 16 May 2026
https://github.com/yash-1335/qwen600
🚀 Build a fast inference engine for the QWEN3-0.6B model using CUDA, optimizing performance with minimal dependencies for efficient learning and practice.
cuda cuda-programming gpu llamacpp llm llm-inference qwen qwen3 transformer
Last synced: 16 May 2026
https://github.com/timxor/c_code
Some of my C code
c cuda m4 parallel-programming
Last synced: 03 May 2026
https://github.com/bd2720/accesspatterns
Comparing chunked vs. striped memory access patterns for CPU and GPU code using the CUDA toolkit in C.
c cache cuda cuda-toolkit performance-analysis performance-testing profiling
Last synced: 16 May 2026