CUDA
CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.
- GitHub: https://github.com/topics/cuda
- Wikipedia: https://en.wikipedia.org/wiki/CUDA
- Created by: Nvidia
- Released: June 23, 2007
- Related Topics: nvcc,
- Last updated: 2026-07-01 00:07:09 UTC
- JSON Representation
https://github.com/potato3d/grid-rt
GPU-accelerated ray tracing using GLSL and CUDA
cuda glsl gpu ray-tracing real-time-rendering
Last synced: 15 Apr 2026
https://github.com/yashkathe/image-noise-reduction-with-cuda
This project conducts an analysis of image denoising technique - median blur, comparing GPU-accelerated (Numba) and CPU-based (OpenCV) processing speeds.
cuda cuda-programming gpu-programming hardware-speed-analysis image-analysis image-processing numba nvidia nvidia-cuda nvidia-gpu opencv parallel-programming
Last synced: 14 May 2025
https://github.com/zhangge6/how-to-optimize-playground
High-performance computing (HPC) demos since I was a freshmen.
Last synced: 15 May 2026
https://github.com/egororachyov/spbench
Benchmark for sparse linear algebra libraries for CPU and GPU platforms.
benchmark cpp cpu cuda gpu-computing graphblas opencl sparse-matrices
Last synced: 15 May 2025
https://github.com/tristanpenman/cuda-examples
A collection of CUDA example code
Last synced: 10 Apr 2025
https://github.com/yunzhu-li/recognizer
An object recognizer mobile app based on deep convolutional neural networks
cnn cuda cudnn gpu ios python swift tensorflow
Last synced: 20 Apr 2026
https://github.com/kiwijuice56/cuda-mandelbox
Ray marching renderer of the 3D mandelbox fractal, accelerated with CUDA GPU code
3d 3d-graphics cpp cuda fractal fractal-images fractal-rendering mandelbox nvidia-cuda
Last synced: 02 May 2026
https://github.com/fabryprog/java-gpu
Support for offloading parallel-for loops in Java to NVIDIA CUDA compatible cards.
cuda gpu java nvidia parallel-computing
Last synced: 15 Apr 2026
https://github.com/mr-technologies/streamadapter
GStreamer integration for MRTech IFF SDK
c camera cuda demosaicing dng genicam gpu gstreamer h264 h265 image-processing jetson json low-latency machine-vision mipi rest-api rtsp tiff vulkan
Last synced: 06 Apr 2026
https://github.com/simmsb/p4haskell
P4 backend in haskell
compiler cuda gpu p4 p4c p4language
Last synced: 13 May 2026
https://github.com/misha-kis/python-plane-ransac
Parallel RANSAC for plane detection for multiple point clouds using Python and CUDA
cuda numba plane-detection python ransac
Last synced: 13 May 2026
https://github.com/deftruth/ptx-isa-8.2-zh
🎉持续更新:CUDA 12.2 PTX-ISA-8.2学习笔记,部分中文翻译 + 个人理解 + 内联汇编示例,讲解CUDA 12.2 PTX-ISA-8.2 汇编指令;进行中.....
Last synced: 13 May 2026
https://github.com/marcoplaitano/counting-sort-cuda
Parallelized version of Counting Sort using CUDA
counting-sort cuda cuda-kernels cuda-programming gpu gpu-programming sort sorting sorting-algorithms
Last synced: 14 May 2026
https://github.com/lordmathis/cudanet
Convolutional Neural Network inference library running on CUDA
convolutional-neural-networks cpp cuda pytorch
Last synced: 08 May 2026
https://github.com/ran-2012/inversion
solve geophysics using CUDA & TensorFlow
cpp cuda geophysics inversion-method python
Last synced: 11 May 2026
https://github.com/pd2871/high-performance-computing
This repo contain the logs of High Performance Computing module's final Assignment
blurred-images c cuda gaussian-blur matrix-multiplication multi-threading parallel-computing pthreads pthreads-api
Last synced: 10 May 2026
https://github.com/mrglaster/cuda-acfcalc
Calculation of the smallest ACF for signals of length N using CUDA technology.
acf c calculations cpp cuda google-colaboratory google-colaboratory-notebooks isu
Last synced: 06 May 2026
https://github.com/nachovizzo/saxpy_openacc_cpp
My way of thinking about OpenACC, C++, and Parallel computing in general
Last synced: 23 Jun 2026
https://github.com/tank3-tk3/parallel-processing-cuda
Parallel processing with CUDA C / C++
c cpp cuda parallel-computing parallel-programming
Last synced: 09 May 2026
https://github.com/tky823/bitlinear158compression
Compare compression models for inference by BitLinear158
Last synced: 12 Jun 2026
https://github.com/dereklstinson/nccl
golang wrapper for nccl
cuda deep-learning go nccl parallel-computing
Last synced: 14 May 2026
https://github.com/debowin/gpu-parallel-recommender-system
GPGPU Parallel User-User Collaborative Filtering System in CUDA C
collaborative-filtering cuda gpu-programming movielens-dataset recommender-system
Last synced: 24 Apr 2026
https://github.com/nixos-cuda/cuda-legacy
Select CUDA package sets which have aged out of Nixpkgs. [maintainers=@ConnorBaker, @SomeoneSerge]
Last synced: 15 May 2026
https://github.com/acrlakshman/gradient-augmented-levelset-cuda
Implementation of Gradient Augmented Levelset method for CPU and GPU
Last synced: 17 Feb 2026
https://github.com/nellogan/distributed_compy
Distributed_compy is a distributed computing library that offers multi-threading, heterogeneous (CPU + mult-GPU), and multi-node support
cluster cuda heterogeneous-parallel-programming multi-threading multigpu openmp openmpi
Last synced: 16 Aug 2025
https://github.com/scarfy-sysu/rtx5060-pytorch-cuda129
Run PyTorch with CUDA 12.9 on RTX 50 series (e.g. RTX 5060)
cuda deep-learning pytorch rtx5060
Last synced: 20 Jul 2025
https://github.com/xlite-dev/HGEMM
⚡️Write HGEMM from scratch using Tensor Cores with WMMA, MMA PTX and CuTe API. 🎉🎉
Last synced: 30 Jul 2025
https://github.com/djenriquez/ewbf-cuda-miner
Run ewbf-miner for zcash
cuda docker mining nvidia nvidia-docker zcash zcl zclassic
Last synced: 17 May 2026
https://github.com/podgorskiy/deeplearningserversetup
My notes on setting up a server for Deep-Learning
cuda deep-learning driver ethernet ipmi neural-network nfs notes nvidia nvidia-driver nvidia-gpu server sshfs ubuntu
Last synced: 22 Aug 2025
https://github.com/lmlsna/install-scripts
Ubuntu install scripts
cuda do-release-upgrade eol nvidia tailscale ubuntu
Last synced: 18 Jul 2025
https://github.com/yingding/applyllm
A python package for applying LLM with LangChain and Hugging Face on local CUDA/MPS host
accelerator batch cuda framework inference kubeflow langchain llm mps pipeline slurm transformers
Last synced: 24 Aug 2025
https://github.com/xkevio/cuda-raytracer
A simple ray tracer written with CUDA that saves its output in a .ppm file, CPU version included for reference.
Last synced: 25 Aug 2025
https://github.com/frozenassassine/neuralnetwork-fromscratch
Neural Network from scratch in C# with CUDA support
ai classification csharp cuda gpu gpu-acceleration neural-network neural-networks nvidia
Last synced: 20 Feb 2026
https://github.com/shreyansh26/mlsys-experiments
A collection of scripts on experimenting and implementing MLSys-related stuff
cuda cuda-kernel gpu gpu-programming llm-inference profiling pytorch triton
Last synced: 30 Aug 2025
https://github.com/kim-hwiwon/t-espresso
A CUDA Library for Low-overhead Host-to-Device Transmission of Patterned Profile Data
Last synced: 04 May 2026
https://github.com/navdeep-g/dimreduce4gpu
Dimensionality reduction ("dimreduce") on GPUs ("4gpu")
cplusplus cuda dimensionality-reduction gpu linear-algebra pca python svd unsupervised-learning
Last synced: 14 Apr 2025
https://github.com/pelayo-felgueroso/tensorflow-gpu-setup
Step-by-step guide to installing TensorFlow with GPU support on Conda.
artificial-intelligence cuda deep-learning gpu machine-learning nvidia nvidia-gpu setup-guide tensorflow
Last synced: 17 Feb 2026
https://github.com/aiday-mar/mpi-cuda-project
Using MPI and CUDA in order to accelerate the conjugate gradient algorithm execution in C++
c-plus-plus cuda gpu mpi university-project
Last synced: 02 May 2026
https://github.com/mu7annad0/100gpu
100 Days of CUDA: Optimizing My Life, One Kernel at a Time. 🔄🔥
Last synced: 08 Mar 2026
https://github.com/B1-663R/docker-mining
Dockerfiles to build docker images to start mining with an NVIDIA Docker architecture
cryptocurrency cuda docker-image docker-nvidia mining
Last synced: 28 Mar 2025
https://github.com/ginkgo-project/cudaarchitectureselector
A CMake module simplifying the specification of CUDA architectures
Last synced: 05 Nov 2025
https://github.com/cklxx/arle
Rust-native inference runtime for Qwen3 / Qwen3.5 — OpenAI-compatible serving + integrated agent, train, and self-evolution workflows. CUDA + Metal, no PyTorch on the hot path.
agent cuda flashinfer gspo inference infra kv-cache llm metal mlx openai-compatible qwen3 qwen35 rl rust
Last synced: 02 May 2026
https://github.com/dito97/gol
High-performance Computing (90535) final project at UniGe
Last synced: 02 May 2026
https://github.com/superlinear-ai/scipy-notebook-gpu
jupyter/scipy-notebook with CUDA Toolkit, cuDNN, NCCL, and TensorRT
cuda cudnn docker nccl scipy-notebook tensorflow tensorrt
Last synced: 01 May 2026
https://github.com/bogdanminko/laperf
La Perf is a framework for AI performance benchmarking — covering LLMs, VLMs, embeddings, with power-metrics collection.
ai-benchmark ai-performance apple-silicon cuda lmstudio ml-benchmark mlx mps nvidia-gpu ollama open-source-benchmark
Last synced: 15 May 2026
https://github.com/lzyrapx/llm-grandmaster-notes
🎓The path to LLM mastery is paved with broken embeddings and resurrected gradients.
cuda deep-learning llm reinforcement-learning
Last synced: 14 May 2025
https://github.com/true-real-michael/python-plane-ransac
Parallel RANSAC for plane detection for multiple point clouds using Python and CUDA
cuda numba plane-detection python ransac
Last synced: 14 Mar 2025
https://github.com/galaxies99/inception-cuda
CUDA Implementation of Inception
Last synced: 12 Apr 2025
https://github.com/dhruvsrikanth/cudann
A distributed implementation of a deep learning framework in CUDA.
cpp cuda deep-learning deep-learning-framework gpu-programming high-performance-computing hpc parallel-programming
Last synced: 01 May 2026
https://github.com/pnocera/cembedd
Embeddings rust API serving intfloat/multilingual-e5-large using huggingface/candle with CUDA enabled
Last synced: 12 Jan 2026
https://github.com/murrellgroup/conflux.jl
Single-node data parallelism in Julia with CUDA
cuda data-parallelism flux julia nccl
Last synced: 22 May 2026
https://github.com/dqbd/cuda-btree
Implementation of B-Trees on NVIDIA CUDA
Last synced: 30 Apr 2026
https://github.com/amypad/numcu
Numerical CUDA-based Python library
array buffer c cpp cpython cpython-api cpython-extensions cuda cxx hacktoberfest numpy python vector
Last synced: 29 Jun 2025
https://github.com/isazi/aoflagger
AOFlagger Radio Frequency Interference mitigation algorithm.
Last synced: 30 Apr 2026
https://github.com/lintenn/cudaaddvectors-explicit-vs-unified-memory
Performance comparison of two different forms of memory management in CUDA
c cuda explicit memory memory-management performance unified-memory
Last synced: 17 May 2026
https://github.com/xmas7/cudampi
A large hybrid CPU/GPU sorting network using CUDA and MPI. The sorting network uses a standard Quicksort for CPUs and a custom Bitonic Sort for GPUs. These two algorithms were the fastest in a number of prior benchmarks.
cpu cuda gpu hybrid mpi network
Last synced: 29 Apr 2026
https://github.com/capelliexp/sc2-im-pf-pathfinding-thesis
Master of science thesis project. Using CUDA to utilize a systems GPU to create pathfinding data (IM+PF), usable by multiple agents in the same environment.
ai cplusplus cuda gpgpu pathfinding starcraft2
Last synced: 15 May 2026
https://github.com/grakshith/parallel-k-means
K-Means clustering for Image Colour Quantization and Image Compression
cuda image-color-quantization image-compression k-means mpi opencv openmp
Last synced: 28 Apr 2026
https://github.com/neoblizz/spmv
Efficient Sparse Matrix-Vector Multiplication (SpMV) using ModernGPU (MTX + CSR formats).
csr cuda gpgpu load-balancing mtx spmv
Last synced: 28 Apr 2026
https://github.com/andreimoraru123/contextcollector
Mixed vision-language Attention Model that gets better by making mistakes
attention attention-mechanism coco-api computer-vision cuda cudnn image-captioning lstm mscoco-dataset multimodal-deep-learning natural-language-processing object-detection opencv pytorch resnet show-and-tell show-attend-and-tell video-inference vision-language yolo
Last synced: 11 Apr 2026
https://github.com/terrylindev/image-to-ASCII
🖼️ A command-line tool for converting images to ASCII art
ascii ascii-art cli command-line cpp cuda docker image-processing image-to-ascii mpi opencv terminal
Last synced: 12 Jul 2025
https://github.com/neoblizz/cupti-plus-plus
CUPTI++ is a C++ interface to the CUDA Profiling Tools Interface (CUPTI).
cpp cuda cuda-profiler cupti profiler
Last synced: 26 Apr 2026
https://github.com/dark-art108/artistic-style-transfer-cnn
cnn-architecture colab-notebooks cuda pil vgg19
Last synced: 01 Mar 2025
https://github.com/tiw302/mandelbrot-c
A simple Mandelbrot set explorer written in C. Crafted with SDL2 and multithreaded rendering for a smooth experience. ‹(•_•)›
c cuda fractal graphics mandelbrot multithreading sdl2 web webassembly
Last synced: 26 Apr 2026
https://github.com/lchsk/ney
A header-only parallel functions library for Intel Xeon/Xeon Phi/GPUs
cuda gpu linux parallel phi scientific xeon xeonphi
Last synced: 07 May 2026
https://github.com/csvancea/gpu-hashtable
GPU-backed linear-probing hash table implemented in CUDA. Supports batch operations such as insert and retrieval.
Last synced: 24 Apr 2026
https://github.com/teodutu/asc
Arhitectura Sistemelor de Calcul - UPB 2020
cache-optimization cuda parallel-programming profiling python-threading
Last synced: 24 Apr 2026
https://github.com/geekysuavo/gpufield
A CUDA-accelerated electromagnetostatics solver
cuda magnetic-fields magnetostatics
Last synced: 24 Dec 2025
https://github.com/kim-hwiwon/T-espresso
A CUDA Library for Low-overhead Host-to-Device Transmission of Patterned Profile Data
Last synced: 10 Apr 2025
https://github.com/kishore-narendran/eecs221-highperformancecomputing
Assignments done during the graduate course EECS 221 - Introduction to HPC that I took in the Spring Quarter of 2016 at University of California, Irvine. Involves assignments that use OpenMP, MPI and CUDA.
Last synced: 17 May 2026
https://github.com/amruthapatil/nyu-cudamatrixoperations
Optimizing CUDA programs for vector addition and matrix multiplication
cuda high-performance-computing
Last synced: 21 May 2026
https://github.com/markdtw/parallel-programming
Basic Pthread, OpenMP, CUDA examples
cuda openmp parallel-programming pthreads
Last synced: 20 Apr 2026
https://github.com/peri044/cuda
GPU implementations of algorithms
cuda gauss-jordan parallel-programming
Last synced: 14 Jul 2025
https://github.com/kohulan/tensorflow-2.0-installation-with-cuda-support
A detailed step by step guide to install Tensorflow-2.0-gpu with CUDA Drivers on Ubuntu Server/ Desktop LTS
Last synced: 07 May 2025
https://github.com/pothosware/pothosgpu
Pothos toolkit for ArrayFire API support
arrayfire cuda dataflow dataflow-programming gpu opencl pothos
Last synced: 19 Apr 2026
https://github.com/shikha-code36/cuda-programming-beginner-guide
A beginner's guide to CUDA programming
cuda cuda-basic cuda-basics cuda-cpp cuda-demo cuda-kernel cuda-kernels cuda-library cuda-programming cuda-support cuda-toolkit
Last synced: 05 Jan 2026
https://github.com/trick-17/backends
Interchangeable backends in C++, OpenMP, CUDA, OpenCL, OpenACC
c-plus-plus cross-platform cuda cuda-backend header-only openacc openacc-backend opencl opencl-backend openmp openmp-backend
Last synced: 11 Apr 2026
https://github.com/l30nardosv/reproduce-parcosi-moleculardocking
Reproducing paper: "Benchmarking the Performance of Irregular Computations in AutoDock-GPU Molecular Docking"
autodock-gpu cpu cuda gpu molecular-docking molecular-docking-scripts opencl paper reproducible-research
Last synced: 16 Feb 2026
https://github.com/avitase/fast_frechet
Comparison of different (fast) discrete Fréchet distance implementations in C++ and CUDA.
benchmark cpp cuda frechet-distance simd
Last synced: 18 May 2026
https://github.com/prithivsakthiur/vlm-parsing
VLM-Parsing is a Gradio-based web application for parsing documents and images into structured HTML and Markdown formats using advanced Vision Language Models (VLMs).
cuda gradio html huggingface-models huggingface-spaces huggingface-transformers logics markdown ocr-recognition pytorch qwen2-5-vl spaces vlm
Last synced: 05 Apr 2026
https://github.com/kilamper/matrix-multiplication
AC - Matrix multiplication using OpenMP, MPI and CUDA
Last synced: 16 May 2026
https://github.com/mirzaim/cuda-devcontainer
CUDA Development Container
cuda devcontainer devcontainers docker remote-development
Last synced: 23 Apr 2025
https://github.com/projectcontinuum/continuum-feature-ai
AI and ML features for continuum
ai continuum continuum-feature cuda llm ml mlops pytourch unsloth
Last synced: 04 Apr 2026
https://github.com/artain-ai/ignite-ms
Fast self-hosted embedding engine for search, RAG, and reindexing workloads on NVIDIA GPUs. Built in Rust + TensorRT for teams that care about scale, cost, and control.
batch-inference batch-processing cuda embeddings gpu high-performance huggingface machine-learning multi-gpu nlp rag rust self-hosted semantic-search tensorrt text-embeddings vector-search
Last synced: 04 Jun 2026
https://github.com/szaghi/adam
Multi-physics AMR SDK and apps for High Performance Computing — from laptop to exascale device-accelerated superpc
amr cfd cuda fluid-dynamics fortran gas-dynamics hpc hydro-dynamics mpi openacc openmp plasma-dynamics
Last synced: 04 Apr 2026
https://github.com/matthias-fauconneau/combustion
Reaction rates and transport properties
ast cantera chemistry code-generation combustion compute cranelift cuda cvode interpreter ir rates reaction spirv transport vulkan
Last synced: 04 Apr 2026
https://github.com/agalue/sherpa-voice-assistant
Local AI-based voice assistant implemented using Sherpa, Whisper, Kokoro, and Ollama
coreml cuda golang kokoro-tts linux macos ollama onnx-runtime rust sherpa whisper-ai
Last synced: 04 Apr 2026