CUDA
CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.
- GitHub: https://github.com/topics/cuda
- Wikipedia: https://en.wikipedia.org/wiki/CUDA
- Created by: Nvidia
- Released: June 23, 2007
- Related Topics: nvcc,
- Last updated: 2026-06-23 00:07:15 UTC
- JSON Representation
https://github.com/firaja/parallel-floydwarshall
Various parallel implementations of Floyd-Warshall algorithm
algorithms c cuda distributed-computing floyd-warshall gpu-computing mpi multiprocessing openmp parallel-computing parallel-programming
Last synced: 16 Apr 2026
https://github.com/redhat-na-ssa/gpu-workshop
Using GPUs on Red Hat Platforms
Last synced: 30 Jul 2025
https://github.com/definetlynotai/llm_data
A bunch of very famous repos source code's in python as pure localdocs all in this repo to train CODE AI
c code-examples cpp cuda data data-dum jupyter-notebook llm llm-code llm-datasets programming-data programming-data-sets python3
Last synced: 08 Oct 2025
https://github.com/hiway-media/ffmpeg-nvenc-static
FFmpeg supports NVENC encoding
cuda ffmpeg ffmpeg-cuda ffmpeg-nvenc nvidia-gpu
Last synced: 11 Apr 2026
https://github.com/nolmoonen/cuda-sdf
CUDA-accelerated path traced Menger sponge using ray marching.
cuda menger path-tracer ray-marching sdf
Last synced: 12 Feb 2026
https://github.com/prince781/libgpublas
Drop-in GPU acceleration for linear algebra.
blas blas-kernels c cblas clblas cuda gpu gpu-acceleration hpc interposition linear-algebra nvidia opencl
Last synced: 29 Apr 2026
https://github.com/zhihu/ZhiLight
A highly optimized inference acceleration engine for Llama and its variants.
cuda gpt inference-engine llama llm llm-serving pytorch
Last synced: 12 Aug 2025
https://github.com/yosh-matsuda/gpu-array
Maximum GPU performance with Modern C++ syntax. RAII and Range-based abstraction to GPU memory management and data layouts, enabling code safety and performance optimization with zero overhead.
cpp cpp20 cuda gpu header-only hip
Last synced: 08 Jun 2026
https://github.com/tristanpenman/cuda-examples
A collection of CUDA example code
Last synced: 10 Apr 2025
https://github.com/shmishtopher/cudnn-versions
A scoop bucket for installing NVIDIA cuDNN versions.
cuda cudnn scoop scoop-apps scoop-bucket
Last synced: 01 May 2026
https://github.com/sanastasiou/dictation-service
GPU-accelerated speech-to-text service that types what you say, powered by OpenAI's Whisper AI
accessibility cuda dictation gpu-acceleration linux openai-whisper productivity python pytorch speech-recognition speech-to-text transcription voice-to-text voice-typing whisper
Last synced: 08 Apr 2026
https://github.com/yunzhu-li/recognizer
An object recognizer mobile app based on deep convolutional neural networks
cnn cuda cudnn gpu ios python swift tensorflow
Last synced: 20 Apr 2026
https://github.com/shalithasuranga/cudaperformance
Compare the performance of matrix multiplication among GPU shared memory, GPU global memory and CPU
cuda cuda-demo matrix-multiplication nvidia
Last synced: 21 Jan 2026
https://github.com/3zrv/raytracerincpp
A ray tracer that renders in 16-color VGA palette at 640x480 resolution.
Last synced: 18 May 2026
https://github.com/lzyrapx/llm-grandmaster-notes
🎓The path to LLM mastery is paved with broken embeddings and resurrected gradients.
cuda deep-learning llm reinforcement-learning
Last synced: 14 May 2025
https://github.com/amypad/numcu
Numerical CUDA-based Python library
array buffer c cpp cpython cpython-api cpython-extensions cuda cxx hacktoberfest numpy python vector
Last synced: 29 Jun 2025
https://github.com/lmlsna/install-scripts
Ubuntu install scripts
cuda do-release-upgrade eol nvidia tailscale ubuntu
Last synced: 18 Jul 2025
https://github.com/qin-yu/julia-svm-gpu-cuda
2019 [Julia] GPU CUDAnative SVM: a stochastic decomposition implementation of support-vector machine training
cpp cuda cuda-programming gpu gpu-computing gpu-programming julia julia-language julia-package machine-learning machine-learning-algorithms machine-learning-library online-learning supervised-learning svm svm-classifier svm-learning svm-library svm-model svm-training
Last synced: 12 Apr 2026
https://github.com/kishore-narendran/eecs221-highperformancecomputing
Assignments done during the graduate course EECS 221 - Introduction to HPC that I took in the Spring Quarter of 2016 at University of California, Irvine. Involves assignments that use OpenMP, MPI and CUDA.
Last synced: 17 May 2026
https://github.com/copperfr/blendervxkex
Windows 7 CUDA & OptiX support for Blender 4.x
blender cuda cycles-renderer optix vxkex windows-7
Last synced: 20 Jan 2026
https://github.com/pnocera/cembedd
Embeddings rust API serving intfloat/multilingual-e5-large using huggingface/candle with CUDA enabled
Last synced: 12 Jan 2026
https://github.com/murrellgroup/conflux.jl
Single-node data parallelism in Julia with CUDA
cuda data-parallelism flux julia nccl
Last synced: 22 May 2026
https://github.com/yosh-matsuda/gpu-ptr
Cross-platform GPU smart pointer with C++20 range support
cpp cpp20 cuda gpu header-only hip
Last synced: 17 Jan 2026
https://github.com/scarfy-sysu/rtx5060-pytorch-cuda129
Run PyTorch with CUDA 12.9 on RTX 50 series (e.g. RTX 5060)
cuda deep-learning pytorch rtx5060
Last synced: 20 Jul 2025
https://github.com/dujonwalker/nixos-config-x86_64-cuda
This repository contains my NixOS configuration optimized for 64-bit x86 systems with NVIDIA CUDA support, featuring a Plasma 6 desktop environment and a variety of essential applications for development, multimedia, and productivity. It serves as a backup for easy restoration and setup on new installations.
cuda flatpak nix nixos nixos-configuration ollama
Last synced: 17 Jan 2026
https://github.com/boltzmannentropy/vllm-5090
vLLM-5090: Docker Container for RTX 5090 on WSL2/Windows
Last synced: 08 Oct 2025
https://github.com/peri044/cuda
GPU implementations of algorithms
cuda gauss-jordan parallel-programming
Last synced: 14 Jul 2025
https://github.com/toxy4ny/artaxerxes
Artaxerxes - Adaptive High-Performance Stress Tester v.1.0. Rebuild old version Xerxes DDoS. Supports GPU+io_uring, DPDK, eBPF/XDP with intelligent fallbacks. Educational tool for advanced cybersecurity labs
cuda cuda-programming cybersecurity cybersecurity-education cybersecurity-tools dpdk ebpf educational high-performance network-security network-security-tool penetration-testing penetration-testing-framework penetration-testing-tools security-tools stress-testing
Last synced: 08 Oct 2025
https://github.com/capelliexp/sc2-im-pf-pathfinding-thesis
Master of science thesis project. Using CUDA to utilize a systems GPU to create pathfinding data (IM+PF), usable by multiple agents in the same environment.
ai cplusplus cuda gpgpu pathfinding starcraft2
Last synced: 15 May 2026
https://github.com/l30nardosv/reproduce-parcosi-moleculardocking
Reproducing paper: "Benchmarking the Performance of Irregular Computations in AutoDock-GPU Molecular Docking"
autodock-gpu cpu cuda gpu molecular-docking molecular-docking-scripts opencl paper reproducible-research
Last synced: 16 Feb 2026
https://github.com/terrylindev/image-to-ASCII
🖼️ A command-line tool for converting images to ASCII art
ascii ascii-art cli command-line cpp cuda docker image-processing image-to-ascii mpi opencv terminal
Last synced: 12 Jul 2025
https://github.com/brocbyte/realtime-deformations
Snow simulation (Material Point Method)
cuda glm material-point-method opengl
Last synced: 10 Aug 2025
https://github.com/hadv/vaneth
GPU-accelerated CREATE2 vanity address miner for Ethereum
create2-contract-deployment cuda ethereum gpu gpu-acceleration gpu-programming open-cl vanity-address
Last synced: 21 Jan 2026
https://github.com/kim-hwiwon/T-espresso
A CUDA Library for Low-overhead Host-to-Device Transmission of Patterned Profile Data
Last synced: 10 Apr 2025
https://github.com/lintenn/cudaaddvectors-explicit-vs-unified-memory
Performance comparison of two different forms of memory management in CUDA
c cuda explicit memory memory-management performance unified-memory
Last synced: 17 May 2026
https://github.com/kohulan/tensorflow-2.0-installation-with-cuda-support
A detailed step by step guide to install Tensorflow-2.0-gpu with CUDA Drivers on Ubuntu Server/ Desktop LTS
Last synced: 07 May 2025
https://github.com/shreyansh26/mlsys-experiments
A collection of scripts on experimenting and implementing MLSys-related stuff
cuda cuda-kernel gpu gpu-programming llm-inference profiling pytorch triton
Last synced: 30 Aug 2025
https://github.com/kilamper/matrix-multiplication
AC - Matrix multiplication using OpenMP, MPI and CUDA
Last synced: 16 May 2026
https://github.com/bdwhst/fluora
A CUDA PBR path tracer
cpp cuda pathtracing pbr rendering
Last synced: 13 Feb 2026
https://github.com/andreimoraru123/contextcollector
Mixed vision-language Attention Model that gets better by making mistakes
attention attention-mechanism coco-api computer-vision cuda cudnn image-captioning lstm mscoco-dataset multimodal-deep-learning natural-language-processing object-detection opencv pytorch resnet show-and-tell show-attend-and-tell video-inference vision-language yolo
Last synced: 11 Apr 2026
https://github.com/shikha-code36/cuda-programming-beginner-guide
A beginner's guide to CUDA programming
cuda cuda-basic cuda-basics cuda-cpp cuda-demo cuda-kernel cuda-kernels cuda-library cuda-programming cuda-support cuda-toolkit
Last synced: 05 Jan 2026
https://github.com/nekon69/fastnoiselitecuda
A wrapper around C++ FastNoiseLite library for CUDA
cellular-noise computer-graphics cpp cuda fastnoiselite gamedev generative-art gpgpu gpu header-only noise opensimplex2-noise pcg perlin-noise procedural-generation simplex-noise terrain-generation texture-generation worley-noise
Last synced: 02 Oct 2025
https://github.com/geekysuavo/gpufield
A CUDA-accelerated electromagnetostatics solver
cuda magnetic-fields magnetostatics
Last synced: 24 Dec 2025
https://github.com/trick-17/backends
Interchangeable backends in C++, OpenMP, CUDA, OpenCL, OpenACC
c-plus-plus cross-platform cuda cuda-backend header-only openacc openacc-backend opencl opencl-backend openmp openmp-backend
Last synced: 11 Apr 2026
https://github.com/amruthapatil/nyu-cudamatrixoperations
Optimizing CUDA programs for vector addition and matrix multiplication
cuda high-performance-computing
Last synced: 21 May 2026
https://github.com/tawssie/zmpy3d_pt
Python implementation of 3D Zernike moments with PyTorch
3d-zernike cuda gpu protein-structure python pytorch structural-bioinformatics superposition zernike-moments
Last synced: 24 Oct 2025
https://github.com/trahay/mpi-wattmeter
MPI-Wattmeter measures the power consumption of MPI programs
carbon-emissions cuda energy-consumption energy-monitor gpu hpc mpi
Last synced: 17 May 2026
https://github.com/infotrend-inc/ctpo-demo_projects
Jupyter Notebook examples using CTPO as their source container.
cuda opencv pytroch tensorflow2
Last synced: 14 Apr 2026
https://github.com/kagof/julia-image-processing
Image processing programs written in Julia
Last synced: 18 May 2026
https://github.com/lukasboettcher/msc-code
This is the repo for my master thesis on a GPU accelerated andersen analysis.
andersen-analysis clang cuda llvm static-analysis
Last synced: 16 Jan 2026
https://github.com/betarixm/cuecc
POSTECH: Heterogeneous Parallel Computing (Fall 2023)
cryptography ctypes cuda ecc postech secp256k1
Last synced: 12 May 2025
https://github.com/trilliwon/cuda-examples
CUDA examples
cuda gpu-computing nvidia-cuda parallel parallel-computing parallel-programming
Last synced: 25 Mar 2025
https://github.com/mazharuddin-mohammed/semidgfem
High-performance TCAD Simulator Using Discontinuous Galerkin FEM
cuda discontinuous-galerkin-method tcad tcad-device-simulator
Last synced: 15 Jun 2025
https://github.com/muhac/jupyter-pytorch-docker
JupyterLab for AI in Docker! Anaconda and PyTorch GPU supported.
conda-environment cuda docker jupyterlab pytorch
Last synced: 01 Oct 2025
https://github.com/seungjaelim/cuda.tutorial
References content from the OLCF CUDA Training Series. (https://github.com/olcf/cuda-training-series)
cuda gpu-programming nsight-compute nsight-systems
Last synced: 07 Feb 2026
https://github.com/elftausend/sliced
Array operations with automatic differentiation on CPU and GPU
autograd automatic-differentiation cuda custos matrix opencl
Last synced: 31 Jan 2026
https://github.com/gjbex/gpu-programming
Material for a training on portable GPU programming
cuda gpu kokkos openmp openmp-off stl thrust
Last synced: 08 Feb 2026
https://github.com/dpbm/qml-course
Minicurso de quantum Machine learning
cuda cuda-q cuquantum docker ml python qml quantum quantum-computing tensorflow
Last synced: 31 Jan 2026
https://github.com/frozenassassine/neuralnetwork-fromscratch
Neural Network from scratch in C# with CUDA support
ai classification csharp cuda gpu gpu-acceleration neural-network neural-networks nvidia
Last synced: 20 Feb 2026
https://github.com/acrlakshman/gradient-augmented-levelset-cuda
Implementation of Gradient Augmented Levelset method for CPU and GPU
Last synced: 17 Feb 2026
https://github.com/mr-technologies/imagefiltercpp
Example of custom image filter for MRTech IFF C++ SDK
camera cpp cuda demosaicing dng genicam gpu h264 h265 image-processing jetson json low-latency machine-vision mipi rest-api rtsp sdk tiff vulkan
Last synced: 26 Feb 2026
https://github.com/fattorib/thunderkittens-simple-gemm
Simple Tensorcore GEMM in ThunderKittens
Last synced: 09 Feb 2026
https://github.com/hanzhi713/bitonic-sort
In-place GPU sort with bitonic sort
bitonic-sort cuda gpu in-place sorting
Last synced: 09 Feb 2026
https://github.com/xkevio/cuda-raytracer
A simple ray tracer written with CUDA that saves its output in a .ppm file, CPU version included for reference.
Last synced: 25 Aug 2025
https://github.com/andreasholt/cusmc
A CUDA-accelerated Statistical Model Checker for Stochastic Timed Automata
Last synced: 11 Feb 2026
https://github.com/tthebc01/cudaconda3
Lightweight container environment with Cuda, Miniconda3, and Jupyter Lab.
cuda docker gpu jupyterlab marimo-notebook miniconda3 reverse-proxy-application
Last synced: 11 Feb 2026
https://github.com/dark-art108/artistic-style-transfer-cnn
cnn-architecture colab-notebooks cuda pil vgg19
Last synced: 01 Mar 2025
https://github.com/dzimiks/cuda-matrix-multiplication
CUDA Matrix Multiplication
cuda matrix matrix-multiplication python
Last synced: 16 Apr 2026
https://github.com/yingding/applyllm
A python package for applying LLM with LangChain and Hugging Face on local CUDA/MPS host
accelerator batch cuda framework inference kubeflow langchain llm mps pipeline slurm transformers
Last synced: 24 Aug 2025
https://github.com/andreabak/whispersubs
Generate subtitles for your video or audio files using the power of AI
ai cuda deep-learning gpu-acceleration machine-learning srt subtitles transcribe transcription translate whisper
Last synced: 15 Feb 2026
https://github.com/rogerallen/jmandelbrotr
Java CUDA Mandelbrot explorer
cuda cuda-opengl java jcuda joml lwjgl3 mandelbrot-viewer opengl
Last synced: 18 Apr 2026
https://github.com/alpinebuster/arkime-docker-compose
Deploy Arkime with GPU-accelerated Rust/Python parsers and custom plugins using Docker Compose.
arkime c cuda deep-neural-networks docker docker-compose llm machine-learning networking pcap pcapng python rust traffic-analysis
Last synced: 16 Apr 2026
https://github.com/lchsk/ney
A header-only parallel functions library for Intel Xeon/Xeon Phi/GPUs
cuda gpu linux parallel phi scientific xeon xeonphi
Last synced: 07 May 2026
https://github.com/zeloe/juce_cuda_convolution
GPU acceleration for efficient, high-quality audio processing.
audio audio-processing convolution cuda dsp juce
Last synced: 03 Mar 2026
https://github.com/mirzaim/cuda-devcontainer
CUDA Development Container
cuda devcontainer devcontainers docker remote-development
Last synced: 23 Apr 2025
https://github.com/digimortl/libguess
Patches that give Bitcoin Core an ability of CUDA mining
bitcoin c-plus-plus cryptocurrency cuda
Last synced: 16 Apr 2026
https://github.com/orlandopalmeira/trabalho-cp-2023-2024
Repositório do trabalho prático no âmbito da UC de Computação Paralela (CP) - Mestrado em Engenharia Informática (MEI/MIEI) - Universidade do Minho (UMinho)
computacao-paralela cp cuda cuda-programming mei miei nvidia nvidia-cuda openmp optimization optimization-problem parallelism performance uminho uminho-mei uminho-miei
Last synced: 18 May 2026
https://github.com/alexjmercer/fractal-art
Generating Fractals in C++ using SFML. For the ultimate visual stimulation and in-depth code!
cmake cmakelists cpp20 cuda cuda-programming fractal-rendering graphics mandelbrot multithreading sfml2
Last synced: 05 Mar 2026
https://github.com/openspeedshop/cbtf-argonavis-gui
Baseline for next generation Open|SpeedShop Graphical User Interface (GUI). The primary focus of this GUI will be the processing and display of CUDA collector performance data. However, there will be refactoring phases to adopt the GUI to support the processing and display of any collector performance data.
cuda performance profiler profiling
Last synced: 18 Apr 2026
https://github.com/cfries/javagpuexperiments
Repository used to demo OpenCL, JOCL, JCuda.
Last synced: 25 Apr 2026
https://github.com/droduit/multiprocessor-architecture
Introduction to Multiprocessor Architecture @ EPFL
cuda multiprocessor multithreading openmp-parallelization
Last synced: 17 Apr 2026
https://github.com/matthewfeickert/cuda-tf-torch
An Ubuntu 18.04 NVIDIA Docker image with CUDA 10.1 CuDNN 7 with TensorFlow and PyTorch
cuda cuda-101 cudnn cudnn-v7 docker docker-image gpu nvidia-docker nvidia-gpu pytorch tensorflow torch
Last synced: 07 Jan 2026
https://github.com/xiongsp/pytorch-docker
Pure Pytorch Docker Images. Support almost all combinations of Pytorch, Python, Ubuntu, CentOS, and CUDA. 纯净的Pytorch镜像,支持几乎各种Pytorch、Python、Ubuntu、CentOS、CUDA版本的组合。
centos cuda docker docker-image python3 pytorch ubuntu
Last synced: 17 Apr 2026
https://github.com/babak2/optimizedsum
Optimized Parallel Sum program demonstrating CPU vs GPU performance
cuda cuda-programming gpu-acceleration gpu-computing gpu-parallelism visual-studio
Last synced: 27 Mar 2025
https://github.com/agalue/sherpa-voice-assistant
Local AI-based voice assistant implemented using Sherpa, Whisper, Kokoro, and Ollama
coreml cuda golang kokoro-tts linux macos ollama onnx-runtime rust sherpa whisper-ai
Last synced: 04 Apr 2026
https://github.com/matthias-fauconneau/combustion
Reaction rates and transport properties
ast cantera chemistry code-generation combustion compute cranelift cuda cvode interpreter ir rates reaction spirv transport vulkan
Last synced: 04 Apr 2026
https://github.com/djenriquez/ewbf-cuda-miner
Run ewbf-miner for zcash
cuda docker mining nvidia nvidia-docker zcash zcl zclassic
Last synced: 17 May 2026
https://github.com/szaghi/adam
Multi-physics AMR SDK and apps for High Performance Computing — from laptop to exascale device-accelerated superpc
amr cfd cuda fluid-dynamics fortran gas-dynamics hpc hydro-dynamics mpi openacc openmp plasma-dynamics
Last synced: 04 Apr 2026
https://github.com/artain-ai/ignite-ms
Fast self-hosted embedding engine for search, RAG, and reindexing workloads on NVIDIA GPUs. Built in Rust + TensorRT for teams that care about scale, cost, and control.
batch-inference batch-processing cuda embeddings gpu high-performance huggingface machine-learning multi-gpu nlp rag rust self-hosted semantic-search tensorrt text-embeddings vector-search
Last synced: 04 Jun 2026
https://github.com/projectcontinuum/continuum-feature-ai
AI and ML features for continuum
ai continuum continuum-feature cuda llm ml mlops pytourch unsloth
Last synced: 04 Apr 2026
https://github.com/juntyr/necsim-rust
Spatially explicit biodiversity simulations using a parallel library written in Rust
biodiversity cuda mpi necsim rust simulation
Last synced: 22 Mar 2025