CUDA
CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.
- GitHub: https://github.com/topics/cuda
- Wikipedia: https://en.wikipedia.org/wiki/CUDA
- Created by: Nvidia
- Released: June 23, 2007
- Related Topics: nvcc,
- Last updated: 2026-06-30 00:07:24 UTC
- JSON Representation
https://github.com/tristanpenman/cuda-examples
A collection of CUDA example code
Last synced: 10 Apr 2025
https://github.com/usegalaxy-eu/ansible-cuda
Ansible role to install the CUDA toolkit as described in the NVIDIA CUDA Installation Guide in a Redhat/CentOS system.
Last synced: 17 Jan 2026
https://github.com/phrb/nvidia-workshop-autotuning
Resources for autotuning CUDA compiler parameters
autotuning compilers cuda gpu julia nodal nvcc
Last synced: 03 May 2026
https://github.com/lawmurray/gpu-gemm
CUDA kernel for matrix-matrix multiplication on Nvidia GPUs, using a Hilbert curve to improve L2 cache utilization.
cplusplus cuda cuda-kernels cuda-programming gpu gpu-computing gpu-programming matrix-multiplication numerical-methods scientific-computing
Last synced: 01 Mar 2026
https://github.com/meetps/me-766
Assignment Solutions to course ME766 High Performance Scientific Computing.
cuda gpu-computing opencl openmp parallel-computing
Last synced: 18 May 2026
https://github.com/zhangge6/how-to-optimize-playground
High-performance computing (HPC) demos since I was a freshmen.
Last synced: 15 May 2026
https://github.com/jedbrooke/cuda_bwt
CUDA accelerated burrows-wheeler transform
bioinformatics burrows-wheeler-transform bwt compression cuda
Last synced: 19 May 2026
https://github.com/rocm/rocmds-cmake
This is a collection of CMake modules that are useful for all ROCm-DS projects. By sharing the code in a single place it makes rolling out CMake fixes easier.
amd cmake cuda hip radeon-instinct-mi-series rocm
Last synced: 10 Apr 2025
https://github.com/simmsb/p4haskell
P4 backend in haskell
compiler cuda gpu p4 p4c p4language
Last synced: 13 May 2026
https://github.com/misha-kis/python-plane-ransac
Parallel RANSAC for plane detection for multiple point clouds using Python and CUDA
cuda numba plane-detection python ransac
Last synced: 13 May 2026
https://github.com/deftruth/ptx-isa-8.2-zh
🎉持续更新:CUDA 12.2 PTX-ISA-8.2学习笔记,部分中文翻译 + 个人理解 + 内联汇编示例,讲解CUDA 12.2 PTX-ISA-8.2 汇编指令;进行中.....
Last synced: 13 May 2026
https://github.com/marcoplaitano/counting-sort-cuda
Parallelized version of Counting Sort using CUDA
counting-sort cuda cuda-kernels cuda-programming gpu gpu-programming sort sorting sorting-algorithms
Last synced: 14 May 2026
https://github.com/lordmathis/cudanet
Convolutional Neural Network inference library running on CUDA
convolutional-neural-networks cpp cuda pytorch
Last synced: 08 May 2026
https://github.com/ran-2012/inversion
solve geophysics using CUDA & TensorFlow
cpp cuda geophysics inversion-method python
Last synced: 11 May 2026
https://github.com/pd2871/high-performance-computing
This repo contain the logs of High Performance Computing module's final Assignment
blurred-images c cuda gaussian-blur matrix-multiplication multi-threading parallel-computing pthreads pthreads-api
Last synced: 10 May 2026
https://github.com/mrglaster/cuda-acfcalc
Calculation of the smallest ACF for signals of length N using CUDA technology.
acf c calculations cpp cuda google-colaboratory google-colaboratory-notebooks isu
Last synced: 06 May 2026
https://github.com/nachovizzo/saxpy_openacc_cpp
My way of thinking about OpenACC, C++, and Parallel computing in general
Last synced: 23 Jun 2026
https://github.com/tank3-tk3/parallel-processing-cuda
Parallel processing with CUDA C / C++
c cpp cuda parallel-computing parallel-programming
Last synced: 09 May 2026
https://github.com/tky823/bitlinear158compression
Compare compression models for inference by BitLinear158
Last synced: 12 Jun 2026
https://github.com/dereklstinson/nccl
golang wrapper for nccl
cuda deep-learning go nccl parallel-computing
Last synced: 14 May 2026
https://github.com/kar-dim/watermarking-gpu
Code for my Diploma thesis at Information and Communication Systems Engineering (University of the Aegean, School of Engineering) with title "Efficient implementation of watermark and watermark detection algorithms for image and video using the graphics processing unit". Part 2 / GPU
arrayfire cpp cuda ffmpeg gpu image-processing opencl parallel-computing video-processing watermark-image watermarking
Last synced: 09 Apr 2025
https://github.com/openspeedshop/cbtf-argonavis-gui
Baseline for next generation Open|SpeedShop Graphical User Interface (GUI). The primary focus of this GUI will be the processing and display of CUDA collector performance data. However, there will be refactoring phases to adopt the GUI to support the processing and display of any collector performance data.
cuda performance profiler profiling
Last synced: 18 Apr 2026
https://github.com/trahay/mpi-wattmeter
MPI-Wattmeter measures the power consumption of MPI programs
carbon-emissions cuda energy-consumption energy-monitor gpu hpc mpi
Last synced: 17 May 2026
https://github.com/dhruvsrikanth/cudann
A distributed implementation of a deep learning framework in CUDA.
cpp cuda deep-learning deep-learning-framework gpu-programming high-performance-computing hpc parallel-programming
Last synced: 01 May 2026
https://github.com/cfries/javagpuexperiments
Repository used to demo OpenCL, JOCL, JCuda.
Last synced: 25 Apr 2026
https://github.com/qin-yu/julia-svm-gpu-cuda
2019 [Julia] GPU CUDAnative SVM: a stochastic decomposition implementation of support-vector machine training
cpp cuda cuda-programming gpu gpu-computing gpu-programming julia julia-language julia-package machine-learning machine-learning-algorithms machine-learning-library online-learning supervised-learning svm svm-classifier svm-learning svm-library svm-model svm-training
Last synced: 12 Apr 2026
https://github.com/mulx10/firefly
Enhancing Object Detection in using Thermal Imaging for thin cross-section unidentifiable objects(eg. cyclist, pedestrians).
autonomous-cars autonomous-navigation autonomous-vehicles c cuda object-detection thermal-camera yolov3
Last synced: 03 Sep 2025
https://github.com/kim-hwiwon/T-espresso
A CUDA Library for Low-overhead Host-to-Device Transmission of Patterned Profile Data
Last synced: 10 Apr 2025
https://github.com/droduit/multiprocessor-architecture
Introduction to Multiprocessor Architecture @ EPFL
cuda multiprocessor multithreading openmp-parallelization
Last synced: 17 Apr 2026
https://github.com/programmer-rd-ai/detectx
A Pythonic approach to object detection using Detectron2, a clean, modular framework for training and deploying computer vision models. DetectX simplifies the complexity of object detection while maintaining high performance and extensibility.
coco-dataset computer-vision computer-vision-library cuda deep-learning detectron2 faster-rcnn gpu-accelerated machine-learning ml-framework object-detection object-recognition python3 pytorch retinanet
Last synced: 10 Jun 2025
https://github.com/copperfr/blendervxkex
Windows 7 CUDA & OptiX support for Blender 4.x
blender cuda cycles-renderer optix vxkex windows-7
Last synced: 20 Jan 2026
https://github.com/dark-art108/artistic-style-transfer-cnn
cnn-architecture colab-notebooks cuda pil vgg19
Last synced: 01 Mar 2025
https://github.com/bogdanminko/laperf
La Perf is a framework for AI performance benchmarking — covering LLMs, VLMs, embeddings, with power-metrics collection.
ai-benchmark ai-performance apple-silicon cuda lmstudio ml-benchmark mlx mps nvidia-gpu ollama open-source-benchmark
Last synced: 15 May 2026
https://github.com/superlinear-ai/scipy-notebook-gpu
jupyter/scipy-notebook with CUDA Toolkit, cuDNN, NCCL, and TensorRT
cuda cudnn docker nccl scipy-notebook tensorflow tensorrt
Last synced: 01 May 2026
https://github.com/galaxies99/inception-cuda
CUDA Implementation of Inception
Last synced: 12 Apr 2025
https://github.com/xiongsp/pytorch-docker
Pure Pytorch Docker Images. Support almost all combinations of Pytorch, Python, Ubuntu, CentOS, and CUDA. 纯净的Pytorch镜像,支持几乎各种Pytorch、Python、Ubuntu、CentOS、CUDA版本的组合。
centos cuda docker docker-image python3 pytorch ubuntu
Last synced: 17 Apr 2026
https://github.com/dpbm/qml-course
Minicurso de quantum Machine learning
cuda cuda-q cuquantum docker ml python qml quantum quantum-computing tensorflow
Last synced: 31 Jan 2026
https://github.com/peri044/cuda
GPU implementations of algorithms
cuda gauss-jordan parallel-programming
Last synced: 14 Jul 2025
https://github.com/infotrend-inc/ctpo-demo_projects
Jupyter Notebook examples using CTPO as their source container.
cuda opencv pytroch tensorflow2
Last synced: 14 Apr 2026
https://github.com/toxy4ny/artaxerxes
Artaxerxes - Adaptive High-Performance Stress Tester v.1.0. Rebuild old version Xerxes DDoS. Supports GPU+io_uring, DPDK, eBPF/XDP with intelligent fallbacks. Educational tool for advanced cybersecurity labs
cuda cuda-programming cybersecurity cybersecurity-education cybersecurity-tools dpdk ebpf educational high-performance network-security network-security-tool penetration-testing penetration-testing-framework penetration-testing-tools security-tools stress-testing
Last synced: 08 Oct 2025
https://github.com/dujonwalker/nixos-config-x86_64-cuda
This repository contains my NixOS configuration optimized for 64-bit x86 systems with NVIDIA CUDA support, featuring a Plasma 6 desktop environment and a variety of essential applications for development, multimedia, and productivity. It serves as a backup for easy restoration and setup on new installations.
cuda flatpak nix nixos nixos-configuration ollama
Last synced: 17 Jan 2026
https://github.com/LKohlhepp/Ito-Monte-Carlo
MC-Simulation of the Ito-SDE (Krülls 1994)
astronomy astrophysics cuda gpu-acceleration monte-carlo physics-simulation simulation stochastic-differential-equations
Last synced: 10 Mar 2025
https://github.com/lchsk/ney
A header-only parallel functions library for Intel Xeon/Xeon Phi/GPUs
cuda gpu linux parallel phi scientific xeon xeonphi
Last synced: 07 May 2026
https://github.com/hadv/vaneth
GPU-accelerated CREATE2 vanity address miner for Ethereum
create2-contract-deployment cuda ethereum gpu gpu-acceleration gpu-programming open-cl vanity-address
Last synced: 21 Jan 2026
https://github.com/andreimoraru123/contextcollector
Mixed vision-language Attention Model that gets better by making mistakes
attention attention-mechanism coco-api computer-vision cuda cudnn image-captioning lstm mscoco-dataset multimodal-deep-learning natural-language-processing object-detection opencv pytorch resnet show-and-tell show-attend-and-tell video-inference vision-language yolo
Last synced: 11 Apr 2026
https://github.com/agalue/sherpa-voice-assistant
Local AI-based voice assistant implemented using Sherpa, Whisper, Kokoro, and Ollama
coreml cuda golang kokoro-tts linux macos ollama onnx-runtime rust sherpa whisper-ai
Last synced: 04 Apr 2026
https://github.com/yingding/applyllm
A python package for applying LLM with LangChain and Hugging Face on local CUDA/MPS host
accelerator batch cuda framework inference kubeflow langchain llm mps pipeline slurm transformers
Last synced: 24 Aug 2025
https://github.com/tyler-hilbert/cuda-kmeans
K-Means in CUDA
cuda kmeans-clustering machine-learning nsight
Last synced: 30 Mar 2025
https://github.com/matthias-fauconneau/combustion
Reaction rates and transport properties
ast cantera chemistry code-generation combustion compute cranelift cuda cvode interpreter ir rates reaction spirv transport vulkan
Last synced: 04 Apr 2026
https://github.com/amypad/numcu
Numerical CUDA-based Python library
array buffer c cpp cpython cpython-api cpython-extensions cuda cxx hacktoberfest numpy python vector
Last synced: 29 Jun 2025
https://github.com/navdeep-g/dimreduce4gpu
Dimensionality reduction ("dimreduce") on GPUs ("4gpu")
cplusplus cuda dimensionality-reduction gpu linear-algebra pca python svd unsupervised-learning
Last synced: 14 Apr 2025
https://github.com/mu7annad0/100gpu
100 Days of CUDA: Optimizing My Life, One Kernel at a Time. 🔄🔥
Last synced: 08 Mar 2026
https://github.com/szaghi/adam
Multi-physics AMR SDK and apps for High Performance Computing — from laptop to exascale device-accelerated superpc
amr cfd cuda fluid-dynamics fortran gas-dynamics hpc hydro-dynamics mpi openacc openmp plasma-dynamics
Last synced: 04 Apr 2026
https://github.com/orlandopalmeira/trabalho-cp-2023-2024
Repositório do trabalho prático no âmbito da UC de Computação Paralela (CP) - Mestrado em Engenharia Informática (MEI/MIEI) - Universidade do Minho (UMinho)
computacao-paralela cp cuda cuda-programming mei miei nvidia nvidia-cuda openmp optimization optimization-problem parallelism performance uminho uminho-mei uminho-miei
Last synced: 18 May 2026
https://github.com/ginkgo-project/cudaarchitectureselector
A CMake module simplifying the specification of CUDA architectures
Last synced: 05 Nov 2025
https://github.com/babak2/optimizedsum
Optimized Parallel Sum program demonstrating CPU vs GPU performance
cuda cuda-programming gpu-acceleration gpu-computing gpu-parallelism visual-studio
Last synced: 27 Mar 2025
https://github.com/frozenassassine/neuralnetwork-fromscratch
Neural Network from scratch in C# with CUDA support
ai classification csharp cuda gpu gpu-acceleration neural-network neural-networks nvidia
Last synced: 20 Feb 2026
https://github.com/brocbyte/realtime-deformations
Snow simulation (Material Point Method)
cuda glm material-point-method opengl
Last synced: 10 Aug 2025
https://github.com/xkevio/cuda-raytracer
A simple ray tracer written with CUDA that saves its output in a .ppm file, CPU version included for reference.
Last synced: 25 Aug 2025
https://github.com/artain-ai/ignite-ms
Fast self-hosted embedding engine for search, RAG, and reindexing workloads on NVIDIA GPUs. Built in Rust + TensorRT for teams that care about scale, cost, and control.
batch-inference batch-processing cuda embeddings gpu high-performance huggingface machine-learning multi-gpu nlp rag rust self-hosted semantic-search tensorrt text-embeddings vector-search
Last synced: 04 Jun 2026
https://github.com/shikha-code36/cuda-programming-beginner-guide
A beginner's guide to CUDA programming
cuda cuda-basic cuda-basics cuda-cpp cuda-demo cuda-kernel cuda-kernels cuda-library cuda-programming cuda-support cuda-toolkit
Last synced: 05 Jan 2026
https://github.com/matthewfeickert/cuda-tf-torch
An Ubuntu 18.04 NVIDIA Docker image with CUDA 10.1 CuDNN 7 with TensorFlow and PyTorch
cuda cuda-101 cudnn cudnn-v7 docker docker-image gpu nvidia-docker nvidia-gpu pytorch tensorflow torch
Last synced: 07 Jan 2026
https://github.com/projectcontinuum/continuum-feature-ai
AI and ML features for continuum
ai continuum continuum-feature cuda llm ml mlops pytourch unsloth
Last synced: 04 Apr 2026
https://github.com/mr-technologies/imagefiltercpp
Example of custom image filter for MRTech IFF C++ SDK
camera cpp cuda demosaicing dng genicam gpu h264 h265 image-processing jetson json low-latency machine-vision mipi rest-api rtsp sdk tiff vulkan
Last synced: 26 Feb 2026
https://github.com/betarixm/cuecc
POSTECH: Heterogeneous Parallel Computing (Fall 2023)
cryptography ctypes cuda ecc postech secp256k1
Last synced: 12 May 2025
https://github.com/dito97/gol
High-performance Computing (90535) final project at UniGe
Last synced: 02 May 2026
https://github.com/kohulan/tensorflow-2.0-installation-with-cuda-support
A detailed step by step guide to install Tensorflow-2.0-gpu with CUDA Drivers on Ubuntu Server/ Desktop LTS
Last synced: 07 May 2025
https://github.com/podgorskiy/deeplearningserversetup
My notes on setting up a server for Deep-Learning
cuda deep-learning driver ethernet ipmi neural-network nfs notes nvidia nvidia-driver nvidia-gpu server sshfs ubuntu
Last synced: 22 Aug 2025
https://github.com/trilliwon/cuda-examples
CUDA examples
cuda gpu-computing nvidia-cuda parallel parallel-computing parallel-programming
Last synced: 25 Mar 2025
https://github.com/trick-17/backends
Interchangeable backends in C++, OpenMP, CUDA, OpenCL, OpenACC
c-plus-plus cross-platform cuda cuda-backend header-only openacc openacc-backend opencl opencl-backend openmp openmp-backend
Last synced: 11 Apr 2026
https://github.com/fattorib/thunderkittens-simple-gemm
Simple Tensorcore GEMM in ThunderKittens
Last synced: 09 Feb 2026
https://github.com/maliknaik16/parallel-computing
CUDA programming in C++ for high-performance computing using Nvidia GPUs, optimized for tasks like machine learning, or image processing
cores cpp cuda gpu makefile matrix nvcc optimization
Last synced: 10 Jun 2025
https://github.com/aiday-mar/mpi-cuda-project
Using MPI and CUDA in order to accelerate the conjugate gradient algorithm execution in C++
c-plus-plus cuda gpu mpi university-project
Last synced: 02 May 2026
https://github.com/hanzhi713/bitonic-sort
In-place GPU sort with bitonic sort
bitonic-sort cuda gpu in-place sorting
Last synced: 09 Feb 2026
https://github.com/prithivsakthiur/vlm-parsing
VLM-Parsing is a Gradio-based web application for parsing documents and images into structured HTML and Markdown formats using advanced Vision Language Models (VLMs).
cuda gradio html huggingface-models huggingface-spaces huggingface-transformers logics markdown ocr-recognition pytorch qwen2-5-vl spaces vlm
Last synced: 05 Apr 2026
https://github.com/yosh-matsuda/gpu-ptr
Cross-platform GPU smart pointer with C++20 range support
cpp cpp20 cuda gpu header-only hip
Last synced: 17 Jan 2026
https://github.com/pnocera/cembedd
Embeddings rust API serving intfloat/multilingual-e5-large using huggingface/candle with CUDA enabled
Last synced: 12 Jan 2026
https://github.com/kim-hwiwon/t-espresso
A CUDA Library for Low-overhead Host-to-Device Transmission of Patterned Profile Data
Last synced: 04 May 2026
https://github.com/pothosware/pothosgpu
Pothos toolkit for ArrayFire API support
arrayfire cuda dataflow dataflow-programming gpu opencl pothos
Last synced: 19 Apr 2026
https://github.com/pelayo-felgueroso/tensorflow-gpu-setup
Step-by-step guide to installing TensorFlow with GPU support on Conda.
artificial-intelligence cuda deep-learning gpu machine-learning nvidia nvidia-gpu setup-guide tensorflow
Last synced: 17 Feb 2026
https://github.com/juntyr/necsim-rust
Spatially explicit biodiversity simulations using a parallel library written in Rust
biodiversity cuda mpi necsim rust simulation
Last synced: 22 Mar 2025
https://github.com/andreasholt/cusmc
A CUDA-accelerated Statistical Model Checker for Stochastic Timed Automata
Last synced: 11 Feb 2026
https://github.com/tthebc01/cudaconda3
Lightweight container environment with Cuda, Miniconda3, and Jupyter Lab.
cuda docker gpu jupyterlab marimo-notebook miniconda3 reverse-proxy-application
Last synced: 11 Feb 2026
https://github.com/lzyrapx/llm-grandmaster-notes
🎓The path to LLM mastery is paved with broken embeddings and resurrected gradients.
cuda deep-learning llm reinforcement-learning
Last synced: 14 May 2025
https://github.com/dzimiks/cuda-matrix-multiplication
CUDA Matrix Multiplication
cuda matrix matrix-multiplication python
Last synced: 16 Apr 2026
https://github.com/markdtw/parallel-programming
Basic Pthread, OpenMP, CUDA examples
cuda openmp parallel-programming pthreads
Last synced: 20 Apr 2026
https://github.com/lmlsna/install-scripts
Ubuntu install scripts
cuda do-release-upgrade eol nvidia tailscale ubuntu
Last synced: 18 Jul 2025
https://github.com/geekysuavo/gpufield
A CUDA-accelerated electromagnetostatics solver
cuda magnetic-fields magnetostatics
Last synced: 24 Dec 2025