CUDA
CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.
- GitHub: https://github.com/topics/cuda
- Wikipedia: https://en.wikipedia.org/wiki/CUDA
- Created by: Nvidia
- Released: June 23, 2007
- Related Topics: nvcc,
- Last updated: 2026-06-22 00:07:17 UTC
- JSON Representation
https://github.com/trahay/mpi-wattmeter
MPI-Wattmeter measures the power consumption of MPI programs
carbon-emissions cuda energy-consumption energy-monitor gpu hpc mpi
Last synced: 17 May 2026
https://github.com/infotrend-inc/ctpo-demo_projects
Jupyter Notebook examples using CTPO as their source container.
cuda opencv pytroch tensorflow2
Last synced: 14 Apr 2026
https://github.com/andreabak/whispersubs
Generate subtitles for your video or audio files using the power of AI
ai cuda deep-learning gpu-acceleration machine-learning srt subtitles transcribe transcription translate whisper
Last synced: 15 Feb 2026
https://github.com/hadv/vaneth
GPU-accelerated CREATE2 vanity address miner for Ethereum
create2-contract-deployment cuda ethereum gpu gpu-acceleration gpu-programming open-cl vanity-address
Last synced: 21 Jan 2026
https://github.com/kishore-narendran/eecs221-highperformancecomputing
Assignments done during the graduate course EECS 221 - Introduction to HPC that I took in the Spring Quarter of 2016 at University of California, Irvine. Involves assignments that use OpenMP, MPI and CUDA.
Last synced: 17 May 2026
https://github.com/andreimoraru123/contextcollector
Mixed vision-language Attention Model that gets better by making mistakes
attention attention-mechanism coco-api computer-vision cuda cudnn image-captioning lstm mscoco-dataset multimodal-deep-learning natural-language-processing object-detection opencv pytorch resnet show-and-tell show-attend-and-tell video-inference vision-language yolo
Last synced: 11 Apr 2026
https://github.com/markdtw/parallel-programming
Basic Pthread, OpenMP, CUDA examples
cuda openmp parallel-programming pthreads
Last synced: 20 Apr 2026
https://github.com/B1-663R/docker-mining
Dockerfiles to build docker images to start mining with an NVIDIA Docker architecture
cryptocurrency cuda docker-image docker-nvidia mining
Last synced: 28 Mar 2025
https://github.com/droduit/multiprocessor-architecture
Introduction to Multiprocessor Architecture @ EPFL
cuda multiprocessor multithreading openmp-parallelization
Last synced: 17 Apr 2026
https://github.com/pelayo-felgueroso/tensorflow-gpu-setup
Step-by-step guide to installing TensorFlow with GPU support on Conda.
artificial-intelligence cuda deep-learning gpu machine-learning nvidia nvidia-gpu setup-guide tensorflow
Last synced: 17 Feb 2026
https://github.com/bogdanminko/laperf
La Perf is a framework for AI performance benchmarking — covering LLMs, VLMs, embeddings, with power-metrics collection.
ai-benchmark ai-performance apple-silicon cuda lmstudio ml-benchmark mlx mps nvidia-gpu ollama open-source-benchmark
Last synced: 15 May 2026
https://github.com/superlinear-ai/scipy-notebook-gpu
jupyter/scipy-notebook with CUDA Toolkit, cuDNN, NCCL, and TensorRT
cuda cudnn docker nccl scipy-notebook tensorflow tensorrt
Last synced: 01 May 2026
https://github.com/peri044/cuda
GPU implementations of algorithms
cuda gauss-jordan parallel-programming
Last synced: 14 Jul 2025
https://github.com/maliknaik16/parallel-computing
CUDA programming in C++ for high-performance computing using Nvidia GPUs, optimized for tasks like machine learning, or image processing
cores cpp cuda gpu makefile matrix nvcc optimization
Last synced: 10 Jun 2025
https://github.com/programmer-rd-ai/detectx
A Pythonic approach to object detection using Detectron2, a clean, modular framework for training and deploying computer vision models. DetectX simplifies the complexity of object detection while maintaining high performance and extensibility.
coco-dataset computer-vision computer-vision-library cuda deep-learning detectron2 faster-rcnn gpu-accelerated machine-learning ml-framework object-detection object-recognition python3 pytorch retinanet
Last synced: 10 Jun 2025
https://github.com/yosh-matsuda/gpu-ptr
Cross-platform GPU smart pointer with C++20 range support
cpp cpp20 cuda gpu header-only hip
Last synced: 17 Jan 2026
https://github.com/xiongsp/pytorch-docker
Pure Pytorch Docker Images. Support almost all combinations of Pytorch, Python, Ubuntu, CentOS, and CUDA. 纯净的Pytorch镜像,支持几乎各种Pytorch、Python、Ubuntu、CentOS、CUDA版本的组合。
centos cuda docker docker-image python3 pytorch ubuntu
Last synced: 17 Apr 2026
https://github.com/bdwhst/fluora
A CUDA PBR path tracer
cpp cuda pathtracing pbr rendering
Last synced: 13 Feb 2026
https://github.com/shreyansh26/mlsys-experiments
A collection of scripts on experimenting and implementing MLSys-related stuff
cuda cuda-kernel gpu gpu-programming llm-inference profiling pytorch triton
Last synced: 30 Aug 2025
https://github.com/tvanfossen/entropic
Local-first agentic inference engine in C/C++. Multi-tier model routing, grammar-constrained output, MCP tool servers. Embeddable via C ABI.
agentic-ai agentic-framework cpp cpp20 cuda edge-ai embedded-ai gbnf gguf grammar-constrained-decoding inference-engine llama-cpp llm local-llm mcp on-device-ai privacy-first tool-calling
Last synced: 30 May 2026
https://github.com/frozenassassine/neuralnetwork-fromscratch
Neural Network from scratch in C# with CUDA support
ai classification csharp cuda gpu gpu-acceleration neural-network neural-networks nvidia
Last synced: 20 Feb 2026
https://github.com/kilamper/matrix-multiplication
AC - Matrix multiplication using OpenMP, MPI and CUDA
Last synced: 16 May 2026
https://github.com/lmlsna/install-scripts
Ubuntu install scripts
cuda do-release-upgrade eol nvidia tailscale ubuntu
Last synced: 18 Jul 2025
https://github.com/trilliwon/cuda-examples
CUDA examples
cuda gpu-computing nvidia-cuda parallel parallel-computing parallel-programming
Last synced: 25 Mar 2025
https://github.com/xkevio/cuda-raytracer
A simple ray tracer written with CUDA that saves its output in a .ppm file, CPU version included for reference.
Last synced: 25 Aug 2025
https://github.com/shikha-code36/cuda-programming-beginner-guide
A beginner's guide to CUDA programming
cuda cuda-basic cuda-basics cuda-cpp cuda-demo cuda-kernel cuda-kernels cuda-library cuda-programming cuda-support cuda-toolkit
Last synced: 05 Jan 2026
https://github.com/orlandopalmeira/trabalho-cp-2023-2024
Repositório do trabalho prático no âmbito da UC de Computação Paralela (CP) - Mestrado em Engenharia Informática (MEI/MIEI) - Universidade do Minho (UMinho)
computacao-paralela cp cuda cuda-programming mei miei nvidia nvidia-cuda openmp optimization optimization-problem parallelism performance uminho uminho-mei uminho-miei
Last synced: 18 May 2026
https://github.com/debowin/gpu-parallel-recommender-system
GPGPU Parallel User-User Collaborative Filtering System in CUDA C
collaborative-filtering cuda gpu-programming movielens-dataset recommender-system
Last synced: 24 Apr 2026
https://github.com/tank3-tk3/pi-calculation-cpu-gpu
PI calculation with CPU and GPU
c cpp cuda parallel-computing pi
Last synced: 13 Apr 2026
https://github.com/trick-17/backends
Interchangeable backends in C++, OpenMP, CUDA, OpenCL, OpenACC
c-plus-plus cross-platform cuda cuda-backend header-only openacc openacc-backend opencl opencl-backend openmp openmp-backend
Last synced: 11 Apr 2026
https://github.com/rogerallen/jmandelbrotr
Java CUDA Mandelbrot explorer
cuda cuda-opengl java jcuda joml lwjgl3 mandelbrot-viewer opengl
Last synced: 18 Apr 2026
https://github.com/babak2/optimizedsum
Optimized Parallel Sum program demonstrating CPU vs GPU performance
cuda cuda-programming gpu-acceleration gpu-computing gpu-parallelism visual-studio
Last synced: 27 Mar 2025
https://github.com/alpinebuster/arkime-docker-compose
Deploy Arkime with GPU-accelerated Rust/Python parsers and custom plugins using Docker Compose.
arkime c cuda deep-neural-networks docker docker-compose llm machine-learning networking pcap pcapng python rust traffic-analysis
Last synced: 16 Apr 2026
https://github.com/lchsk/ney
A header-only parallel functions library for Intel Xeon/Xeon Phi/GPUs
cuda gpu linux parallel phi scientific xeon xeonphi
Last synced: 07 May 2026
https://github.com/szymon423/tsp-cpu-vs-gpu
Simple brute force approach to solve travelling salesman problem with CPU and GPU
Last synced: 11 Mar 2025
https://github.com/dark-art108/artistic-style-transfer-cnn
cnn-architecture colab-notebooks cuda pil vgg19
Last synced: 01 Mar 2025
https://github.com/elftausend/sliced
Array operations with automatic differentiation on CPU and GPU
autograd automatic-differentiation cuda custos matrix opencl
Last synced: 31 Jan 2026
https://github.com/openspeedshop/cbtf-argonavis-gui
Baseline for next generation Open|SpeedShop Graphical User Interface (GUI). The primary focus of this GUI will be the processing and display of CUDA collector performance data. However, there will be refactoring phases to adopt the GUI to support the processing and display of any collector performance data.
cuda performance profiler profiling
Last synced: 18 Apr 2026
https://github.com/kar-dim/watermarking-gpu
Code for my Diploma thesis at Information and Communication Systems Engineering (University of the Aegean, School of Engineering) with title "Efficient implementation of watermark and watermark detection algorithms for image and video using the graphics processing unit". Part 2 / GPU
arrayfire cpp cuda ffmpeg gpu image-processing opencl parallel-computing video-processing watermark-image watermarking
Last synced: 09 Apr 2025
https://github.com/yingding/applyllm
A python package for applying LLM with LangChain and Hugging Face on local CUDA/MPS host
accelerator batch cuda framework inference kubeflow langchain llm mps pipeline slurm transformers
Last synced: 24 Aug 2025
https://github.com/teodutu/asc
Arhitectura Sistemelor de Calcul - UPB 2020
cache-optimization cuda parallel-programming profiling python-threading
Last synced: 24 Apr 2026
https://github.com/dujonwalker/nixos-config-x86_64-cuda
This repository contains my NixOS configuration optimized for 64-bit x86 systems with NVIDIA CUDA support, featuring a Plasma 6 desktop environment and a variety of essential applications for development, multimedia, and productivity. It serves as a backup for easy restoration and setup on new installations.
cuda flatpak nix nixos nixos-configuration ollama
Last synced: 17 Jan 2026
https://github.com/lzyrapx/llm-grandmaster-notes
🎓The path to LLM mastery is paved with broken embeddings and resurrected gradients.
cuda deep-learning llm reinforcement-learning
Last synced: 14 May 2025
https://github.com/csvancea/gpu-hashtable
GPU-backed linear-probing hash table implemented in CUDA. Supports batch operations such as insert and retrieval.
Last synced: 24 Apr 2026
https://github.com/agalue/sherpa-voice-assistant
Local AI-based voice assistant implemented using Sherpa, Whisper, Kokoro, and Ollama
coreml cuda golang kokoro-tts linux macos ollama onnx-runtime rust sherpa whisper-ai
Last synced: 04 Apr 2026
https://github.com/pnocera/cembedd
Embeddings rust API serving intfloat/multilingual-e5-large using huggingface/candle with CUDA enabled
Last synced: 12 Jan 2026
https://github.com/mrglaster/cuda-acfcalc
Calculation of the smallest ACF for signals of length N using CUDA technology.
acf c calculations cpp cuda google-colaboratory google-colaboratory-notebooks isu
Last synced: 06 May 2026
https://github.com/tky823/bitlinear158compression
Compare compression models for inference by BitLinear158
Last synced: 12 Jun 2026
https://github.com/dereklstinson/nccl
golang wrapper for nccl
cuda deep-learning go nccl parallel-computing
Last synced: 14 May 2026
https://github.com/pd2871/high-performance-computing
This repo contain the logs of High Performance Computing module's final Assignment
blurred-images c cuda gaussian-blur matrix-multiplication multi-threading parallel-computing pthreads pthreads-api
Last synced: 10 May 2026
https://github.com/tank3-tk3/parallel-processing-cuda
Parallel processing with CUDA C / C++
c cpp cuda parallel-computing parallel-programming
Last synced: 09 May 2026
https://github.com/ezamagni/knapsack-simd
A genetic 01-Knapsack problem solver in CUDA
cuda knapsack-problem knapsack01
Last synced: 09 May 2026
https://github.com/dhruvsrikanth/fastconv
Distributed and serial implementations of the 2D Convolution operation in c++ and CUDA.
convolution-filters cpp cuda gpu-programming high-performance-computing hpc image-editor image-processing nvidia parallel-programming
Last synced: 04 May 2026
https://github.com/abhans/archdev
Container that is built with Arch Linux with NVIDIA Driver & CUDA support, PyTorch and TensorFlow built in.
archlinux container cuda docker
Last synced: 07 May 2026
https://github.com/kibotu/llm-windows-server
Turn your Windows GPU into a private, low-latency LLM server. Docker-based, OpenAI-compatible API.
agentic cuda docker gguf llma-cpp local-llm nvidia-gpu openai-api opencode qwen self-hosted windows
Last synced: 10 Jun 2026
https://github.com/poodarchu/vision-lab
Computer Vision Experiments in all.
computer-vision cuda object-detection
Last synced: 07 May 2026
https://github.com/daaboulex/unsloth-nix
Unsloth (git main) packaged for NixOS — CPU/CUDA/ROCm LoRA fine-tuning envs
cuda fine-tuning flake lora machine-learning nix nixos nixos-module pytorch rocm unsloth
Last synced: 10 Jun 2026
https://github.com/sun-zhenxing/fast-neural-style
快速风格迁移部署
cuda cv2 fast-neural-style opencv
Last synced: 05 May 2026
https://github.com/xebastex/sfw-python
Python package designed to provide the essentials tools for off-the-grid inverse problem. This is the bedrock for future GUI implementation.
blasso cuda frank-wolfe pytorch
Last synced: 09 May 2026
https://github.com/speedcell4/torchdevice
Setup CUDA_VISIBLE_DEVICES
cuda deep-learning gpu machine-learning pytorch
Last synced: 07 May 2026
https://github.com/alextmjugador/rust-cuda-quickstart
Bring the Rust-CUDA project back to life under modern Linux environments.
cuda cuda-programming cuda-rust cuda-support docker rust
Last synced: 06 May 2026
https://github.com/uefi-code/msra_thepracticespaceproject_pytorchcuda
My repo to attend MSRA the Practice Space Project 2022, CUDA Implement and Optimize
Last synced: 06 May 2026
https://github.com/garciparedes/cuda-examples
Cuda examples who I develop to learn HPC based on GPU
c c-plus-plus cuda examples gpgpu gpu hpc
Last synced: 09 May 2026
https://github.com/igorcosta/deep-docker
Docker image for Deep Learning on AWS Cloud
cuda deep-learning docker docker-image tensorflow
Last synced: 05 May 2026
https://github.com/seralexeev/rabbit0
Robot Rabbit
cuda jetson nvidia robotics ros2 zed-camera
Last synced: 15 Jun 2026
https://github.com/seieric/gst-dsobjectsmask
📀NVIDIA DeepStream integrated GStreamer Plugin. Mask objects with cuda cores on Jetson boards. Fast and smooth since everything is done on NVMM.🏎
cuda cuda-programming deepstream gpu gstreamer gstreamer-plugins instance-segmentation jetson-agx-orin jetson-agx-xavier jetson-tx1 jetson-tx2 jetson-xavier maskrcnn nvidia-jetson nvidia-jetson-nano opencv opencv4 resnet resnet50
Last synced: 06 May 2026
https://github.com/poyea/lollipop
🍭 Sweet GPU compute kernels in CUDA, wrapped via CuPy
cuda cuda-kernel cuda-kernels cuda-programming gpu-kernels gpu-programming python
Last synced: 17 Jun 2026
https://github.com/abdulfatir/subkmeans
Numpy and pyCUDA implementation of subKmeans
clustering cuda kdd kmeans numpy pycuda python subspace-clustering
Last synced: 09 May 2026
https://github.com/brosnanyuen/raybnn_dataloader
Data Loader for RayBNN
arrayfire cpu csv csv-parser cuda data-structures gpu-computing oneapi opencl parallel parallel-computing rust
Last synced: 07 May 2026
https://github.com/manishklach/gpu-resident-inference-lab
Research lab for GPU-resident LLM inference loops: persistent kernels, sparse KV selection, tiered residency, speculative decode, and trace-driven scheduling.
cuda gpu-systems kv-cache llm-inference mega-kernel model-systems persistent-kernel runtime speculative-decoding
Last synced: 19 Jun 2026
https://github.com/jayemscript/llm-systems-from-scratch
A hands-on learning project for building the core systems behind Large Language Models using C++, Rust, and optional Python/JavaScript bindings. Includes tensor operations, autograd, neural networks, tokenization, and a minimal transformer pipeline.
ai-systems autograd c-language cpp cuda educational-project high-performance-computing inference-engine machine-learning neural-networks-from-scratch pybind11 tensor-library tokenization transformers wasm
Last synced: 19 Jun 2026
https://github.com/pharmcat/metidacu.jl
CUDA solver for Metida.jl
cuda julia-language metida mixed-models
Last synced: 27 Apr 2026
https://github.com/codingrule/cuda-mbrot
Just another mandlebrot with cuda
cuda cuda-toolkit cupy fractal mandelbrot mathematics nvidia
Last synced: 27 Apr 2026
https://github.com/axel-ex/seame-ads-autonomous-lane-detection-24-25
🚗 Real-time lane detection and autonomous steering for JetRacer, powered by ROS2 and GPU-accelerated CV on Jetson Nano.
cuda jetson-nano ros2 tensorrt
Last synced: 27 Apr 2026
https://github.com/andrewboessen/bitonic-merge-sort
Bitonic Merge Sort algorithm optimized for GPU execution
bitonic-merge-sort cuda sorting-network
Last synced: 16 May 2026
https://github.com/bl33h/productoftwovectors
This code utilizes CUDA for parallel vector multiplication on a GPU, demonstrating GPU's acceleration capabilities.
cuda gpu kernel paralelism parallel-programming product vector
Last synced: 16 May 2026
https://github.com/ehsanmok/cs-521
UBC CS 521: Parallel Computing and Architectures
cuda erlang parallel-algorithm parallel-computing
Last synced: 16 May 2026
https://github.com/maelstrom6/mandelpy
A Mandelbrot and Buddhabrot viewer with GPU acceleration
buddhabrot cuda gpu mandelbrot python3
Last synced: 27 Apr 2026
https://github.com/pkestene/mandelbrot_kokkos
cuda gpu gpu-computing kokkos mandelbrot openmp performance-portability
Last synced: 27 Apr 2026
https://github.com/xusworld/tars
Tars is a cool deep learning framework.
avx2 avx512 cuda deep-learning
Last synced: 27 Apr 2026
https://github.com/thunder-compute/thunder-compute-documentation
Documentation for Thunder Compute, a cloud platform creating technology to virtualize GPUs over TCP
ai artificial-intelligence cloud cloud-computing cuda gpu llm machine-learning nvidia pytorch tensorflow thunder-compute virtualization
Last synced: 15 Oct 2025
https://github.com/abhinavsharma07/streamlit
Stable Diffusion
clip cuda denoising diffusers generative-models latent-diffusion latent-space lms-scheduler unet
Last synced: 28 Apr 2026
https://github.com/ashwani-rathee/imagesgpu.jl
Image Processing on GPU in Julia
cuda gpu image image-processing julia
Last synced: 11 Jul 2025
https://github.com/sunsided/rust-arrayfire-experiments
Toying around with ArrayFire in Rust
arrayfire conways-game-of-life cuda gpgpu gpu-acceleration gpu-computing opencl rust
Last synced: 28 Apr 2026
https://github.com/pvdberg1998/cufft_rust
A safe Rust wrapper around a subset of cuFFT.
Last synced: 19 Apr 2025
https://github.com/dolongbien/cuda
CUDA and Caffe/Caffe2 installation Ubuntu 16.04
c3d-intel-caffe caffe caffe2 cuda cudnn deep-learning ubuntu
Last synced: 28 Apr 2026