CUDA
CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.
- GitHub: https://github.com/topics/cuda
- Wikipedia: https://en.wikipedia.org/wiki/CUDA
- Created by: Nvidia
- Released: June 23, 2007
- Related Topics: nvcc,
- Last updated: 2026-06-23 00:07:15 UTC
- JSON Representation
https://github.com/ubermorgott/morgottalk
Cross-platform desktop push-to-talk voice transcription. Single binary. GPU accelerated (CUDA/Vulkan/Metal/ROCm/OpenCL). Powered by whisper.cpp.
cuda desktop go gpu speech-to-text svelte transcription voice wails whisper
Last synced: 07 Apr 2026
https://github.com/hrolive/fundamentals-of-accelerated-computing-with-cuda-python
Explore how to use Numba—the just-in-time, type-specializing Python function compiler—to create and launch CUDA kernels to accelerate Python programs on massively parallel NVIDIA GPUs.
accelerated-computing cuda cuda-programming jit numba nvidia python
Last synced: 01 May 2026
https://github.com/programmergnome/kutyai
This is a python dog breed recognizer graphical application with 420 breeds and 42000 images.
cuda deep-learning image-classification python3 qt5-gui tensorflow transfer-learning
Last synced: 11 May 2026
https://github.com/drilonaliu/parallel-permuation-cipher-attack
attack cryptography cuda gpu parallel-computing
Last synced: 21 Mar 2025
https://github.com/drilonaliu/bachelor-thesis
Parallel Programming Fractals
cuda fractals gpu parallel-programming
Last synced: 15 May 2026
https://github.com/drilonaliu/parallel-permutation-cipher
cryptography cuda gpu parallel-programming permutation
Last synced: 19 Jul 2025
https://github.com/phantom7knight/cuda-fusion
This project is for learning CUDA to understand the GPU work better.
cuda cuda-programming gpgpu gpu
Last synced: 17 May 2026
https://github.com/aaditya29/parallel-computing-and-cuda
Learning about Parallel Computing and GPU programming using CUDA.
c cpp cuda cuda-kernels cuda-programming nvidia-cuda openmp openmpi parallel-computing parallel-programming
Last synced: 18 Jul 2025
https://github.com/chensongpoixs/cmedia_transcode
媒体服务转码版本GPU(cuda) 支持H264与H265转码
cuda gpu h264 h265 media transcode-media
Last synced: 19 May 2026
https://github.com/iebeid/cuda-particles
A simple visualization of particles calcualted using CUDA
Last synced: 17 Apr 2026
https://github.com/drilonaliu/parallel-image-edge-detection
cuda edge-detection gpu image-processing
Last synced: 17 May 2026
https://github.com/kratugautam99/logiclink-project
LogicLink is a conversational AI chatbot developed by Kratu Gautam (AIML Engineer). Powered by the TinyLlama-1.1B-Chat-v1.0 model, it provides an interactive interface for engaging conversations, query resolution, and task assistance. Version 5 features streaming responses, conversation management, and a sleek GUI.
antd-design chatbot-application conversational-ai cuda gradio graphical-user-interface huggingface-spaces huggingface-transformers jupyter-notebooks keras large-language-models mlops model-service-controller modelscope-studio natural-language-generation natural-language-processing pytorch reasoning-agent tensorflow
Last synced: 07 Apr 2026
https://github.com/andresvalle/ocr-extraction
Text extraction from images using EasyOCR and parallelization with PyTorch
Last synced: 01 May 2026
https://github.com/antoniakras/semantic-video-search
GPU-optimized semantic search on video transcripts, with benchmarking of FAISS, Pinecone, and PostgreSQL vector databases. Deployed via Docker on FORTH’s GPU infrastructure.
bert-embeddings bert-fine-tuning cuda dokcer embedding-models embeddings-word2vec faiss-vector-database gpu-computing huggingface-transformers nlp-machine-learning pgvector pineconedb postgresql python pytorch retrieval-augmented-generation similarity-search vector-database whisper-ai
Last synced: 03 May 2026
https://github.com/jonmarty/pycuda-kmeans
A parallelized PyCuda implementation of the KMeans clustering algorithm.
Last synced: 25 Apr 2026
https://github.com/mcp-tool-shop-org/gpu-container
Model-aware inference memory-placement planner for single-GPU rigs: profile hardware + model, generate explicit VRAM/RAM/NVMe placement plans across runtimes (llama.cpp/vLLM/...), and prove them with a measured receipt. Not VRAM overflow - declared placement.
cuda gpu inference llama-cpp llm moe offload vram wsl2
Last synced: 09 Jun 2026
https://github.com/jdibenes/game_of_life_cuda
OpenGL / CUDA implementation of Conway's Game of Life.
cpp cuda opengl qt6 simulation
Last synced: 02 Apr 2026
https://github.com/kar-dim/cas-2d
Implementation of the AMD FidelityFX CAS (Contrast Adaptive Sharpening) algorithm on CUDA/OpenCL, for sharpening static images.
cpp cuda dll fidelityfx gpu image-processing parallel-computing sharpen
Last synced: 22 Jun 2025
https://github.com/chrisdalvit/gpu-matrix-transpose
Implementation and benchmarking of different matrix transpose with CUDA
c cpp cuda cuda-kernels cuda-programming gpu-acceleration gpu-computing gpu-programming matrix-transpose nvidia-gpu
Last synced: 17 Apr 2026
https://github.com/marius311/cudadistributedtools.jl
A set of utility tools for multi-GPU + multi-process workflows
Last synced: 01 May 2026
https://github.com/leo27945875/pybind11_cuda_matmul
cpp cuda matrix-multiplication pybind11 python3
Last synced: 17 Apr 2026
https://github.com/kanchishimono/python-images
Ubuntu based Python container images, including CUDA images
container-image cuda docker dockerfile machine-learning python python3
Last synced: 30 Apr 2026
https://github.com/loreloc/triturus
A bunch of triton kernels with increasing complexity for learning and exploring triton and GPU programming
Last synced: 17 Apr 2026
https://github.com/stckvrflw/pem-spgemm
pemSpGEMM - An Improved SpGEMM Algorithm
Last synced: 17 Apr 2026
https://github.com/void4main/bifurcation-diagram
These little python scripts plot a bifurcation diagram into a png file (work fine on a raspberry pi and accelerated on a NVIDIA Jetson Nano) - but still a lot of room for improvements ...
bifurcation cuda feigenbaum gpu jetson logistic map nano numba sequence vectorize
Last synced: 17 Apr 2026
https://github.com/bjornmelin/ml-production-engineering
⚙️ End-to-end ML deployment solutions. Focused on model serving, multi-GPU optimization, and production-grade system implementation. 🎯
cuda deployment docker fastapi gpu-computing kubernetes mlops production
Last synced: 17 Apr 2026
https://github.com/bjornmelin/nlp-engineering-hub
📚 Enterprise NLP systems and LLM applications. Features custom language model implementations, distributed training pipelines, and efficient inference systems. 🔤
cuda gpu-optimization huggingface huggingface-transformers langchain language-models large-language-models nlp openai python transformers
Last synced: 17 Apr 2026
https://github.com/rkarahul/person-detector-faceverifier
Person-Detector-FaceVerifier is a sophisticated system for detecting and verifying faces in images. Ideal for applications like passport control and security, it combines advanced face detection with precise verification techniques.
bootstrap5 css3 cuda django html5 javascipt opencv-python os python pytorch yolov8
Last synced: 07 Apr 2026
https://github.com/f14-bertolotti/torchess
cuda torch extension for a chess engine
Last synced: 01 May 2026
https://github.com/ribin-baby/cuda_cudnn_installation_on_ubuntu20.04
Installation of CUDA-11.8 with cuDNN-8.7 for ubuntu(20.04) server A30 GPU, and onnx gpu installation guide
cuda gpu linux onnxruntime server
Last synced: 16 May 2026
https://github.com/vibesmiths/mcp-rvc
GPU service for voice cloning via Retrieval-based Voice Conversion (CUDA + ROCm)
cuda docker gpu rocm rvc tts voice-cloning
Last synced: 17 Apr 2026
https://github.com/vibesmiths/mcp-musicgen
GPU service for text-to-music generation via Meta AudioCraft (CUDA + ROCm)
audiocraft cuda docker gpu musicgen python rocm text-to-music
Last synced: 17 Apr 2026
https://github.com/briiqn/obj2schem
A CUDA enabled .obj model to schematic (Sponge V3) converter
cuda minecraft schematics wavefront-obj worldedit
Last synced: 17 Apr 2026
https://github.com/cs550-epfl/report
EPFL CS-550 project report
cuda formal-verification gpu memory-consistency ptx simt
Last synced: 03 Jun 2026
https://github.com/flosmume/cpp-cuda-deepvision-rtx-starter
CUDA C++ practice project for RTX 4070 SUPER — explore GPU concurrency, pinned memory, and Nsight profiling. Includes SAXPY and 2D blur kernels to train optimization, stream overlap, and timing analysis for NVIDIA Developer Technology Engineering skillset.
cpp cuda cuda-kernels cuda-streams deep-learning-inference gpu gpu-optimization gpu-profiling high-performance-computing nsight nvidia parrallel-computing pinned-memory
Last synced: 16 May 2026
https://github.com/ahmadrafidev/learn-cuda
A place where I learn about CUDA
cuda cuda-programming gpu os parallel-programming
Last synced: 13 Apr 2025
https://github.com/aeyage/intraday_prices
GPU-accelerated portfolio optimisation
Last synced: 05 Apr 2025
https://github.com/illagrenan/cuda-90-cudnn7-runtime-1604-py36
Ubuntu 16.04 with Python 3.6 and CUDA9 Dockerfile
Last synced: 03 May 2026
https://github.com/imanghd/parallelprocessing
CE Algorithms Lab @ SUT
cuda openmp parallel-algorithm parallel-processing systolic
Last synced: 01 May 2026
https://github.com/jadc/cuda-raytracer
A simple path tracer written in CUDA.
cpp cuda gpu-programming graphics parallel-programming path-tracing raytracing
Last synced: 16 May 2026
https://github.com/synapticore-io/torch-cuda
PyTorch CUDA 12.8 project template - GPU-accelerated ML with modern Python packaging (uv), checkpointing, early stopping, and reproducibility utilities.
cuda gpu project-template python pytorch
Last synced: 04 Apr 2026
https://github.com/seieric/pytorch-mpi-singularity
Singularity Container including PyTorch with CUDA and mpi backend for DistributedDataParallel
cuda hpc nvidia openmpi pytorch singularity utokyo
Last synced: 18 Apr 2026
https://github.com/psteinb/gtc2017
Slides for my presentation at GTC 2017 from May 8-11 in Silicon Valley
compression cuda ffmpeg gpu gpu-computing h264 h265 microscopes spim
Last synced: 03 May 2026
https://github.com/thalesmg/haskell-accelerate-parconc
Example and benchmark of Accelerate-HS from Parallel and Concurrent Programming in Haskell
accelerate cuda gpu-computing haskell parallel-computing
Last synced: 18 Apr 2026
https://github.com/qanastek/concurency-tetravex
This software is an fast and reliable tetravex solver based on C++ and CUDA.
c-plus-plus cuda parrallel-computing tetravex
Last synced: 18 Apr 2026
https://github.com/abdelrahman-amen/active_learning_in_nlp
I applied active learning to the IMDB dataset for sentiment analysis. Starting with a small labeled subset, I trained a model and used uncertainty sampling to select and label challenging reviews. This iterative process improved performance while reducing labeling effort.
activelearning cuda entropy imdb-dataset margin nlp python sklearnex torch uncertainty
Last synced: 18 Apr 2026
https://github.com/betarixm/csed490c
POSTECH: Heterogeneous Parallel Computing (Fall 2023)
cuda gpu parallel-computing postech
Last synced: 19 Apr 2026
https://github.com/aledinola/ifp_cuda_mex
Solve the income fluctuation problem on the GPU
Last synced: 14 May 2026
https://github.com/flavienbwk/nvidia-cuda-mirror-docker
An all-in-one mirror for installing NVIDIA Docker.
cuda docker linux-mirror mirror nvidia nvidia-docker nvidia-docker2 offline offline-capable
Last synced: 18 Apr 2026
https://github.com/cooliron2311/cumd5bf
CUDA based md5 password bruteforcer
Last synced: 18 Apr 2026
https://github.com/marcellodesales/nvidea-cuda-ubuntu-docker
NVidea CUDA base image on Ubuntu Linux, used to run Machine Learning
ai cuda docker docker-compose machine-learning ml nvidia-docker ubuntu
Last synced: 18 Apr 2026
https://github.com/toshikinakamura0412/dockerfiles
Development environment using Docker for some Linux distributions
alpine bash cuda debian devcontainer devcontainers docker docker-compose fedora opencv opensuse ros ros-humble ros-noetic ros2 ubuntu ubuntu2004 ubuntu2204 vscode zsh
Last synced: 10 Jul 2025
https://github.com/dmmutua/cuda_projects
An Implementation of a variety of Algorithms & Technical Papers Mostly Related to Machine Learning & Deep Learning in CUDA C
c cuda cuda-programming deep-learning machine-learning machine-learning-algorithms
Last synced: 18 Apr 2026
https://github.com/ne0nwinds/gpupuzzles
My solutions to srush/GPU-Puzzles using CUDA
Last synced: 16 May 2026
https://github.com/genpat-it/ohe-rs
Ultra-fast one-hot encoding for bioinformatics and ML, powered by Rust + CUDA. Built for cgMLST allele profiles and large-scale categorical data.
bioinformatics cuda machine-learning one-hot-encoding performance pyo3 python rust
Last synced: 04 Jun 2026
https://github.com/ronaldsg20/compu-paralela
Códigos de ejemplo para computación paralela y distribuida
cuda opencv openmp posix-threads
Last synced: 14 May 2026
https://github.com/ex539/docker-dev-env
A collection of ready-to-use Docker development environments for multiple Linux distributions (Ubuntu, Debian, Alpine, Arch, Kali). Includes shared configurations, utility scripts, and comprehensive documentation for reproducible development setups across teams and CI/CD pipelines.
big-data cpp cuda docker docker-image docker-php docker-setup environment hadoop jenkins kubernetes qtcreator reproducibility x11
Last synced: 05 Apr 2026
https://github.com/lionpsiuc/postgraduate
A collection of assignments and projects completed during my M.Sc. in High-Performance Computing at Trinity College Dublin.
Last synced: 01 May 2026
https://github.com/manishklach/gb300-rl-runtime
Close-to-metal C/CUDA lab for RL inference fast paths: persistent GPU workers, hugepage KV arenas, cacheline-aware command rings, and async reward handoff. Goal: remove page faults, malloc/free, scheduler wakeups, CPU round-trips, and KV migration from the per-token path.
ai-infrastructure close-to-metal cuda gb300 gpu-inference hpc lock-free nvlink reinforcement-learning spsc-queue
Last synced: 09 Jun 2026
https://github.com/aditiisaxena/cuda-accelerated-box-filter-for-texture-image-enhancement
Enhances grayscale texture images using a CUDA-based box filter. Built with CUDA, C++14, and OpenCV for high-performance image processing.
cpp cuda gpu-programming linux nvidia opencv
Last synced: 18 Apr 2026
https://github.com/intelav/gpu-agent-opt
AI Agent Framework for GPU Kernel Autotuning & Optimization. Automate CUDA kernel exploration, profiling, and tuning with AI-driven agents for deep learning, geospatial AI, and HPC workloads.
ai-agents autotuning cuda deep-l edge-ai geospatial gpu hpc nvidia optimization performance pytorch
Last synced: 19 Apr 2026
https://github.com/vicen-te/tiny-nn
A tiny neural network framework for fully-connected layers with CPU and CUDA support
backpropagation cplusplus-20 cpu cuda cuda-12-8 kernel multi-threaded neural-network nn
Last synced: 19 Apr 2026
https://github.com/bd2720/accesspatterns
Comparing chunked vs. striped memory access patterns for CPU and GPU code using the CUDA toolkit in C.
c cache cuda cuda-toolkit performance-analysis performance-testing profiling
Last synced: 16 May 2026
https://github.com/timanema/msc-thesis-public
Repository containing a GPU-accelerated compressor based on FSST
compression cpp cuda gpu thesis
Last synced: 19 Apr 2026
https://github.com/zepedroresende/matrixmultiplication
Matrix Multiplication optimizations on intel and CUDA
c cpp cuda hpc matrix-multiplication omp optimization
Last synced: 01 May 2026
https://github.com/yash-1335/qwen600
🚀 Build a fast inference engine for the QWEN3-0.6B model using CUDA, optimizing performance with minimal dependencies for efficient learning and practice.
cuda cuda-programming gpu llamacpp llm llm-inference qwen qwen3 transformer
Last synced: 16 May 2026
https://github.com/fatlipp/toyslam
SLAM implementation from scratch w/o external graph optimization libs
cuda gpu lidar-slam mapping odometry robotics slam
Last synced: 20 Apr 2026
https://github.com/ydkn/htw-progko-cuda
Parallel processing of image transformations. Part of the "Programmierkonzepte und Algorithmen" course at HTW-Berlin.
cuda image-transformations opencv
Last synced: 20 Apr 2026
https://github.com/rtfirst/voice-to-text
Cross-platform Push-to-Talk speech-to-text — local Whisper transcription (CUDA/MPS) with optional Anthropic API correction and live VU meter overlay. Windows 11 + macOS.
cuda macos push-to-talk python speech-to-text voice-input whisper windows
Last synced: 04 Jun 2026
https://github.com/amirbroker/cupydtw
Use Cuda for Dynamic Time Warping
cuda dtw dynamic-time-warping python
Last synced: 20 Apr 2026
https://github.com/lanceberge/cuda-newton-fractals
Parallelize and visualize the Newton Iteration
cpp cuda mathematical-modelling visualization
Last synced: 16 May 2026
https://github.com/alexkranias/triton_vs_cuda
Building Triton and CUDA kernels side-by-side to create a cuBLAS-performant GEMM kernel.
cuda cuda-kernels gpu gpu-programming parallel-programming python triton
Last synced: 20 Apr 2026
https://github.com/voschezang/holographic-projector-simulations
Optimizations of Simulations of Holographic Projectors using CUDA
cuda gpu holography parallel-computing photonics
Last synced: 16 May 2026
https://github.com/djenriquez/ccminer
Dockerized ccminer
cuda docker ethereum mining nvidia nvidia-docker
Last synced: 05 May 2026
https://github.com/jusqua/dip-benchmark
Departmental undergraduate research project at UFS. Digital image processing benchmark using multiple tools to learn new ways to develop image processors.
benchmark cuda image-processing matlab opencv sycl visiongl
Last synced: 20 Apr 2026
https://github.com/bonevbs/cuknn
Cuda implementation of k-nearest neighbor search
Last synced: 20 Apr 2026
https://github.com/py-sandy/llama.cpp-windows-builder
Automated, reproducible build scripts for llama.cpp on Windows 10/11. Installs prerequisites, configures CMake and builds with CUDA.
ai build-scripts build-tool builder cuda llamacpp script scripts windows windows-10 windows-11
Last synced: 20 Apr 2026
https://github.com/riciokzz/computer-vision
Computer Vision project
cuda data-cleaning data-engineering data-science exploratory-data-analysis machine-learning neural-network
Last synced: 20 May 2026
https://github.com/mrkct/cuda-raytracer
Simple CUDA-Accelerated raytracer
cuda gpu raytracing raytracing-one-weekend
Last synced: 21 Apr 2026
https://github.com/rai-project/dlperf
Déjà vu: Modeling DNN Performance by Recalling History
benchmark cuda deep-learning modeling onnx performance tensorflow
Last synced: 21 Apr 2026
https://github.com/musaibbashir/object-detection
Pytorch+CUDA implementation of several image classification and object detection models like YOLO , Fast-CNN, RF-DETR
cnn computer-vision cuda image-classification object-detection pytorch yolo
Last synced: 21 Apr 2026
https://github.com/arya2004/parallel-computing
Parallel Computing Uni Course
Last synced: 18 May 2026