CUDA
CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.
- GitHub: https://github.com/topics/cuda
- Wikipedia: https://en.wikipedia.org/wiki/CUDA
- Created by: Nvidia
- Released: June 23, 2007
- Related Topics: nvcc,
- Last updated: 2026-06-23 00:07:15 UTC
- JSON Representation
https://github.com/eshibusawa/cupy-cuda
Learn CUDA programming essentials with CuPy, from basic kernels to advanced memory patterns
cooperative-thread-array cub cuda cupy gpu parallel-computing python
Last synced: 15 Jun 2025
https://github.com/haleelrah/Vision-pro-MAX
A Raspberry Pi-based object detection system for assisting visually impaired individuals. This project utilizes YOLO object detection and a Hailo 8L TPU to identify obstacles like manholes, potholes, and bumps, providing real-time audio feedback to aid navigation.
bash computer-vision cuda fine-tuning jupyter-notebook object-detection opencv python pytorch raspberry-pi rpi-camera ssh text-to-speech ultralytics yolo yolov8
Last synced: 30 Dec 2025
https://github.com/bjornmelin/ml-vision-lab
👁️ Production-grade computer vision implementations. Real-world applications in image processing, object detection, and video analytics with GPU acceleration. 📸
computer-vision cuda deep-learning image-processing object-detection opencv pytorch video-analytics
Last synced: 04 Apr 2026
https://github.com/emmanuelmess/firstcollisiontimesteprarefiedgassimulator
This simulator computes all possible intersections for a very small timestep for a particle model
Last synced: 17 Apr 2026
https://github.com/antonioberna/nn-gpu-logic-gates
Neural Network implementation on GPU using CUDA C++ to learn logic gates operations
cpp cuda gpu logic-gates neural-networks nvidia
Last synced: 01 May 2026
https://github.com/kenmalik/cuda-dr-bcg
CUDA C++ implementation of the DR-BCG algorithm for numerically solving linear systems.
cpp cuda hpc numerical-methods
Last synced: 19 Apr 2026
https://github.com/wojcikmikolaj/particles-in-a-jar
Collisions between particles simulated on GPU.
algorithms-and-data-structures collision-detection collisions cuda gpu-programming
Last synced: 25 May 2026
https://github.com/simonschoelly/poisson-solver
A solver for a modified poisson equation using cuda.
cpp cuda finite-difference gpgpu pgc poisson-equation preconditioned-conjugate-gradient thomas-algorithm
Last synced: 18 May 2026
https://github.com/edcalderin/huggingface_ragflow
This project implements a classic Retrieval-Augmented Generation (RAG) system using HuggingFace models with quantization techniques. The system processes PDF documents, extracts their content, and enables interactive question-answering through a Streamlit web application.
bitsandbytes cuda huggingface huggingface-embeddings langchain langchain-community large-language-models llm nf4 python qdrant quantization rag retrieval-augmented-generation ruff streamlit text-generation
Last synced: 15 Jul 2025
https://github.com/xstupi00/N-Body-CUDA
PCG - Parallel Computations on GPU - Project - N-Body-CUDA
cuda gpu-acceleration gpu-computing nbody-simulation optimization parallel-computing pcg vut vut-fit
Last synced: 11 Mar 2025
https://github.com/ragu-manjegowda/parallel-programming
Assignments and Projects of Udacity's Introduction to Parallel Programming Course
cuda gpu-programming nvidia-cuda nvidia-gpu udacity-parallel-programming
Last synced: 25 May 2026
https://github.com/unknownnuts/meshsdk
Mesh processing library with extra `C/C#/JS/TS/PYTHON` bindings.
cuda dicom electron emscripten mesh modelling pybind11 stl stomatology threejs wasm
Last synced: 10 Apr 2026
https://github.com/aayes89/pyllm
Entrena tu propio LLM desde cero
cpu cuda llm llm-training pip python3
Last synced: 18 May 2026
https://github.com/matteopolak/stock-predict
Stock prediction with LSTM using TensorFlow and TypeScript.
ai artificial-intelligence cuda lstm machine-learning stock tensorflow typescript
Last synced: 09 May 2026
https://github.com/avarga1/vllm-hb
vLLM-compatible inference runtime in pure Rust. Zero Python. Zero libtorch. CUDA via candle.
candle cuda inference llm openai-api rust tokio vllm
Last synced: 07 Apr 2026
https://github.com/jpodivin/gputomata
Cellular automata running on CUDA capable GPUs
cellular-automata cellular-automaton cuda
Last synced: 07 Nov 2025
https://github.com/followthesapper/atlas-q
GPU-accelerated quantum tensor network simulator with adaptive MPS
ai cuda gpu-acceleration high-performance-computing matrix-product-states nisq python pytorch qaoa quantum-algorithms quantum-computing quantum-simulator scientific-computing shors-algorithm tensor-networks triton vqe
Last synced: 20 Jan 2026
https://github.com/debanjan06/spatial-streamio
An optimized, out-of-core asynchronous data streaming pipeline for high-throughput 3D point cloud training loops. Features low-level numpy.memmap zero-copy reads and multi-threaded ring prefetching to eliminate I/O bottlenecks, delivering a 33.33% throughput efficiency gain on PyTorch CUDA workloads.
asynchronous-programming cuda data-engineering deep-learning-pipelines io-optimization memory-mapping point-cloud pytorch
Last synced: 11 Jun 2026
https://github.com/loveboyme/yolov5-tensorrt-accelerator
基于TensorRT加速的YOLOv5高性能推理框架 | High-performance YOLOv5 inference framework accelerated by TensorRT with dynamic optimization
cuda dynamic-shapes-cuda-stream fp16 int8 pycuda tensorrt yolov5
Last synced: 29 Mar 2025
https://github.com/GTruf/Driver-Drowsiness-Detector
Prototype of an intelligent safety system for detecting driver drowsiness
cpp cuda cudnn deep-learning driver-drowsiness-detection driver-drowsiness-detector drowsiness-detection face-recognition image-recognition machine-learning neural-network nvidia-cuda object-recognition opencv qt6 recognition-neural-network yolo yolov10 yolov5 yolov9
Last synced: 14 Mar 2025
https://github.com/ngoma1713/rushirb2001
🤖 Explore advanced AI and machine learning solutions for protein modeling and medical applications, developed by a dedicated data science graduate student.
computer-vision-opencv cuda data-science-portfolio deep-learning generative-ai machine-learning medical-ai protein-modeling published-researcher pytorch quantum-ml rag-chatbot tensorflow
Last synced: 02 May 2026
https://github.com/bjornmelin/llm-gpu-optimization
🚄 Advanced LLM optimization techniques using CUDA. Features efficient attention mechanisms, custom CUDA kernels for transformers, and memory-efficient training strategies. ⚡
cuda deep-learning gpu-acceleration llm-optimization machine-learning memory-optimization parallel-computing transformers
Last synced: 18 Mar 2025
https://github.com/amitkumarj441/deep-learning-on-your-finger
A rich collection of dockerfiles for installing deep learning dependecies on your way :rocket:
Last synced: 18 Apr 2026
https://github.com/m-torhan/advent-of-code
🎄 Solutions for the Advent of Code
advent-of-code advent-of-code-2024 cuda
Last synced: 07 Apr 2025
https://github.com/deepschneider/tinygrad-universal
Universal version of Tinygrad with CUDA and OpenCL support
autograd automatic-differentiation cuda pycuda pyopencl tinygrad tinygrad-cuda
Last synced: 06 Mar 2025
https://github.com/kylesayrs/pttp
PyTorch Tensor Profiler with fully-supported memory timelines and events
Last synced: 07 May 2026
https://github.com/wiktor2718/matrix_flow
Matrix Flow is a simple machine learning library written in Rust and CUDA. It was created as a portfolio project to deepen my understanding of machine learning, GPU programming, and Rust. It provides an API for matrix manipulation and includes specially optimized neural networks.
adam-optimizer benchmarking cuda deep-learning gpu-computing machine-learning matrix-operations neural-networks portfolio-project rust
Last synced: 18 May 2026
https://github.com/mjun0812/setup-cuda
Set up a specific version of NVIDIA CUDA in GitHub Actions on Linux x86_64, arm64 (Debian and Fedora based distribution) and Windows
action cuda cuda-toolkit github-actions
Last synced: 13 Jan 2026
https://github.com/cppshizoids/cuda
This is my basic lessons of CUDA
cuda cuda-demo cuda-programming
Last synced: 15 Jul 2025
https://github.com/kmock930/texture-image-comparison
This project aims to build a model which classifies the type of an unseen image as accurate as possible, by implementing, evaluating, and comparing amongst 2 different multi-layer perceptron Neural Networks.
computer-vision conda confusion-matrix convolutional-neural-networks cuda image-preprocessing keras keras-tensorflow learning-curve-analysis matplotlib multi-layer-perceptron neural-network pickle-file python3 skimage
Last synced: 12 Apr 2026
https://github.com/lruizap/testcuda
Guide to install and use cuda for programming
Last synced: 12 May 2026
https://github.com/flavienbwk/tensorflow2-cuda-10.2-docker
Tensorflow 2.3, CUDA 10.2, Docker compatible image
cuda docker python3 tensorflow ubuntu1804
Last synced: 11 Apr 2026
https://github.com/tfogal/gemm-db
For creating a cacheable GEMM cost model.
Last synced: 18 May 2026
https://github.com/demetriantitus/machine-vision---yolov8
This project provides a comprehensive guide to object detection in cluttered environments using YOLOv8. It demonstrates how to identify and classify objects in both still images and video streams
computer-vision cuda dataset image-classification machine-learning nvidia-gpu object-detection surveillance traffic-monitoring video-analysis yolov8
Last synced: 18 May 2026
https://github.com/promptromp/aws-bootstrap-g4dn
fast and easy bootstrapping of AWS EC2 instances for CUDA development. Use as a CLI, as a programmatic SDK, or as an Agent Skill!
aws cuda ec2 jupyter-notebook machine-learning mlops python
Last synced: 21 Feb 2026
https://github.com/obj-wtf/gan-architecture
APP For training GAN Models on Architecture Plan
architecture building cuda gan pix2pix-tensorflow plan
Last synced: 18 May 2026
https://github.com/moshiba/fmindex
ultra fast parallel FM index generation for DNA reads
Last synced: 18 May 2026
https://github.com/sangioai/sph
CUDA and OpenMP versions of SPH (Smoothed Particle Hydrodynamics) serial algorithm.
Last synced: 27 Apr 2026
https://github.com/sonhm3029/setup-experience
This project for storage my setup experience, error met-and-solve in developing end to end AI, software project
ai computer-vision cuda deep-learning software
Last synced: 10 Jun 2026
https://github.com/jesuscopado/parallel-programming
My solutions for the course Programming Parallel Computers at Aalto University (http://ppc.cs.aalto.fi/). Grade: 5/5
cpp cuda image-segmentation median-filter sorting-algorithms
Last synced: 19 Apr 2026
https://github.com/kis-balazs/cuda-research
CUDA Research & Code. Course-style structured. Inspiration from @Infatoshi.
Last synced: 14 May 2025
https://github.com/alan-cooney/python-cuda-starter-template
Python CUDA Starter Template
Last synced: 30 Mar 2025
https://github.com/ludgerpaehler/lulesh-enzyme
AD with Enzyme through Lulesh.
automatic-differentiation cuda cuda-programming gpu-computing high-performance-computing llvm-enzyme scientific-computing
Last synced: 15 Jun 2026
https://github.com/bjornmelin/ai-system-design
🎨 Large-scale AI system architectures and implementations. Features distributed training systems, multi-GPU pipelines, and efficient resource management. 🏗️
architecture cuda distributed-systems engineering gpu-computing production scalability system-design
Last synced: 23 Jul 2025
https://github.com/ivanbgd/cuda_quad_c
Calculates a definite integral by using three different rules. Compares sequential to parallel implementations.
cuda integrals parallel-implementations
Last synced: 28 Mar 2025
https://github.com/rushirg/cuda-matrix-multiplication
Matrix Multiplication on GPGPU in CUDA
cpu cuda gpu parallel-processing
Last synced: 17 May 2026
https://github.com/derek-palmer/dvr-scan-file-organizer
DVR-Scan-Organizer is a Dockerized extension for DVR-Scan, designed to process multiple video files and organize output in a structured format.
cuda dvr dvr-scan multimedia opencv opencv-python python video video-processing
Last synced: 01 May 2026
https://github.com/gaaniruddha/mphil-gpu-imager
This repository contains code for project #1 of MPhil: test-version of GPU imager for a single time-step, single-channel and single time-step, multi-channel.
astronomy benchmarks cuda cufft google-sheets gpu-imager imaging-astronomy interferometry radio-astronomy
Last synced: 11 Jun 2026
https://github.com/elymsyr/auv_ws
An open-source simulation and control workspace for an Autonomous Underwater Vehicle (AUV) built on ROS 2 Humble and Gazebo. It features a high-fidelity dynamics model and an advanced AI-based motion controller (FossenNet) that uses a pre-trained LibTorch model to imitate a NL-MPC for real-time, high-performance manoeuvring.
autonomous-vehicles auv control-systems cpp cuda deep-learning gazebo imitation-learning libtorch mpc python robotics ros2 simulation
Last synced: 15 Apr 2026
https://github.com/awaldis/cuda-experiments
A place to explore the capabilities and limits of CUDA parallel processing.
cuda cuda-kernels cuda-programming
Last synced: 27 Aug 2025
https://github.com/puzzlef/vector-max-cuda
Performance of sequential vs CUDA-based vector element max.
basics cuda element experiment max vector
Last synced: 17 May 2026
https://github.com/hit07/ml-dl-torch
This repository contains comprehensive understanding of Machine Leaning, DeepLeaning using Pytorch
computer-vision convolutional-neural-networks cuda neural-networks pytorch
Last synced: 28 Feb 2025
https://github.com/quik-fe/node-nvidia-smi
Node wrapper around nvidia-smi.
cuda gpu nodejs nvidia nvidia-smi typescript
Last synced: 19 Feb 2026
https://github.com/lordofhyphens/gpu-path-delay-coverage
CUDA-based Path Delay Fault Coverage
Last synced: 04 May 2026
https://github.com/usman619/pdc
Parallel and Distributed Computing
cuda distributed-computing distributed-systems nextcloud
Last synced: 11 Apr 2026
https://github.com/miferreiro/cdap-cuda
CUDA exercises for the subject of "Computación Distribuída e de Altas Prestacións" in the Master Degree of Computer Engineering of the University of Vigo in 2020
Last synced: 17 May 2026
https://github.com/flagro/paralleltasks
CUDA/OpenMP parallel tasks
algorithms compression cpp cuda openmp parallel-computing unique-values
Last synced: 17 May 2026
https://github.com/bfalls/img-compressor
GPU-accelerated JPEG compressor
cli-tool command-line compression cpp cpp-cuda-gpu-programming-parallel-computing cuda dct demo-project gpgpu gpu-programming high-performance-computing hpc image-compression image-processing jpeg parallel-computing
Last synced: 20 Apr 2026
https://github.com/maneeshsit/pcie
Modify run:ai and other FOSS projects code for use with PCIe card-based AI accelerators for both inference and training
cuda cxl cxl-mem distro exo k3s k8s kestra llamacpp llm-d mpi4py mpio onnxoptimizer opentelemetry-ebpf-profiler paxos-cluster pcie photonics-computing runai visualize vllm
Last synced: 24 Aug 2025
https://github.com/timxor/c_code
Some of my C code
c cuda m4 parallel-programming
Last synced: 03 May 2026
https://github.com/drilonaliu/parallel-s_aes-ccm-xts
aes cryptography cuda gpu parallel-programming saes
Last synced: 21 Mar 2025
https://github.com/drilonaliu/parallel-caesar-cipher
caesar-cipher cryptography cuda gpu parallel-programming
Last synced: 21 Mar 2025
https://github.com/mvishiu11/kmeans-clustering
K-Means Clustering with both GPU (CUDA) and CPU implementations
Last synced: 15 Mar 2025
https://github.com/abhiram-kandiyana/cuda-blast-2024
Reimplementation of NCBI BLAST with CUDA backend for faster retrieval
blast cuda gpu-acceleration parallel-processing
Last synced: 15 Mar 2025
https://github.com/rajkamalsah/flow-hpc-shocktrack
GPU-accelerated, fault-tolerant Schlieren/PIV shock tracking with interactive ROI, 1-px edges, and resumable training.
ai-ml computer-vision cuda fluid-dynamics hpc mlsystem opencv piv pytorch schlieren scientific-ml smalldata transformer
Last synced: 03 May 2026
https://github.com/tianzonglin/cloud-control-gui
A tool to compute, visualize, analyse and drag points (high-dimensional data)
cuda interaction-design visualization
Last synced: 25 Apr 2026
https://github.com/sshoecraft/shepherd
An interactive multi-backend LLM runtime with intelligent cache eviction and persistent retrieval-augmented memory.
anthropic cli cpp cuda gemini grok inference kv-cache llama-cpp llm mcp ollama openai openai-server rag smart-evictions tensorrt tool-calling ulimited-context
Last synced: 10 Apr 2026
https://github.com/zhaocc1106/cuxx-programing
一些cuda库的样例,cuda、cublas、cublaslt、cusparse...
Last synced: 23 Mar 2025
https://github.com/versi379/optimized-matrix-multiplication
This project utilizes CUDA and cuBLAS to optimize matrix multiplication, achieving up to a 5x speedup on large matrices by leveraging GPU acceleration. It also improves memory efficiency and reduces data transfer times between CPU and GPU.
cublas cuda cuda-programming hpc matrix-multiplication parallel-computing parallel-programming
Last synced: 17 May 2026
https://github.com/proafxin/cuda-docker
High performance computing Images with pycuda and tensorrt preinstalled
cuda docker dockerfile libcudnn nvidia-tensorrt pycuda python tensorrt
Last synced: 11 Apr 2026
https://github.com/BardiFarsi/ThreadPoolManager
ThreadPoolManager is a C++ project that implements an efficient multi-threading system using a thread pool for generic functions of the same type and different tasks. It includes task management, synchronization mechanisms, and thread-safe logging to demonstrate concurrent task execution.
cpp cpp17 cpp20 cuda cuda-programming memory-management multiprocessing multithreading parallel-computing parallel-processing parallel-programming thread thread-pool thread-safety threadpool threads threadsafe
Last synced: 15 May 2025
https://github.com/baudneo/zomi-server
FastAPI ML server designed for ZoneMinder (zomi-client)
alpr coral-tpu cuda face-detection face-recognition fastapi machine-learning object-detection onnxruntime opencv pydantic-v2 tensorrt torch zoneminder
Last synced: 18 Jan 2026
https://github.com/kirubhakaranm/vision-pipeline-cuda
High-performance camera processing pipeline with CUDA GPU acceleration, CPU multithreading, and real-time TCP/IP telemetry monitoring (1,200+ FPS, <1ms latency)
computer-vision cpp17 cuda edge-detection gpu-acceleration image-processing multithreading networking opencv performance-optimization real-time robotics tcp-ip telemetry
Last synced: 12 Apr 2026
https://github.com/camille-004/cusprec
🏁 Sparse signal recovery library written in PyCUDA.
cuda ml python signal-processing sparse-recovery
Last synced: 18 Jan 2026
https://github.com/vladd12/libexecstd
Modern C++ library for using an execution context of computer devices
cpp cpp17 cuda gpu-acceleration gpu-computing
Last synced: 06 May 2026
https://github.com/jegp/aestream-paper
AEStream paper
coroutines cuda event-based-vision gpu
Last synced: 03 May 2026
https://github.com/jaidevd/ipec-fdp
cuda hpc keras mapreduce numba spark tensorflow
Last synced: 11 Apr 2026
https://github.com/mxm-tr/docker-darknet-opencv
Accelerated objects detection on streams and files, using a Docker darknet YOLO container
cuda docker docker-compose object-recognition opencv-python python3 yolo
Last synced: 10 Apr 2026
https://github.com/ergus/algorithms
Set of multiple algorithms implemented in multiple paradigms
algorithms cmake concurrency cpp cuda gpgpu inter-language metaprogramming multithreading pthreads stl testing
Last synced: 17 May 2026
https://github.com/ubermorgott/morgottalk
Cross-platform desktop push-to-talk voice transcription. Single binary. GPU accelerated (CUDA/Vulkan/Metal/ROCm/OpenCL). Powered by whisper.cpp.
cuda desktop go gpu speech-to-text svelte transcription voice wails whisper
Last synced: 07 Apr 2026
https://github.com/programmergnome/kutyai
This is a python dog breed recognizer graphical application with 420 breeds and 42000 images.
cuda deep-learning image-classification python3 qt5-gui tensorflow transfer-learning
Last synced: 11 May 2026
https://github.com/drilonaliu/parallel-permuation-cipher-attack
attack cryptography cuda gpu parallel-computing
Last synced: 21 Mar 2025
https://github.com/drilonaliu/bachelor-thesis
Parallel Programming Fractals
cuda fractals gpu parallel-programming
Last synced: 15 May 2026
https://github.com/gammahazard/locate-anything
Sleek, mobile-friendly web UI for NVIDIA LocateAnything-3B — open-vocabulary object detection & grounding on your own GPU, via one docker compose up.
bounding-boxes computer-vision cuda docker fastapi gpu grounding locate-anything machine-learning nvidia object-detection ocr open-vocabulary-detection react self-hosted tailwindcss typescript vision-language-model web-ui
Last synced: 28 May 2026