CUDA
CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.
- GitHub: https://github.com/topics/cuda
- Wikipedia: https://en.wikipedia.org/wiki/CUDA
- Created by: Nvidia
- Released: June 23, 2007
- Related Topics: nvcc,
- Last updated: 2026-06-22 00:07:17 UTC
- JSON Representation
https://github.com/nabilshadman/cuda-4-dummies
Lecture slides and exercise files of the CUDA 4 Dummies course (2025)
cuda gpu-computing high-performance-computing nsight-systems nvidia-gpu parallel-computing
Last synced: 31 Oct 2025
https://github.com/mohamedsamirx/yolov12-tensorrt-cpp
YOLOv12 Inference Using CPP, Tensorrt, And CUDA
cpp cuda tensorrt tensorrt-inference yolo yolov12
Last synced: 15 Apr 2026
https://github.com/manu-sh/cuda-mandelbrot
how to use cuda acceleration to compute mandelbrot set
Last synced: 15 Apr 2026
https://github.com/ahmed5827/image_generation
This application provides a graphical user interface (GUI) for generating images using the Stable Diffusion model. The GUI allows users to input a text prompt, and the application generates an image based on the prompt.
ai cuda generative-ai image-generation
Last synced: 15 Apr 2026
https://github.com/lyynn777/cuda-bitonic-sort
Simple CUDA project to implement Bitonic Sort and compare it with normal CPU sorting.
bitonic-sort cuda gpu-computing gpu-vs-cpu parallel-computing performance-testing pycuda python
Last synced: 15 Apr 2026
https://github.com/flosmume/cpp-cuda-streams-and-pinned-mem
A CUDA C++ demo showing how to overlap data transfer and kernel execution using multiple streams and pinned (page-locked) host memory. This project illustrates asynchronous memcpy, event timing, and performance benefits of concurrent GPU execution — essential for building high-throughput pipelines.
asynchronous-execution cuda cuda-streams gpu parallel-programming performance-optimization pinned-memory
Last synced: 13 May 2026
https://github.com/uva-trasgo/controllers
Read-only mirror of the official repository: https://gitlab.com/trasgo-group-valladolid/controllers. Controllers is a library written in C11 that provides a simplified way to program applications that can exploit heterogeneous computational platforms including accelerators and/or multi-core CPUs.
cuda heterogeneous-computing heterogeneous-parallel-programming hip opencl openmp
Last synced: 12 May 2026
https://github.com/mahdi-hasan-shuvo/ml-opensource-project
is an open source repository focused on providing practical and educational machine learning resources. The project aims to make learning and applying machine learning more accessible through well-documented code, tutorials, and real-world examples.
cuda machine-learning machine-learning-algorithms ml-projects open-source python
Last synced: 19 May 2026
https://github.com/tkemmer/cunessie.jl
CUDA-accelerated Nonlocal Electrostatics in Structured Solvents
bioinformatics boundary-element-method cuda electrostatics gpu-computing julia proteins
Last synced: 31 Jan 2026
https://github.com/snandasena/courseera_gpu_specilization
Example for Cuda streaming
Last synced: 15 Apr 2026
https://github.com/sneha-at-hub/bruteforce_passwordcracking_in-milliseconds
Last synced: 28 Apr 2026
https://github.com/eastonman/tensorrt-pytorch-wrapper
A wrapper makes TensorRT engine accept PyTorch Cuda Tensor.
Last synced: 06 May 2026
https://github.com/starlitdreams/pacman-convolutional-q-learning
This project implements a Deep Q-Network (DQN) using PyTorch to train an agent to play Atari's Ms. Pac-Man. It utilizes reinforcement learning with a convolutional neural network (CNN) for image processing. Features include experience replay, frame preprocessing, and CUDA support, with trained model saving and video rendering of gameplay.
artificial-intelligence artificial-neural-networks atari cuda deep-learning deep-learning-algorithms deep-q-learning deeplearning gymnasium gymnasium-environment python pytorch
Last synced: 15 Apr 2026
https://github.com/ramyacp14/document-based-question-and-answers
Developed a document question answering system that utilizes Llama and LangChain for contextual and accurate answers. The system supports .txt documents, intelligent text splitting, and context-aware querying through an easy-to-use Streamlit interface.
chroma cuda hugging-face langchain llama python recursivecharactertextsplitter streamlit
Last synced: 07 Mar 2026
https://github.com/cscfi/csc-env-julia
Julia language environment including MPI.jl, CUDA.jl and AMDGPU.jl preferences for HPC clusters at CSC.
amdgpu ansible cuda hpc julia julia-language mpi
Last synced: 01 Feb 2026
https://github.com/storterald/neural-network
Simple neural network implementation in C++ and CUDA
asm asmx86 c-plus-plus cmake cpp cuda machine-learning neural-network
Last synced: 28 Mar 2025
https://github.com/amypad/miutil
Basic functionality needed for AMYPAD
cuda matlab medical-imaging python
Last synced: 13 May 2025
https://github.com/teambipartite/bipartite-gemm
High throughput data-parallel GEMM implementations in Cuda using Cuda cores and Tensor cores
Last synced: 17 Apr 2026
https://github.com/m-torhan/cuda-fractals
CUDA C++ implementation of Fractals visualization
Last synced: 25 Feb 2026
https://github.com/actepukc/uv-app-starter-pack
Bootstrap PySide6 GUI apps quickly using uv, with built-in PyTorch/CUDA handling.
astral-uv cross-platform cuda gui pyside6 python pytorch qt6 starter-kit template
Last synced: 30 Apr 2026
https://github.com/ivanfioravanti/tflops_mps
TFLOPs testing on MPS and CUDA
Last synced: 19 May 2026
https://github.com/grindelfp/cuda-n-body-simulation
Simulation of N-Body movement using CUDA.
Last synced: 06 Apr 2025
https://github.com/drilonaliu/parallel-fractal-tree
GPU-accelerated fractal tree generation with CUDA and OpenGL interoperability.
cuda fractal-tree fractals gpu
Last synced: 19 May 2026
https://github.com/xza85hrf/flag_prediction_project
This application predicts the name of a country (or countries) based on an input flag image. It uses advanced image processing techniques and deep learning models built with PyTorch to classify flags accurately.
cross-validation cuda data-augmentation docker efficientnetb0 flag-recognition image-classification machine-learning mixed-precision-training mobilenetv2 python pytorch resnet resnet-50 transfer-learning
Last synced: 15 Apr 2026
https://github.com/patriciobcs/mini-aevol
Parallel implementation of a reduced version of the Aevol simulator
Last synced: 19 May 2026
https://github.com/muneeb706/cuda
sample programs implemented using cuda (gpu)
cplusplus cuda gpu-programming
Last synced: 19 May 2026
https://github.com/hnthap/vietnamese-word-segment
Vietnamese word segmentation package.
cuda torch transformers vietnamese vietnamese-nlp vietnamese-tokenizer word-segmentation
Last synced: 19 May 2026
https://github.com/fieldcure/fieldcure-whisper-runtimes
Pre-built Whisper.net native runtime binaries (CPU/CUDA/Vulkan) for the FieldCure software ecosystem.
cuda dotnet native-binaries nuget redistributable vulkan whisper whisper-net
Last synced: 01 Jun 2026
https://github.com/chiragajain/gpu-optimization-roadmap
This repository is part of a structured curriculum designed to master GPU optimization, Triton, Deep Learning, and LLMs. This section focuses on GPU fundamentals, CUDA programming, and PyTorch optimizations.
cuda deeplearning gpu-acceleration learning python pytorch triton
Last synced: 18 Feb 2026
https://github.com/kar-dim/CAS-2D
Implementation of the AMD FidelityFX CAS (Contrast Adaptive Sharpening) algorithm on CUDA, for sharpening static images.
cpp cuda dll fidelityfx gpu image-processing parallel-computing sharpen
Last synced: 01 Nov 2025
https://github.com/baremetalrt/baremetalrt
BareMetalRT — edge GPU compute mesh
cuda distributed-computing gpu inference llm nvidia tensorrt windows
Last synced: 18 Apr 2026
https://github.com/mxm-tr/docker-darknet-opencv
Accelerated objects detection on streams and files, using a Docker darknet YOLO container
cuda docker docker-compose object-recognition opencv-python python3 yolo
Last synced: 10 Apr 2026
https://github.com/joe-mruz/hgvisualizer
An interactive simulation and visualization tool for evolving hypergraphs, inspired by the Wolfram Physics Project.
cpp cuda hypergraph physics simulator wolfram
Last synced: 02 May 2026
https://github.com/kirubhakaranm/vision-pipeline-cuda
High-performance camera processing pipeline with CUDA GPU acceleration, CPU multithreading, and real-time TCP/IP telemetry monitoring (1,200+ FPS, <1ms latency)
computer-vision cpp17 cuda edge-detection gpu-acceleration image-processing multithreading networking opencv performance-optimization real-time robotics tcp-ip telemetry
Last synced: 12 Apr 2026
https://github.com/sangioai/sph
CUDA and OpenMP versions of SPH (Smoothed Particle Hydrodynamics) serial algorithm.
Last synced: 27 Apr 2026
https://github.com/lruizap/testcuda
Guide to install and use cuda for programming
Last synced: 12 May 2026
https://github.com/amitkumarj441/deep-learning-on-your-finger
A rich collection of dockerfiles for installing deep learning dependecies on your way :rocket:
Last synced: 18 Apr 2026
https://github.com/debanjan06/spatial-streamio
An optimized, out-of-core asynchronous data streaming pipeline for high-throughput 3D point cloud training loops. Features low-level numpy.memmap zero-copy reads and multi-threaded ring prefetching to eliminate I/O bottlenecks, delivering a 33.33% throughput efficiency gain on PyTorch CUDA workloads.
asynchronous-programming cuda data-engineering deep-learning-pipelines io-optimization memory-mapping point-cloud pytorch
Last synced: 11 Jun 2026
https://github.com/matteopolak/stock-predict
Stock prediction with LSTM using TensorFlow and TypeScript.
ai artificial-intelligence cuda lstm machine-learning stock tensorflow typescript
Last synced: 09 May 2026
https://github.com/muppetsg2/cudaraytracer
A custom ray tracer originally developed during university studies to run on CPU, now ported to GPU using CUDA. This project was created to explore GPU rendering techniques and to gain hands-on experience with CUDA programming.
cuda mit-license nvidia-cuda nvidia-gpu raytracing sfml stb-image student-project study-project
Last synced: 16 Apr 2026
https://github.com/xstupi00/N-Body-CUDA
PCG - Parallel Computations on GPU - Project - N-Body-CUDA
cuda gpu-acceleration gpu-computing nbody-simulation optimization parallel-computing pcg vut vut-fit
Last synced: 11 Mar 2025
https://github.com/farukalamai/cpp-for-cuda
A structured C++ learning path designed specifically for developers preparing to learn CUDA programming.
Last synced: 09 Jun 2026
https://github.com/simonschoelly/poisson-solver
A solver for a modified poisson equation using cuda.
cpp cuda finite-difference gpgpu pgc poisson-equation preconditioned-conjugate-gradient thomas-algorithm
Last synced: 18 May 2026
https://github.com/brendanm12345/simple_renderer_cs149
Simple CUDA renderer implementation. 19th most efficient out of 150+ submissions
Last synced: 18 May 2026
https://github.com/rajshrestha86/kmeans-clusterize-cuda
Implementation of K-Means algorithm from scratch using CUDA.
Last synced: 18 May 2026
https://github.com/yashpotdar-py/flood-vision
Flood Vision - A deep learning–based computer vision system for flood mapping and damage assessment using aerial imagery.
cuda deep-learning flood-detection iot python
Last synced: 16 Apr 2026
https://github.com/kentakoong/mtnlog
A simple multinode performance logger for Python
cuda lanta nvitop python slurm-cluster
Last synced: 11 Jan 2026
https://github.com/amruthapatil/nyu-cudaconvolution
Implementing convolution operations on an image using CUDA, exploiting different methodologies - basic, tiled, and cuDNN
Last synced: 13 Mar 2025
https://github.com/jiriklepl/bits-knn-jpdc2024
Replication package for the paper Towards Optimal GPU-accelerated K-Nearest Neighbors Search
bitonic-sort cuda gpu k-nearest-neighbors knn-search top-k
Last synced: 21 Mar 2025
https://github.com/sferez/sspp_sparse_matrix_cuda
Small Scale Parallel Programming, Sparse Matrix multiplication with CUDA
cpp cuda omp omp-parallel parallel-computing small-scale-parallel-programming sparse-matrix
Last synced: 30 Apr 2026
https://github.com/equiel-1703/cuhip
Wrapper tool to convert CUDA source code to HIP code and compile it with HIPCC. Useful for learning CUDA programming using AMD devices..
Last synced: 14 May 2026
https://github.com/edcalderin/huggingface_ragflow
This project implements a classic Retrieval-Augmented Generation (RAG) system using HuggingFace models with quantization techniques. The system processes PDF documents, extracts their content, and enables interactive question-answering through a Streamlit web application.
bitsandbytes cuda huggingface huggingface-embeddings langchain langchain-community large-language-models llm nf4 python qdrant quantization rag retrieval-augmented-generation ruff streamlit text-generation
Last synced: 15 Jul 2025
https://github.com/aayes89/pyllm
Entrena tu propio LLM desde cero
cpu cuda llm llm-training pip python3
Last synced: 18 May 2026
https://github.com/aaaastark/nvidia-cuda-google-colab
Deployment of NVIDIA-CUDA on Google Colab. With in examples codes (Vector Addition and Matrix Multiplication).
c cpp cuda googlecolab googlecolaboratory matrix-multiplication nvidia python vector-addition
Last synced: 16 Apr 2026
https://github.com/alexjmercer/cuda-npp-assignment
Learning about CUDA and NVIDIA Performance Primitives. Part of Coursera Assignment.
Last synced: 13 Feb 2026
https://github.com/ivanbuccella/sf2bio
Deep reinforcement learning for de novo drug design: a ReLeaSe method execution on a Docker Environment
cuda deep-learning deep-reinforcement-learning docker docker-compose machine-learning nvidia-cuda nvidia-docker reinforcement-learning release release-method
Last synced: 01 May 2026
https://github.com/tlabaltoh/tlab-sharescreen-server-win
Software frame encoder using CUDA and cast encoded frames over UDP. Trying to implement a custom streaming protocol and shader based frame encoder/decoder for screencast.
cuda desktop-capture screensharing unity unity3d windows-graphics-capture
Last synced: 14 Feb 2026
https://github.com/avarga1/vllm-hb
vLLM-compatible inference runtime in pure Rust. Zero Python. Zero libtorch. CUDA via candle.
candle cuda inference llm openai-api rust tokio vllm
Last synced: 07 Apr 2026
https://github.com/mrtejas/cv-sandbox
A collection of Computer Vision mini-projects tuned for a number of tasks, including face detection, object detection, image segmentation and CLIP. Trained on popular datasets and includes comparative study of the methods. Done as a part of S24 course : Computer Vision at IIIT Hyd
computer-vision cuda ml opencv pytorch yolo
Last synced: 01 May 2026
https://github.com/loveboyme/yolov5-tensorrt-accelerator
基于TensorRT加速的YOLOv5高性能推理框架 | High-performance YOLOv5 inference framework accelerated by TensorRT with dynamic optimization
cuda dynamic-shapes-cuda-stream fp16 int8 pycuda tensorrt yolov5
Last synced: 29 Mar 2025
https://github.com/ankhoa1212/cuda-program
This is a GPU program built with CUDA using parallel reduction
cpp cuda curand gpu-programming parallel-reduction
Last synced: 14 Feb 2026
https://github.com/nagharjun17/mlir-to-ptx-cuda
Creating an MLIR dialect that fuses Addition + ReLU, lowers to NVVM and LLVM IR and generates PTX to run the kernel on CUDA GPU
cpp cuda deep-learning llvm mlir ptx
Last synced: 18 Apr 2026
https://github.com/wiktor2718/matrix_flow
Matrix Flow is a simple machine learning library written in Rust and CUDA. It was created as a portfolio project to deepen my understanding of machine learning, GPU programming, and Rust. It provides an API for matrix manipulation and includes specially optimized neural networks.
adam-optimizer benchmarking cuda deep-learning gpu-computing machine-learning matrix-operations neural-networks portfolio-project rust
Last synced: 18 May 2026
https://github.com/fikri-rouzan/cuda-c-program-part-3
CUDA C program from NVIDIA course.
Last synced: 01 May 2026
https://github.com/cppshizoids/cuda
This is my basic lessons of CUDA
cuda cuda-demo cuda-programming
Last synced: 15 Jul 2025
https://github.com/mattjesc/gpu-accelerated-fap
GPU-Accelerated Frequency Analysis Prototype using CUDA, Unit Testing, and User-Defined Settings
c cmake cpp cuda cufft googletest gpu gpu-acceleration gpu-computing gpu-programming nvidia signal-processing test test-automation testing unit-testing
Last synced: 16 Apr 2026
https://github.com/tfogal/gemm-db
For creating a cacheable GEMM cost model.
Last synced: 18 May 2026
https://github.com/smoke-y/athena
Deep learning library
cuda deep-learning deep-learning-library
Last synced: 01 Mar 2026
https://github.com/demetriantitus/machine-vision---yolov8
This project provides a comprehensive guide to object detection in cluttered environments using YOLOv8. It demonstrates how to identify and classify objects in both still images and video streams
computer-vision cuda dataset image-classification machine-learning nvidia-gpu object-detection surveillance traffic-monitoring video-analysis yolov8
Last synced: 18 May 2026
https://github.com/obj-wtf/gan-architecture
APP For training GAN Models on Architecture Plan
architecture building cuda gan pix2pix-tensorflow plan
Last synced: 18 May 2026
https://github.com/aarid/cuda_operations
This project compares performance between CPU and GPU with CUDA operations. Two simples cases are used: matrix multiplication and 2d convolution.
conv2d cuda cuda-programming gpu gpu-computing matrix-multiplication
Last synced: 02 Mar 2026
https://github.com/anselm67/cuda_mnist
A CUDA implementation of MNIST - for CUDA beginners.
cuda gpu gpu-computing gpu-programming mnist mnist-classification
Last synced: 02 Mar 2026
https://github.com/moshiba/fmindex
ultra fast parallel FM index generation for DNA reads
Last synced: 18 May 2026
https://github.com/atticuszeller/pytorch-lightning-uv
📦 Zero-config Deep Learning template with PyTorch Lightning, UV package manager, W&B tracking, and modern Python tooling 🚀
classification cuda deep-learning machine-learning mnist-classification python pytorch pytorch-lightning typer uv
Last synced: 16 Apr 2026
https://github.com/ludgerpaehler/lulesh-enzyme
AD with Enzyme through Lulesh.
automatic-differentiation cuda cuda-programming gpu-computing high-performance-computing llvm-enzyme scientific-computing
Last synced: 15 Jun 2026
https://github.com/ivanbgd/cuda_quad_c
Calculates a definite integral by using three different rules. Compares sequential to parallel implementations.
cuda integrals parallel-implementations
Last synced: 28 Mar 2025
https://github.com/rushirg/cuda-matrix-multiplication
Matrix Multiplication on GPGPU in CUDA
cpu cuda gpu parallel-processing
Last synced: 17 May 2026
https://github.com/puzzlef/vector-max-cuda
Performance of sequential vs CUDA-based vector element max.
basics cuda element experiment max vector
Last synced: 17 May 2026
https://github.com/darshanakgr/meanfiltergpu
A gpu implementation of mean filter in CUDA
Last synced: 01 May 2026
https://github.com/miferreiro/cdap-cuda
CUDA exercises for the subject of "Computación Distribuída e de Altas Prestacións" in the Master Degree of Computer Engineering of the University of Vigo in 2020
Last synced: 17 May 2026
https://github.com/flagro/paralleltasks
CUDA/OpenMP parallel tasks
algorithms compression cpp cuda openmp parallel-computing unique-values
Last synced: 17 May 2026
https://github.com/eagleeee2/ethminer
EthMiner is a powerful Ethereum mining software optimized for GPU performance using OpenCL and CUDA technologies. It provides easy setup, detailed performance metrics, and robust compatibility with major mining pools, ensuring maximum efficiency and profitability for both novice and experienced miners.
cryptocurrency cuda eth ethash ethereum ethereum-mining gpu-mining mining-pool mining-software open-source
Last synced: 16 Apr 2026
https://github.com/harmeshgv/gpu-powered-bert-finetuning
Efficient fine-tuning of BERT models using CUDA-powered GPUs, optimized for laptops and devices with NVIDIA RTX 3000/4000 series or CUDA-compatible GPUs. Ideal for fast NLP model training with PyTorch and Hugging Face Transformers.
bert-model cuda finetuning-llms pytorch
Last synced: 16 Apr 2026
https://github.com/drilonaliu/parallel-s_aes-ccm-xts
aes cryptography cuda gpu parallel-programming saes
Last synced: 21 Mar 2025
https://github.com/drilonaliu/parallel-caesar-cipher
caesar-cipher cryptography cuda gpu parallel-programming
Last synced: 21 Mar 2025
https://github.com/tianzonglin/cloud-control-gui
A tool to compute, visualize, analyse and drag points (high-dimensional data)
cuda interaction-design visualization
Last synced: 25 Apr 2026
https://github.com/versi379/optimized-matrix-multiplication
This project utilizes CUDA and cuBLAS to optimize matrix multiplication, achieving up to a 5x speedup on large matrices by leveraging GPU acceleration. It also improves memory efficiency and reduces data transfer times between CPU and GPU.
cublas cuda cuda-programming hpc matrix-multiplication parallel-computing parallel-programming
Last synced: 17 May 2026
https://github.com/ergus/algorithms
Set of multiple algorithms implemented in multiple paradigms
algorithms cmake concurrency cpp cuda gpgpu inter-language metaprogramming multithreading pthreads stl testing
Last synced: 17 May 2026