CUDA
CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.
- GitHub: https://github.com/topics/cuda
- Wikipedia: https://en.wikipedia.org/wiki/CUDA
- Created by: Nvidia
- Released: June 23, 2007
- Related Topics: nvcc,
- Last updated: 2026-07-01 00:07:09 UTC
- JSON Representation
https://github.com/miniex/maidenx
Rust-based CUDA library designed for learning purposes and building my AI engines named Maiden Engine
Last synced: 20 Mar 2025
https://github.com/ayoussf/triton-hub
A container of various PyTorch neural network modules written in Triton.
cuda deep-learning openai pytorch triton triton-lang
Last synced: 14 Apr 2025
https://github.com/gunrock/template
Template repository for essentials applications to get you started asap!
cpp cuda essentials gpu graph-algorithms graph-analytics gunrock
Last synced: 15 May 2026
https://github.com/satyajitghana/gpu-programming
Contains the contents of GPU Architecture and Programming course done on NPTEL
c cpp cuda cuda-programming gpu-programming nptel nvidia
Last synced: 09 Mar 2026
https://github.com/blazekill/hello-cuda
Cpp + Vcpkg + CUDA + VsCode starter project.
Last synced: 18 May 2026
https://github.com/adamczykpiotr/cudamatrixlibrary
Matrix operation library using single, n-threads or CUDA supported GPU
agh agh-ust cpp cuda cuda-library matrix matrix-computations matrix-functions matrix-multiplication
Last synced: 19 Apr 2026
https://github.com/sbstndb/grayscott_k
A simple 3D GrayScott simulation using Kokkos enabling CUDA or OpenMP backend
cuda finite-difference grayscott grid kokkos laplacian openmp simulation visualisation
Last synced: 16 May 2026
https://github.com/xza85hrf/ml-framework_checker
ML Framework and CUDA Checker is a Python-based GUI application for checking PyTorch, TensorFlow, and CUDA installations. It provides detailed system specs, compatibility checks, advanced GPU management, and offers options to view instructions, export logs, and update machine learning frameworks.
compatibility cuda gpu-management gui-application machine-learning python pytorch system-checker system-specs tensorflow
Last synced: 28 Apr 2026
https://github.com/fandreuz/parallel-programming-for-hpc
Scientific codes in C/C++ with CUDA, OpenACC, FFTW, (cu)BLAS
Last synced: 20 Apr 2026
https://github.com/sartajbhuvaji/cuda
Deloped CUDA kernel functions to load and train a Convolution Neural Network from scratch.
cuda cuda-programming gpu-programming neural-network nvidia-cuda
Last synced: 30 Mar 2025
https://github.com/ssoehdata/cuda_fortran_sci_eng
Working through examples from the Cuda Fortran for Scientists and Engineers 2nd Edition Book
cuda cuda-fortran fortran hpc nvfortran
Last synced: 21 Aug 2025
https://github.com/croko22/vit-cpp
An implementation of the Transformer model architecture ("Attention Is All You Need") in pure C++17 from scratch
cpp cuda deep-learning machine-learning neural-network transformer
Last synced: 17 Jan 2026
https://github.com/raumberg/hypervision
Neural Network based real-time aimbot system, operating on TensorRT with custom CUDA kernel and C FFI extensions
ai aim cuda cython neural-networks python tensorrt yolo
Last synced: 20 May 2026
https://github.com/syncush/cifar-10-pytorch
cifar10 cuda deep-learning machine-learning python python3 pytroch
Last synced: 19 May 2026
https://github.com/dhruvsrikanth/fastconv
Distributed and serial implementations of the 2D Convolution operation in c++ and CUDA.
convolution-filters cpp cuda gpu-programming high-performance-computing hpc image-editor image-processing nvidia parallel-programming
Last synced: 04 May 2026
https://github.com/mhaseeb123/gcb
GCB includes a suite of benchmarks and basic tests for CUDA-aware MPI and C++ compilers.
cpp cpp23 cuda mpi partitioned-communication st-mpi
Last synced: 17 May 2026
https://github.com/arakiss/hecate-os
Linux distro with automatic hardware detection and per-system optimization. Ubuntu 24.04 base. Alpha.
ai cuda docker gpu hardware-optimization kernel-tuning linux linux-distribution machine-learning nvidia operating-system performance sysctl ubuntu workstation zram
Last synced: 16 Feb 2026
https://github.com/nondairyneutrino/pararealgpu.jl
A distributed and GPU-based implementation of the Parareal algorithm for parallel-in-time integration of equations of motion.
accelerator computational-physics computational-science cuda differential-equation-solvers distributed-computing gpu-computing high-performance-computing julialang ode ordinary-differential-equations parallel-computing parallel-in-time-integration parareal partial-differential-equation pde simulation
Last synced: 21 Apr 2026
https://github.com/alekseyscorpi/vacancies_server
This is a server for vacancies generation using LLM (Saiga3)
code cuda cuda-toolkit docker dockerfile flask llama3 llamacpp llm ngrok pydantic saiga
Last synced: 06 Feb 2026
https://github.com/jakubriegel/game_of_life_3d
3D game of life implemented in CUDA
concurency cuda gameoflife nvidia put-poznan
Last synced: 21 Apr 2026
https://github.com/brosnanyuen/raybnn_graph
Graph Manipulation Library For GPUs, CPUs, and FPGAs via CUDA, OpenCL, and oneAPI
cuda gpu graph graph-algorithms neural-network neural-networks opencl raybnn rust
Last synced: 06 Feb 2026
https://github.com/patrickm663/localglmnet.jl
This is a WIP implementation of Richman & Wüthrich (2022) using Julia's Flux.jl + CUDA.jl
cuda deep-learning flux julia neural-networks symbolic-regression xai
Last synced: 22 Apr 2026
https://github.com/orgh0/highperformancecnn
Implementation of a High Performance CNN for MNIST dataset
Last synced: 18 May 2026
https://github.com/dafadey/GPGPU_OpenCL_vs_CUDA
This is a repository with sample codes for testing memory bandwidth, arithmetic latency hiding and shared/local memory performance on AMD and nVidia devices
cuda gpgpu gpgpu-computing opencl
Last synced: 16 May 2025
https://github.com/liuyuweitarek/pytorch-docker-builder
Automate PyTorch Docker image builds with compatible Python, CUDA, and Poetry versions, including CI/CD for testing.
cicd containerd cuda docker docker-image poetry-python python python3 pytorch pytorch-docker
Last synced: 06 Feb 2026
https://github.com/rnabla/cuda-des
Bruteforcing DES using CUDA
bruteforce cuda data des encryption gpu parallel standard
Last synced: 27 Oct 2025
https://github.com/michaelfranzl/image_debian-gpgpu
Dockerfile for a Debian base image with AMD and Nvidia GPGPU support
amd container container-image cuda debian docker gpgpu nvidia opencl
Last synced: 10 May 2026
https://github.com/dvhh/masscorrelation
An exercise in writing an efficient correlation calculator
calculations correlation-calculation cuda matrix multi-threading openmp
Last synced: 15 May 2026
https://github.com/hariprashad-ravikumar/accelerated-computing-in-cuda-c
This repo contains my codes for problem sets in NVIDIA Getting Started with Accelerated Computing in CUDA C/C++
c cuda cuda-kernels cuda-toolkit
Last synced: 24 Apr 2026
https://github.com/ruturaj4/cuda_nvidia_tutorial
cuda projects
cuda cuda-vector-addition nvidia nvidia-cuda parallel
Last synced: 26 Oct 2025
https://github.com/alegau03/parallel-k-means
Implementation of C programs for the K-Means algorithm for parallel computing.
c c-programming cuda parallel parallel-programming
Last synced: 24 Apr 2026
https://github.com/andih/cuda-fortran-stream
Variant of STREAM Benchmark in CUDA Fortran
cuda cuda-fortran gpu stream-benchmarks variants
Last synced: 02 Mar 2025
https://github.com/david-palma/cuda-programming
Educational CUDA C/C++ programming repository with commented examples on GPU parallel computing, matrix operations, and performance profiling. Requires a CUDA-enabled NVIDIA GPU.
c-cpp cpp cuda cuda-toolkit education gpu gpu-programming kernel matrix-operations nvcc nvidia parallel-computing parallel-programming practice profiling threads
Last synced: 25 Apr 2026
https://github.com/crcrpar/dev-chainer
Dockerfile for Chainer Development in VSCode
chainer cuda docker nvidia-docker vscode
Last synced: 26 Apr 2026
https://github.com/vishwamartur/btc_recovery
High-performance Bitcoin wallet password recovery system with GPU acceleration and integrated graphics support. Recover Bitcoin Core wallet.dat files without blockchain download using advanced algorithms and blockchain APIs.
bitcoin bitcoin-core blockchain blockchain-api cpp cryptocurrency cuda electrum gpu-acceleration integrated-graphics multithreading opencl password-recovery private-keys recovery-tools wallet-dat wallet-recovery
Last synced: 14 Apr 2026
https://github.com/lhldev/rust-neural-network
neural network implementation in rust
cuda feedforward-neural-network
Last synced: 16 May 2026
https://github.com/steleman/quadratic-assignment
Research on the Quadratic Assignment Problem with CUDA Acceleration
cuda cuda-kernels cuda-programming cuda-programming-project quadratic-assignment quadratic-assignment-problem
Last synced: 07 Apr 2026
https://github.com/gravitytwog/electromagneticfield
Electro-magnetic field simulation made with CUDA
c cuda cuda-kernels cuda-programming
Last synced: 26 Apr 2026
https://github.com/pharmcat/metidacu.jl
CUDA solver for Metida.jl
cuda julia-language metida mixed-models
Last synced: 27 Apr 2026
https://github.com/enp1s0/curand_fp16
FP16 pseudo random number generator on GPU
cuda gpu half-precision random-number-generators
Last synced: 20 Aug 2025
https://github.com/codingrule/cuda-mbrot
Just another mandlebrot with cuda
cuda cuda-toolkit cupy fractal mandelbrot mathematics nvidia
Last synced: 27 Apr 2026
https://github.com/katpercent/raytracing
A foundation for ray tracing using CUDA and parallel computing techniques.
3d cuda engine game parrallel-computing ray raytracing
Last synced: 01 Nov 2025
https://github.com/iag-geo/image-classification
Image classification scripts using YOLOv5 with aerial imagery
cuda image-classification python pytorch swimming-pools yolov5
Last synced: 22 Feb 2026
https://github.com/pjueon/cuda_intellisense
A simple python script to fix cuda C++ intellisense for visual studio.
Last synced: 09 Apr 2026
https://github.com/matteogianferrari/qr-decomposition
Tthis project implements different methods to exploit caches usage, the multicore CPU and the GPU architectures, on the Gram-Schmidt QR Decomposition algorithm and measure the performance of the different implementations.
cuda openmp parallel-computing
Last synced: 12 Apr 2026
https://github.com/axel-ex/seame-ads-autonomous-lane-detection-24-25
🚗 Real-time lane detection and autonomous steering for JetRacer, powered by ROS2 and GPU-accelerated CV on Jetson Nano.
cuda jetson-nano ros2 tensorrt
Last synced: 27 Apr 2026
https://github.com/enriquebdel/clases-cuda-programacion-paralela-en-c-
En este repositorio encontrarás varias lecciones creadas por mí sobre la librería CUDA en C. El programa que utilizo para programar es MobaXterm.
c cuda cuda-programming gnu-linux googlecolab mobaxterm nvidia parallel-programming ubuntu university
Last synced: 19 May 2026
https://github.com/linux-alex/geep
GEEP (Genetic Evolutionary Engineering Platform) - a C++/Qt framework for genetic programming, optimized with CUDA acceleration. GEEP enables large-scale population-based optimization, ideal for solving high-dimensional problems using evolutionary algorithms and GPU computing.
cpp cuda framework genetic-programming
Last synced: 18 May 2026
https://github.com/denyskryvytskyi/capgemini-cuda
CUDA implementation of vector additon, matrix multiplication, reduction and sorting
bitonic-sort cpp cuda cuda-kernels gpgpu matrix matrix-multiplication matrix-multiplication-parallel matrix-transpose nvidia nvidia-cuda nvidia-gpu reduction-dimension sort sorting-algorithms-implemented vector vector-addition vectorization
Last synced: 14 May 2026
https://github.com/renatomaynard/a-multiple-population-coarse-grained-genetic-algorithm-to-solve-the-quadratic-assignment-problem-
A Multiple-population coarse-grained Genetic Algorithm to solve the Quadratic Assignment Problem
c cuda genetic-algorithm quadratic-assignment-problem
Last synced: 09 May 2026
https://github.com/maelstrom6/mandelpy
A Mandelbrot and Buddhabrot viewer with GPU acceleration
buddhabrot cuda gpu mandelbrot python3
Last synced: 27 Apr 2026
https://github.com/pkestene/mandelbrot_kokkos
cuda gpu gpu-computing kokkos mandelbrot openmp performance-portability
Last synced: 27 Apr 2026
https://github.com/xusworld/tars
Tars is a cool deep learning framework.
avx2 avx512 cuda deep-learning
Last synced: 27 Apr 2026
https://github.com/brosnanyuen/raybnn_sparse
Sparse Matrix Library for GPUs, CPUs, and FPGAs via CUDA, OpenCL, and oneAPI
arrayfire cpu cuda gpu gpu-computing opencl parallel parallel-computing parallel-programming raybnn rust sparse sparse-coding sparse-matrix sparse-neural-networks
Last synced: 19 Jan 2026
https://github.com/liberxue/parallel_computing
CUDA Algorithm && Hacker's Delight
algorithms cuda cuda-kernels cuda-programming hacker-s-delight nvidia
Last synced: 24 Feb 2026
https://github.com/ezroot/gacc
GIACC - Generate Images, Art, Code and Conversations
ai codegen cuda huggingface image imagegeneration python rust stablediffusion
Last synced: 06 Apr 2026
https://github.com/abhinavsharma07/streamlit
Stable Diffusion
clip cuda denoising diffusers generative-models latent-diffusion latent-space lms-scheduler unet
Last synced: 28 Apr 2026
https://github.com/jonathanraiman/mini_cuda_rtc
Miniature CUDA Array library with Runtime Compilation
cpp11 cuda jit runtime-compilation
Last synced: 14 Apr 2026
https://github.com/mark0011astra/simplecuda
CUDAを使用したGPU演算をNumPyと同様のインターフェースで簡単行えるライブラリ。A library that allows users to easily perform GPU operations using CUDA with a NumPy-like interface.
cuda cupy gpu machine-learning numpy python vector
Last synced: 02 May 2026
https://github.com/sunsided/rust-arrayfire-experiments
Toying around with ArrayFire in Rust
arrayfire conways-game-of-life cuda gpgpu gpu-acceleration gpu-computing opencl rust
Last synced: 28 Apr 2026
https://github.com/alpha74/hungarianalgocuda
Hungarian Algorithm for Linear Assignment Problem implemented using CUDA.
cuda nvcc parallel-computing parallel-programming
Last synced: 01 Jun 2026
https://github.com/dolongbien/cuda
CUDA and Caffe/Caffe2 installation Ubuntu 16.04
c3d-intel-caffe caffe caffe2 cuda cudnn deep-learning ubuntu
Last synced: 28 Apr 2026
https://github.com/fblupi/grado_informatica-ppr
Prácticas de la asignatura Programación Paralela de la UGR
cuda mpi openmp parallel-computing
Last synced: 22 Apr 2026
https://github.com/xavierjiezou/gpu-compute-capability
An application for querying the computing power of each gpu released by NVIDIA.
Last synced: 28 Apr 2026
https://github.com/SanaeProject/Matrix-for-Cpp
This repository has types that handle matrices.
cpp14 cpp14-library cuda matrix-library
Last synced: 15 May 2025
https://github.com/hartorn/docker-python
Repository to build python image, based on ubuntu and CUDA
cuda docker mkl-dnn onednn python3 ubuntu ubuntu1804
Last synced: 05 May 2026
https://github.com/quantum-integrated-technologies/deepforge
DeepForge : framework for working with machine learning.
ai artificial-intelligence cuda library machine-learning ml neural-network
Last synced: 31 Jul 2025
https://github.com/leocelente/basic_cuda
My CUDA source files while learning
Last synced: 29 Apr 2026
https://github.com/thunder-compute/thunder-compute-documentation
Documentation for Thunder Compute, a cloud platform creating technology to virtualize GPUs over TCP
ai artificial-intelligence cloud cloud-computing cuda gpu llm machine-learning nvidia pytorch tensorflow thunder-compute virtualization
Last synced: 15 Oct 2025
https://github.com/andrewboessen/bitonic-merge-sort
Bitonic Merge Sort algorithm optimized for GPU execution
bitonic-merge-sort cuda sorting-network
Last synced: 16 May 2026
https://github.com/Programmer-RD-AI/DetectX
A Pythonic approach to object detection using Detectron2, a clean, modular framework for training and deploying computer vision models. DetectX simplifies the complexity of object detection while maintaining high performance and extensibility.
coco-dataset computer-vision computer-vision-library cuda deep-learning detectron2 faster-rcnn gpu-accelerated machine-learning ml-framework object-detection object-recognition python3 pytorch retinanet
Last synced: 04 May 2025
https://github.com/asadiahmad/gesture-detection
Real-time Gesture Detection using CUDA-accelerated OpenCV in Python.
computer-vision cuda gesture-recognition gpu-acceleration open-pose opencv opencv-cuda pose-detection real-time
Last synced: 29 Apr 2026
https://github.com/nofaralfasi/parallel-sequence-alignment
A parallelized version of multiple DNA sequence alignment algorithm with MPI, OpenMP and CUDA
cuda mpi openmp sequence-alignment
Last synced: 29 Apr 2026
https://github.com/m15kh/cuda_programming
CUDA programming enables parallel computing on NVIDIA GPUs for high-performance tasks like deep learning and scientific computing
cuda cuda-programming gpu nvidia parallel-computing practice-programming
Last synced: 03 Apr 2025
https://github.com/ismailtekin05/caloriedetectingai
🍎🔍 Smart AI system that identifies food items in photos and calculates their calorie content automatically. Built with TensorFlow, YOLOv8, CUDA and computer vision for accurate nutrition tracking.
ai aimodel calorie-calculator computer-vision cuda data-analysis data-science data-segmentation data-visualization dataset dataset-generation image-processing image-recognition python segmentation-models tensorflow ultralytics yaml yolo yolov8
Last synced: 29 Apr 2026
https://github.com/lightshade12/kittlespt
A hobby CUDA pathtracing renderer.
3d-graphics computer-graphics cuda gpu path-tracing ray-tracing
Last synced: 18 Mar 2025
https://github.com/kartavyaantani/cuda_image_processing
A CUDA-accelerated image processing project featuring multiple GPU-based filters and enhancement techniques. Implements convolution, edge detection, Non-Local Means (NLM) denoising, K-Nearest Neighbors (KNN), and pixelization. Each operation is optimized using CUDA kernels for real-time performance on large images. The project supports command-line
cuda cuda-kernels cuda-programming cuda-toolkit gpu-programming high-performance-computing image-manipulation image-processing nvidia-cuda nvidia-gpu
Last synced: 30 Apr 2026
https://github.com/eric900115/parallelprogramming
The repository contains the coursework for CS5422, NTHU's Parallel Programming Course.
Last synced: 26 May 2026
https://github.com/bolner/totally-diffused
Debian/NVIDIA Docker image for AUTOMATIC1111's Stable Diffusion application.
automatic1111 cuda debian docker-image nvidia stable-diffusion xformers
Last synced: 11 Apr 2026
https://github.com/baonguyen6742/uv-install-torch
Tutorial to install torch/pytorch with cuda using uv
cuda install installation package python pytorch resolver torch torchaudio torchvision tutorial uv
Last synced: 13 Apr 2026
https://github.com/jxlarrea/homeassistant-voice-recipes
GPU/CUDA-accelerated voice control stack for Home Assistant. Runs on x86/x64 and ARM64 (including the NVIDIA DGX Spark). 100% Local - No Cloud, No Subscriptions.
arm64 cuda dgx-spark gb10 gpu-acceleration home-assistant local-llm qwen3 speech-to-text text-to-speech voice-assistant x86-64
Last synced: 26 May 2026
https://github.com/sarah627/horus_eye_fcih_graduation_project
An AI-powered tourism website using YOLOv7 for real-time landmark detection in images. Built with Flask, PyTorch, and Roboflow for seamless tourist interaction.
computer-vision cuda flask jupyter-notebook kaggle matplotlib object-detection opencv python pytorch roboflow
Last synced: 14 Apr 2026
https://github.com/fynv/cudainline
A CUDA interface for Python. A distillation of the engine part of ThrustRTC.
Last synced: 18 May 2026
https://github.com/dotblueshoes/robertscross
The Roberts cross operator is used in image processing and computer vision for edge detection.
cuda edge-detection image-processing
Last synced: 30 Mar 2025
https://github.com/inventwithdean/cuda_mlp
Implementation of a simple Multilayer Perceptron in pure CUDA
cuda cuda-programming deep-learning neural-networks
Last synced: 30 Mar 2025
https://github.com/tudasc/cusan-tests
A test suite for CUDA-aware MPI race detection
Last synced: 03 May 2026
https://github.com/duskvirkus/ofxarrayfire
An openFrameworks addon with pre-compiled binaries of ArrayFire.
arrayfire cuda ofxaddon openframeworks openframeworks-addon
Last synced: 09 May 2026
https://github.com/varun-1703/eu-act-navigator-rag-qabot
An interactive, privacy-first application for querying the European Union’s AI Act using a local Retrieval-Augmented Generation (RAG) pipeline. Combines semantic search (FAISS) and a quantized TinyLlama LLM for fast, accurate, and context-aware answers—all running on your own hardware.
cuda faiss hugging-face-transformers langchain legal-tech local-slm machine-learning nlp open-source privacy rag-chatbot sentence-transformers streamlit tinyllama
Last synced: 03 May 2026
https://github.com/thomasonzhou/minitorch
rebuilding pytorch: from autograd to convolutions in CUDA
Last synced: 02 Feb 2026