CUDA
CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.
- GitHub: https://github.com/topics/cuda
- Wikipedia: https://en.wikipedia.org/wiki/CUDA
- Created by: Nvidia
- Released: June 23, 2007
- Related Topics: nvcc,
- Last updated: 2026-07-02 00:07:18 UTC
- JSON Representation
https://github.com/lordofhyphens/gpu-path-delay-coverage
CUDA-based Path Delay Fault Coverage
Last synced: 04 May 2026
https://github.com/hit07/ml-dl-torch
This repository contains comprehensive understanding of Machine Leaning, DeepLeaning using Pytorch
computer-vision convolutional-neural-networks cuda neural-networks pytorch
Last synced: 28 Feb 2025
https://github.com/gaaniruddha/mphil-gpu-imager
This repository contains code for project #1 of MPhil: test-version of GPU imager for a single time-step, single-channel and single time-step, multi-channel.
astronomy benchmarks cuda cufft google-sheets gpu-imager imaging-astronomy interferometry radio-astronomy
Last synced: 11 Jun 2026
https://github.com/alan-cooney/python-cuda-starter-template
Python CUDA Starter Template
Last synced: 30 Mar 2025
https://github.com/jesuscopado/parallel-programming
My solutions for the course Programming Parallel Computers at Aalto University (http://ppc.cs.aalto.fi/). Grade: 5/5
cpp cuda image-segmentation median-filter sorting-algorithms
Last synced: 19 Apr 2026
https://github.com/h4ck3r-04/fpassword
Fpassword merges Hashcat's hash-cracking precision with Hydra's parallelized network login, offering penetration testers a powerful tool for swift hash deciphering and simultaneous login attempts across diverse protocols.
brute-force brute-force-attacks c cracking cuda gpgpu hashcat hashes hydra network-security opencl password penetration-testing
Last synced: 16 Jan 2026
https://github.com/sonhm3029/setup-experience
This project for storage my setup experience, error met-and-solve in developing end to end AI, software project
ai computer-vision cuda deep-learning software
Last synced: 10 Jun 2026
https://github.com/shineiarakawa/particle-stabilizer
A C++ and CUDA-based program for simulating the motion of particles.
Last synced: 12 May 2026
https://github.com/promptromp/aws-bootstrap-g4dn
fast and easy bootstrapping of AWS EC2 instances for CUDA development. Use as a CLI, as a programmatic SDK, or as an Agent Skill!
aws cuda ec2 jupyter-notebook machine-learning mlops python
Last synced: 21 Feb 2026
https://github.com/lucatedeschini/feedforwardnn
This project is my submission for the exam "Project Work in Architecture and Platform for Artificial Intelligence"
c cuda neural-networks openmp scratch-implementation
Last synced: 20 Apr 2026
https://github.com/akshaysinhaaa/emova
A deep learning framework designed for emotion and sentiment recognition using text, audio, and video modalities. This project leverages the MELD (Multimodal EmotionLines Dataset) to train a robust and flexible model that reflects human communication more accurately than unimodal models.
bert cnn cuda deep-learning multimodal python pytorch resnet-18 tensorboard transformers
Last synced: 05 May 2026
https://github.com/flavienbwk/tensorflow2-cuda-10.2-docker
Tensorflow 2.3, CUDA 10.2, Docker compatible image
cuda docker python3 tensorflow ubuntu1804
Last synced: 11 Apr 2026
https://github.com/kmock930/texture-image-comparison
This project aims to build a model which classifies the type of an unseen image as accurate as possible, by implementing, evaluating, and comparing amongst 2 different multi-layer perceptron Neural Networks.
computer-vision conda confusion-matrix convolutional-neural-networks cuda image-preprocessing keras keras-tensorflow learning-curve-analysis matplotlib multi-layer-perceptron neural-network pickle-file python3 skimage
Last synced: 12 Apr 2026
https://github.com/mjun0812/setup-cuda
Set up a specific version of NVIDIA CUDA in GitHub Actions on Linux x86_64, arm64 (Debian and Fedora based distribution) and Windows
action cuda cuda-toolkit github-actions
Last synced: 13 Jan 2026
https://github.com/cmazakas/cuda-stuff
A CUDA-based playground
cmake cuda delaunay-triangulation vscode
Last synced: 24 Mar 2025
https://github.com/deepschneider/tinygrad-universal
Universal version of Tinygrad with CUDA and OpenCL support
autograd automatic-differentiation cuda pycuda pyopencl tinygrad tinygrad-cuda
Last synced: 06 Mar 2025
https://github.com/bjornmelin/llm-gpu-optimization
🚄 Advanced LLM optimization techniques using CUDA. Features efficient attention mechanisms, custom CUDA kernels for transformers, and memory-efficient training strategies. ⚡
cuda deep-learning gpu-acceleration llm-optimization machine-learning memory-optimization parallel-computing transformers
Last synced: 18 Mar 2025
https://github.com/ngoma1713/rushirb2001
🤖 Explore advanced AI and machine learning solutions for protein modeling and medical applications, developed by a dedicated data science graduate student.
computer-vision-opencv cuda data-science-portfolio deep-learning generative-ai machine-learning medical-ai protein-modeling published-researcher pytorch quantum-ml rag-chatbot tensorflow
Last synced: 02 May 2026
https://github.com/GTruf/Driver-Drowsiness-Detector
Prototype of an intelligent safety system for detecting driver drowsiness
cpp cuda cudnn deep-learning driver-drowsiness-detection driver-drowsiness-detector drowsiness-detection face-recognition image-recognition machine-learning neural-network nvidia-cuda object-recognition opencv qt6 recognition-neural-network yolo yolov10 yolov5 yolov9
Last synced: 14 Mar 2025
https://github.com/Parxd/cuda-optim
various CUDA kernels optimized for specific ML algos
Last synced: 02 Sep 2025
https://github.com/mattjesc/federated-learning-simulation-1gpu-mi-is
Federated Learning Simulation on a Single GPU with Model Interpretability and Interactive Visualization
ai cuda deep-learning distributed-systems federated-learning gpu hpc keras machine-learning ml model-interpretability python pytorch simulation streamlit tensorflow
Last synced: 05 Jan 2026
https://github.com/minseoc03/cuda-100-days
A 100-day journey to master CUDA programming, inspired by the CUDA-120-DAYS--CHALLENGE project. This repo contains daily CUDA exercises and code folders, with learning notes hosted on Notion. Practicing on leetgpu.com due to lack of local NVIDIA GPU.
100daysofcode cuda deeplearning gpgpu gpu hpc nvidia parallel-computing
Last synced: 19 Apr 2025
https://github.com/moesio-f/cla
C Linear Algebra (CLA) library. A simple toy library for basic vector/matrix operations with CUDA support and Python bindings.
Last synced: 09 May 2026
https://github.com/ndgigliotti/torch-ipca
GPU-accelerated Incremental PCA for PyTorch
cuda dimensionality-reduction gpu incremental-pca machine-learning pca pytorch
Last synced: 26 Jan 2026
https://github.com/marnovo/cuda-projects
cuda cuda-kernels gpu gpu-programming nvidia-cuda parallel-computing
Last synced: 10 Jun 2025
https://github.com/ionmich/cs149-local-dev
Provides `conda` installation instructions for Stanford's CS149 (Parallel Computing) programming assignments
conda cs149 cuda ispc parallel-computing
Last synced: 31 Mar 2025
https://github.com/jpodivin/gputomata
Cellular automata running on CUDA capable GPUs
cellular-automata cellular-automaton cuda
Last synced: 07 Nov 2025
https://github.com/mathiasotnes/gemm
General Matrix Multiplication (GEMM) optimization in Cuda.
Last synced: 26 Mar 2025
https://github.com/unknownnuts/meshsdk
Mesh processing library with extra `C/C#/JS/TS/PYTHON` bindings.
cuda dicom electron emscripten mesh modelling pybind11 stl stomatology threejs wasm
Last synced: 10 Apr 2026
https://github.com/ragu-manjegowda/parallel-programming
Assignments and Projects of Udacity's Introduction to Parallel Programming Course
cuda gpu-programming nvidia-cuda nvidia-gpu udacity-parallel-programming
Last synced: 25 May 2026
https://github.com/d-krylov/cuda_to_opengl
Simple examples for CUDA OpenGL interoperability
Last synced: 01 May 2026
https://github.com/wojcikmikolaj/particles-in-a-jar
Collisions between particles simulated on GPU.
algorithms-and-data-structures collision-detection collisions cuda gpu-programming
Last synced: 25 May 2026
https://github.com/xueeinstein/udacity-cs344-cuda8
Code for Udacity CS344 (Intro to Parallel Programming) using CUDA 8.0
cuda cuda-8 parallel-computing
Last synced: 02 May 2026
https://github.com/kenmalik/cuda-dr-bcg
CUDA C++ implementation of the DR-BCG algorithm for numerically solving linear systems.
cpp cuda hpc numerical-methods
Last synced: 19 Apr 2026
https://github.com/nourmorsy/convolution-neural-network-cuda
Code for optimization to CNN using CUDA
Last synced: 13 May 2026
https://github.com/prateekshukla1108/thunderkittens-docs
Documentation for ThunderKittens framework
Last synced: 18 Mar 2025
https://github.com/moshidev/acap
Prácticas de la asignatura Arquitectura y Computación de Altas Prestaciones
cuda homework-assignments mpi pthreads
Last synced: 30 Mar 2025
https://github.com/cuda8/brainwords2
GPU brainflayer for sale $250
brain brainflayer brainwords cuda gpu key pass passphrase private
Last synced: 10 Mar 2025
https://github.com/shtrophic/wicuvanity
Generate wireguard vanity keys on your Nvidia GPU
cuda gpu vanity-address vanity-addresses vanitygen wireguard
Last synced: 10 Mar 2025
https://github.com/cserajdeep/dnn-iris-pytorch
Deep Neural Network with Batch normalization for tabulat datasets.
batch batch-normalization classification cuda deep-learning dnn iris-dataset
Last synced: 02 May 2026
https://github.com/monajemi-arman/sparkling
Easy to use Spark cluster management panel with GPU support
apache-spark csharp cuda distributed-computing distributed-learning docker gpu javascript nextjs torch typescript
Last synced: 12 Apr 2026
https://github.com/Neuro-Mechatronics-Interfaces/python-intan
Tools and demos for working with EMG data from intan using python
circuitpython cuda emg pico python realtime tensorflow
Last synced: 13 Jan 2026
https://github.com/followthesapper/atlas-q
GPU-accelerated quantum tensor network simulator with adaptive MPS
ai cuda gpu-acceleration high-performance-computing matrix-product-states nisq python pytorch qaoa quantum-algorithms quantum-computing quantum-simulator scientific-computing shors-algorithm tensor-networks triton vqe
Last synced: 20 Jan 2026
https://github.com/kataglyphis/machinelearningalgorithms
Basic Machine Learning Algorithms
cuda machine-learning python tensorflow
Last synced: 31 Mar 2025
https://github.com/m-torhan/advent-of-code
🎄 Solutions for the Advent of Code
advent-of-code advent-of-code-2024 cuda
Last synced: 07 Apr 2025
https://github.com/snandasena/courseera_gpu_specilization_capstone_project
Coursera GPU Specilization Capstone Project
cpp cuda gpu-programming imageprocessing linearalgebra
Last synced: 02 May 2026
https://github.com/tdavidcl/cu_intercept
cuda cuda-memory cuda-programming hook massif memory-tracking preload
Last synced: 03 May 2026
https://github.com/codename-detective/cuda_gpgpus_shared_memory_systems_pdp
CUDA GPGPUs Shared Memory Systems Parallel & Distributed Programming
cuda cuda-programming numa parallel-programming
Last synced: 30 Mar 2025
https://github.com/voltr0x/raytracing-cuda
Raytracing in a weekend using CUDA
Last synced: 01 Apr 2026
https://github.com/kylesayrs/pttp
PyTorch Tensor Profiler with fully-supported memory timelines and events
Last synced: 07 May 2026
https://github.com/kis-balazs/cuda-research
CUDA Research & Code. Course-style structured. Inspiration from @Infatoshi.
Last synced: 14 May 2025
https://github.com/bjornmelin/ai-system-design
🎨 Large-scale AI system architectures and implementations. Features distributed training systems, multi-GPU pipelines, and efficient resource management. 🏗️
architecture cuda distributed-systems engineering gpu-computing production scalability system-design
Last synced: 23 Jul 2025
https://github.com/derek-palmer/dvr-scan-file-organizer
DVR-Scan-Organizer is a Dockerized extension for DVR-Scan, designed to process multiple video files and organize output in a structured format.
cuda dvr dvr-scan multimedia opencv opencv-python python video video-processing
Last synced: 01 May 2026
https://github.com/elymsyr/auv_ws
An open-source simulation and control workspace for an Autonomous Underwater Vehicle (AUV) built on ROS 2 Humble and Gazebo. It features a high-fidelity dynamics model and an advanced AI-based motion controller (FossenNet) that uses a pre-trained LibTorch model to imitate a NL-MPC for real-time, high-performance manoeuvring.
autonomous-vehicles auv control-systems cpp cuda deep-learning gazebo imitation-learning libtorch mpc python robotics ros2 simulation
Last synced: 15 Apr 2026
https://github.com/quik-fe/node-nvidia-smi
Node wrapper around nvidia-smi.
cuda gpu nodejs nvidia nvidia-smi typescript
Last synced: 19 Feb 2026
https://github.com/kronbii/thermal-super-resolution
State-of-the-art thermal super-resolution system (IMDN) with RGB→thermal adaptation, custom multi-component loss, 29.6 dB PSNR, 0.713 SSIM, 250+ FPS, production-ready PyTorch + CUDA implementation.
computer-vision cuda deep-learning image-enhancement imdn model-optimization production-machine-learning pytorch real-time real-time-processing research super-resolution thermal-imaging
Last synced: 18 Apr 2026
https://github.com/asadiahmad/100_sports_image_classification
A deep learning project for sport image classification using a custom VGG19-based architecture with integrated Grad-CAM heatmap visualization for model interpretability.
computer-vision cuda data-augmentation deep-learning explainable-ai gpu-acceleration grad-cam heatmap-visualization image-classification mixed-precision-training pytorch pytorch-grad-cam sports-analytics sports-classification transfer-learning vgg19
Last synced: 11 Jun 2025
https://github.com/ysl1016/cudadigitfilter
CUDA-based parallel image filtering system for MNIST dataset
computer-vision cuda deep-learning gpu-acceleration image-processing mnist parallel-computing
Last synced: 28 Mar 2025
https://github.com/maneeshsit/pcie
Modify run:ai and other FOSS projects code for use with PCIe card-based AI accelerators for both inference and training
cuda cxl cxl-mem distro exo k3s k8s kestra llamacpp llm-d mpi4py mpio onnxoptimizer opentelemetry-ebpf-profiler paxos-cluster pcie photonics-computing runai visualize vllm
Last synced: 24 Aug 2025
https://github.com/sshoecraft/shepherd
An interactive multi-backend LLM runtime with intelligent cache eviction and persistent retrieval-augmented memory.
anthropic cli cpp cuda gemini grok inference kv-cache llama-cpp llm mcp ollama openai openai-server rag smart-evictions tensorrt tool-calling ulimited-context
Last synced: 10 Apr 2026
https://github.com/camille-004/cusprec
🏁 Sparse signal recovery library written in PyCUDA.
cuda ml python signal-processing sparse-recovery
Last synced: 18 Jan 2026
https://github.com/sid911/neuralnetworkcpp
A small experiment to learn about neural networks and their runtimes in cpp
cpp cuda machine-learning neural-network
Last synced: 20 Aug 2025
https://github.com/pvgupta24/parallel-programming
Basic algorithms for parallel programming in CUDA C++, Java and OpenMP
cuda openmp parallel-programming
Last synced: 19 Aug 2025
https://github.com/dmalexx/cuda_check
How can you check if CUDA is available in Tensorflow
Last synced: 10 Apr 2026
https://github.com/ojeda-e/fokker-planck
Numerical solution of the Fokker-Planck equation in large times using CUDA/C.
Last synced: 17 Aug 2025
https://github.com/alessiobugetti/integral-image-processing
Implements sequential and parallel integral image computation in C++ and Python, utilizing CUDA for parallel computation on GPU
cuda gpu-acceleration integral-image numba parallel-computing pycuda
Last synced: 24 May 2026
https://github.com/i-m-iron-man/abmax
Abmax is an agent-based modelling framework in Jax, focused on dynamic population size
abm agent agent-based agent-based-modeling agent-based-simulation agents cuda jax python
Last synced: 04 Oct 2025
https://github.com/andreeo/parallel-computing-cuda
Programs in terminal applying the parallel programming model with the CUDA arquitecture
c cpp cuda docker lineal-search parallel-computing parallel-reduction rank-sort-algorithm
Last synced: 09 Apr 2026
https://github.com/nwpu66/cookiekiss-engine
CookieKiss Engine include a render and other small tech related to compute graphic.
compute-graphics cpp cuda opengl vulkan
Last synced: 09 Apr 2026
https://github.com/tomtolleson/cuda-kernel-benchmarking-tool
A benchmarking tool in C++ that creates Cuda kernels and tests the overall system performance between CPU and GPU
cuda cuda-kernels cuda-support cuda-toolkit nvidia nvidia-cuda nvidia-gpu
Last synced: 30 Mar 2025
https://github.com/ibrar-syed/complete_deep-learning-nvidia_gpu-setup-linux
Full setup for a deep learning environment on Ubuntu Linux with CUDA, cuDNN, TensorRT, and TensorFlow GPU. Includes scripts, test code, and environment configuration
ai bash conda cuda cudnn deep-learning environment-setup gcc gpu jupyter linux machine-learning nvidia-cuda nvidia-gpu pytorch setup-script tensorflow tensorrt
Last synced: 09 Apr 2026
https://github.com/timdev-r/cv-ground-truth-extraction
(Dump) Helper for ground truth extraction, movement analytics and silhouette visual demonstration
computer-vision cuda ground-truth intel-realsense pandas python
Last synced: 18 Apr 2026
https://github.com/datasagess/fic
NLP Hackaton \w NN + FastAPI + Docker
catboost cuda docker fastapi lstm python pytorch rapidfuzz tensorflow
Last synced: 08 Aug 2025
https://github.com/dmitryyurov/bitonic-cuda
An implementation of bitonic search on CUDA
cuda gpu-programming sorting-algorithms
Last synced: 02 Oct 2025
https://github.com/sephiroth7712/k-nearest-neigbours
Implementation of K-Nearest Neighbors algorithm using multiple parallel computing approaches: CUDA (GPU), Hadoop, Spark, MPI, OpenMP, and PThreads. Demonstrates scalable machine learning across different parallel computing paradigms from GPU to distributed frameworks.
cuda cuda-programming hadoop-mapreduce java mpi multiprocessing multithreading openmp pthreads scala spark
Last synced: 12 Apr 2026
https://github.com/conan-kiln/kiln
An actively maintained fork of ConanCenter with an emphasis on CV, ML and robotics capabilities on edge devices
computer-vision conan cuda machine-learning oneapi packaging robotics rust scientific-computing
Last synced: 02 Oct 2025
https://github.com/brave-tarnished/gpu-accelerated-opc
Optical Proximity Correction (OPC) is a photolithography technique that modifies photomask geometry to counteract diffraction and process effects, ensuring accurate printing of patterns on the wafer. This work demonstrates a proof of concept showing how using a GPU-based approach can significantly speed up these modifications compared to a CPU.
cpp cuda gpu-acceleration photolithography semiconductors
Last synced: 02 Oct 2025
https://github.com/sankeer28/pptx-text-audio-transcriber
Extract text and transcribe audio from PowerPoint presentations using OpenAI Whisper.
audio-transcription cuda openai-whisper powerpoint pptx-parser
Last synced: 02 Oct 2025
https://github.com/desmondjs/cuda_mceliece_kem
CUDA-Accelerated McEliece KEM 🔑 | Post-Quantum Cryptography on GPU Implementation of Classic McEliece key encapsulation, encryption, decryption, and decapsulation on CPU & GPU with CUDA, including benchmarking scripts and full FYP2 report
academic-project benchmarking classic-mceliece cuda fyp gpu-acceleration kem pqc
Last synced: 02 Oct 2025
https://github.com/TeamBipartite/bipartite-gemm
High throughput data-parallel GEMM implementations in Cuda using Cuda cores and Tensor cores
Last synced: 14 Jan 2026
https://github.com/nvaranki/cmmx
CUDA matrix multiplication (official guide, modified)
Last synced: 08 Aug 2025
https://github.com/drilonaliu/parallel-mandelbrot-set
GPU-accelerated Mandelbrot Set generation with CUDA and OpenGL interoperability.
cuda fractals gpu mandelbrot-fractal parallel-programming
Last synced: 12 Apr 2026
https://github.com/aurelienperez/gpu-heston-monte-carlo
GPU-accelerated Monte Carlo simulation for option pricing under the Heston model using CUDA.
Last synced: 01 Apr 2025
https://github.com/nikhilrout/thetensorcoreproject
Microarchitecture implementation of Nvidia's Tensor Cores
cuda floating-point gpgpu hybrid-precision-training tensorcore
Last synced: 01 Apr 2025
https://github.com/f-koehler/itesol
WIP: Iterative eigensolvers for C++20, Python and CUDA
cpp20 cuda eigenvalues linear-algebra python
Last synced: 08 Nov 2025
https://github.com/yutakseo/docker_ubuntu-cuda_environment
🐳 A ready-to-use Docker environment for deep learning development with Ubuntu 22.04 and CUDA 11.8.
container cuda docker environment ubuntu
Last synced: 12 Apr 2026