CUDA
CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.
- GitHub: https://github.com/topics/cuda
- Wikipedia: https://en.wikipedia.org/wiki/CUDA
- Created by: Nvidia
- Released: June 23, 2007
- Related Topics: nvcc,
- Last updated: 2026-06-23 00:07:15 UTC
- JSON Representation
https://github.com/sahil-rajwar-2004/vector-cuda
vector calculation with GPU acceleration using CUDA
c cpp11 cuda cuda-kernels cuda-programming nvcc
Last synced: 15 May 2025
https://github.com/neel-dandiwala/cuda-programs
Miscellaneous programs that grasp the concept of Parallel Computing
cuda gpu-programming parallel-programming
Last synced: 16 May 2025
https://github.com/tchung1970/sd-cli-cuda
CUDA-accelerated Stable Diffusion plugin for wavespeed-desktop
cuda gpu linux nvidia stable-diffusion
Last synced: 09 May 2026
https://github.com/bikrammajhi/100-days-of-gpu
This is my 🔥 100 Days of GPU — a wild, hands-on journey through CUDA kernels, Triton spells, and PTX sorcery.
cuda nsight-compute ptx triton
Last synced: 18 Jun 2025
https://github.com/enapiuz/logic-circuit-simulator
Logic circuit (based on NAND gates) simulator using OpenCL
c circuit-simulator cuda digital-logic gpgpu logic-gates opencl simulator
Last synced: 03 May 2026
https://github.com/kar-dim/CAS-2D
Implementation of the AMD FidelityFX CAS (Contrast Adaptive Sharpening) algorithm on CUDA, for sharpening static images.
cpp cuda dll fidelityfx gpu image-processing parallel-computing sharpen
Last synced: 01 Nov 2025
https://github.com/mohammadshabazuddin/text_to_speech_generation_with_llm_with_hugging_face
Build a text-to-speech generation system using LLMs and Hugging Face to convert text into natural audio speech.
cuda huggingface-transformers llms nlp
Last synced: 03 May 2026
https://github.com/sid911/neuralnetworkcpp
A small experiment to learn about neural networks and their runtimes in cpp
cpp cuda machine-learning neural-network
Last synced: 20 Aug 2025
https://github.com/lk/gpu-nbody
GPU-accelerated n-body engine for t-SNE and physics simulation
cuda gpu n-body n-body-simulator
Last synced: 02 Sep 2025
https://github.com/drilonaliu/parallel-permutation-cipher
cryptography cuda gpu parallel-programming permutation
Last synced: 19 Jul 2025
https://github.com/phantom7knight/cuda-fusion
This project is for learning CUDA to understand the GPU work better.
cuda cuda-programming gpgpu gpu
Last synced: 17 May 2026
https://github.com/9prady9/archdock
Arch linux docker image for app development
arch-linux arrayfire cuda docker-image forge opencl
Last synced: 03 May 2026
https://github.com/aaditya29/parallel-computing-and-cuda
Learning about Parallel Computing and GPU programming using CUDA.
c cpp cuda cuda-kernels cuda-programming nvidia-cuda openmp openmpi parallel-computing parallel-programming
Last synced: 18 Jul 2025
https://github.com/h4ck3r-04/fpassword
Fpassword merges Hashcat's hash-cracking precision with Hydra's parallelized network login, offering penetration testers a powerful tool for swift hash deciphering and simultaneous login attempts across diverse protocols.
brute-force brute-force-attacks c cracking cuda gpgpu hashcat hashes hydra network-security opencl password penetration-testing
Last synced: 16 Jan 2026
https://github.com/chensongpoixs/cmedia_transcode
媒体服务转码版本GPU(cuda) 支持H264与H265转码
cuda gpu h264 h265 media transcode-media
Last synced: 19 May 2026
https://github.com/chiragajain/gpu-optimization-roadmap
This repository is part of a structured curriculum designed to master GPU optimization, Triton, Deep Learning, and LLMs. This section focuses on GPU fundamentals, CUDA programming, and PyTorch optimizations.
cuda deeplearning gpu-acceleration learning python pytorch triton
Last synced: 18 Feb 2026
https://github.com/pvgupta24/parallel-programming
Basic algorithms for parallel programming in CUDA C++, Java and OpenMP
cuda openmp parallel-programming
Last synced: 19 Aug 2025
https://github.com/lucatedeschini/feedforwardnn
This project is my submission for the exam "Project Work in Architecture and Platform for Artificial Intelligence"
c cuda neural-networks openmp scratch-implementation
Last synced: 20 Apr 2026
https://github.com/akshaysinhaaa/emova
A deep learning framework designed for emotion and sentiment recognition using text, audio, and video modalities. This project leverages the MELD (Multimodal EmotionLines Dataset) to train a robust and flexible model that reflects human communication more accurately than unimodal models.
bert cnn cuda deep-learning multimodal python pytorch resnet-18 tensorboard transformers
Last synced: 05 May 2026
https://github.com/drilonaliu/parallel-image-edge-detection
cuda edge-detection gpu image-processing
Last synced: 17 May 2026
https://github.com/kratugautam99/logiclink-project
LogicLink is a conversational AI chatbot developed by Kratu Gautam (AIML Engineer). Powered by the TinyLlama-1.1B-Chat-v1.0 model, it provides an interactive interface for engaging conversations, query resolution, and task assistance. Version 5 features streaming responses, conversation management, and a sleek GUI.
antd-design chatbot-application conversational-ai cuda gradio graphical-user-interface huggingface-spaces huggingface-transformers jupyter-notebooks keras large-language-models mlops model-service-controller modelscope-studio natural-language-generation natural-language-processing pytorch reasoning-agent tensorflow
Last synced: 07 Apr 2026
https://github.com/cmazakas/cuda-stuff
A CUDA-based playground
cmake cuda delaunay-triangulation vscode
Last synced: 24 Mar 2025
https://github.com/dmalexx/cuda_check
How can you check if CUDA is available in Tensorflow
Last synced: 10 Apr 2026
https://github.com/Parxd/cuda-optim
various CUDA kernels optimized for specific ML algos
Last synced: 02 Sep 2025
https://github.com/mattjesc/federated-learning-simulation-1gpu-mi-is
Federated Learning Simulation on a Single GPU with Model Interpretability and Interactive Visualization
ai cuda deep-learning distributed-systems federated-learning gpu hpc keras machine-learning ml model-interpretability python pytorch simulation streamlit tensorflow
Last synced: 05 Jan 2026
https://github.com/hnthap/vietnamese-word-segment
Vietnamese word segmentation package.
cuda torch transformers vietnamese vietnamese-nlp vietnamese-tokenizer word-segmentation
Last synced: 19 May 2026
https://github.com/minseoc03/cuda-100-days
A 100-day journey to master CUDA programming, inspired by the CUDA-120-DAYS--CHALLENGE project. This repo contains daily CUDA exercises and code folders, with learning notes hosted on Notion. Practicing on leetgpu.com due to lack of local NVIDIA GPU.
100daysofcode cuda deeplearning gpgpu gpu hpc nvidia parallel-computing
Last synced: 19 Apr 2025
https://github.com/moesio-f/cla
C Linear Algebra (CLA) library. A simple toy library for basic vector/matrix operations with CUDA support and Python bindings.
Last synced: 09 May 2026
https://github.com/ndgigliotti/torch-ipca
GPU-accelerated Incremental PCA for PyTorch
cuda dimensionality-reduction gpu incremental-pca machine-learning pca pytorch
Last synced: 26 Jan 2026
https://github.com/marnovo/cuda-projects
cuda cuda-kernels gpu gpu-programming nvidia-cuda parallel-computing
Last synced: 10 Jun 2025
https://github.com/ionmich/cs149-local-dev
Provides `conda` installation instructions for Stanford's CS149 (Parallel Computing) programming assignments
conda cs149 cuda ispc parallel-computing
Last synced: 31 Mar 2025
https://github.com/kar-dim/cas-2d
Implementation of the AMD FidelityFX CAS (Contrast Adaptive Sharpening) algorithm on CUDA/OpenCL, for sharpening static images.
cpp cuda dll fidelityfx gpu image-processing parallel-computing sharpen
Last synced: 22 Jun 2025
https://github.com/kanchishimono/python-images
Ubuntu based Python container images, including CUDA images
container-image cuda docker dockerfile machine-learning python python3
Last synced: 30 Apr 2026
https://github.com/rkarahul/person-detector-faceverifier
Person-Detector-FaceVerifier is a sophisticated system for detecting and verifying faces in images. Ideal for applications like passport control and security, it combines advanced face detection with precise verification techniques.
bootstrap5 css3 cuda django html5 javascipt opencv-python os python pytorch yolov8
Last synced: 07 Apr 2026
https://github.com/muneeb706/cuda
sample programs implemented using cuda (gpu)
cplusplus cuda gpu-programming
Last synced: 19 May 2026
https://github.com/cuda8/brainwords2
GPU brainflayer for sale $250
brain brainflayer brainwords cuda gpu key pass passphrase private
Last synced: 10 Mar 2025
https://github.com/shtrophic/wicuvanity
Generate wireguard vanity keys on your Nvidia GPU
cuda gpu vanity-address vanity-addresses vanitygen wireguard
Last synced: 10 Mar 2025
https://github.com/monajemi-arman/sparkling
Easy to use Spark cluster management panel with GPU support
apache-spark csharp cuda distributed-computing distributed-learning docker gpu javascript nextjs torch typescript
Last synced: 12 Apr 2026
https://github.com/Neuro-Mechatronics-Interfaces/python-intan
Tools and demos for working with EMG data from intan using python
circuitpython cuda emg pico python realtime tensorflow
Last synced: 13 Jan 2026
https://github.com/ojeda-e/fokker-planck
Numerical solution of the Fokker-Planck equation in large times using CUDA/C.
Last synced: 17 Aug 2025
https://github.com/kataglyphis/machinelearningalgorithms
Basic Machine Learning Algorithms
cuda machine-learning python tensorflow
Last synced: 31 Mar 2025
https://github.com/tdavidcl/cu_intercept
cuda cuda-memory cuda-programming hook massif memory-tracking preload
Last synced: 03 May 2026
https://github.com/codename-detective/cuda_gpgpus_shared_memory_systems_pdp
CUDA GPGPUs Shared Memory Systems Parallel & Distributed Programming
cuda cuda-programming numa parallel-programming
Last synced: 30 Mar 2025
https://github.com/voltr0x/raytracing-cuda
Raytracing in a weekend using CUDA
Last synced: 01 Apr 2026
https://github.com/alessiobugetti/integral-image-processing
Implements sequential and parallel integral image computation in C++ and Python, utilizing CUDA for parallel computation on GPU
cuda gpu-acceleration integral-image numba parallel-computing pycuda
Last synced: 24 May 2026
https://github.com/i-m-iron-man/abmax
Abmax is an agent-based modelling framework in Jax, focused on dynamic population size
abm agent agent-based agent-based-modeling agent-based-simulation agents cuda jax python
Last synced: 04 Oct 2025
https://github.com/ribin-baby/cuda_cudnn_installation_on_ubuntu20.04
Installation of CUDA-11.8 with cuDNN-8.7 for ubuntu(20.04) server A30 GPU, and onnx gpu installation guide
cuda gpu linux onnxruntime server
Last synced: 16 May 2026
https://github.com/patriciobcs/mini-aevol
Parallel implementation of a reduced version of the Aevol simulator
Last synced: 19 May 2026
https://github.com/kronbii/thermal-super-resolution
State-of-the-art thermal super-resolution system (IMDN) with RGB→thermal adaptation, custom multi-component loss, 29.6 dB PSNR, 0.713 SSIM, 250+ FPS, production-ready PyTorch + CUDA implementation.
computer-vision cuda deep-learning image-enhancement imdn model-optimization production-machine-learning pytorch real-time real-time-processing research super-resolution thermal-imaging
Last synced: 18 Apr 2026
https://github.com/asadiahmad/100_sports_image_classification
A deep learning project for sport image classification using a custom VGG19-based architecture with integrated Grad-CAM heatmap visualization for model interpretability.
computer-vision cuda data-augmentation deep-learning explainable-ai gpu-acceleration grad-cam heatmap-visualization image-classification mixed-precision-training pytorch pytorch-grad-cam sports-analytics sports-classification transfer-learning vgg19
Last synced: 11 Jun 2025
https://github.com/andreeo/parallel-computing-cuda
Programs in terminal applying the parallel programming model with the CUDA arquitecture
c cpp cuda docker lineal-search parallel-computing parallel-reduction rank-sort-algorithm
Last synced: 09 Apr 2026
https://github.com/ysl1016/cudadigitfilter
CUDA-based parallel image filtering system for MNIST dataset
computer-vision cuda deep-learning gpu-acceleration image-processing mnist parallel-computing
Last synced: 28 Mar 2025
https://github.com/flosmume/cpp-cuda-deepvision-rtx-starter
CUDA C++ practice project for RTX 4070 SUPER — explore GPU concurrency, pinned memory, and Nsight profiling. Includes SAXPY and 2D blur kernels to train optimization, stream overlap, and timing analysis for NVIDIA Developer Technology Engineering skillset.
cpp cuda cuda-kernels cuda-streams deep-learning-inference gpu gpu-optimization gpu-profiling high-performance-computing nsight nvidia parrallel-computing pinned-memory
Last synced: 16 May 2026
https://github.com/nwpu66/cookiekiss-engine
CookieKiss Engine include a render and other small tech related to compute graphic.
compute-graphics cpp cuda opengl vulkan
Last synced: 09 Apr 2026
https://github.com/ahmadrafidev/learn-cuda
A place where I learn about CUDA
cuda cuda-programming gpu os parallel-programming
Last synced: 13 Apr 2025
https://github.com/drilonaliu/parallel-fractal-tree
GPU-accelerated fractal tree generation with CUDA and OpenGL interoperability.
cuda fractal-tree fractals gpu
Last synced: 19 May 2026
https://github.com/ibrar-syed/complete_deep-learning-nvidia_gpu-setup-linux
Full setup for a deep learning environment on Ubuntu Linux with CUDA, cuDNN, TensorRT, and TensorFlow GPU. Includes scripts, test code, and environment configuration
ai bash conda cuda cudnn deep-learning environment-setup gcc gpu jupyter linux machine-learning nvidia-cuda nvidia-gpu pytorch setup-script tensorflow tensorrt
Last synced: 09 Apr 2026
https://github.com/tomtolleson/cuda-kernel-benchmarking-tool
A benchmarking tool in C++ that creates Cuda kernels and tests the overall system performance between CPU and GPU
cuda cuda-kernels cuda-support cuda-toolkit nvidia nvidia-cuda nvidia-gpu
Last synced: 30 Mar 2025
https://github.com/timdev-r/cv-ground-truth-extraction
(Dump) Helper for ground truth extraction, movement analytics and silhouette visual demonstration
computer-vision cuda ground-truth intel-realsense pandas python
Last synced: 18 Apr 2026
https://github.com/grindelfp/cuda-n-body-simulation
Simulation of N-Body movement using CUDA.
Last synced: 06 Apr 2025
https://github.com/sephiroth7712/k-nearest-neigbours
Implementation of K-Nearest Neighbors algorithm using multiple parallel computing approaches: CUDA (GPU), Hadoop, Spark, MPI, OpenMP, and PThreads. Demonstrates scalable machine learning across different parallel computing paradigms from GPU to distributed frameworks.
cuda cuda-programming hadoop-mapreduce java mpi multiprocessing multithreading openmp pthreads scala spark
Last synced: 12 Apr 2026
https://github.com/datasagess/fic
NLP Hackaton \w NN + FastAPI + Docker
catboost cuda docker fastapi lstm python pytorch rapidfuzz tensorflow
Last synced: 08 Aug 2025
https://github.com/dmitryyurov/bitonic-cuda
An implementation of bitonic search on CUDA
cuda gpu-programming sorting-algorithms
Last synced: 02 Oct 2025
https://github.com/TeamBipartite/bipartite-gemm
High throughput data-parallel GEMM implementations in Cuda using Cuda cores and Tensor cores
Last synced: 14 Jan 2026
https://github.com/aeyage/intraday_prices
GPU-accelerated portfolio optimisation
Last synced: 05 Apr 2025
https://github.com/drilonaliu/parallel-mandelbrot-set
GPU-accelerated Mandelbrot Set generation with CUDA and OpenGL interoperability.
cuda fractals gpu mandelbrot-fractal parallel-programming
Last synced: 12 Apr 2026
https://github.com/farukalamai/cpp-for-cuda
A structured C++ learning path designed specifically for developers preparing to learn CUDA programming.
Last synced: 09 Jun 2026
https://github.com/conan-kiln/kiln
An actively maintained fork of ConanCenter with an emphasis on CV, ML and robotics capabilities on edge devices
computer-vision conan cuda machine-learning oneapi packaging robotics rust scientific-computing
Last synced: 02 Oct 2025
https://github.com/aurelienperez/gpu-heston-monte-carlo
GPU-accelerated Monte Carlo simulation for option pricing under the Heston model using CUDA.
Last synced: 01 Apr 2025
https://github.com/nikhilrout/thetensorcoreproject
Microarchitecture implementation of Nvidia's Tensor Cores
cuda floating-point gpgpu hybrid-precision-training tensorcore
Last synced: 01 Apr 2025
https://github.com/brave-tarnished/gpu-accelerated-opc
Optical Proximity Correction (OPC) is a photolithography technique that modifies photomask geometry to counteract diffraction and process effects, ensuring accurate printing of patterns on the wafer. This work demonstrates a proof of concept showing how using a GPU-based approach can significantly speed up these modifications compared to a CPU.
cpp cuda gpu-acceleration photolithography semiconductors
Last synced: 02 Oct 2025
https://github.com/yutakseo/docker_ubuntu-cuda_environment
🐳 A ready-to-use Docker environment for deep learning development with Ubuntu 22.04 and CUDA 11.8.
container cuda docker environment ubuntu
Last synced: 12 Apr 2026
https://github.com/sankeer28/pptx-text-audio-transcriber
Extract text and transcribe audio from PowerPoint presentations using OpenAI Whisper.
audio-transcription cuda openai-whisper powerpoint pptx-parser
Last synced: 02 Oct 2025
https://github.com/isquicha/cuda-parallel-studies
Learning CUDA programming here =D
cuda cuda-programming cuda-toolkit
Last synced: 03 Jul 2025
https://github.com/desmondjs/cuda_mceliece_kem
CUDA-Accelerated McEliece KEM 🔑 | Post-Quantum Cryptography on GPU Implementation of Classic McEliece key encapsulation, encryption, decryption, and decapsulation on CPU & GPU with CUDA, including benchmarking scripts and full FYP2 report
academic-project benchmarking classic-mceliece cuda fyp gpu-acceleration kem pqc
Last synced: 02 Oct 2025
https://github.com/nvaranki/cmmx
CUDA matrix multiplication (official guide, modified)
Last synced: 08 Aug 2025
https://github.com/jadc/cuda-raytracer
A simple path tracer written in CUDA.
cpp cuda gpu-programming graphics parallel-programming path-tracing raytracing
Last synced: 16 May 2026
https://github.com/fikri-rouzan/cuda-c-program-part-1
CUDA C program from NVIDIA course.
Last synced: 12 Apr 2026
https://github.com/crazyguitar/libefaxx
aws benchmark cpp20-coroutine cuda efa gpu gpu-benchmarks hpc large-language-models llm rdma rdma-benchmarks
Last synced: 16 Jan 2026
https://github.com/toshikinakamura0412/dockerfiles
Development environment using Docker for some Linux distributions
alpine bash cuda debian devcontainer devcontainers docker docker-compose fedora opencv opensuse ros ros-humble ros-noetic ros2 ubuntu ubuntu2004 ubuntu2204 vscode zsh
Last synced: 10 Jul 2025
https://github.com/alpinebuster/meshlib
Mesh processing library with extra `C/C#/JS/TS/PYTHON` bindings.
cuda dicom electron emscripten mesh mesh-modelling pybind11 stl stomatology threejs wasm
Last synced: 03 Jul 2025
https://github.com/ne0nwinds/gpupuzzles
My solutions to srush/GPU-Puzzles using CUDA
Last synced: 16 May 2026
https://github.com/ivanfioravanti/tflops_mps
TFLOPs testing on MPS and CUDA
Last synced: 19 May 2026
https://github.com/pipecruz/cuda-flocking-sim
CPU and GPU (CUDA) implementations of naive/optimized flocking algorithms
Last synced: 07 May 2026