CUDA
CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.
- GitHub: https://github.com/topics/cuda
- Wikipedia: https://en.wikipedia.org/wiki/CUDA
- Created by: Nvidia
- Released: June 23, 2007
- Related Topics: nvcc,
- Last updated: 2026-06-22 00:07:17 UTC
- JSON Representation
https://github.com/david-palma/cuda-programming
Educational CUDA C/C++ programming repository with commented examples on GPU parallel computing, matrix operations, and performance profiling. Requires a CUDA-enabled NVIDIA GPU.
c-cpp cpp cuda cuda-toolkit education gpu gpu-programming kernel matrix-operations nvcc nvidia parallel-computing parallel-programming practice profiling threads
Last synced: 25 Apr 2026
https://github.com/crcrpar/dev-chainer
Dockerfile for Chainer Development in VSCode
chainer cuda docker nvidia-docker vscode
Last synced: 26 Apr 2026
https://github.com/haleelrah/Vision-pro-MAX
A Raspberry Pi-based object detection system for assisting visually impaired individuals. This project utilizes YOLO object detection and a Hailo 8L TPU to identify obstacles like manholes, potholes, and bumps, providing real-time audio feedback to aid navigation.
bash computer-vision cuda fine-tuning jupyter-notebook object-detection opencv python pytorch raspberry-pi rpi-camera ssh text-to-speech ultralytics yolo yolov8
Last synced: 30 Dec 2025
https://github.com/gravitytwog/electromagneticfield
Electro-magnetic field simulation made with CUDA
c cuda cuda-kernels cuda-programming
Last synced: 26 Apr 2026
https://github.com/linux-alex/geep
GEEP (Genetic Evolutionary Engineering Platform) - a C++/Qt framework for genetic programming, optimized with CUDA acceleration. GEEP enables large-scale population-based optimization, ideal for solving high-dimensional problems using evolutionary algorithms and GPU computing.
cpp cuda framework genetic-programming
Last synced: 18 May 2026
https://github.com/oaslananka/cv_cuda_cpp_sample
This is a sample project demonstrating how to use OpenCV and CUDA in C++ for detecting people in drone footage with YOLO. The project aims to be simple and understandable for those who want to learn how to use OpenCV and CUDA in C++.
computervision cpp cuda opencv
Last synced: 01 May 2026
https://github.com/baremetalrt/baremetalrt
BareMetalRT — edge GPU compute mesh
cuda distributed-computing gpu inference llm nvidia tensorrt windows
Last synced: 18 Apr 2026
https://github.com/sergiomarquezdev/yt-transcriber
🛠️ CLI tool to transcribe YouTube videos using OpenAI Whisper with CUDA acceleration, generate AI summaries (EN/ES) with Gemini, and create LinkedIn/Twitter content. Supports YouTube, Google Drive, and local files.
ai cli cuda gemini python transcription whisper youtube
Last synced: 15 May 2026
https://github.com/separatrixxx/pgp_labs_7_sem
👓 Laboratory work for the 7 semester of MAI on PGP and PDP
Last synced: 15 May 2026
https://github.com/mohammadshabazuddin/text_to_speech_generation_with_llm_with_hugging_face
Build a text-to-speech generation system using LLMs and Hugging Face to convert text into natural audio speech.
cuda huggingface-transformers llms nlp
Last synced: 03 May 2026
https://github.com/muppetsg2/cudaraytracer
A custom ray tracer originally developed during university studies to run on CPU, now ported to GPU using CUDA. This project was created to explore GPU rendering techniques and to gain hands-on experience with CUDA programming.
cuda mit-license nvidia-cuda nvidia-gpu raytracing sfml stb-image student-project study-project
Last synced: 16 Apr 2026
https://github.com/9prady9/archdock
Arch linux docker image for app development
arch-linux arrayfire cuda docker-image forge opencl
Last synced: 03 May 2026
https://github.com/tornikeo/minimal-vscode-cuda-meson
Minimal sample of using VSCode and Meson to build CUDA applications
Last synced: 08 Sep 2025
https://github.com/lablup/backend.ai-accelerator-cuda
The Backend.AI CUDA Accelerator Plugin
Last synced: 16 May 2026
https://github.com/elcruzo/cuda-conv
Lightweight CUDA kernel for 2D image convolution achieving 20x+ speedup. Built with CuPy for the NVIDIA Hackathon.
computer-vision convolution cuda cupy gpu-computing hackathon high-performance-computing image-processing nvidia python
Last synced: 15 May 2026
https://github.com/yashpotdar-py/flood-vision
Flood Vision - A deep learning–based computer vision system for flood mapping and damage assessment using aerial imagery.
cuda deep-learning flood-detection iot python
Last synced: 16 Apr 2026
https://github.com/sferez/sspp_sparse_matrix_cuda
Small Scale Parallel Programming, Sparse Matrix multiplication with CUDA
cpp cuda omp omp-parallel parallel-computing small-scale-parallel-programming sparse-matrix
Last synced: 30 Apr 2026
https://github.com/aaaastark/nvidia-cuda-google-colab
Deployment of NVIDIA-CUDA on Google Colab. With in examples codes (Vector Addition and Matrix Multiplication).
c cpp cuda googlecolab googlecolaboratory matrix-multiplication nvidia python vector-addition
Last synced: 16 Apr 2026
https://github.com/alexjmercer/cuda-npp-assignment
Learning about CUDA and NVIDIA Performance Primitives. Part of Coursera Assignment.
Last synced: 13 Feb 2026
https://github.com/tlabaltoh/tlab-sharescreen-server-win
Software frame encoder using CUDA and cast encoded frames over UDP. Trying to implement a custom streaming protocol and shader based frame encoder/decoder for screencast.
cuda desktop-capture screensharing unity unity3d windows-graphics-capture
Last synced: 14 Feb 2026
https://github.com/lehoangan2906/cuda_basics
A simple implementation of operations on vectors and matrices, optimized for running on Nvidia GPU with CUDA
Last synced: 16 Jun 2025
https://github.com/ankhoa1212/cuda-program
This is a GPU program built with CUDA using parallel reduction
cpp cuda curand gpu-programming parallel-reduction
Last synced: 14 Feb 2026
https://github.com/nagharjun17/mlir-to-ptx-cuda
Creating an MLIR dialect that fuses Addition + ReLU, lowers to NVVM and LLVM IR and generates PTX to run the kernel on CUDA GPU
cpp cuda deep-learning llvm mlir ptx
Last synced: 18 Apr 2026
https://github.com/mattjesc/gpu-accelerated-fap
GPU-Accelerated Frequency Analysis Prototype using CUDA, Unit Testing, and User-Defined Settings
c cmake cpp cuda cufft googletest gpu gpu-acceleration gpu-computing gpu-programming nvidia signal-processing test test-automation testing unit-testing
Last synced: 16 Apr 2026
https://github.com/smoke-y/athena
Deep learning library
cuda deep-learning deep-learning-library
Last synced: 01 Mar 2026
https://github.com/aarid/cuda_operations
This project compares performance between CPU and GPU with CUDA operations. Two simples cases are used: matrix multiplication and 2d convolution.
conv2d cuda cuda-programming gpu gpu-computing matrix-multiplication
Last synced: 02 Mar 2026
https://github.com/anselm67/cuda_mnist
A CUDA implementation of MNIST - for CUDA beginners.
cuda gpu gpu-computing gpu-programming mnist mnist-classification
Last synced: 02 Mar 2026
https://github.com/deltatecs/voses
Volatile Secret Searcher - massively parallel, brute force memory dump analysis for (D)TLS secret extraction
cuda memory-hacking reverse-engineering tls
Last synced: 15 Jun 2025
https://github.com/atticuszeller/pytorch-lightning-uv
📦 Zero-config Deep Learning template with PyTorch Lightning, UV package manager, W&B tracking, and modern Python tooling 🚀
classification cuda deep-learning machine-learning mnist-classification python pytorch pytorch-lightning typer uv
Last synced: 16 Apr 2026
https://github.com/viktor-akusoff/chernabogpy
ChernabogPy is a Python package for visualizing gravitational distortions caused by black holes using nonlinear ray tracing.
cuda gpu physics-simulation python3 relativity-of-space-and-time torch
Last synced: 15 May 2026
https://github.com/eagleeee2/ethminer
EthMiner is a powerful Ethereum mining software optimized for GPU performance using OpenCL and CUDA technologies. It provides easy setup, detailed performance metrics, and robust compatibility with major mining pools, ensuring maximum efficiency and profitability for both novice and experienced miners.
cryptocurrency cuda eth ethash ethereum ethereum-mining gpu-mining mining-pool mining-software open-source
Last synced: 16 Apr 2026
https://github.com/harmeshgv/gpu-powered-bert-finetuning
Efficient fine-tuning of BERT models using CUDA-powered GPUs, optimized for laptops and devices with NVIDIA RTX 3000/4000 series or CUDA-compatible GPUs. Ideal for fast NLP model training with PyTorch and Hugging Face Transformers.
bert-model cuda finetuning-llms pytorch
Last synced: 16 Apr 2026
https://github.com/zury7/parallel-programming
A collection of performance optimizations and comparisons between multiprocessing and multithreading using pthreads, OpenMP, and CUDA. The experiments analyze execution speed, resource usage, and parallelization efficiency across different computational models. ( CS 4553 : Scientific Computing )
Last synced: 08 May 2026
https://github.com/grizzz13/minimal-cuda
Minimal configurations to setup cuda cpp in cmake.
Last synced: 18 Apr 2026
https://github.com/AMYPAD/miutil
Basic functionality needed for AMYPAD
cuda matlab medical-imaging python
Last synced: 10 Apr 2025
https://github.com/phrutis/bip39scan
brute bip39 mnemonic GPU - $250
bip39 brute brute-force bruteforce cuda gpu mnemonic phrases seed
Last synced: 10 Apr 2025
https://github.com/joe-mruz/hgvisualizer
An interactive simulation and visualization tool for evolving hypergraphs, inspired by the Wolfram Physics Project.
cpp cuda hypergraph physics simulator wolfram
Last synced: 02 May 2026
https://github.com/iebeid/cuda-particles
A simple visualization of particles calcualted using CUDA
Last synced: 17 Apr 2026
https://github.com/farukalamai/cpp-for-cuda
A structured C++ learning path designed specifically for developers preparing to learn CUDA programming.
Last synced: 09 Jun 2026
https://github.com/jonmarty/pycuda-kmeans
A parallelized PyCuda implementation of the KMeans clustering algorithm.
Last synced: 25 Apr 2026
https://github.com/jdibenes/game_of_life_cuda
OpenGL / CUDA implementation of Conway's Game of Life.
cpp cuda opengl qt6 simulation
Last synced: 02 Apr 2026
https://github.com/chrisdalvit/gpu-matrix-transpose
Implementation and benchmarking of different matrix transpose with CUDA
c cpp cuda cuda-kernels cuda-programming gpu-acceleration gpu-computing gpu-programming matrix-transpose nvidia-gpu
Last synced: 17 Apr 2026
https://github.com/leo27945875/pybind11_cuda_matmul
cpp cuda matrix-multiplication pybind11 python3
Last synced: 17 Apr 2026
https://github.com/antoniakras/semantic-video-search
GPU-optimized semantic search on video transcripts, with benchmarking of FAISS, Pinecone, and PostgreSQL vector databases. Deployed via Docker on FORTH’s GPU infrastructure.
bert-embeddings bert-fine-tuning cuda dokcer embedding-models embeddings-word2vec faiss-vector-database gpu-computing huggingface-transformers nlp-machine-learning pgvector pineconedb postgresql python pytorch retrieval-augmented-generation similarity-search vector-database whisper-ai
Last synced: 03 May 2026
https://github.com/loreloc/triturus
A bunch of triton kernels with increasing complexity for learning and exploring triton and GPU programming
Last synced: 17 Apr 2026
https://github.com/stckvrflw/pem-spgemm
pemSpGEMM - An Improved SpGEMM Algorithm
Last synced: 17 Apr 2026
https://github.com/void4main/bifurcation-diagram
These little python scripts plot a bifurcation diagram into a png file (work fine on a raspberry pi and accelerated on a NVIDIA Jetson Nano) - but still a lot of room for improvements ...
bifurcation cuda feigenbaum gpu jetson logistic map nano numba sequence vectorize
Last synced: 17 Apr 2026
https://github.com/bjornmelin/ml-production-engineering
⚙️ End-to-end ML deployment solutions. Focused on model serving, multi-GPU optimization, and production-grade system implementation. 🎯
cuda deployment docker fastapi gpu-computing kubernetes mlops production
Last synced: 17 Apr 2026
https://github.com/bjornmelin/nlp-engineering-hub
📚 Enterprise NLP systems and LLM applications. Features custom language model implementations, distributed training pipelines, and efficient inference systems. 🔤
cuda gpu-optimization huggingface huggingface-transformers langchain language-models large-language-models nlp openai python transformers
Last synced: 17 Apr 2026
https://github.com/vibesmiths/mcp-rvc
GPU service for voice cloning via Retrieval-based Voice Conversion (CUDA + ROCm)
cuda docker gpu rocm rvc tts voice-cloning
Last synced: 17 Apr 2026
https://github.com/vibesmiths/mcp-musicgen
GPU service for text-to-music generation via Meta AudioCraft (CUDA + ROCm)
audiocraft cuda docker gpu musicgen python rocm text-to-music
Last synced: 17 Apr 2026
https://github.com/briiqn/obj2schem
A CUDA enabled .obj model to schematic (Sponge V3) converter
cuda minecraft schematics wavefront-obj worldedit
Last synced: 17 Apr 2026
https://github.com/cs550-epfl/report
EPFL CS-550 project report
cuda formal-verification gpu memory-consistency ptx simt
Last synced: 03 Jun 2026
https://github.com/kentakoong/mtnlog
A simple multinode performance logger for Python
cuda lanta nvitop python slurm-cluster
Last synced: 11 Jan 2026
https://github.com/synapticore-io/torch-cuda
PyTorch CUDA 12.8 project template - GPU-accelerated ML with modern Python packaging (uv), checkpointing, early stopping, and reproducibility utilities.
cuda gpu project-template python pytorch
Last synced: 04 Apr 2026
https://github.com/seieric/pytorch-mpi-singularity
Singularity Container including PyTorch with CUDA and mpi backend for DistributedDataParallel
cuda hpc nvidia openmpi pytorch singularity utokyo
Last synced: 18 Apr 2026
https://github.com/thalesmg/haskell-accelerate-parconc
Example and benchmark of Accelerate-HS from Parallel and Concurrent Programming in Haskell
accelerate cuda gpu-computing haskell parallel-computing
Last synced: 18 Apr 2026
https://github.com/qanastek/concurency-tetravex
This software is an fast and reliable tetravex solver based on C++ and CUDA.
c-plus-plus cuda parrallel-computing tetravex
Last synced: 18 Apr 2026
https://github.com/abdelrahman-amen/active_learning_in_nlp
I applied active learning to the IMDB dataset for sentiment analysis. Starting with a small labeled subset, I trained a model and used uncertainty sampling to select and label challenging reviews. This iterative process improved performance while reducing labeling effort.
activelearning cuda entropy imdb-dataset margin nlp python sklearnex torch uncertainty
Last synced: 18 Apr 2026
https://github.com/betarixm/csed490c
POSTECH: Heterogeneous Parallel Computing (Fall 2023)
cuda gpu parallel-computing postech
Last synced: 19 Apr 2026
https://github.com/flavienbwk/nvidia-cuda-mirror-docker
An all-in-one mirror for installing NVIDIA Docker.
cuda docker linux-mirror mirror nvidia nvidia-docker nvidia-docker2 offline offline-capable
Last synced: 18 Apr 2026
https://github.com/cooliron2311/cumd5bf
CUDA based md5 password bruteforcer
Last synced: 18 Apr 2026
https://github.com/marcellodesales/nvidea-cuda-ubuntu-docker
NVidea CUDA base image on Ubuntu Linux, used to run Machine Learning
ai cuda docker docker-compose machine-learning ml nvidia-docker ubuntu
Last synced: 18 Apr 2026
https://github.com/equiel-1703/cuhip
Wrapper tool to convert CUDA source code to HIP code and compile it with HIPCC. Useful for learning CUDA programming using AMD devices..
Last synced: 14 May 2026
https://github.com/dmmutua/cuda_projects
An Implementation of a variety of Algorithms & Technical Papers Mostly Related to Machine Learning & Deep Learning in CUDA C
c cuda cuda-programming deep-learning machine-learning machine-learning-algorithms
Last synced: 18 Apr 2026
https://github.com/genpat-it/ohe-rs
Ultra-fast one-hot encoding for bioinformatics and ML, powered by Rust + CUDA. Built for cgMLST allele profiles and large-scale categorical data.
bioinformatics cuda machine-learning one-hot-encoding performance pyo3 python rust
Last synced: 04 Jun 2026
https://github.com/ex539/docker-dev-env
A collection of ready-to-use Docker development environments for multiple Linux distributions (Ubuntu, Debian, Alpine, Arch, Kali). Includes shared configurations, utility scripts, and comprehensive documentation for reproducible development setups across teams and CI/CD pipelines.
big-data cpp cuda docker docker-image docker-php docker-setup environment hadoop jenkins kubernetes qtcreator reproducibility x11
Last synced: 05 Apr 2026
https://github.com/aditiisaxena/cuda-accelerated-box-filter-for-texture-image-enhancement
Enhances grayscale texture images using a CUDA-based box filter. Built with CUDA, C++14, and OpenCV for high-performance image processing.
cpp cuda gpu-programming linux nvidia opencv
Last synced: 18 Apr 2026
https://github.com/aledinola/ifp_cuda_mex
Solve the income fluctuation problem on the GPU
Last synced: 14 May 2026
https://github.com/intelav/gpu-agent-opt
AI Agent Framework for GPU Kernel Autotuning & Optimization. Automate CUDA kernel exploration, profiling, and tuning with AI-driven agents for deep learning, geospatial AI, and HPC workloads.
ai-agents autotuning cuda deep-l edge-ai geospatial gpu hpc nvidia optimization performance pytorch
Last synced: 19 Apr 2026
https://github.com/vicen-te/tiny-nn
A tiny neural network framework for fully-connected layers with CPU and CUDA support
backpropagation cplusplus-20 cpu cuda cuda-12-8 kernel multi-threaded neural-network nn
Last synced: 19 Apr 2026
https://github.com/timanema/msc-thesis-public
Repository containing a GPU-accelerated compressor based on FSST
compression cpp cuda gpu thesis
Last synced: 19 Apr 2026
https://github.com/ronaldsg20/compu-paralela
Códigos de ejemplo para computación paralela y distribuida
cuda opencv openmp posix-threads
Last synced: 14 May 2026
https://github.com/fatlipp/toyslam
SLAM implementation from scratch w/o external graph optimization libs
cuda gpu lidar-slam mapping odometry robotics slam
Last synced: 20 Apr 2026
https://github.com/ydkn/htw-progko-cuda
Parallel processing of image transformations. Part of the "Programmierkonzepte und Algorithmen" course at HTW-Berlin.
cuda image-transformations opencv
Last synced: 20 Apr 2026
https://github.com/rtfirst/voice-to-text
Cross-platform Push-to-Talk speech-to-text — local Whisper transcription (CUDA/MPS) with optional Anthropic API correction and live VU meter overlay. Windows 11 + macOS.
cuda macos push-to-talk python speech-to-text voice-input whisper windows
Last synced: 04 Jun 2026
https://github.com/amirbroker/cupydtw
Use Cuda for Dynamic Time Warping
cuda dtw dynamic-time-warping python
Last synced: 20 Apr 2026
https://github.com/arya2004/parallel-computing
Parallel Computing Uni Course
Last synced: 18 May 2026
https://github.com/alexkranias/triton_vs_cuda
Building Triton and CUDA kernels side-by-side to create a cuBLAS-performant GEMM kernel.
cuda cuda-kernels gpu gpu-programming parallel-programming python triton
Last synced: 20 Apr 2026
https://github.com/juntyr/necsim-rust-docs
Documentation of the spatially explicit biodiversity simulation necsim-rust
biodiversity cuda docs mpi necsim rust simulation
Last synced: 14 May 2026
https://github.com/jusqua/dip-benchmark
Departmental undergraduate research project at UFS. Digital image processing benchmark using multiple tools to learn new ways to develop image processors.
benchmark cuda image-processing matlab opencv sycl visiongl
Last synced: 20 Apr 2026
https://github.com/bonevbs/cuknn
Cuda implementation of k-nearest neighbor search
Last synced: 20 Apr 2026
https://github.com/py-sandy/llama.cpp-windows-builder
Automated, reproducible build scripts for llama.cpp on Windows 10/11. Installs prerequisites, configures CMake and builds with CUDA.
ai build-scripts build-tool builder cuda llamacpp script scripts windows windows-10 windows-11
Last synced: 20 Apr 2026
https://github.com/nguyenpanda/gemm
Parallel Computing Assignment - K251 - HCMUT - VNU
cpp23 cuda forkjoin matrix-multiplication mpi openmp openmpi parallel-computing simd simd-instructions strassen-multiplication
Last synced: 14 May 2026