CUDA
CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.
- GitHub: https://github.com/topics/cuda
- Wikipedia: https://en.wikipedia.org/wiki/CUDA
- Created by: Nvidia
- Released: June 23, 2007
- Related Topics: nvcc,
- Last updated: 2026-06-30 00:07:24 UTC
- JSON Representation
https://github.com/kirubhakaranm/vision-pipeline-cuda
High-performance camera processing pipeline with CUDA GPU acceleration, CPU multithreading, and real-time TCP/IP telemetry monitoring (1,200+ FPS, <1ms latency)
computer-vision cpp17 cuda edge-detection gpu-acceleration image-processing multithreading networking opencv performance-optimization real-time robotics tcp-ip telemetry
Last synced: 12 Apr 2026
https://github.com/sangioai/sph
CUDA and OpenMP versions of SPH (Smoothed Particle Hydrodynamics) serial algorithm.
Last synced: 27 Apr 2026
https://github.com/phrutis/bip39scan.com
Collective search for old coins
bip39 brute-force client-server cuda gpu mnemonic pass passphrase passphrase-generator passwords
Last synced: 04 Sep 2025
https://github.com/pintamonas4575/rlgan-project-maadm-upm
Neuroevolution to learn the Lunar Lander from Gymnasium and a GAN to learn to color images. Subject from the ML and BD master´s degree of UPM.
cifar10 cuda dcgan deep-learning flappy-bird gan genetic-algorithm lunar-lander machine-learning mlp python3 pytorch reinforcement-learning tensorflow wgan-gp
Last synced: 12 Apr 2026
https://github.com/lruizap/testcuda
Guide to install and use cuda for programming
Last synced: 12 May 2026
https://github.com/marcorentap/kokkos-docker-cluster
Deploy Docker containers with Kokkos, OpenMP, OpenMPI and CUDA as a Docker swarm.
Last synced: 10 Mar 2025
https://github.com/amitkumarj441/deep-learning-on-your-finger
A rich collection of dockerfiles for installing deep learning dependecies on your way :rocket:
Last synced: 18 Apr 2026
https://github.com/debanjan06/spatial-streamio
An optimized, out-of-core asynchronous data streaming pipeline for high-throughput 3D point cloud training loops. Features low-level numpy.memmap zero-copy reads and multi-threaded ring prefetching to eliminate I/O bottlenecks, delivering a 33.33% throughput efficiency gain on PyTorch CUDA workloads.
asynchronous-programming cuda data-engineering deep-learning-pipelines io-optimization memory-mapping point-cloud pytorch
Last synced: 11 Jun 2026
https://github.com/matteopolak/stock-predict
Stock prediction with LSTM using TensorFlow and TypeScript.
ai artificial-intelligence cuda lstm machine-learning stock tensorflow typescript
Last synced: 09 May 2026
https://github.com/boohohoo/shamining
Shamining is a cloud mining service that allows users to mine cryptocurrencies without the need for personal hardware. By renting computing power from eco-friendly data centers, users can mine efficiently. The platform offers easy-to-use interface, flexible contracts, and daily payouts.
cryptocurrency cryptomining cuda gpu-mining mining mining-software open-source opencl
Last synced: 04 Jul 2025
https://github.com/xstupi00/N-Body-CUDA
PCG - Parallel Computations on GPU - Project - N-Body-CUDA
cuda gpu-acceleration gpu-computing nbody-simulation optimization parallel-computing pcg vut vut-fit
Last synced: 11 Mar 2025
https://github.com/simonschoelly/poisson-solver
A solver for a modified poisson equation using cuda.
cpp cuda finite-difference gpgpu pgc poisson-equation preconditioned-conjugate-gradient thomas-algorithm
Last synced: 18 May 2026
https://github.com/prdai/mnist-digit-recognition
A PyTorch-based deep learning implementation for MNIST digit recognition featuring CNNs, GPU acceleration, experiment tracking, and comprehensive testing capabilities.
cnn computer-vision cuda data-science deep-learning digit-recognition image-classification machine-learning mnist neural-networks python pytorch wandb
Last synced: 12 Apr 2026
https://github.com/occisor2/fluidsimulation
Second project of my parallel algorithms course
cuda high-performance-computing
Last synced: 28 Feb 2025
https://github.com/brendanm12345/simple_renderer_cs149
Simple CUDA renderer implementation. 19th most efficient out of 150+ submissions
Last synced: 18 May 2026
https://github.com/rajshrestha86/kmeans-clusterize-cuda
Implementation of K-Means algorithm from scratch using CUDA.
Last synced: 18 May 2026
https://github.com/boned-fruitwood759/whisperx-asr-with-fastapi
🎤 Enable real-time speech recognition with WhisperX using FastAPI for efficient, scalable audio processing.
asr ctranslate2 cuda fastapi openai python speech-recognition torch transformers whisper whisperx
Last synced: 12 Apr 2026
https://github.com/amruthapatil/nyu-cudaconvolution
Implementing convolution operations on an image using CUDA, exploiting different methodologies - basic, tiled, and cuDNN
Last synced: 13 Mar 2025
https://github.com/jiriklepl/bits-knn-jpdc2024
Replication package for the paper Towards Optimal GPU-accelerated K-Nearest Neighbors Search
bitonic-sort cuda gpu k-nearest-neighbors knn-search top-k
Last synced: 21 Mar 2025
https://github.com/edcalderin/huggingface_ragflow
This project implements a classic Retrieval-Augmented Generation (RAG) system using HuggingFace models with quantization techniques. The system processes PDF documents, extracts their content, and enables interactive question-answering through a Streamlit web application.
bitsandbytes cuda huggingface huggingface-embeddings langchain langchain-community large-language-models llm nf4 python qdrant quantization rag retrieval-augmented-generation ruff streamlit text-generation
Last synced: 15 Jul 2025
https://github.com/fmigneault/dockers
Collection of docker setup with common libraries for image processing and machine learning.
boost cuda docker image-processing opencv python
Last synced: 12 Apr 2026
https://github.com/aayes89/pyllm
Entrena tu propio LLM desde cero
cpu cuda llm llm-training pip python3
Last synced: 18 May 2026
https://github.com/avarga1/vllm-hb
vLLM-compatible inference runtime in pure Rust. Zero Python. Zero libtorch. CUDA via candle.
candle cuda inference llm openai-api rust tokio vllm
Last synced: 07 Apr 2026
https://github.com/loveboyme/yolov5-tensorrt-accelerator
基于TensorRT加速的YOLOv5高性能推理框架 | High-performance YOLOv5 inference framework accelerated by TensorRT with dynamic optimization
cuda dynamic-shapes-cuda-stream fp16 int8 pycuda tensorrt yolov5
Last synced: 29 Mar 2025
https://github.com/emanuelemessina/gigacheck
ABFT Matrix Multiplication of any size in CUDA
abft cuda matrix-multiplication
Last synced: 28 Feb 2025
https://github.com/karusb/2dca-cuda
2 Dimensional Cellular Automata Visualisation (Game of Life)
algorithm-flowchart cellular-automata cuda game game-of-life glut visual-studio
Last synced: 12 Apr 2026
https://github.com/wiktor2718/matrix_flow
Matrix Flow is a simple machine learning library written in Rust and CUDA. It was created as a portfolio project to deepen my understanding of machine learning, GPU programming, and Rust. It provides an API for matrix manipulation and includes specially optimized neural networks.
adam-optimizer benchmarking cuda deep-learning gpu-computing machine-learning matrix-operations neural-networks portfolio-project rust
Last synced: 18 May 2026
https://github.com/redhat-et/triton-cache-performance-comparison
amd-gpu cache cuda gpu nvidia-gpu performance rocm triton
Last synced: 12 Apr 2026
https://github.com/cppshizoids/cuda
This is my basic lessons of CUDA
cuda cuda-demo cuda-programming
Last synced: 15 Jul 2025
https://github.com/bjornmelin/cuda-core-projects
🎯 Essential CUDA programming patterns and optimizations. Showcasing parallel computing expertise through matrix operations, memory management, and advanced kernel implementations. 💻
cpp cuda cuda-kernels gpu-computing high-performance-computing nvidia optimization parallel-computing
Last synced: 12 Apr 2026
https://github.com/baro-00/cpp-cuda-lab
Experimental C++ projects using NVIDIA CUDA for parallel computing. Learning & testing GPU kernels
Last synced: 04 May 2026
https://github.com/tfogal/gemm-db
For creating a cacheable GEMM cost model.
Last synced: 18 May 2026
https://github.com/demetriantitus/machine-vision---yolov8
This project provides a comprehensive guide to object detection in cluttered environments using YOLOv8. It demonstrates how to identify and classify objects in both still images and video streams
computer-vision cuda dataset image-classification machine-learning nvidia-gpu object-detection surveillance traffic-monitoring video-analysis yolov8
Last synced: 18 May 2026
https://github.com/lionpsiuc/cflow
A computational model for heat propagation in a cylindrical radiator using both CPU and GPU parallel processing. The simulation uses finite difference methods to model the directional flow of heat through a cylindrical pipe system with specific boundary conditions and cyclic connections between pipe segments.
Last synced: 29 May 2026
https://github.com/0x778/gaussian_filter_using_cuda
Implemention of gaussain filter using CUDA
cuda cuda-kernels cuda-programming image-processing
Last synced: 04 May 2026
https://github.com/obj-wtf/gan-architecture
APP For training GAN Models on Architecture Plan
architecture building cuda gan pix2pix-tensorflow plan
Last synced: 18 May 2026
https://github.com/moshiba/fmindex
ultra fast parallel FM index generation for DNA reads
Last synced: 18 May 2026
https://github.com/ludgerpaehler/lulesh-enzyme
AD with Enzyme through Lulesh.
automatic-differentiation cuda cuda-programming gpu-computing high-performance-computing llvm-enzyme scientific-computing
Last synced: 15 Jun 2026
https://github.com/ivanbgd/cuda_quad_c
Calculates a definite integral by using three different rules. Compares sequential to parallel implementations.
cuda integrals parallel-implementations
Last synced: 28 Mar 2025
https://github.com/hrolive/data-analytics-in-the-era-of-large-scale-machine-learning
Slides and other material for the Cyprus NCC training event about "Data analytics in the era of large-scale machine learning".
cuda deep-learning gpu-acceleration gradient-boosting large-language-models machine-learning preprocessing python pytorch
Last synced: 13 Apr 2026
https://github.com/rushirg/cuda-matrix-multiplication
Matrix Multiplication on GPGPU in CUDA
cpu cuda gpu parallel-processing
Last synced: 17 May 2026
https://github.com/puzzlef/vector-max-cuda
Performance of sequential vs CUDA-based vector element max.
basics cuda element experiment max vector
Last synced: 17 May 2026
https://github.com/matthewfeickert/report-urssi-fellowship-2025
Report on URSSI 2025 Early-Career Fellowship
Last synced: 17 Jan 2026
https://github.com/ray-chew/modified_ch
Density functional theory (DFT) and self-consistent field theory (SCFT) simulation of diblock copolymers
cuda density-functional-theory diblock-copolymer numerical-analysis numerical-methods self-consistent-field-theory
Last synced: 11 May 2026
https://github.com/hr-fahim/transformer-model-optimization
Sample GPT Transformer Model from Scratch.
cuda few-shot-learning transfomers
Last synced: 02 May 2026
https://github.com/miferreiro/cdap-cuda
CUDA exercises for the subject of "Computación Distribuída e de Altas Prestacións" in the Master Degree of Computer Engineering of the University of Vigo in 2020
Last synced: 17 May 2026
https://github.com/alessiobugetti/histogram-equalization
Implements sequential and parallel histogram equalization in C++ and Python, utilizing CUDA for parallel computation on GPU
cuda gpu-acceleration histogram-equalization parallel-computing pycuda
Last synced: 04 May 2026
https://github.com/eminem5410/devmind-platform
Linux-first CLI for AI environment diagnostics, repair & automation
ai automation cli cuda developer-tools devops docker generative-ai linux local-llm observability ollama python self-hosted system-monitoring
Last synced: 30 May 2026
https://github.com/xza85hrf/flux_pipeline
FluxPipeline is a prototype experimental project that provides a framework for working with the FLUX.1-schnell image generation model. This project is intended for educational and experimental purposes only.
ai cuda docker educational experimental flux1 flux1-schnell flux1ai gradio image-generation model non-commercial python pytorch research transformer-model
Last synced: 05 Jul 2025
https://github.com/doxakis/cosinesimilaritydistancesongpu
Compute cosine similarity distances for all combinations of the dataset on the gpu with CUDA
Last synced: 13 Apr 2026
https://github.com/flagro/paralleltasks
CUDA/OpenMP parallel tasks
algorithms compression cpp cuda openmp parallel-computing unique-values
Last synced: 17 May 2026
https://github.com/drilonaliu/parallel-s_aes-ccm-xts
aes cryptography cuda gpu parallel-programming saes
Last synced: 21 Mar 2025
https://github.com/drilonaliu/parallel-caesar-cipher
caesar-cipher cryptography cuda gpu parallel-programming
Last synced: 21 Mar 2025
https://github.com/eyelor/text-to-image-item-generator
A Python workflow for generating random item images using models from Hugging Face.
ai conda cuda flux-schnell generator huggingface item llama python pytorch text-to-image
Last synced: 13 Apr 2026
https://github.com/tianzonglin/cloud-control-gui
A tool to compute, visualize, analyse and drag points (high-dimensional data)
cuda interaction-design visualization
Last synced: 25 Apr 2026
https://github.com/versi379/optimized-matrix-multiplication
This project utilizes CUDA and cuBLAS to optimize matrix multiplication, achieving up to a 5x speedup on large matrices by leveraging GPU acceleration. It also improves memory efficiency and reduces data transfer times between CPU and GPU.
cublas cuda cuda-programming hpc matrix-multiplication parallel-computing parallel-programming
Last synced: 17 May 2026
https://github.com/ergus/algorithms
Set of multiple algorithms implemented in multiple paradigms
algorithms cmake concurrency cpp cuda gpgpu inter-language metaprogramming multithreading pthreads stl testing
Last synced: 17 May 2026
https://github.com/ubermorgott/morgottalk
Cross-platform desktop push-to-talk voice transcription. Single binary. GPU accelerated (CUDA/Vulkan/Metal/ROCm/OpenCL). Powered by whisper.cpp.
cuda desktop go gpu speech-to-text svelte transcription voice wails whisper
Last synced: 07 Apr 2026
https://github.com/efecaliskannn/pneumonia-detection-with-cnn--vgg16--and-resnet50-deep-learning-models
In this project, pneumonia detection using deep learning, a subset of artificial intelligence, is aimed. The performance of deep learning algorithms, including CNN, VGG16, and ResNet50 models, in detecting pneumonia has been examined.(Bu projede yapay zekanın alt kümesi olan derin öğrenme ile zatürre tespiti amaçlanmaktadır.)
artificial-intelligence convolutional-neural-networks cuda deep-learning keras-tensorflow nvidia-cuda pyhton transfer-learning
Last synced: 13 Jun 2025
https://github.com/programmergnome/kutyai
This is a python dog breed recognizer graphical application with 420 breeds and 42000 images.
cuda deep-learning image-classification python3 qt5-gui tensorflow transfer-learning
Last synced: 11 May 2026
https://github.com/tiktokfnf33/rayleigh-taylor-instability-simulation
# CUDA Rayleigh-Taylor Instability SimulationThis repository features a high-performance simulation of the Rayleigh-Taylor instability using CUDA, Python, and C. Explore the implementation and results to understand fluid dynamics in a parallel computing context. 🖥️🚀
c computational-fluid-dynamics cuda euler-method finite-difference gpu-computing hpc numerical-simulation parallel-computing physics-simulation python rayleigh-taylor-instability runge-kutta
Last synced: 04 May 2026
https://github.com/drilonaliu/parallel-permuation-cipher-attack
attack cryptography cuda gpu parallel-computing
Last synced: 21 Mar 2025
https://github.com/drilonaliu/bachelor-thesis
Parallel Programming Fractals
cuda fractals gpu parallel-programming
Last synced: 15 May 2026
https://github.com/mrgkanev/tensorflow-gpu-docker-setup
A Docker environment for TensorFlow GPU development with optimized configurations for WSL2, troubleshooting guides, and common error fixes
cuda cuda-toolkit deep-learning dev-environment development-tools docker gpu-acceleration machine-learning nvidia-docker nvidia-docker-support python tensorflow
Last synced: 13 Apr 2026
https://github.com/drilonaliu/parallel-permutation-cipher
cryptography cuda gpu parallel-programming permutation
Last synced: 19 Jul 2025
https://github.com/hrshl212/custom-cuda-kernels-with-neural-network-implementation
The repository contains custom CUDA kernels for linear layer, softmax and relu which are integrated with python to develop a Neural Network
cuda neural-network python pytorch
Last synced: 08 May 2026
https://github.com/phantom7knight/cuda-fusion
This project is for learning CUDA to understand the GPU work better.
cuda cuda-programming gpgpu gpu
Last synced: 17 May 2026
https://github.com/aaditya29/parallel-computing-and-cuda
Learning about Parallel Computing and GPU programming using CUDA.
c cpp cuda cuda-kernels cuda-programming nvidia-cuda openmp openmpi parallel-computing parallel-programming
Last synced: 18 Jul 2025
https://github.com/lord-turmoil/cudacmakedemo
A demo for building CUDA program with CMake
Last synced: 16 Mar 2025
https://github.com/delusionary/histoptimizer
Solves a minimum variance cost of the partition problem.
Last synced: 14 Jan 2026
https://github.com/dgcnz/nvtx-vscode
Create NVIDIA NVTX ranges directly in VS Code, then profile with Nsight Systems without modifying source code.
Last synced: 13 Apr 2026
https://github.com/ran-2012/cuda-practice
cuda practice code for nvidia programming guide
Last synced: 27 Feb 2025
https://github.com/avicted/hip_fm_synthesis
This project demonstrates FM Synthesis (Frequency Modulation) using HIP (Heterogeneous Compute Interface), enabling high-performance sound generation on both AMD and NVIDIA GPUs.
amd audio-processing cuda fm-synthesis hip nvidia rocm
Last synced: 16 Mar 2025
https://github.com/nel-s/vein-cracker
Recovers which internal generator states could have generated a provided set of Minecraft Java b1.6-1.12.2 veins. Those can then be used to recover 3/4ths of any worldseeds that could have generated them.
cuda minecraft seedcracking veins
Last synced: 16 Mar 2025
https://github.com/chensongpoixs/cmedia_transcode
媒体服务转码版本GPU(cuda) 支持H264与H265转码
cuda gpu h264 h265 media transcode-media
Last synced: 19 May 2026
https://github.com/drilonaliu/parallel-image-edge-detection
cuda edge-detection gpu image-processing
Last synced: 17 May 2026
https://github.com/cripterhack/business-address-scrapper
Python+Scrapy - Distributed scraping system with cache for business information extraction.
cuda ollama postgresql python redis scraper scraping scrapy tesseract
Last synced: 14 Jun 2025
https://github.com/kratugautam99/logiclink-project
LogicLink is a conversational AI chatbot developed by Kratu Gautam (AIML Engineer). Powered by the TinyLlama-1.1B-Chat-v1.0 model, it provides an interactive interface for engaging conversations, query resolution, and task assistance. Version 5 features streaming responses, conversation management, and a sleek GUI.
antd-design chatbot-application conversational-ai cuda gradio graphical-user-interface huggingface-spaces huggingface-transformers jupyter-notebooks keras large-language-models mlops model-service-controller modelscope-studio natural-language-generation natural-language-processing pytorch reasoning-agent tensorflow
Last synced: 07 Apr 2026
https://github.com/kar-dim/cas-2d
Implementation of the AMD FidelityFX CAS (Contrast Adaptive Sharpening) algorithm on CUDA/OpenCL, for sharpening static images.
cpp cuda dll fidelityfx gpu image-processing parallel-computing sharpen
Last synced: 22 Jun 2025
https://github.com/rugleb/cuda
A simple example of a program that uses parallel GPU computing on an NVIDIA graphics card using CUDA technology
Last synced: 10 Apr 2025
https://github.com/kanchishimono/python-images
Ubuntu based Python container images, including CUDA images
container-image cuda docker dockerfile machine-learning python python3
Last synced: 30 Apr 2026
https://github.com/rkarahul/person-detector-faceverifier
Person-Detector-FaceVerifier is a sophisticated system for detecting and verifying faces in images. Ideal for applications like passport control and security, it combines advanced face detection with precise verification techniques.
bootstrap5 css3 cuda django html5 javascipt opencv-python os python pytorch yolov8
Last synced: 07 Apr 2026
https://github.com/ribin-baby/cuda_cudnn_installation_on_ubuntu20.04
Installation of CUDA-11.8 with cuDNN-8.7 for ubuntu(20.04) server A30 GPU, and onnx gpu installation guide
cuda gpu linux onnxruntime server
Last synced: 16 May 2026