CUDA
CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.
- GitHub: https://github.com/topics/cuda
- Wikipedia: https://en.wikipedia.org/wiki/CUDA
- Created by: Nvidia
- Released: June 23, 2007
- Related Topics: nvcc,
- Last updated: 2026-07-01 00:07:09 UTC
- JSON Representation
https://github.com/lord-turmoil/cudacmakedemo
A demo for building CUDA program with CMake
Last synced: 16 Mar 2025
https://github.com/delusionary/histoptimizer
Solves a minimum variance cost of the partition problem.
Last synced: 14 Jan 2026
https://github.com/dgcnz/nvtx-vscode
Create NVIDIA NVTX ranges directly in VS Code, then profile with Nsight Systems without modifying source code.
Last synced: 13 Apr 2026
https://github.com/ran-2012/cuda-practice
cuda practice code for nvidia programming guide
Last synced: 27 Feb 2025
https://github.com/avicted/hip_fm_synthesis
This project demonstrates FM Synthesis (Frequency Modulation) using HIP (Heterogeneous Compute Interface), enabling high-performance sound generation on both AMD and NVIDIA GPUs.
amd audio-processing cuda fm-synthesis hip nvidia rocm
Last synced: 16 Mar 2025
https://github.com/nel-s/vein-cracker
Recovers which internal generator states could have generated a provided set of Minecraft Java b1.6-1.12.2 veins. Those can then be used to recover 3/4ths of any worldseeds that could have generated them.
cuda minecraft seedcracking veins
Last synced: 16 Mar 2025
https://github.com/maltsev-andrey/cuda-nn-inference
GPU-accelerated neural network inference using custom CUDA kernels. Achieves 97.82% accuracy on MNIST.
cuda deep-learning gpu-programming neural-networks numba nvidia parallel-computing parallel-programming performance-optimization python3 pytorch rhel9 tesla-p100
Last synced: 07 Mar 2026
https://github.com/cripterhack/business-address-scrapper
Python+Scrapy - Distributed scraping system with cache for business information extraction.
cuda ollama postgresql python redis scraper scraping scrapy tesseract
Last synced: 14 Jun 2025
https://github.com/andreasholt/cuda-matmul-benchmarking
Implementing and benchmarking various matmul implementations in CUDA
Last synced: 01 Nov 2025
https://github.com/nxoti1/points-reader-ocr
🖥️ Extract text from images easily with POINTS-Reader OCR, a high-accuracy application for seamless document conversion and processing.
cuda gradio huggingface-transformers ocr open-source points-reader reportlab spaces tencent vision-language-model vlm
Last synced: 20 May 2026
https://github.com/ludekcizinsky/fast-cg-solver
Implementation of Conjugate Gradient (CG) algorithm for solving sparse linear systems using MPI and CUDA.
Last synced: 17 May 2026
https://github.com/myselfaryan/attention-mechanism
Accelerating Scaled Dot-Product Attention using OpenMP and CUDA
Last synced: 27 Apr 2026
https://github.com/juliankarrer/reyn
CUDA-based Implementation of Smoothed Particle Hydrodynamics for Fluid Simulation
cuda fluid lagrangian simulation sph
Last synced: 31 Oct 2025
https://github.com/lu-m-dev/cuda-molecular-simulation
CUDA accelerated molecular simulation of materials
cuda materials-science molecular-dynamics molecular-simulation monte-carlo
Last synced: 25 Jun 2026
https://github.com/rugleb/cuda
A simple example of a program that uses parallel GPU computing on an NVIDIA graphics card using CUDA technology
Last synced: 10 Apr 2025
https://github.com/nabilshadman/cuda-4-dummies
Lecture slides and exercise files of the CUDA 4 Dummies course (2025)
cuda gpu-computing high-performance-computing nsight-systems nvidia-gpu parallel-computing
Last synced: 31 Oct 2025
https://github.com/flosmume/cpp-cuda-streams-and-pinned-mem
A CUDA C++ demo showing how to overlap data transfer and kernel execution using multiple streams and pinned (page-locked) host memory. This project illustrates asynchronous memcpy, event timing, and performance benefits of concurrent GPU execution — essential for building high-throughput pipelines.
asynchronous-execution cuda cuda-streams gpu parallel-programming performance-optimization pinned-memory
Last synced: 13 May 2026
https://github.com/uva-trasgo/controllers
Read-only mirror of the official repository: https://gitlab.com/trasgo-group-valladolid/controllers. Controllers is a library written in C11 that provides a simplified way to program applications that can exploit heterogeneous computational platforms including accelerators and/or multi-core CPUs.
cuda heterogeneous-computing heterogeneous-parallel-programming hip opencl openmp
Last synced: 12 May 2026
https://github.com/mahdi-hasan-shuvo/ml-opensource-project
is an open source repository focused on providing practical and educational machine learning resources. The project aims to make learning and applying machine learning more accessible through well-documented code, tutorials, and real-world examples.
cuda machine-learning machine-learning-algorithms ml-projects open-source python
Last synced: 19 May 2026
https://github.com/sneha-at-hub/bruteforce_passwordcracking_in-milliseconds
Last synced: 28 Apr 2026
https://github.com/eastonman/tensorrt-pytorch-wrapper
A wrapper makes TensorRT engine accept PyTorch Cuda Tensor.
Last synced: 06 May 2026
https://github.com/ramyacp14/document-based-question-and-answers
Developed a document question answering system that utilizes Llama and LangChain for contextual and accurate answers. The system supports .txt documents, intelligent text splitting, and context-aware querying through an easy-to-use Streamlit interface.
chroma cuda hugging-face langchain llama python recursivecharactertextsplitter streamlit
Last synced: 07 Mar 2026
https://github.com/storterald/neural-network
Simple neural network implementation in C++ and CUDA
asm asmx86 c-plus-plus cmake cpp cuda machine-learning neural-network
Last synced: 28 Mar 2025
https://github.com/naidezhujimo/cuda-learning-just-record-the-learning-process-
just record the learning process,There are notes,Welcome to learn.
Last synced: 26 Mar 2025
https://github.com/azdavis/parallel-portrait-mode
Parallel Portrait Mode
cuda image-processing ispc openmp
Last synced: 13 Apr 2026
https://github.com/amypad/miutil
Basic functionality needed for AMYPAD
cuda matlab medical-imaging python
Last synced: 13 May 2025
https://github.com/ivanfioravanti/tflops_mps
TFLOPs testing on MPS and CUDA
Last synced: 19 May 2026
https://github.com/grindelfp/cuda-n-body-simulation
Simulation of N-Body movement using CUDA.
Last synced: 06 Apr 2025
https://github.com/drilonaliu/parallel-fractal-tree
GPU-accelerated fractal tree generation with CUDA and OpenGL interoperability.
cuda fractal-tree fractals gpu
Last synced: 19 May 2026
https://github.com/patriciobcs/mini-aevol
Parallel implementation of a reduced version of the Aevol simulator
Last synced: 19 May 2026
https://github.com/muneeb706/cuda
sample programs implemented using cuda (gpu)
cplusplus cuda gpu-programming
Last synced: 19 May 2026
https://github.com/hnthap/vietnamese-word-segment
Vietnamese word segmentation package.
cuda torch transformers vietnamese vietnamese-nlp vietnamese-tokenizer word-segmentation
Last synced: 19 May 2026
https://github.com/andresvalle/ocr-extraction
Text extraction from images using EasyOCR and parallelization with PyTorch
Last synced: 01 May 2026
https://github.com/chiragajain/gpu-optimization-roadmap
This repository is part of a structured curriculum designed to master GPU optimization, Triton, Deep Learning, and LLMs. This section focuses on GPU fundamentals, CUDA programming, and PyTorch optimizations.
cuda deeplearning gpu-acceleration learning python pytorch triton
Last synced: 18 Feb 2026
https://github.com/sevilze/folderesque
Python Script to process and upscale images in specified folders using RRDB models.
Last synced: 02 Mar 2026
https://github.com/kar-dim/CAS-2D
Implementation of the AMD FidelityFX CAS (Contrast Adaptive Sharpening) algorithm on CUDA, for sharpening static images.
cpp cuda dll fidelityfx gpu image-processing parallel-computing sharpen
Last synced: 01 Nov 2025
https://github.com/kenwuqianghao/c4ai-cuda-birds
Homework assignments for C4AI Beginners in Research-Driven Studies
Last synced: 18 Apr 2026
https://github.com/TheodoreAI/monte-carlo-simulator
CUDA application for Monte Carlo simulation is used to determine the range of outcomes for a series of parameters, each of which has a probability distribution showing how likely each option is to happen. This is using CUDA.
cuda gpu-computing monte-carlo-simulation parallel-computing
Last synced: 06 Oct 2025
https://github.com/mxm-tr/docker-darknet-opencv
Accelerated objects detection on streams and files, using a Docker darknet YOLO container
cuda docker docker-compose object-recognition opencv-python python3 yolo
Last synced: 10 Apr 2026
https://github.com/kirubhakaranm/vision-pipeline-cuda
High-performance camera processing pipeline with CUDA GPU acceleration, CPU multithreading, and real-time TCP/IP telemetry monitoring (1,200+ FPS, <1ms latency)
computer-vision cpp17 cuda edge-detection gpu-acceleration image-processing multithreading networking opencv performance-optimization real-time robotics tcp-ip telemetry
Last synced: 12 Apr 2026
https://github.com/hshshshshsh12e/gpumkat
Gpumkat is a shader debugger for metal which is designed to do what instruments can't do
alternative api control cuda darwin debugger debugging gpumkat macos management profiler release shaders threads
Last synced: 14 Apr 2026
https://github.com/sangioai/sph
CUDA and OpenMP versions of SPH (Smoothed Particle Hydrodynamics) serial algorithm.
Last synced: 27 Apr 2026
https://github.com/mmz33/practice-cuda
c cpp cuda cuda-programming gpu-programming parallel-programming
Last synced: 14 Apr 2026
https://github.com/kanttouchthis/cuda_schem
script for voxelization of 3d models to minecraft .schem schematics with texture support powered by numba cuda.
cuda minecraft numba voxelization
Last synced: 07 Oct 2025
https://github.com/lruizap/testcuda
Guide to install and use cuda for programming
Last synced: 12 May 2026
https://github.com/amitkumarj441/deep-learning-on-your-finger
A rich collection of dockerfiles for installing deep learning dependecies on your way :rocket:
Last synced: 18 Apr 2026
https://github.com/debanjan06/spatial-streamio
An optimized, out-of-core asynchronous data streaming pipeline for high-throughput 3D point cloud training loops. Features low-level numpy.memmap zero-copy reads and multi-threaded ring prefetching to eliminate I/O bottlenecks, delivering a 33.33% throughput efficiency gain on PyTorch CUDA workloads.
asynchronous-programming cuda data-engineering deep-learning-pipelines io-optimization memory-mapping point-cloud pytorch
Last synced: 11 Jun 2026
https://github.com/matteopolak/stock-predict
Stock prediction with LSTM using TensorFlow and TypeScript.
ai artificial-intelligence cuda lstm machine-learning stock tensorflow typescript
Last synced: 09 May 2026
https://github.com/dreoporto/tensorflow-gpu-docker
An example project to run TensorFlow with CUDA-enabled GPU acceleration using Windows, Docker and WSL2.
artificial-intelligence cuda deep-learning docker docker-compose jupyter machine-learning nvidia-docker python windows wsl2
Last synced: 27 Jan 2026
https://github.com/xstupi00/N-Body-CUDA
PCG - Parallel Computations on GPU - Project - N-Body-CUDA
cuda gpu-acceleration gpu-computing nbody-simulation optimization parallel-computing pcg vut vut-fit
Last synced: 11 Mar 2025
https://github.com/simonschoelly/poisson-solver
A solver for a modified poisson equation using cuda.
cpp cuda finite-difference gpgpu pgc poisson-equation preconditioned-conjugate-gradient thomas-algorithm
Last synced: 18 May 2026
https://github.com/marius311/cudadistributedtools.jl
A set of utility tools for multi-GPU + multi-process workflows
Last synced: 01 May 2026
https://github.com/brendanm12345/simple_renderer_cs149
Simple CUDA renderer implementation. 19th most efficient out of 150+ submissions
Last synced: 18 May 2026
https://github.com/rajshrestha86/kmeans-clusterize-cuda
Implementation of K-Means algorithm from scratch using CUDA.
Last synced: 18 May 2026
https://github.com/amruthapatil/nyu-cudaconvolution
Implementing convolution operations on an image using CUDA, exploiting different methodologies - basic, tiled, and cuDNN
Last synced: 13 Mar 2025
https://github.com/thanduriel/cuda_hip_comparison
performance study of atomics on GPUs
Last synced: 09 Oct 2025
https://github.com/jiriklepl/bits-knn-jpdc2024
Replication package for the paper Towards Optimal GPU-accelerated K-Nearest Neighbors Search
bitonic-sort cuda gpu k-nearest-neighbors knn-search top-k
Last synced: 21 Mar 2025
https://github.com/enesdoruk/opencv-cpp
Opencv CPP tutorials
computer-vision cpp cuda opencv
Last synced: 09 Oct 2025
https://github.com/edcalderin/huggingface_ragflow
This project implements a classic Retrieval-Augmented Generation (RAG) system using HuggingFace models with quantization techniques. The system processes PDF documents, extracts their content, and enables interactive question-answering through a Streamlit web application.
bitsandbytes cuda huggingface huggingface-embeddings langchain langchain-community large-language-models llm nf4 python qdrant quantization rag retrieval-augmented-generation ruff streamlit text-generation
Last synced: 15 Jul 2025