CUDA
CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.
- GitHub: https://github.com/topics/cuda
- Wikipedia: https://en.wikipedia.org/wiki/CUDA
- Created by: Nvidia
- Released: June 23, 2007
- Related Topics: nvcc,
- Last updated: 2026-06-29 00:07:23 UTC
- JSON Representation
https://github.com/shunk031/nvinfo-go
Rewrite of ikr7/nvinfo, a simple utility for monitoring your CUDA-enabled GPUs, with Golang
cli cuda go golang gpu nvidia nvidia-smi
Last synced: 02 Apr 2025
https://github.com/umitkacar/onnx-tensorrt-optimization
40x faster AI inference: ONNX to TensorRT optimization with FP16/INT8 quantization, multi-GPU support, and deployment
cuda deep-learning edge-computing fp16 gpu-acceleration inference-acceleration int8 latency-optimization mlops model-deployment model-optimization nvidia-gpu onnx onnxruntime production-ai pytorch-to-onnx quantization real-time-inference tensorflow-to-onnx tensorrt
Last synced: 18 Feb 2026
https://github.com/enfiskutensykkel/cuda-rdma-bench
NVIDIA GPU direct RDMA using SISCI API
cuda dma gpudirect-rdma pcie rdma sisci
Last synced: 30 Mar 2025
https://github.com/toruniina/spray
molecular viewer based on ray-tracing
c-plus-plus computer-graphics cuda molecular-graphics molecular-viewer opengl raytracing
Last synced: 19 Jan 2026
https://github.com/BrosnanYuen/RayBNN_Raytrace
Ray tracing library using GPUs, CPUs, and FPGAs via CUDA, OpenCL, and oneAPI
arrayfire cuda gpu gpu-computing opencl parallel parallel-computing ray ray-tracing raybnn raylib raytracer raytracing rust
Last synced: 04 Apr 2025
https://github.com/ivanrs297/pycuda-covariance-matrix
A PyCUDA covariance matrix parallel implementation
Last synced: 25 Oct 2025
https://github.com/statikfintechllc/godcore
All-in-one local AI stack for Mistral-13B and Llama.cpp, with one-step CUDA wheel install, OpenAI-compatible API, and modern web dashboard. Switch between local and cloud chat, run on your own GPU, and deploy instantly—no API keys or paywalls. Designed for easy install, custom builds, and fast remote access. Enjoy!
ai chatbot chatgpt cuda dashboard fastapi llama-cpp llm local-ai mistral openai-compatible react selfhosted webui
Last synced: 25 Jun 2025
https://github.com/gpuengineering/gputils
A C++ header-only library for parallel linear algebra on GPUs (CUDA/cuBLAS under the hood)
cplusplus-17 cplusplus-20 cpp cuda cuda-c cuda-cpp cuda-programming header-only linear-algebra
Last synced: 13 Aug 2025
https://github.com/chiang-yuan/culsm
CUDA C++ code implementing GPU-accelerated Lattice Spring Model (CuLSM) simulations.
cuda gpu parallel-computing particles
Last synced: 07 Sep 2025
https://github.com/pmeier/tox-ltt
Install PyTorch distributions with light-the-torch
cuda install light-the-torch pip plugin pytorch tox
Last synced: 25 Aug 2025
https://github.com/cloudmercato/python-fpb
Python Floating Point Benchmark
benchmark cuda floating-point numpy pandas python
Last synced: 19 Apr 2026
https://github.com/tk-yoshimura/tensorshader
Deep Learning .NET library, For Regression.
complex cuda deep-learning dotnet6 gpgpu net6 quaternion
Last synced: 15 Oct 2025
https://github.com/appsolves/lanepilot
The worlds first real-time AI-powered traffic management system, featuring automated vehicle detection, lane allocation optimization, and dynamic control for (autonomous) cars!
ai ai-traffic-management autonomous-driving computer-vision cuda edge-computing embedded-systems jetson-orin-nano-super lane-detection pytorch
Last synced: 29 Apr 2026
https://github.com/yuvix25/py2cuda
Convert Python 3 code to CUDA code.
converter cuda gpu gpu-acceleration python python3
Last synced: 11 Sep 2025
https://github.com/guilt/rocm-programming-masterclass
Udemy's CUDA programming Masterclass with Examples in ROCM/HIP.
cuda easy hip learning-by-doing masterclass rocm
Last synced: 04 Aug 2025
https://github.com/willigarneau/astar-pathfinding
🗺📌 Implementation of the A* pathfinding algorithm with OpenCV and Cuda in C++ 💪
a-star algorithm axis-camera cuda detection implementation opencv pathfinding
Last synced: 14 Jul 2025
https://github.com/mchatzakis/daisy
The DaiSy Library for Fast and Exact, Data Series and Vector Similarity Search
cuda data-series disk-based distributed-systems dynamic-time-warping euclidean-distance exact-searching gpu-acceleration in-memory-computing mpi pybind11 similarity-search time-series
Last synced: 01 Apr 2026
https://github.com/vorticity-inc/vtensor
VTensor, a C++ library, facilitates tensor manipulation on GPUs, emulating the python-numpy style for ease of use. It leverages RMM (RAPIDS Memory Manager) for efficient device memory management. It also supports xtensor for host memory operations.
cublas cuda curand cusolver gpu numpy rmm tensor xarray xtensor
Last synced: 14 Apr 2025
https://github.com/thomasvonwu/interview-note
Share Interview Questions and Summarize Answers
Last synced: 23 Jun 2025
https://github.com/postmalloc/barycuda
A tiny CUDA library for fast barycentric operations.
3d-graphics barycentric-coordinates cuda python simplex
Last synced: 31 Oct 2025
https://github.com/dayyass/hpc
My experiments with MPI and OpenMP
cpp cuda gpu high-performance-computing hpc mpi nvidia openmp parallel-computing super-computing
Last synced: 07 Mar 2026
https://github.com/zhihu/ZhiLight
A highly optimized inference acceleration engine for Llama and its variants.
cuda gpt inference-engine llama llm llm-serving pytorch
Last synced: 12 Aug 2025
https://github.com/ragibson/cuda-k-means
An implementation of Lloyd's algorithm for data clustering on GPUs and computational accelerators.
clustering cuda gpu k-means unsupervised-clustering
Last synced: 18 Jun 2026
https://github.com/alejandroamat/3dgs-vulkan-cpp
Cross-platform Vulkan 3D Gaussian Splatting renderer - Windows/Mac/Linux, any GPU, with Python binding support
3d 3d-graphics 3d-reconstruction 3dgs apple computer-vision cuda differentiable-rendering gaussian-splatting glfw3 gpu gpu-acceleration linux macos nerf neural-rendering python real-time vulkan windows
Last synced: 15 Jun 2025
https://github.com/thomasjo/cudalicious
C++ header library intended to reduce CUDA boilerplate code
boilerplate cpp cuda header-only
Last synced: 19 May 2026
https://github.com/rocm/rocmds-cmake
This is a collection of CMake modules that are useful for all ROCm-DS projects. By sharing the code in a single place it makes rolling out CMake fixes easier.
amd cmake cuda hip radeon-instinct-mi-series rocm
Last synced: 10 Apr 2025
https://github.com/egororachyov/spbench
Benchmark for sparse linear algebra libraries for CPU and GPU platforms.
benchmark cpp cpu cuda gpu-computing graphblas opencl sparse-matrices
Last synced: 15 May 2025
https://github.com/tawssie/zmpy3d_cp
Python implementation of 3D Zernike moments with CuPy
3d-zernike cuda cupy gpu protein-structure python structural-bioinformatics superposition zernike-moments
Last synced: 15 Apr 2025
https://github.com/bensuperpc/easyai
Make your own AI easily !
ai cuda python python3 tensorflow
Last synced: 16 Feb 2026
https://github.com/mrfoxak/evaluate-lip-reading-using-deep-learning-techniques.
This paper explores Silent Sound Technology, focusing on its potential to enhance communication in noisy environments through lip-reading and deep learning, with applications in hearing aids and security.
bi-lstm cnn cuda deep-learning image-processing lstm machine-learning mathematics neural-networks ovencv python research-paper sklearn tensorflow
Last synced: 03 Sep 2025
https://github.com/ventura8/whisper-pro-asr
A high-performance Docker container that runs OpenAI's Whisper model. Optimized for CPU, Intel NPU, Intel Arc/iGPU, and NVIDIA CUDA GPUs.
asr bazarr ctranslate2 cuda docker faster-whisper hardware-acceleration huggingface intel-npu media-automation openvino speech-to-text uvr vocal-isolation whisper whisper-asr
Last synced: 28 Apr 2026
https://github.com/tristanpenman/cuda-examples
A collection of CUDA example code
Last synced: 10 Apr 2025
https://github.com/lawmurray/gpu-gemm
CUDA kernel for matrix-matrix multiplication on Nvidia GPUs, using a Hilbert curve to improve L2 cache utilization.
cplusplus cuda cuda-kernels cuda-programming gpu gpu-computing gpu-programming matrix-multiplication numerical-methods scientific-computing
Last synced: 01 Mar 2026
https://github.com/potato3d/grid-rt
GPU-accelerated ray tracing using GLSL and CUDA
cuda glsl gpu ray-tracing real-time-rendering
Last synced: 15 Apr 2026
https://github.com/demwafflez/cuda-2d-softbody-physics-simulation
Handcrafted from scratch! Felt and dealt with every single one of those thousand ACCESS_VIOLATION!
cpp cuda gpu-computing opengl physics-2d physics-simulation softbody-physics softbody-simulation verlet-physics
Last synced: 02 Mar 2025
https://github.com/ancry1596/bitlocker-recovery-password-brute-forcer
GPU-accelerated BitLocker recovery password brute-forcer using BitCracker and CUDA
bitcracker bitlocker brute-force cuda gpu nvidia password-recovery python
Last synced: 08 Apr 2026
https://github.com/hiway-media/ffmpeg-nvenc-static
FFmpeg supports NVENC encoding
cuda ffmpeg ffmpeg-cuda ffmpeg-nvenc nvidia-gpu
Last synced: 11 Apr 2026
https://github.com/yashkathe/image-noise-reduction-with-cuda
This project conducts an analysis of image denoising technique - median blur, comparing GPU-accelerated (Numba) and CPU-based (OpenCV) processing speeds.
cuda cuda-programming gpu-programming hardware-speed-analysis image-analysis image-processing numba nvidia nvidia-cuda nvidia-gpu opencv parallel-programming
Last synced: 14 May 2025
https://github.com/aresio/lassie
LASSIE is a black-box deterministic simulator of large-scale mass-action biochemical systems
biochemical cuda gpu-computing large-scale mass-action simulation stiff
Last synced: 21 Feb 2026
https://github.com/weiyu0824/flash-attention-lite
Basic Flash attention Implmentation
Last synced: 24 Jun 2025
https://github.com/sashakolpakov/graphem-rapids
Graph embedding for influence maximization in networks
cuda cuda-kernels embeddings graph-algorithms graph-theory pykeops pytorch rapidsai
Last synced: 16 Apr 2026
https://github.com/brosnanyuen/raybnn_diffeq
Differential Equation Solver using GPUs, CPUs, and FPGAs via CUDA, OpenCL, and oneAPI
arrayfire cuda differential differential-equations gpu gpu-computing opencl parallel parallel-computing parallel-programming raybnn rust
Last synced: 09 Apr 2025
https://github.com/arsfiqball/image-sharpen-cpp
Implementation of Image Sharpening algorithm in C++ & CUDA
cuda gpu image-processing image-sharpening-algorithm
Last synced: 22 Apr 2026
https://github.com/brosnanyuen/raybnn_neural
Neural Networks with Sparse Weights in Rust using GPUs, CPUs, and FPGAs via CUDA, OpenCL, and oneAPI
cpu cuda deep-learning gpu machine-learning machine-learning-algorithms neural-network neural-networks opencl parallel raybnn rust sparse-network sparse-neural-networks
Last synced: 09 Apr 2025
https://github.com/lu-zero/nvidia-video-codec
Redistributable headers to build cuvid and nvenc
cuda cuvid nvenc nvidia nvidia-video-codec
Last synced: 19 Apr 2025
https://github.com/jtriley/gpucrate
Creates hard-linked GPU driver (currently just NVIDIA) volumes for use with docker, singularity, etc.
container cuda docker gpu singularity
Last synced: 27 Feb 2026
https://github.com/usegalaxy-eu/ansible-cuda
Ansible role to install the CUDA toolkit as described in the NVIDIA CUDA Installation Guide in a Redhat/CentOS system.
Last synced: 17 Jan 2026
https://github.com/mr-technologies/farsightcpp
Basic MRTech IFF C++ SDK sample application
camera cpp cuda demosaicing dng genicam gpu h264 h265 image-processing jetson json low-latency machine-vision mipi rest-api rtsp sdk tiff vulkan
Last synced: 12 Apr 2025
https://github.com/fabryprog/java-gpu
Support for offloading parallel-for loops in Java to NVIDIA CUDA compatible cards.
cuda gpu java nvidia parallel-computing
Last synced: 15 Apr 2026
https://github.com/cppalliance/crypt
A C++20 module of cryptographic utilities for CPU and GPU
Last synced: 23 Apr 2025
https://github.com/meetps/me-766
Assignment Solutions to course ME766 High Performance Scientific Computing.
cuda gpu-computing opencl openmp parallel-computing
Last synced: 18 May 2026
https://github.com/prince781/libgpublas
Drop-in GPU acceleration for linear algebra.
blas blas-kernels c cblas clblas cuda gpu gpu-acceleration hpc interposition linear-algebra nvidia opencl
Last synced: 29 Apr 2026
https://github.com/btursunbayev/nvsonar
Active GPU diagnostic tool that identifies performance bottlenecks using micro-probes
cuda diagnostics gpu monitoring nvidia performance
Last synced: 02 Apr 2026
https://github.com/mr-technologies/imagebrokercpp
Example of image export from MRTech IFF C++ SDK
camera cpp cuda demosaicing dng genicam gpu h264 h265 image-processing jetson json low-latency machine-vision mipi opencv rest-api rtsp tiff vulkan
Last synced: 12 Apr 2025
https://github.com/pfcclab/open3d
Open3D: A Modern Library for 3D Data Processing
3d 3d-perception arm computer-graphics cpp cuda gpu gui machine-learning mesh-processing odometry opengl paddle pointcloud python reconstruction registration rendering tensorflow visualization
Last synced: 14 Apr 2025
https://github.com/firaja/parallel-floydwarshall
Various parallel implementations of Floyd-Warshall algorithm
algorithms c cuda distributed-computing floyd-warshall gpu-computing mpi multiprocessing openmp parallel-computing parallel-programming
Last synced: 16 Apr 2026
https://github.com/silviopaganini/darknet-docker-nvidia
Docker Image to run Darknet on Nvidia with CUDA 9.0 and openCV 3.4.0
cuda darknet docker nvidia-docker opencv
Last synced: 13 Jul 2025
https://github.com/santhsecurity/vyre
Compiler-grade sequential GPU compute. Workgroup-local stacks, queues, hashmaps, dominator trees, fixed-point dataflow. CUDA + WGPU + SPIR-V with bit-exact conformance gate. Rust.
compute cuda gpgpu gpu gpu-computing parallel-computing rust spir-v wgpu
Last synced: 23 Jun 2026
https://github.com/hrolive/fundamentals-of-accelerated-computing-with-cuda-c-cpp
Accelerate and optimize existing C/C++ CPU-only applications using the most essential CUDA tools and techniques.
cpp cuda cuda-kernels cuda-programming nsight nvidia profilling
Last synced: 10 Apr 2025
https://github.com/sanastasiou/dictation-service
GPU-accelerated speech-to-text service that types what you say, powered by OpenAI's Whisper AI
accessibility cuda dictation gpu-acceleration linux openai-whisper productivity python pytorch speech-recognition speech-to-text transcription voice-to-text voice-typing whisper
Last synced: 08 Apr 2026
https://github.com/franneck94/cuda-aes
AES Implementation (Counter Mode) in C++, OpenMP and CUDA.
aes c-plus-plus counter cuda encryption openmp parallel
Last synced: 13 Apr 2025
https://github.com/phael-exe/aco-selection-parallel
Parallelization of ACO with CUDA and OpenMP for large-scale instance selection.
cuda openmp parallel-computing
Last synced: 03 Jun 2026
https://github.com/shanthanu9/heterogeneous-parallel-computing-with-cuda
CUDA C/C++ programs
cuda histogram matrix-multiplication stencil thrust
Last synced: 12 Jun 2025
https://github.com/ammaryasirnaich/deeplearning_playland
This repository contains Docker Image files, which support the common frameworks required for Deep learning implementation. The images support both the latest GPU (Nvidia CUDA) and CPU processors.
cuda cuda11 cudnn cudnn8 deep-learning docker docker-image dockerfile gpu kersa opencv pytorch pytorch-cnn scikit-learn tensorflow2
Last synced: 12 Apr 2026
https://github.com/neomatrix369/dl4j-nlp-cuda-example
A git repository containing an NLP example using DL4J (cuda) in Java
cuda cuda-details cudnn deep-learning deeplearning4j dl4j docker-container java jvm machine-learning natural-language-processing nlp nvidia nvidia-drivers nvidia-gpu valohai-cli valohai-platform
Last synced: 22 Feb 2026
https://github.com/mr-technologies/lensprofiler
MRTech IFF SDK lens profiling tool
c calibration camera camera-calibration cuda distortion distortion-correction genicam gpu image-processing jetson json lens low-latency machine-vision mipi opencv python sdk tiff
Last synced: 22 Aug 2025
https://github.com/yosh-matsuda/gpu-array
Maximum GPU performance with Modern C++ syntax. RAII and Range-based abstraction to GPU memory management and data layouts, enabling code safety and performance optimization with zero overhead.
cpp cpp20 cuda gpu header-only hip
Last synced: 08 Jun 2026
https://github.com/mr-technologies/streamadapter
GStreamer integration for MRTech IFF SDK
c camera cuda demosaicing dng genicam gpu gstreamer h264 h265 image-processing jetson json low-latency machine-vision mipi rest-api rtsp tiff vulkan
Last synced: 06 Apr 2026
https://github.com/gurbaaz27/cs433a-design-exercises
Solutions of design exercises in CS433A: Parallel Programming, Spring Semester 2021-22
barriers cuda gpu-programming locks openmp parallel-programming posix-threads semaphores
Last synced: 29 Jan 2026
https://github.com/definetlynotai/llm_data
A bunch of very famous repos source code's in python as pure localdocs all in this repo to train CODE AI
c code-examples cpp cuda data data-dum jupyter-notebook llm llm-code llm-datasets programming-data programming-data-sets python3
Last synced: 08 Oct 2025
https://github.com/mindstudioofficial/fl_cuda_mandelbrot
Flutter example for visualizing the Mandelbrot Set using CUDA
cuda flutter-examples fractal-rendering
Last synced: 16 May 2026
https://github.com/yunzhu-li/recognizer
An object recognizer mobile app based on deep convolutional neural networks
cnn cuda cudnn gpu ios python swift tensorflow
Last synced: 20 Apr 2026
https://github.com/3zrv/raytracerincpp
A ray tracer that renders in 16-color VGA palette at 640x480 resolution.
Last synced: 18 May 2026
https://github.com/mr-technologies/farsightpy
Basic MRTech IFF Python SDK sample application
camera cuda demosaicing dng genicam gpu h264 h265 image-processing jetson json low-latency machine-vision mipi python rest-api rtsp sdk tiff vulkan
Last synced: 12 Apr 2025
https://github.com/phrb/nvidia-workshop-autotuning
Resources for autotuning CUDA compiler parameters
autotuning compilers cuda gpu julia nodal nvcc
Last synced: 03 May 2026
https://github.com/rkv0id/boltzmanumba
GPU-Parallelization of a sequential Lattice Boltzmann gist on CUDA-capable devices using Numba.
Last synced: 08 Sep 2025
https://github.com/jedbrooke/cuda_bwt
CUDA accelerated burrows-wheeler transform
bioinformatics burrows-wheeler-transform bwt compression cuda
Last synced: 19 May 2026
https://github.com/alejandrogallo/atrip
High Performance library for the CCSD(T) algorithm in quantum chemistry
asynchronous-programming coupled-cluster cuda literate-programming mpi quantum-chemistry
Last synced: 28 Oct 2025
https://github.com/rapidsai/cuvs-lucene
A Lucene codec for vector search and clustering on the GPU
anns cuda gpu hybrid-search information-retrieval lucene nearest-neighbors neighborhood-methods semantic-search vector-database vector-search vector-similarity vector-store
Last synced: 01 Aug 2025
https://github.com/kiwijuice56/cuda-mandelbox
Ray marching renderer of the 3D mandelbox fractal, accelerated with CUDA GPU code
3d 3d-graphics cpp cuda fractal fractal-images fractal-rendering mandelbox nvidia-cuda
Last synced: 02 May 2026
https://github.com/redhat-na-ssa/gpu-workshop
Using GPUs on Red Hat Platforms
Last synced: 30 Jul 2025
https://github.com/eddieoz/bananaforge
🎨 Professional AI-powered multi-layer 3D printing optimization tool that converts 2D images into optimized multi-layer 3D models for color printing with advanced transparency mixing.
3d-printer 3d-printing 3dprinting art cli-app cuda hueforge machine-learning python
Last synced: 17 Aug 2025
https://github.com/nolmoonen/cuda-sdf
CUDA-accelerated path traced Menger sponge using ray marching.
cuda menger path-tracer ray-marching sdf
Last synced: 12 Feb 2026
https://github.com/brosnanyuen/raybnn_optimizer
Gradient Descent Optimizers and Genetic Algorithms using GPUs, CPUs, and FPGAs via CUDA, OpenCL, and oneAPI
arrayfire cuda genetic-algorithm genetic-algorithms gpu gpu-computing gradient gradient-descent parallel parallel-computing raybnn rust
Last synced: 07 Oct 2025
https://github.com/sagdrip/cudarrows
CUDA port of Logic Arrows
cellular-automata cuda gpu-acceleration logic-gates
Last synced: 19 Feb 2026
https://github.com/iitii/useless
逗比脚本备份,部分自用配置文件,一些自用脚本
aria2 bash-script cuda docker doubi ffmpeg frpc frps oh-my-zsh powerlevel10k
Last synced: 10 Apr 2026