Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
CUDA
![](https://explore-feed.github.com/topics/cuda/cuda.png)
CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.
- GitHub: https://github.com/topics/cuda
- Wikipedia: https://en.wikipedia.org/wiki/CUDA
- Created by: Nvidia
- Released: June 23, 2007
- Related Topics: nvcc,
- Last updated: 2025-02-07 00:07:09 UTC
- JSON Representation
https://github.com/andreabak/whispersubs
Generate subtitles for your video or audio files using the power of AI
ai cuda deep-learning gpu-acceleration machine-learning srt subtitles transcribe transcription translate whisper
Last synced: 16 Nov 2024
https://github.com/DefTruth/hgemm-tensorcores-mma
⚡️Write HGEMM from scratch using Tensor Cores with WMMA, MMA PTX and CuTe API. 🎉🎉
Last synced: 06 Dec 2024
https://github.com/hanzhi713/bitonic-sort
In-place GPU sort with bitonic sort
bitonic-sort cuda gpu in-place sorting
Last synced: 26 Jan 2025
https://github.com/andreimoraru123/contextcollector
Mixed vision-language Attention Model that gets better by making mistakes
attention attention-mechanism coco-api computer-vision cuda cudnn image-captioning lstm mscoco-dataset multimodal-deep-learning natural-language-processing object-detection opencv pytorch resnet show-and-tell show-attend-and-tell video-inference vision-language yolo
Last synced: 18 Jan 2025
https://github.com/kim-hwiwon/t-espresso
A CUDA Library for Low-overhead Host-to-Device Transmission of Patterned Profile Data
Last synced: 27 Dec 2024
https://github.com/evanmcclure/hello_gpu
Hello world example for Rust on GPU
apple apple-silicon cuda cuda-programming example-project gpu gpu-programming gpu-support metal rust rust-lang
Last synced: 20 Jan 2025
https://github.com/coreylowman/tenten
A tiny tensor library in rust with fused JIT operations.
Last synced: 07 Jan 2025
https://github.com/nellogan/distributed_compy
Distributed_compy is a distributed computing library that offers multi-threading, heterogeneous (CPU + mult-GPU), and multi-node support
cluster cuda heterogeneous-parallel-programming multi-threading multigpu openmp openmpi
Last synced: 12 Jan 2025
https://github.com/arminms/p2rng
A modern header-only C++ library for parallel algorithmic (pseudo) random number generation supporting OpenMP, CUDA, ROCm and oneAPI
cpp cuda cxx header-only heterogeneous-computing library linux macos multiplatorm oneapi openmp parallel pcg-random prng pseudorandom-number-generator random-number-distributions random-number-generation rocm stl-algorithms windows
Last synced: 05 Nov 2024
https://github.com/csvancea/gpu-hashtable
GPU-backed linear-probing hash table implemented in CUDA. Supports batch operations such as insert and retrieval.
Last synced: 19 Jan 2025
https://github.com/brosnanyuen/raybnn_neural
Neural Networks with Sparse Weights in Rust using GPUs, CPUs, and FPGAs via CUDA, OpenCL, and oneAPI
cpu cuda deep-learning gpu machine-learning machine-learning-algorithms neural-network neural-networks opencl parallel raybnn rust sparse-network sparse-neural-networks
Last synced: 13 Nov 2024
https://github.com/samuraibupt/cuda_code
Codes for learning cuda. Implementation of multiple kernels.
Last synced: 23 Oct 2024
https://github.com/dpbm/qml-course
Minicurso de quantum Machine learning
cuda cuda-q cuquantum docker ml python qml quantum quantum-computing tensorflow
Last synced: 21 Dec 2024
https://github.com/vorticity-inc/vtensor
VTensor, a C++ library, facilitates tensor manipulation on GPUs, emulating the python-numpy style for ease of use. It leverages RMM (RAPIDS Memory Manager) for efficient device memory management. It also supports xtensor for host memory operations.
cublas cuda curand cusolver gpu numpy rmm tensor xarray xtensor
Last synced: 10 Dec 2024
https://github.com/akhuntsaria/canny-edge-detection
Canny edge detector implemented in CUDA C/C++
cuda image-processing video-processing
Last synced: 23 Oct 2024
https://github.com/programmer-rd-ai/digivis
A PyTorch-based deep learning implementation for MNIST digit recognition featuring CNNs, GPU acceleration, experiment tracking, and comprehensive testing capabilities.
cnn computer-vision cuda data-science deep-learning digit-recognition image-classification machine-learning mnist neural-networks python pytorch wandb
Last synced: 12 Jan 2025
https://github.com/thomasonzhou/minitorch
rebuilding pytorch: from autograd to convolutions in CUDA
Last synced: 30 Dec 2024
https://github.com/hatamiarash7/cuda-python
GPU programming using CUDA & Python
cuda gpu gpu-computing gpu-programming python
Last synced: 03 Feb 2025
https://github.com/anne-andresen/multi-modal-cuda-c-gan
Raw C/cuda implementation of 3d GAN
3d 3d-models attention-mechanism c cross-attention cross-attention-c cuda gan gan-models low-level-programming medical-imaging multimodal-deep-learning pytorch transformer-pytorch transformers transformers-c
Last synced: 05 Nov 2024
https://github.com/dolongbien/cuda
CUDA and Caffe/Caffe2 installation Ubuntu 16.04
c3d-intel-caffe caffe caffe2 cuda cudnn deep-learning ubuntu
Last synced: 21 Jan 2025
https://github.com/giorgiogamba/parallel_programming
Experimenting with parallel programming
cuda cuda-kernels cuda-programming cuda-toolkit parallel parallel-computing parallel-processing parallel-programming visual-studio
Last synced: 30 Dec 2024
https://github.com/fandreuz/parallel-programming-for-hpc
Scientific codes in C/C++ with CUDA, OpenACC, FFTW, (cu)BLAS
Last synced: 21 Jan 2025
https://github.com/bl33h/pythagoreantheorem
A program that calculates the Pythagorean theorem for a large number of elements using GPU parallel processing.
arrays cuda kernel parallel-programming pythagoras pythagorean-theorem
Last synced: 21 Jan 2025
https://github.com/bl33h/productoftwovectors
This code utilizes CUDA for parallel vector multiplication on a GPU, demonstrating GPU's acceleration capabilities.
cuda gpu kernel paralelism parallel-programming product vector
Last synced: 21 Jan 2025
https://github.com/crcrpar/dev-chainer
Dockerfile for Chainer Development in VSCode
chainer cuda docker nvidia-docker vscode
Last synced: 17 Dec 2024
https://github.com/daelsepara/hipmandelbrot
GPU Implementation of Mandelbrot Fractal Generator with Benchmarking
amd cuda fractal gpu gpu-compute gpu-computing hip mandelbrot parallel-computing rocm sdk
Last synced: 07 Nov 2024
https://github.com/lightshade12/kittlespt
A hobby CUDA pathtracing renderer.
3d-graphics computer-graphics cuda gpu path-tracing ray-tracing
Last synced: 24 Jan 2025
https://github.com/ashwanirathee/imagesgpu.jl
Image Processing on GPU in Julia
cuda gpu image image-processing julia
Last synced: 08 Jan 2025
https://github.com/stanczakdominik/cuda_poisson
A 2D poisson solver via CUDA
Last synced: 04 Feb 2025
https://github.com/donpablonows/coin
🪙 Crypto Optimization Interface Network (aka COIN) is a high-performance Bitcoin address generator using CUDA acceleration and multi-threading. It optimizes GPU and CPU resources for fast address generation, ensures secure private key creation, and includes real-time monitoring and automatic system optimizations.
bitcoin blockchain cryptography cuda gpu-acceleration
Last synced: 07 Jan 2025
https://github.com/nellogan/makefileexamples
Makefile examples of how to automate testing and building of applications/systems that use multiple: languages, compilers, and testing tools.
automated-testing c cuda makefile python valgrind
Last synced: 21 Jan 2025
https://github.com/daelsepara/hipslm
CPU and GPU (using HIP) implementations of phase pattern generators for use with spatial light modulators
computer-generated-holography cuda gpu hip hologram holography phase phase-pattern slm spatial-light-modulator
Last synced: 29 Dec 2024
https://github.com/meirbek-dev/face-mask_detector
Обнаружие маски на лице в реальном времени
artificial-intelligence covid-19 cuda cudnn deep-learning face-mask graduation-project jupyter-notebook keras machine-learning mask-detection mobilnet-v2 object-detection object-recognition object-tracking opencv4-python python real-time supervised-learning tensorflow2-gpu
Last synced: 11 Jan 2025
https://github.com/whutao/artificial-art
Image approximation with triangles using evolutionary algorithm.
cuda evolutionary-algorithm python3
Last synced: 16 Jan 2025
https://github.com/tlabaltoh/tlab-sharescreen-server-win
Software frame encoder using CUDA and cast encoded frames over UDP. Trying to implement a custom streaming protocol and shader based frame encoder/decoder for screencast.
cuda desktop-capture screensharing unity unity3d windows-graphics-capture
Last synced: 28 Jan 2025
https://github.com/michaelfranzl/image_debian-gpgpu
Dockerfile for a Debian base image with AMD and Nvidia GPGPU support
amd container container-image cuda debian docker gpgpu nvidia opencl
Last synced: 21 Jan 2025
https://github.com/sohhamseal/scalable-systems-programs
A little less effort to learn parallel programming...
Last synced: 13 Jan 2025
https://github.com/andih/cuda-fortran-stream
Variant of STREAM Benchmark in CUDA Fortran
cuda cuda-fortran gpu stream-benchmarks variants
Last synced: 12 Jan 2025
https://github.com/sanaeprj/matrix-for-cpp
This repository has types that handle matrices.
cpp14 cpp14-library cuda matrix-library
Last synced: 19 Nov 2024
https://github.com/franciscoda/psvm
R package and C++ library that allows training SVM models in a GPU using CUDA and predicting out-of-sample data. A support vector machine (SVM) is a type of machine learning model that is trained using supervised data to classify samples.
cpp cpp17 cuda machine-learning r svm-classifier svm-training
Last synced: 28 Jan 2025
https://github.com/abhisheknair10/occupancy.nn
An multi-step pipeline to train and inference Occupancy Networks
Last synced: 13 Jan 2025
https://github.com/orgh0/highperformancecnn
Implementation of a High Performance CNN for MNIST dataset
Last synced: 22 Jan 2025
https://github.com/garciparedes/cuda-examples
Cuda examples who I develop to learn HPC based on GPU
c c-plus-plus cuda examples gpgpu gpu hpc
Last synced: 16 Jan 2025
https://github.com/wallneradam/docker-ccminer
CCMiner (tpruvot version) Docker Builder
ccminer cuda docker gpu litecoin miner monero nvidia nvidia-docker
Last synced: 01 Feb 2025
https://github.com/jonathanraiman/mini_cuda_rtc
Miniature CUDA Array library with Runtime Compilation
cpp11 cuda jit runtime-compilation
Last synced: 22 Jan 2025
https://github.com/alpha74/hungarianalgocuda
Hungarian Algorithm for Linear Assignment Problem implemented using CUDA.
cuda nvcc parallel-computing parallel-programming
Last synced: 16 Jan 2025
https://github.com/david-palma/cuda-programming
Educational CUDA C/C++ programming repository with commented examples on GPU parallel computing, matrix operations, and performance profiling. Requires a CUDA-enabled NVIDIA GPU.
c-cpp cpp cuda cuda-toolkit education gpu gpu-programming kernel matrix-operations nvcc nvidia parallel-computing parallel-programming practice profiling threads
Last synced: 31 Jan 2025
https://github.com/kchristin22/ising_model
Implementation of a cellular automaton on GPU using different features of CUDA
cellular-automaton cuda gpu-programming hpc ising-model parallel-computing
Last synced: 22 Jan 2025
https://github.com/sleeepyjack/multisplit
Simple multisplit for CUDA accelerators
cpp cuda gpu nvidia parallel-programming primitive split
Last synced: 22 Jan 2025
https://github.com/piyush26c/cuda-programming
c cuda ipynb-jupyter-notebook mathematics sppu-computer-engineering
Last synced: 12 Jan 2025
https://github.com/dafadey/GPGPU_OpenCL_vs_CUDA
This is a repository with sample codes for testing memory bandwidth, arithmetic latency hiding and shared/local memory performance on AMD and nVidia devices
cuda gpgpu gpgpu-computing opencl
Last synced: 19 Nov 2024
https://github.com/jessetg/cuda-practice
Working through the chapters of Cuda by Example
c cpp cuda cuda-by-example gpgpu
Last synced: 14 Jan 2025
https://github.com/brosnanyuen/raybnn_sparse
Sparse Matrix Library for GPUs, CPUs, and FPGAs via CUDA, OpenCL, and oneAPI
arrayfire cpu cuda gpu gpu-computing opencl parallel parallel-computing parallel-programming raybnn rust sparse sparse-coding sparse-matrix sparse-neural-networks
Last synced: 13 Nov 2024
https://github.com/bjornmelin/deep-learning-evolution
🧠 Deep-Learning Evolution: Unified collection of TensorFlow & PyTorch projects, featuring custom CUDA kernels, distributed training, memory‑efficient methods, and production‑ready pipelines. Showcases advanced GPU optimizations, from foundational models to cutting‑edge architectures. 🚀
ai-research cuda data-science deep-learning distributed-training gan gpu-acceleration machine-learning model-optimization neural-networks python pytorch tensorflow training-pipeline transformers
Last synced: 05 Feb 2025
https://github.com/alegau03/parallel-k-means
Implementation of C programs for the K-Means algorithm for parallel computing.
c c-programming cuda parallel parallel-programming
Last synced: 05 Feb 2025
https://github.com/brosnanyuen/raybnn_graph
Graph Manipulation Library For GPUs, CPUs, and FPGAs via CUDA, OpenCL, and oneAPI
cuda gpu graph graph-algorithms neural-network neural-networks opencl raybnn rust
Last synced: 13 Nov 2024
https://github.com/emmanuelmess/firstcollisiontimesteprarefiedgassimulator
This simulator computes all possible intersections for a very small timestep for a particle model
Last synced: 15 Jan 2025
https://github.com/sunsided/rust-arrayfire-experiments
Toying around with ArrayFire in Rust
arrayfire conways-game-of-life cuda gpgpu gpu-acceleration gpu-computing opencl rust
Last synced: 20 Dec 2024
https://github.com/ruturaj4/cuda_nvidia_tutorial
cuda projects
cuda cuda-vector-addition nvidia nvidia-cuda parallel
Last synced: 16 Jan 2025
https://github.com/poodarchu/vision-lab
Computer Vision Experiments in all.
computer-vision cuda object-detection
Last synced: 28 Jan 2025
https://github.com/speedcell4/torchdevice
Setup CUDA_VISIBLE_DEVICES
cuda deep-learning gpu machine-learning pytorch
Last synced: 06 Jan 2025
https://github.com/aliyoussef97/triton-hub
A container of various PyTorch neural network modules written in Triton.
cuda deep-learning openai pytorch triton triton-lang
Last synced: 05 Feb 2025
https://github.com/abhinavsharma07/streamlit
Stable Diffusion
clip cuda denoising diffusers generative-models latent-diffusion latent-space lms-scheduler unet
Last synced: 05 Feb 2025
https://github.com/gordonkoerner1/gordo_cuda
Library of Cython Wrappers for the NVIDIA API
cuda python sparse-linear-algebra sparse-linear-solver sparse-linear-systems
Last synced: 05 Feb 2025
https://github.com/5had3z/torch-discounted-cumsum-nd
PyTorch Discounted Cumsum with Autograd (CPU + CUDA)
Last synced: 05 Feb 2025
https://github.com/antonioberna/nn-gpu-logic-gates
Neural Network implementation on GPU using CUDA C++ to learn logic gates operations
cpp cuda gpu logic-gates neural-networks nvidia
Last synced: 05 Feb 2025
https://github.com/sbstndb/grayscott_k
A simple 3D GrayScott simulation using Kokkos enabling CUDA or OpenMP backend
cuda finite-difference grayscott grid kokkos laplacian openmp simulation visualisation
Last synced: 05 Feb 2025
https://github.com/fynv/cudainline
A CUDA interface for Python. A distillation of the engine part of ThrustRTC.
Last synced: 05 Feb 2025
https://github.com/nolmoonen/cuda-sdf
CUDA-accelerated path traced Menger sponge using ray marching.
cuda menger path-tracer ray-marching sdf
Last synced: 05 Feb 2025
https://github.com/gogolb/ee147
Intro to GPU Computing
c cuda cuda-kernels cuda-toolkit gpu-computing gpu-programming university-course
Last synced: 29 Jan 2025
https://github.com/pabvald/parallel-computing
Parallel computing practise with OpenMP, MPICH and CUDA
cuda mpich openmp parallel-computing
Last synced: 29 Jan 2025
https://github.com/mortafix/quickshift
A working implementation of Quickshift algorithm in CUDA, GPU-compatible.
Last synced: 13 Jan 2025
https://github.com/liberxue/parallel_computing
CUDA Algorithm && Hacker's Delight
algorithms cuda cuda-kernels cuda-programming hacker-s-delight nvidia
Last synced: 31 Dec 2024
https://github.com/tensorbfs/cutropicalgemm.jl
The fastest Tropical number matrix multiplication on GPU
Last synced: 20 Dec 2024
https://github.com/dhruvsrikanth/cudann
A distributed implementation of a deep learning framework in CUDA.
cpp cuda deep-learning deep-learning-framework gpu-programming high-performance-computing hpc parallel-programming
Last synced: 25 Dec 2024
https://github.com/maelstrom6/mandelpy
A Mandelbrot and Buddhabrot viewer with GPU acceleration
buddhabrot cuda gpu mandelbrot python3
Last synced: 05 Feb 2025
https://github.com/komorra/blackmagicengine
Nextgen, Classic/VR/AR Game Engine
core cuda dx12 game-development gameengine gpu net nvidia vulcan
Last synced: 31 Dec 2024
https://github.com/m-torhan/cuda-stl-renderer
CUDA C++ implementation of STL file renderer using ray tracing method
Last synced: 31 Dec 2024
https://github.com/brosnanyuen/raybnn_dataloader
Data Loader for RayBNN
arrayfire cpu csv csv-parser cuda data-structures gpu-computing oneapi opencl parallel parallel-computing rust
Last synced: 13 Jan 2025
https://github.com/mhaseeb123/gcb
GCB includes a suite of benchmarks and basic tests for CUDA-aware MPI and C++ compilers.
cpp cpp23 cuda mpi partitioned-communication st-mpi
Last synced: 24 Jan 2025
https://github.com/ashwani-rathee/imagesgpu.jl
Image Processing on GPU in Julia
cuda gpu image image-processing julia
Last synced: 21 Nov 2024
https://github.com/nickolasrm/gpuvscpumatrixmultiplication
CPU and GPU optimized matrix multiplication (AVX, transposition, CUDA and other)
avx comparison cuda hpc matrix multiplication
Last synced: 28 Dec 2024
https://github.com/dvhh/masscorrelation
An exercise in writing an efficient correlation calculator
calculations correlation-calculation cuda matrix multi-threading openmp
Last synced: 25 Jan 2025