Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
CUDA
![](https://explore-feed.github.com/topics/cuda/cuda.png)
CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.
- GitHub: https://github.com/topics/cuda
- Wikipedia: https://en.wikipedia.org/wiki/CUDA
- Created by: Nvidia
- Released: June 23, 2007
- Related Topics: nvcc,
- Last updated: 2025-02-15 00:06:58 UTC
- JSON Representation
https://github.com/ilyasmoutawwakil/optimum-whisper-autobenchmark
A set of benchmarks on OpenAI's Whisper model, using AutoBenchmark and Optimum's OnnxRuntime Optimizations.
Last synced: 30 Jan 2025
https://github.com/yomi4486/zundamon_v3
マスター、お冷ショットで。
cuda discord-bot discord-py docker docker-compose python tts voicevox zundamon
Last synced: 27 Nov 2024
https://github.com/teodutu/asc
Arhitectura Sistemelor de Calcul - UPB 2020
cache-optimization cuda parallel-programming profiling python-threading
Last synced: 30 Jan 2025
https://github.com/deftruth/hgemm-tensorcores-mma
⚡️Write HGEMM from scratch using Tensor Cores with WMMA, MMA PTX and CuTe API. 🎉🎉
Last synced: 04 Dec 2024
https://github.com/infotrend-inc/ctpo-demo_projects
Jupyter Notebook examples using CTPO as their source container.
cuda opencv pytroch tensorflow2
Last synced: 05 Feb 2025
https://github.com/bensuperpc/easyai
Make your own AI easily !
ai cuda python python3 tensorflow
Last synced: 17 Jan 2025
https://github.com/abaksy/cuda-examples
A repository of examples coded in CUDA C/C++
Last synced: 17 Jan 2025
https://github.com/elftausend/sliced
Array operations with automatic differentiation on CPU and GPU
autograd automatic-differentiation cuda custos matrix opencl
Last synced: 14 Feb 2025
https://github.com/potato3d/grid-rt
GPU-accelerated ray tracing using GLSL and CUDA
cuda glsl gpu ray-tracing real-time-rendering
Last synced: 10 Jan 2025
https://github.com/dqbd/cuda-btree
Implementation of B-Trees on NVIDIA CUDA
Last synced: 13 Feb 2025
https://github.com/kagof/julia-image-processing
Image processing programs written in Julia
Last synced: 12 Feb 2025
https://github.com/zeloe/rtconvolver
A realtime convolution VST3
c convolution cplusplus cuda juce
Last synced: 25 Dec 2024
https://github.com/soran-ghaderi/torchebm
⚡ Energy-Based Modeling library for PyTorch, offering tools for sampling, inference, and learning in complex distributions.
contrastive-divergence cuda diffusion-models energy-based-model generative-ai langevin-dynamics noise-contrastive-estimation probabilistic-machine-learning reasoning sampling-methods score-matching variational-inference
Last synced: 26 Dec 2024
https://github.com/dujonwalker/nixos-config-x86_64-cuda
This repository contains my NixOS configuration optimized for 64-bit x86 systems with NVIDIA CUDA support, featuring a Plasma 6 desktop environment and a variety of essential applications for development, multimedia, and productivity. It serves as a backup for easy restoration and setup on new installations.
cuda flatpak nix nixos nixos-configuration ollama
Last synced: 26 Dec 2024
https://github.com/andreasholt/cusmc
A CUDA-accelerated Statistical Model Checker for Stochastic Timed Automata
Last synced: 02 Jan 2025
https://github.com/tthebc01/cudaconda3
Lightweight container environment with Cuda, Miniconda3, and Jupyter Lab.
cuda docker gpu jupyterlab marimo-notebook miniconda3 reverse-proxy-application
Last synced: 03 Jan 2025
https://github.com/kar-dim/watermarking-gpu
Code for my Diploma thesis at Information and Communication Systems Engineering (University of the Aegean, School of Engineering) with title "Efficient implementation of watermark and watermark detection algorithms for image and video using the graphics processing unit". Part 2 / GPU
arrayfire cpp cuda gpu image-processing opencl parallel-computing video-processing watermark-image watermarking
Last synced: 04 Jan 2025
https://github.com/szymon423/tsp-cpu-vs-gpu
Simple brute force approach to solve travelling salesman problem with CPU and GPU
Last synced: 18 Jan 2025
https://github.com/andreimoraru123/contextcollector
Mixed vision-language Attention Model that gets better by making mistakes
attention attention-mechanism coco-api computer-vision cuda cudnn image-captioning lstm mscoco-dataset multimodal-deep-learning natural-language-processing object-detection opencv pytorch resnet show-and-tell show-attend-and-tell video-inference vision-language yolo
Last synced: 18 Jan 2025
https://github.com/dito97/gol
High-performance Computing (90535) final project at UniGe
Last synced: 14 Feb 2025
https://github.com/nellogan/distributed_compy
Distributed_compy is a distributed computing library that offers multi-threading, heterogeneous (CPU + mult-GPU), and multi-node support
cluster cuda heterogeneous-parallel-programming multi-threading multigpu openmp openmpi
Last synced: 12 Jan 2025
https://github.com/rogerallen/jmandelbrotr
Java CUDA Mandelbrot explorer
cuda cuda-opengl java jcuda joml lwjgl3 mandelbrot-viewer opengl
Last synced: 25 Jan 2025
https://github.com/pd2871/high-performance-computing
This repo contain the logs of High Performance Computing module's final Assignment
blurred-images c cuda gaussian-blur matrix-multiplication multi-threading parallel-computing pthreads pthreads-api
Last synced: 25 Jan 2025
https://github.com/orlandopalmeira/trabalho-cp-2023-2024
Repositório do trabalho prático no âmbito da UC de Computação Paralela (CP) - Mestrado em Engenharia Informática (MEI/MIEI) - Universidade do Minho (UMinho)
computacao-paralela cp cuda cuda-programming mei miei nvidia nvidia-cuda openmp optimization optimization-problem parallelism performance uminho uminho-mei uminho-miei
Last synced: 25 Jan 2025
https://github.com/xmas7/cudampi
A large hybrid CPU/GPU sorting network using CUDA and MPI. The sorting network uses a standard Quicksort for CPUs and a custom Bitonic Sort for GPUs. These two algorithms were the fastest in a number of prior benchmarks.
cpu cuda gpu hybrid mpi network
Last synced: 01 Feb 2025
https://github.com/fixstars/cuda-multi-view-stereo
C++/CUDA library for Multi-View Stereo
3d-reconstruction computer-vision cuda multi-view-stereo structure-from-motion
Last synced: 09 Feb 2025
https://github.com/raad-labs/raad-video
A high-performance video loading library for machine learning, designed for efficient training data preparation.
cuda machine-learning training-data
Last synced: 09 Feb 2025
https://github.com/kilamper/matrix-multiplication
AC - Matrix multiplication using OpenMP, MPI and CUDA
Last synced: 26 Jan 2025
https://github.com/brosnanyuen/raybnn_graph
Graph Manipulation Library For GPUs, CPUs, and FPGAs via CUDA, OpenCL, and oneAPI
cuda gpu graph graph-algorithms neural-network neural-networks opencl raybnn rust
Last synced: 13 Feb 2025
https://github.com/liuyuweitarek/pytorch-docker-builder
Automate PyTorch Docker image builds with compatible Python, CUDA, and Poetry versions, including CI/CD for testing.
cicd containerd cuda docker docker-image poetry-python python python3 pytorch pytorch-docker
Last synced: 24 Jan 2025
https://github.com/programmer-rd-ai/detectx
A Pythonic approach to object detection using Detectron2, a clean, modular framework for training and deploying computer vision models. DetectX simplifies the complexity of object detection while maintaining high performance and extensibility.
coco-dataset computer-vision computer-vision-library cuda deep-learning detectron2 faster-rcnn gpu-accelerated machine-learning ml-framework object-detection object-recognition python3 pytorch retinanet
Last synced: 12 Jan 2025
https://github.com/programmer-rd-ai/digivis
A PyTorch-based deep learning implementation for MNIST digit recognition featuring CNNs, GPU acceleration, experiment tracking, and comprehensive testing capabilities.
cnn computer-vision cuda data-science deep-learning digit-recognition image-classification machine-learning mnist neural-networks python pytorch wandb
Last synced: 12 Jan 2025
https://github.com/michaelfranzl/image_debian-gpgpu
Dockerfile for a Debian base image with AMD and Nvidia GPGPU support
amd container container-image cuda debian docker gpgpu nvidia opencl
Last synced: 21 Jan 2025
https://github.com/kaierikniermann/hpc-uzh-notes
These are some notes for the High Performance Computing course taught at UZH
cuda high-performance-computing mpi openacc openmp
Last synced: 12 Jan 2025
https://github.com/fandreuz/parallel-programming-for-hpc
Scientific codes in C/C++ with CUDA, OpenACC, FFTW, (cu)BLAS
Last synced: 21 Jan 2025
https://github.com/maelstrom6/mandelpy
A Mandelbrot and Buddhabrot viewer with GPU acceleration
buddhabrot cuda gpu mandelbrot python3
Last synced: 05 Feb 2025
https://github.com/ergonomech/comfyui-windows-installer
Automated setup for ComfyUI on Windows with CUDA, custom plugins, and optimized PyTorch settings. Made to Run as Server and Error Correct,. Easy installation and launch using Miniconda.
automation comfy conda conda-environment cuda hosting-deployment setup windows
Last synced: 06 Feb 2025
https://github.com/whutao/artificial-art
Image approximation with triangles using evolutionary algorithm.
cuda evolutionary-algorithm python3
Last synced: 16 Jan 2025
https://github.com/speedcell4/torchdevice
Setup CUDA_VISIBLE_DEVICES
cuda deep-learning gpu machine-learning pytorch
Last synced: 08 Feb 2025
https://github.com/alpha74/hungarianalgocuda
Hungarian Algorithm for Linear Assignment Problem implemented using CUDA.
cuda nvcc parallel-computing parallel-programming
Last synced: 16 Jan 2025
https://github.com/mortafix/quickshift
A working implementation of Quickshift algorithm in CUDA, GPU-compatible.
Last synced: 13 Jan 2025
https://github.com/dhruvsrikanth/cudann
A distributed implementation of a deep learning framework in CUDA.
cpp cuda deep-learning deep-learning-framework gpu-programming high-performance-computing hpc parallel-programming
Last synced: 25 Dec 2024
https://github.com/giorgiogamba/parallel_programming
Experimenting with parallel programming
cuda cuda-kernels cuda-programming cuda-toolkit parallel parallel-computing parallel-processing parallel-programming visual-studio
Last synced: 30 Dec 2024
https://github.com/jonathanraiman/mini_cuda_rtc
Miniature CUDA Array library with Runtime Compilation
cpp11 cuda jit runtime-compilation
Last synced: 22 Jan 2025
https://github.com/m-torhan/cuda-stl-renderer
CUDA C++ implementation of STL file renderer using ray tracing method
Last synced: 31 Dec 2024
https://github.com/aliyoussef97/triton-hub
A container of various PyTorch neural network modules written in Triton.
cuda deep-learning openai pytorch triton triton-lang
Last synced: 05 Feb 2025
https://github.com/meirbek-dev/face-mask_detector
Обнаружие маски на лице в реальном времени
artificial-intelligence covid-19 cuda cudnn deep-learning face-mask graduation-project jupyter-notebook keras machine-learning mask-detection mobilnet-v2 object-detection object-recognition object-tracking opencv4-python python real-time supervised-learning tensorflow2-gpu
Last synced: 11 Jan 2025
https://github.com/dolongbien/cuda
CUDA and Caffe/Caffe2 installation Ubuntu 16.04
c3d-intel-caffe caffe caffe2 cuda cudnn deep-learning ubuntu
Last synced: 21 Jan 2025
https://github.com/abhinavsharma07/streamlit
Stable Diffusion
clip cuda denoising diffusers generative-models latent-diffusion latent-space lms-scheduler unet
Last synced: 05 Feb 2025
https://github.com/vietdoo/seam-carving-cuda
CUDA Seam Carving: Accelerating Image Resizing with GPU Computing
cc cuda cuda-programming gpu-computing parrallel-computing seam-carving
Last synced: 07 Feb 2025
https://github.com/adamczykpiotr/cudamatrixlibrary
Matrix operation library using single, n-threads or CUDA supported GPU
agh agh-ust cpp cuda cuda-library matrix matrix-computations matrix-functions matrix-multiplication
Last synced: 19 Jan 2025
https://github.com/brosnanyuen/raybnn_dataloader
Data Loader for RayBNN
arrayfire cpu csv csv-parser cuda data-structures gpu-computing oneapi opencl parallel parallel-computing rust
Last synced: 13 Jan 2025
https://github.com/jessetg/cuda-practice
Working through the chapters of Cuda by Example
c cpp cuda cuda-by-example gpgpu
Last synced: 14 Jan 2025
https://github.com/wallneradam/docker-ccminer
CCMiner (tpruvot version) Docker Builder
ccminer cuda docker gpu litecoin miner monero nvidia nvidia-docker
Last synced: 01 Feb 2025
https://github.com/mala13f/statistical-learning-in-finance
This Repository contains all the codes, papers and related data for assignments done during the course.
cuda gpu-acceleration jupyter-notebook machine-learning python statistical-learning
Last synced: 31 Jan 2025
https://github.com/dotblueshoes/robertscross
The Roberts cross operator is used in image processing and computer vision for edge detection.
cuda edge-detection image-processing
Last synced: 05 Feb 2025
https://github.com/mhaseeb123/gcb
GCB includes a suite of benchmarks and basic tests for CUDA-aware MPI and C++ compilers.
cpp cpp23 cuda mpi partitioned-communication st-mpi
Last synced: 24 Jan 2025
https://github.com/inventwithdean/cuda_mlp
Implementation of a simple Multilayer Perceptron in pure CUDA
cuda cuda-programming deep-learning neural-networks
Last synced: 05 Feb 2025
https://github.com/nickolasrm/gpuvscpumatrixmultiplication
CPU and GPU optimized matrix multiplication (AVX, transposition, CUDA and other)
avx comparison cuda hpc matrix multiplication
Last synced: 28 Dec 2024
https://github.com/poodarchu/vision-lab
Computer Vision Experiments in all.
computer-vision cuda object-detection
Last synced: 28 Jan 2025
https://github.com/neoblizz/cupti-plus-plus
CUPTI++ is a C++ interface to the CUDA Profiling Tools Interface (CUPTI).
cpp cuda cuda-profiler cupti profiler
Last synced: 09 Feb 2025
https://github.com/bjornmelin/deep-learning-evolution
🧠 Deep-Learning Evolution: Unified collection of TensorFlow & PyTorch projects, featuring custom CUDA kernels, distributed training, memory‑efficient methods, and production‑ready pipelines. Showcases advanced GPU optimizations, from foundational models to cutting‑edge architectures. 🚀
ai-research cuda data-science deep-learning distributed-training gan gpu-acceleration machine-learning model-optimization neural-networks python pytorch tensorflow training-pipeline transformers
Last synced: 05 Feb 2025
https://github.com/tlabaltoh/tlab-sharescreen-server-win
Software frame encoder using CUDA and cast encoded frames over UDP. Trying to implement a custom streaming protocol and shader based frame encoder/decoder for screencast.
cuda desktop-capture screensharing unity unity3d windows-graphics-capture
Last synced: 28 Jan 2025
https://github.com/alextmjugador/rust-cuda-quickstart
Bring the Rust-CUDA project back to life under modern Linux environments.
cuda cuda-programming cuda-rust cuda-support docker rust
Last synced: 26 Jan 2025
https://github.com/zeloe/juce_cuda_convolution
Linear realtime convolution using CUDA
audio audio-processing convolution cuda dsp juce
Last synced: 25 Dec 2024
https://github.com/lcsb-biocore/cufluxsampler.jl
GPU-accelerated algorithms for flux sampling in CUDA.jl
cobra cuda gpu julia metabolic-network metabolism sampling
Last synced: 30 Jan 2025
https://github.com/enp1s0/curand_fp16
FP16 pseudo random number generator on GPU
cuda gpu half-precision random-number-generators
Last synced: 26 Dec 2024
https://github.com/garciparedes/cuda-examples
Cuda examples who I develop to learn HPC based on GPU
c c-plus-plus cuda examples gpgpu gpu hpc
Last synced: 16 Jan 2025
https://github.com/ruturaj4/cuda_nvidia_tutorial
cuda projects
cuda cuda-vector-addition nvidia nvidia-cuda parallel
Last synced: 16 Jan 2025
https://github.com/dansolombrino/gphungarian
A GPU-accelerated implementation of the Hungarian Algorithm, written in CUDA
Last synced: 07 Feb 2025
https://github.com/anne-andresen/multi-modal-cuda-c-gan
Raw C/cuda implementation of 3d GAN
3d 3d-models attention-mechanism c cross-attention cross-attention-c cuda gan gan-models low-level-programming medical-imaging multimodal-deep-learning pytorch transformer-pytorch transformers transformers-c
Last synced: 05 Nov 2024
https://github.com/tyler-hilbert/cuda-linearregression
Linear Regression written from scratch in CUDA
ai cublas cuda gpu linear-regression nsight
Last synced: 05 Feb 2025
https://github.com/orgh0/highperformancecnn
Implementation of a High Performance CNN for MNIST dataset
Last synced: 22 Jan 2025
https://github.com/kar-dim/fidelityfx-cas-cuda
Implementation of the AMD FidelityFX CAS (Contrast Adaptive Sharpening) algorithm on CUDA, for sharpening static images.
cpp cuda dll fidelityfx gpu image-processing parallel-computing sharpen
Last synced: 26 Dec 2024
https://github.com/weiyu0824/flash-attention-lite
Basic Flash attention Implmentation
Last synced: 05 Feb 2025
https://github.com/pjueon/cuda_intellisense
A simple python script to fix cuda C++ intellisense for visual studio.
Last synced: 23 Oct 2024
https://github.com/galaxies99/inception-cuda
CUDA Implementation of Inception
Last synced: 07 Nov 2024
https://github.com/rhysdg/whisper-onnx-python
A low-footprint GPU accelerated Speech to Text Python package for the Jetpack 5 era bolstered by an optimized graph
ai chatbot cuda machine-learning onnxruntime speech-to-text whisper
Last synced: 08 Feb 2025
https://github.com/brendanbignell/cuda_montecarlooptionpricer
CUDA Monte Carlo Barrier Option Pricing Demo & Jupyer lab ML models
cuda deep-learning ml pytorch quantitative-finance xgboost-regression
Last synced: 05 Feb 2025
https://github.com/bokutotu/cudnn_graph_api_example
cudnn graph api example
Last synced: 14 Feb 2025
https://github.com/pratikvn/nla4hpc-exercises-framework
The exercises framework for the Numerical Linear Algebra for HPC course at Karlsruhe Institute of Technology.
cuda ginkgo homeworks hpc-course teaching
Last synced: 26 Jan 2025
https://github.com/mayukhdeb/patrick
Tiny neural net library written from scratch with cupy :warning: under construction :warning:
cuda deep-learning gpu-computing machine-learning neural-network regression
Last synced: 12 Feb 2025
https://github.com/duskvirkus/ofxarrayfire
An openFrameworks addon with pre-compiled binaries of ArrayFire.
arrayfire cuda ofxaddon openframeworks openframeworks-addon
Last synced: 25 Jan 2025
https://github.com/miniex/maidenx
Rust-based CUDA library designed for learning purposes and building my AI engines named Maiden Engine
Last synced: 28 Oct 2024
https://github.com/le-ander/msc_bioinfo-experimental_design
Using information theory to inform experimental design with GPU acceleration. Computing group project as part of the MSc in Bioinformatics and Theorectical Systems Biology at Imperial College London 2016/2017.
cuda experimental-design gpu-computing information-theory pycuda systems-biology
Last synced: 31 Jan 2025
https://github.com/sartajbhuvaji/cuda
Deloped CUDA kernel functions to load and train a Convolution Neural Network from scratch.
cuda cuda-programming gpu-programming neural-network nvidia-cuda
Last synced: 05 Feb 2025
https://github.com/pharmcat/metidacu.jl
CUDA solver for Metida.jl
cuda julia-language metida mixed-models
Last synced: 09 Feb 2025
https://github.com/stanczakdominik/cuda_poisson
A 2D poisson solver via CUDA
Last synced: 04 Feb 2025
https://github.com/thisalmandula/gpu_accelerated_lpt_cfd_code
This repository contains GPU accelerated version of the particle tracking model developed by Merel Kooi for biofouled microplastic particles ( available at: https://pubs.acs.org/doi/10.1021/acs.est.6b04702) written in CUDA Fortran and CUDA Python. This repository is intended as a learning tool for GPU programming.
biofouling computational-fluid-dynamics cuda fortran lagrangian-particle-tracking microplastics python
Last synced: 02 Feb 2025
https://github.com/thomasonzhou/minitorch
rebuilding pytorch: from autograd to convolutions in CUDA
Last synced: 30 Dec 2024
https://github.com/ashwanirathee/imagesgpu.jl
Image Processing on GPU in Julia
cuda gpu image image-processing julia
Last synced: 08 Jan 2025