Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
CUDA
![](https://explore-feed.github.com/topics/cuda/cuda.png)
CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.
- GitHub: https://github.com/topics/cuda
- Wikipedia: https://en.wikipedia.org/wiki/CUDA
- Created by: Nvidia
- Released: June 23, 2007
- Related Topics: nvcc,
- Last updated: 2025-02-13 00:07:16 UTC
- JSON Representation
https://github.com/jedbrooke/cuda_bwt
CUDA accelerated burrows-wheeler transform
bioinformatics burrows-wheeler-transform bwt compression cuda
Last synced: 19 Jan 2025
https://github.com/egororachyov/spbench
Benchmark for sparse linear algebra libraries for CPU and GPU platforms.
benchmark cpp cpu cuda gpu-computing graphblas opencl sparse-matrices
Last synced: 19 Nov 2024
https://github.com/garciparedes/parallel-scan-sky
Parallel Computing work
c cuda high-performance-computing hpc mpi openmp parallel parallel-algorithm parallel-computing parallel-processing parallel-programming parallelism parallelization university-of-valladolid
Last synced: 16 Jan 2025
https://github.com/zhangge6/how-to-optimize-playground
High-performance computing (HPC) demos since I was a freshmen.
Last synced: 12 Feb 2025
https://github.com/definetlynotai/llm_data
A bunch of very famous repos source code's in python as pure localdocs all in this repo to train CODE AI
c code-examples cpp cuda data data-dum jupyter-notebook llm llm-code llm-datasets programming-data programming-data-sets python3
Last synced: 26 Jan 2025
https://github.com/bokutotu/curs
cuda&cublas&cudnn wrapper for Rust
cuda deep-learning high-performance-computing hpc rust
Last synced: 22 Dec 2024
https://github.com/ammaryasirnaich/deeplearning_playland
This repository contains Docker Image files, which support the common frameworks required for Deep learning implementation. The images support both the latest GPU (Nvidia CUDA) and CPU processors.
cuda cuda11 cudnn cudnn8 deep-learning docker docker-image dockerfile gpu kersa opencv pytorch pytorch-cnn scikit-learn tensorflow2
Last synced: 08 Feb 2025
https://github.com/brosnanyuen/raybnn_diffeq
Differential Equation Solver using GPUs, CPUs, and FPGAs via CUDA, OpenCL, and oneAPI
arrayfire cuda differential differential-equations gpu gpu-computing opencl parallel parallel-computing parallel-programming raybnn rust
Last synced: 13 Nov 2024
https://github.com/bruce-lee-ly/cuda_back2back_hgemm
Use tensor core to calculate back-to-back HGEMM (half-precision general matrix multiplication) with MMA PTX instruction.
back2back-gemm back2back-hgemm cublas cuda fused-gemm fused-hgemm gemm gpu hgemm matrix-multiply nvidia tensor-core
Last synced: 15 Nov 2024
https://github.com/mr-technologies/iff
MRTech IFF SDK documentation
basler camera cuda demosaicing dng gpu h264 h265 image-processing jetson json low-latency machine-vision mipi nvidia-gpu rest-api rtsp sdk tiff ximea
Last synced: 24 Nov 2024
https://github.com/deftruth/ptx-isa-8.2-zh
🎉持续更新:CUDA 12.2 PTX-ISA-8.2学习笔记,部分中文翻译 + 个人理解 + 内联汇编示例,讲解CUDA 12.2 PTX-ISA-8.2 汇编指令;进行中.....
Last synced: 09 Feb 2025
https://github.com/mr-technologies/imagebrokerpy
Example of image export from MRTech IFF Python SDK
camera cuda demosaicing dng genicam gpu h264 h265 image-processing jetson json low-latency machine-vision mipi opencv python rest-api rtsp tiff vulkan
Last synced: 05 Feb 2025
https://github.com/gurbaaz27/cs433a-design-exercises
Solutions of design exercises in CS433A: Parallel Programming, Spring Semester 2021-22
barriers cuda gpu-programming locks openmp parallel-programming posix-threads semaphores
Last synced: 14 Nov 2024
https://github.com/mirzaim/cuda-devcontainer
CUDA Development Container
cuda devcontainer devcontainers docker remote-development
Last synced: 09 Jan 2025
https://github.com/dereklstinson/nccl
golang wrapper for nccl
cuda deep-learning go nccl parallel-computing
Last synced: 15 Jan 2025
https://github.com/lmlsna/install-scripts
Ubuntu install scripts
cuda do-release-upgrade eol nvidia tailscale ubuntu
Last synced: 25 Nov 2024
https://github.com/ilyasmoutawwakil/optimum-whisper-autobenchmark
A set of benchmarks on OpenAI's Whisper model, using AutoBenchmark and Optimum's OnnxRuntime Optimizations.
Last synced: 30 Jan 2025
https://github.com/andreabak/whispersubs
Generate subtitles for your video or audio files using the power of AI
ai cuda deep-learning gpu-acceleration machine-learning srt subtitles transcribe transcription translate whisper
Last synced: 16 Nov 2024
https://github.com/betarixm/cuecc
POSTECH: Heterogeneous Parallel Computing (Fall 2023)
cryptography ctypes cuda ecc postech secp256k1
Last synced: 18 Nov 2024
https://github.com/evanmcclure/hello_gpu
Hello world example for Rust on GPU
apple apple-silicon cuda cuda-programming example-project gpu gpu-programming gpu-support metal rust rust-lang
Last synced: 20 Jan 2025
https://github.com/cppalliance/crypt
A C++20 module of cryptographic utilities for CPU and GPU
Last synced: 09 Jan 2025
https://github.com/stdogpkg/cukuramoto
A python/CUDA pkg which solves numerically the kuramoto model through the Heun's method
complex-networks cuda kuramoto-model
Last synced: 29 Jan 2025
https://github.com/trilliwon/cuda-examples
CUDA examples
cuda gpu-computing nvidia-cuda parallel parallel-computing parallel-programming
Last synced: 30 Jan 2025
https://github.com/pedro-avalos/gpu-burn-snap
Unofficial snap for GPU Burn
cuda gpu gpu-burn linux package snap snapcraft stress-test stress-testing
Last synced: 10 Feb 2025
https://github.com/akhuntsaria/canny-edge-detection
Canny edge detector implemented in CUDA C/C++
cuda image-processing video-processing
Last synced: 23 Oct 2024
https://github.com/xusworld/tars
Tars is a cool deep learning framework.
avx2 avx512 cuda deep-learning
Last synced: 05 Feb 2025
https://github.com/peri044/cuda
GPU implementations of algorithms
cuda gauss-jordan parallel-programming
Last synced: 08 Feb 2025
https://github.com/isazi/aoflagger
AOFlagger Radio Frequency Interference mitigation algorithm.
Last synced: 30 Jan 2025
https://github.com/yingding/applyllm
A python package for applying LLM with LangChain and Hugging Face on local CUDA/MPS host
accelerator batch cuda framework inference kubeflow langchain llm mps pipeline slurm transformers
Last synced: 22 Dec 2024
https://github.com/coreylowman/tenten
A tiny tensor library in rust with fused JIT operations.
Last synced: 07 Jan 2025
https://github.com/vorticity-inc/vtensor
VTensor, a C++ library, facilitates tensor manipulation on GPUs, emulating the python-numpy style for ease of use. It leverages RMM (RAPIDS Memory Manager) for efficient device memory management. It also supports xtensor for host memory operations.
cublas cuda curand cusolver gpu numpy rmm tensor xarray xtensor
Last synced: 10 Dec 2024
https://github.com/raad-labs/raad-video
A high-performance video loading library for machine learning, designed for efficient training data preparation.
cuda machine-learning training-data
Last synced: 09 Feb 2025
https://github.com/DefTruth/hgemm-tensorcores-mma
⚡️Write HGEMM from scratch using Tensor Cores with WMMA, MMA PTX and CuTe API. 🎉🎉
Last synced: 06 Dec 2024
https://github.com/amypad/numcu
Numerical CUDA-based Python library
array buffer c cpp cpython cpython-api cpython-extensions cuda cxx hacktoberfest numpy python vector
Last synced: 25 Nov 2024
https://github.com/xmas7/cudampi
A large hybrid CPU/GPU sorting network using CUDA and MPI. The sorting network uses a standard Quicksort for CPUs and a custom Bitonic Sort for GPUs. These two algorithms were the fastest in a number of prior benchmarks.
cpu cuda gpu hybrid mpi network
Last synced: 01 Feb 2025
https://github.com/liny18/image-to-ascii
🖼️ A command-line tool for converting images to ASCII art
ascii ascii-art cli command-line cpp cuda docker image-processing image-to-ascii mpi opencv terminal
Last synced: 21 Nov 2024
https://github.com/superlinear-ai/scipy-notebook-gpu
jupyter/scipy-notebook with CUDA Toolkit, cuDNN, NCCL, and TensorRT
cuda cudnn docker nccl scipy-notebook tensorflow tensorrt
Last synced: 09 Jan 2025
https://github.com/mrglaster/cuda-acfcalc
Calculation of the smallest ACF for signals of length N using CUDA technology.
acf c calculations cpp cuda google-colaboratory google-colaboratory-notebooks isu
Last synced: 15 Jan 2025
https://github.com/brosnanyuen/raybnn_neural
Neural Networks with Sparse Weights in Rust using GPUs, CPUs, and FPGAs via CUDA, OpenCL, and oneAPI
cpu cuda deep-learning gpu machine-learning machine-learning-algorithms neural-network neural-networks opencl parallel raybnn rust sparse-network sparse-neural-networks
Last synced: 13 Nov 2024
https://github.com/fixstars/cuda-multi-view-stereo
C++/CUDA library for Multi-View Stereo
3d-reconstruction computer-vision cuda multi-view-stereo structure-from-motion
Last synced: 09 Feb 2025
https://github.com/samuraibupt/cuda_code
Codes for learning cuda. Implementation of multiple kernels.
Last synced: 23 Oct 2024
https://github.com/gjbex/gpu-programming
Material for a training on portable GPU programming
cuda gpu kokkos openmp openmp-off stl thrust
Last synced: 22 Nov 2024
https://github.com/hanzhi713/bitonic-sort
In-place GPU sort with bitonic sort
bitonic-sort cuda gpu in-place sorting
Last synced: 26 Jan 2025
https://github.com/kim-hwiwon/t-espresso
A CUDA Library for Low-overhead Host-to-Device Transmission of Patterned Profile Data
Last synced: 27 Dec 2024
https://github.com/orlandopalmeira/trabalho-cp-2023-2024
Repositório do trabalho prático no âmbito da UC de Computação Paralela (CP) - Mestrado em Engenharia Informática (MEI/MIEI) - Universidade do Minho (UMinho)
computacao-paralela cp cuda cuda-programming mei miei nvidia nvidia-cuda openmp optimization optimization-problem parallelism performance uminho uminho-mei uminho-miei
Last synced: 25 Jan 2025
https://github.com/kishore-narendran/eecs221-highperformancecomputing
Assignments done during the graduate course EECS 221 - Introduction to HPC that I took in the Spring Quarter of 2016 at University of California, Irvine. Involves assignments that use OpenMP, MPI and CUDA.
Last synced: 05 Jan 2025
https://github.com/sagdrip/cudarrows
CUDA port of Logic Arrows
cellular-automata cuda gpu-acceleration logic-gates
Last synced: 21 Jan 2025
https://github.com/pfcclab/open3d
Open3D: A Modern Library for 3D Data Processing
3d 3d-perception arm computer-graphics cpp cuda gpu gui machine-learning mesh-processing odometry opengl paddle pointcloud python reconstruction registration rendering tensorflow visualization
Last synced: 21 Jan 2025
https://github.com/pd2871/high-performance-computing
This repo contain the logs of High Performance Computing module's final Assignment
blurred-images c cuda gaussian-blur matrix-multiplication multi-threading parallel-computing pthreads pthreads-api
Last synced: 25 Jan 2025
https://github.com/pothosware/pothosgpu
Pothos toolkit for ArrayFire API support
arrayfire cuda dataflow dataflow-programming gpu opencl pothos
Last synced: 15 Jan 2025
https://github.com/rogerallen/jmandelbrotr
Java CUDA Mandelbrot explorer
cuda cuda-opengl java jcuda joml lwjgl3 mandelbrot-viewer opengl
Last synced: 25 Jan 2025
https://github.com/dzimiks/cuda-matrix-multiplication
CUDA Matrix Multiplication
cuda matrix matrix-multiplication python
Last synced: 03 Jan 2025
https://github.com/muhac/jupyter-pytorch-docker
JupyterLab for AI in Docker! Anaconda and PyTorch GPU supported.
conda-environment cuda docker jupyterlab pytorch
Last synced: 21 Jan 2025
https://github.com/brocbyte/realtime-deformations
Snow simulation (Material Point Method)
cuda glm material-point-method opengl
Last synced: 09 Nov 2024
https://github.com/neoblizz/spmv
Efficient Sparse Matrix-Vector Multiplication (SpMV) using ModernGPU (MTX + CSR formats).
csr cuda gpgpu load-balancing mtx spmv
Last synced: 09 Feb 2025
https://github.com/B1-663R/docker-mining
Dockerfiles to build docker images to start mining with an NVIDIA Docker architecture
cryptocurrency cuda docker-image docker-nvidia mining
Last synced: 31 Oct 2024
https://github.com/dark-art108/artistic-style-transfer-cnn
cnn-architecture colab-notebooks cuda pil vgg19
Last synced: 11 Jan 2025
https://github.com/lchsk/ney
A header-only parallel functions library for Intel Xeon/Xeon Phi/GPUs
cuda gpu linux parallel phi scientific xeon xeonphi
Last synced: 08 Jan 2025
https://github.com/juntyr/necsim-rust
Spatially explicit biodiversity simulations using a parallel library written in Rust
biodiversity cuda mpi necsim rust simulation
Last synced: 28 Oct 2024
https://github.com/arminms/p2rng
A modern header-only C++ library for parallel algorithmic (pseudo) random number generation supporting OpenMP, CUDA, ROCm and oneAPI
cpp cuda cxx header-only heterogeneous-computing library linux macos multiplatorm oneapi openmp parallel pcg-random prng pseudorandom-number-generator random-number-distributions random-number-generation rocm stl-algorithms windows
Last synced: 05 Nov 2024
https://github.com/nellogan/distributed_compy
Distributed_compy is a distributed computing library that offers multi-threading, heterogeneous (CPU + mult-GPU), and multi-node support
cluster cuda heterogeneous-parallel-programming multi-threading multigpu openmp openmpi
Last synced: 12 Jan 2025
https://github.com/andreimoraru123/contextcollector
Mixed vision-language Attention Model that gets better by making mistakes
attention attention-mechanism coco-api computer-vision cuda cudnn image-captioning lstm mscoco-dataset multimodal-deep-learning natural-language-processing object-detection opencv pytorch resnet show-and-tell show-attend-and-tell video-inference vision-language yolo
Last synced: 18 Jan 2025
https://github.com/szymon423/tsp-cpu-vs-gpu
Simple brute force approach to solve travelling salesman problem with CPU and GPU
Last synced: 18 Jan 2025
https://github.com/kar-dim/watermarking-gpu
Code for my Diploma thesis at Information and Communication Systems Engineering (University of the Aegean, School of Engineering) with title "Efficient implementation of watermark and watermark detection algorithms for image and video using the graphics processing unit". Part 2 / GPU
arrayfire cpp cuda gpu image-processing opencl parallel-computing video-processing watermark-image watermarking
Last synced: 04 Jan 2025
https://github.com/kohulan/tensorflow-2.0-installation-with-cuda-support
A detailed step by step guide to install Tensorflow-2.0-gpu with CUDA Drivers on Ubuntu Server/ Desktop LTS
Last synced: 30 Nov 2024
https://github.com/alpha74/cuda_basics
Nvidia NVCC CUDA programs for begineers.
c cpp cuda cuda-programs nvcc nvidia parallel-computing parallel-programming
Last synced: 16 Jan 2025
https://github.com/mindstudioofficial/fl_cuda_mandelbrot
Flutter example for visualizing the Mandelbrot Set using CUDA
cuda flutter-examples fractal-rendering
Last synced: 11 Jan 2025
https://github.com/tthebc01/cudaconda3
Lightweight container environment with Cuda, Miniconda3, and Jupyter Lab.
cuda docker gpu jupyterlab marimo-notebook miniconda3 reverse-proxy-application
Last synced: 03 Jan 2025
https://github.com/navdeep-g/dimreduce4gpu
Dimensionality reduction ("dimreduce") on GPUs ("4gpu")
cplusplus cuda dimensionality-reduction gpu linear-algebra pca python svd unsupervised-learning
Last synced: 24 Dec 2024
https://github.com/nachovizzo/saxpy_openacc_cpp
My way of thinking about OpenACC, C++, and Parallel computing in general
Last synced: 30 Jan 2025
https://github.com/frozenassassine/neuralnetwork-fromscratch
Neural Network from scratch in C# with CUDA support
ai classification csharp cuda gpu gpu-acceleration neural-network neural-networks nvidia
Last synced: 13 Nov 2024
https://github.com/mulx10/firefly
Enhancing Object Detection in using Thermal Imaging for thin cross-section unidentifiable objects(eg. cyclist, pedestrians).
autonomous-cars autonomous-navigation autonomous-vehicles c cuda object-detection thermal-camera yolov3
Last synced: 30 Dec 2024
https://github.com/andreasholt/cusmc
A CUDA-accelerated Statistical Model Checker for Stochastic Timed Automata
Last synced: 02 Jan 2025
https://github.com/dujonwalker/nixos-config-x86_64-cuda
This repository contains my NixOS configuration optimized for 64-bit x86 systems with NVIDIA CUDA support, featuring a Plasma 6 desktop environment and a variety of essential applications for development, multimedia, and productivity. It serves as a backup for easy restoration and setup on new installations.
cuda flatpak nix nixos nixos-configuration ollama
Last synced: 26 Dec 2024
https://github.com/qin-yu/julia-svm-gpu-cuda
2019 [Julia] GPU CUDAnative SVM: a stochastic decomposition implementation of support-vector machine training
cpp cuda cuda-programming gpu gpu-computing gpu-programming julia julia-language julia-package machine-learning machine-learning-algorithms machine-learning-library online-learning supervised-learning svm svm-classifier svm-learning svm-library svm-model svm-training
Last synced: 22 Jan 2025
https://github.com/acrlakshman/gradient-augmented-levelset-cuda
Implementation of Gradient Augmented Levelset method for CPU and GPU
Last synced: 13 Feb 2025
https://github.com/LKohlhepp/Ito-Monte-Carlo
MC-Simulation of the Ito-SDE (Krülls 1994)
astronomy astrophysics cuda gpu-acceleration monte-carlo physics-simulation simulation stochastic-differential-equations
Last synced: 23 Oct 2024
https://github.com/soran-ghaderi/torchebm
⚡ Energy-Based Modeling library for PyTorch, offering tools for sampling, inference, and learning in complex distributions.
contrastive-divergence cuda diffusion-models energy-based-model generative-ai langevin-dynamics noise-contrastive-estimation probabilistic-machine-learning reasoning sampling-methods score-matching variational-inference
Last synced: 26 Dec 2024
https://github.com/zeloe/rtconvolver
A realtime convolution VST3
c convolution cplusplus cuda juce
Last synced: 25 Dec 2024
https://github.com/dpbm/qml-course
Minicurso de quantum Machine learning
cuda cuda-q cuquantum docker ml python qml quantum quantum-computing tensorflow
Last synced: 21 Dec 2024
https://github.com/babak2/optimizedsum
Optimized Parallel Sum program demonstrating CPU vs GPU performance
cuda cuda-programming gpu-acceleration gpu-computing gpu-parallelism visual-studio
Last synced: 01 Feb 2025
https://github.com/kagof/julia-image-processing
Image processing programs written in Julia
Last synced: 12 Feb 2025
https://github.com/dqbd/cuda-btree
Implementation of B-Trees on NVIDIA CUDA
Last synced: 13 Feb 2025
https://github.com/yomi4486/zundamon_v3
マスター、お冷ショットで。
cuda discord-bot discord-py docker docker-compose python tts voicevox zundamon
Last synced: 27 Nov 2024
https://github.com/potato3d/grid-rt
GPU-accelerated ray tracing using GLSL and CUDA
cuda glsl gpu ray-tracing real-time-rendering
Last synced: 10 Jan 2025
https://github.com/markdtw/parallel-programming
Basic Pthread, OpenMP, CUDA examples
cuda openmp parallel-programming pthreads
Last synced: 12 Jan 2025
https://github.com/sean-bradley/cudalookupsha256
SHA256 Lookup using parallel processing on a NVidia CUDA Compatible Graphics card
cuda parallel-processing sha256
Last synced: 13 Nov 2024
https://github.com/cfries/javagpuexperiments
Repository used to demo OpenCL, JOCL, JCuda.
Last synced: 27 Dec 2024
https://github.com/trick-17/backends
Interchangeable backends in C++, OpenMP, CUDA, OpenCL, OpenACC
c-plus-plus cross-platform cuda cuda-backend header-only openacc openacc-backend opencl opencl-backend openmp openmp-backend
Last synced: 13 Jan 2025
https://github.com/elftausend/sliced
Array operations with automatic differentiation on CPU and GPU
autograd automatic-differentiation cuda custos matrix opencl
Last synced: 14 Feb 2025
https://github.com/abaksy/cuda-examples
A repository of examples coded in CUDA C/C++
Last synced: 17 Jan 2025