Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
CUDA
CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.
- GitHub: https://github.com/topics/cuda
- Wikipedia: https://en.wikipedia.org/wiki/CUDA
- Created by: Nvidia
- Released: June 23, 2007
- Related Topics: nvcc,
- Last updated: 2025-01-31 00:06:47 UTC
- JSON Representation
https://github.com/baderlab/ecuda
STL-like containers (array, vector, matrix, cube) useable in device code.
Last synced: 05 Nov 2024
https://github.com/huangcongqing/cuda-learning
cuda编程学习入门
cuda cuda-kernels cuda-programming
Last synced: 01 Nov 2024
https://github.com/gjbex/python-for-hpc
Repository for participants of the "Python for HPC" training
cuda cython dask gpu hpc mpi numba python python-training scientific-computing swig training
Last synced: 22 Nov 2024
https://github.com/seonglae/llama2gptq
Chat to LLaMa 2 that also provides responses with reference documents over vector database. Locally available model using GPTQ 4bit quantization.
chatai chatbot chatgpt cuda gpt langchain llama-2 llama2 model-quantization quantization question-answering rye streamlit-chat transformers
Last synced: 12 Dec 2024
https://github.com/ahdhn/cudatemplate
Template for starting CUDA/C++ project using CMake with Github Action for CI
Last synced: 08 Nov 2024
https://github.com/niftypet/nipet
High-throughput PET image reconstruction with high quantitative accuracy and precision
analysis cuda gpu image-reconstruction medical-imaging mlem pet processing python
Last synced: 22 Jan 2025
https://github.com/marianhlavac/fft-cuda
Fast Fourier Transform implementation, computable on CUDA platform. Seminar project for MI-PRC course at FIT CTU.
c-plus-plus coursework cuda fast-fourier-transform fit-ctu nvidia python
Last synced: 08 Nov 2024
https://github.com/LatticeQCD/SIMULATeQCD
SIMULATeQCD is a multi-GPU Lattice QCD framework that makes it easy for physicists to implement lattice QCD formulas while still providing competitive performance.
cuda gpu hip hpc lattice lattice-qcd mpi parallel physics
Last synced: 30 Oct 2024
https://github.com/kwea123/python-ray-tracing-with-cuda-example
An example of cuda ray tracing in pure python syntax.
Last synced: 30 Oct 2024
https://github.com/williamvenner/squad-mortar-helper
💣 SMH – a computer vision project for automatic, precision mortar strike calculations in Squad
computer-vision cuda cv gpu mortar mortars rust smh squad squad-game squadgame
Last synced: 02 Nov 2024
https://github.com/llnl/fpchecker
A dynamic analysis tool to detect floating-point errors in HPC applications.
cancellation clang cuda exceptions floating-point floating-point-arithmetic infinity llvm overflow overflow-detection underflow-detection
Last synced: 11 Nov 2024
https://github.com/pawbz/GeoPhyInv.jl
A Julia Toolbox for Geophysical Modeling and Inverse Problems
cuda elastic finite-difference-method gpu julia parallel-computing poisson seismic-inversion seismic-tomography seismic-waves staggeredgrid
Last synced: 14 Nov 2024
https://github.com/deftruth/cuffpa-py
📚[WIP] FFPA: Yet another Faster Flash Prefill Attention with O(1)🎉GPU SRAM complexity for headdim > 256, ~1.5x🎉faster than SDPA EA.
attention cuda flash-attention mlsys sdpa tensor-cores
Last synced: 09 Jan 2025
https://github.com/rodrgo/OpenPH
Parallel reduction of boundary matrices for Persistent Homology with CUDA
cuda gpu-computing numerical-computation parallel-computing persistent-homology topological-data-analysis
Last synced: 12 Nov 2024
https://github.com/DefTruth/cuffpa-py
📚[WIP] FFPA: Yet another Faster Flash Prefill Attention with O(1)🎉GPU SRAM complexity for headdim > 256, ~1.5x🎉faster than SDPA EA.
attention cuda flash-attention mlsys sdpa tensor-cores
Last synced: 08 Jan 2025
https://github.com/miguelcarcamov/gpuvmem
GPU Framework for Radio Astronomical Image Synthesis
alma astronomical-algorithms astronomical-images astrophysics complex-systems cuda gpu gpu-acceleration gpu-computing image-synthesis maximum-entropy multi-gpu optimization-methods radio-imaging radio-interferometry radioastronomy ska vla
Last synced: 01 Nov 2024
https://github.com/idealab-isu/GPView
GPU Accelerated Voxelization Framework for 3D CAD models.
Last synced: 27 Oct 2024
https://github.com/prg-titech/dynasoar
CUDA Dynamic Memory Allocator for SOA Data Layout
cuda memory-allocation simd smmo
Last synced: 18 Nov 2024
https://github.com/SafeAILab/zkDL
zkDL, an open source toolkit for zero-knowledge proofs of deep learning powered by CUDA
cuda deep-neural-networks gpu-acceleration privacy-enhancing-technologies zero-knowledge-proof
Last synced: 12 Nov 2024
https://github.com/pkestene/ppkmhd
MPI+Kokkos implementation of spectral difference method (SDM) high order schemes
cea cfd cpp cuda finite-volume finite-volume-method finite-volumes gpu gpu-computing hpc hydrodynamics kokkos magnetohydrodynamics mpi parallel-computing performance-portability
Last synced: 18 Dec 2024
https://github.com/anibali/docker-torch
A Docker image for Lua Torch
cuda docker docker-image lua torch
Last synced: 11 Oct 2024
https://github.com/cair/fast-tsetlin-machine-in-cuda-with-imdb-demo
A CUDA implementation of the Tsetlin Machine based on bitwise operators
bitwise-operators cuda explainable-artificial-intelligence gpu-computing pattern-recognition tsetlin-machine
Last synced: 10 Dec 2024
https://github.com/conradsnicta/bandicoot-code
Bandicoot: C++ library for GPU linear algebra & scientific computing - https://coot.sourceforge.io
armadillo c-plus-plus clblas cublas cuda cuda-kernels cusolver gpu gpu-accelerated-library gpu-acceleration gpu-computing linear-algebra linear-algebra-library machine-learning matrix-functions matrix-library opencl opencl-kernels scientific-computing
Last synced: 02 Nov 2024
https://github.com/dereklstinson/gocudnn
Go Bindings for cudnn and other cuda pacakges.
convolutional-neural-networks cuda cudnn go golang machine-learning neural-network
Last synced: 24 Jan 2025
https://github.com/coderonion/cuda-beginner-course-cpp-version
bilibili视频【CUDA 12.x 并行编程入门(C++版)】配套代码
cpp cublas cuda cuda-programming cudnn gpu gpu-programming nvcc nvidia parallel-programming python rust
Last synced: 19 Nov 2024
https://github.com/fclc/multi-plexer
Goal: Low power cluster capable of serving 24+ streams of 4KHDR60 source transcodes while consuming no more than 100W at peak and idling at less than 10W
arm64 clustering cuda decoding encoding ffmpeg hardware hdr jetson jetson-nano jetson-xavier-nx opencl plex pocl raspberry-pi-4 rockpro64 transcode zfs zfsonlinux
Last synced: 11 Oct 2024
https://github.com/blurgyy/jaxngp
JAX implementation of instant-ngp (NeRF part)
cuda hashgrid instant-ngp jax nerf neural-radiance-field nix python wsl
Last synced: 13 Nov 2024
https://github.com/Deyht/CIANNA
Convolutional Interactive Artificial Neural Networks by/for Astrophysicists
astronomy astrophysics convolutional-neural-networks cuda deep-learning deep-neural-networks gpu machine-learning ml neural-network object-detection yolo
Last synced: 09 Nov 2024
https://github.com/enot-autodl/onnx-runtime-with-tensorrt-and-openvino
Docker scripts for building ONNX Runtime with TensorRT and OpenVINO in manylinux environment
cuda nvidia onnx onnxruntime openvino tensorrt
Last synced: 07 Nov 2024
https://github.com/kamalkraj/swin-transformer-serve
Deploy Swin Transformer using TorchServe
cuda deploy docker eager-load microsoft python3 pytorch swin-transformer torch torchscript torchserve
Last synced: 07 Nov 2024
https://github.com/ENOT-AutoDL/ONNX-Runtime-with-TensorRT-and-OpenVINO
Docker scripts for building ONNX Runtime with TensorRT and OpenVINO in manylinux environment
cuda nvidia onnx onnxruntime openvino tensorrt
Last synced: 28 Oct 2024
https://github.com/johnh2o2/cuvarbase
Python library for fast time-series analysis on CUDA GPUs
cuda fourier-methods gpu gpu-computing lomb-scargle-periodogram nfft python python-3 time-series
Last synced: 02 Nov 2024
https://github.com/juliagpu/nccl.jl
A Julia wrapper for the NVIDIA Collective Communications Library.
Last synced: 12 Nov 2024
https://github.com/m-pilia/disptools
Generate displacement fields with known volume changes
cuda image-processing jacobian python3
Last synced: 12 Nov 2024
https://github.com/whizzzkid/opencv-complete-build-cuda
Full build script for Open CV with/without cuda and bumblebee support
build build-automation cuda cudnn install installer opencv opencv-library opencv-python opencv3 opencv3-python
Last synced: 26 Dec 2024
https://github.com/jimouris/parallel-convolution
🖼️ Parallel Image Convolution, applying a blur filter to images. Written in C, optimized in three different ways: MPI, MPI & OpenMP and CUDA.
blur-filter cuda image-convolution image-processing mpi mpi-library parallel-processing
Last synced: 06 Nov 2024
https://github.com/sandialabs/lgrtk
Tool Kit for Lagrangian Grid Reconnection
cuda gpu hpc physics sandia-national-laboratories scr-2300 snl-applications
Last synced: 12 Nov 2024
https://github.com/carpentries-incubator/lesson-gpu-programming
GPU Programming with Python and CUDA.
beta carpentries-incubator cuda cupy english gpu lesson lesson-gpu-programming numba parallel-programming programming python
Last synced: 29 Dec 2024
https://github.com/illuhad/hipCPU
Implementation of AMD HIP for CPUs
cuda gpgpu hip hpc openmp openmp-parallelization
Last synced: 09 Nov 2024
https://github.com/ktaletsk/NCCV
Short course on computer vision and image processing using Numba+CUDA+OpenCV
computer-vision cuda jupyter-notebook numba
Last synced: 15 Nov 2024
https://github.com/nasa-jpl/flightview
Real-time tools for Imaging Spectroscopy Data
aviris camera cameralink cuda hyperspectral hyperspectral-analysis hyperspectral-data rtp spectroscopy
Last synced: 10 Nov 2024
https://github.com/illuhad/hipcpu
Implementation of AMD HIP for CPUs
cuda gpgpu hip hpc openmp openmp-parallelization
Last synced: 28 Nov 2024
https://github.com/kostyaev/sentence2vec
Deep sentence embedding using Sequence to Sequence learning
cuda sentence2vec seq2seq torch
Last synced: 28 Oct 2024
https://github.com/ktaletsk/nccv
Short course on computer vision and image processing using Numba+CUDA+OpenCV
computer-vision cuda jupyter-notebook numba
Last synced: 31 Dec 2024
https://github.com/heavyai/heavyai.jl
Julia client for OmniSci GPU-accelerated SQL engine and analytics platform
cuda data-science database gpu julia-language julia-package julialang sql
Last synced: 31 Oct 2024
https://github.com/tjyuyao/cutex
PyCUDA based PyTorch Extension Made Easy
cuda customized extension operator pycuda pytorch
Last synced: 06 Jan 2025
https://github.com/microsoft/svirl
Svirl is GPU-accelerated solver of complex Ginzburg-Landau equations for superconductivity. It consists of time-dependent solver to describe vortex dynamics and free energy minimizer to accurately find static configurations.
cuda ginzburg-landau gpu python scientific-computing superconductivity vortex
Last synced: 04 Dec 2024
https://github.com/sevagh/zen
optimized realtime harmonic/percussive source separation using the GPU (NVIDIA CUDA) and CPU (Intel IPP)
audio cuda digital-signal-processing dsp real-time source-separation thrust
Last synced: 23 Dec 2024
https://github.com/bluescarni/rakau
C++17 N-body Barnes-Hut on heterogeneous hardware architectures
astronomy astrophyics astrophysical-simulation avx avx2 avx512 cpp17 cuda n-body n-body-simulator nbody nbody-gravity-simulation nbody-problem nbody-sim nbody-simulation rocm simd vectorization
Last synced: 27 Oct 2024
https://github.com/mxpv/nvml-go
golang wrapper for NVIDIA Management Library (NVML)
cuda golang golang-wrapper gpu nvidia nvidia-smi nvml
Last synced: 11 Oct 2024
https://github.com/prg-titech/ikra-cpp
C++ Library for Object-oriented Programming with Structure of Arrays Layout
Last synced: 18 Nov 2024
https://github.com/ShahriarRezghi/Spyker
High-performance Spiking Neural Networks Library Written From Scratch with C++ and Python Interfaces.
computational-neuroscience cuda cudnn cxx high-performance neuroscience onednn python r-stdp snn stdp
Last synced: 05 Nov 2024
https://github.com/minhhn2910/cuda-half2
Convert CUDA programs from float data type to half or half2 with SIMDization
Last synced: 26 Dec 2024
https://github.com/pkestene/ms-hpc-ai-gpu
resources pour le cours d'introduction à la programmation des GPUs du mastère spécialisé HPC-AI
cuda deep-learning gpu gpu-computing machine-learning physics-informed-neural-networks pinn pinns
Last synced: 18 Dec 2024
https://github.com/bruce-lee-ly/flash_attention_inference
Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.
cuda cutlass flash-attention flash-attention-2 gpu inference large-language-model llm mha multi-head-attention nvidia tensor-core
Last synced: 15 Nov 2024
https://github.com/TristanBilot/mlx-GCN
MLX implementation of GCN, with benchmark on MPS, CUDA and CPU (M1 Pro, M2 Ultra, M3 Max).
apple cuda deep-learning gnn mlx pytorch
Last synced: 30 Oct 2024
https://github.com/pinto0309/20220228_intel_deeplearning_day_hitnet_demo
Special Presentation Demo at Intel IoT Planet 2021 DeepLearning Day / インテル IoT プラネット 2021 DeepLearning Dayの特別講演の発表資料 https://www.intel.co.jp/content/www/jp/ja/now/iot-planet/deep-learning-day.html
cuda docker intel onnx openvino
Last synced: 23 Oct 2024
https://github.com/yhtang/graphdot
GPU-accelerated Marginalized Graph Kernel with customizable node and edge features; Gaussian process regression.
cheminformatics cuda gpu graph-algorithms machine-learning python
Last synced: 08 Nov 2024
https://github.com/kyegomez/neva
The open source implementation of "NeVA: NeMo Vision and Language Assistant"
artificial-intelligence cuda gpt4 multi-modal multi-modal-learning multithreading neva nvidia robotics
Last synced: 09 Nov 2024
https://github.com/bdhu/gpuinfo
A minimal command-line utility written in Rust for querying GPU status
command-line-tool cuda gpu nvidia nvidia-smi nvml rust rust-lang
Last synced: 07 Nov 2024
https://github.com/tree-sitter-grammars/tree-sitter-cuda
CUDA grammar for tree-sitter
Last synced: 30 Oct 2024
https://github.com/tylerjthomas9/rapids.jl
An unofficial Julia wrapper for the RAPIDS.ai ecosystem using PythonCall.jl
Last synced: 14 Jan 2025
https://github.com/bwohlberg/sporco-cuda
CUDA extension for the SPORCO project
convolutional-sparse-coding cuda gpu
Last synced: 09 Dec 2024
https://github.com/niftypet/nimpa
NiftyPET: Neuro-Image Manipulation, Processing and Analysis
analysis cuda gpu medical-imaging mr pet processing python
Last synced: 13 Jan 2025
https://github.com/iowar/kecmatch-gpu
Finds matching solidity function signatures using GPU
Last synced: 23 Nov 2024
https://github.com/dbklim/docker_image_with_cuda10_cudnn7
Dockerfiles and manual for easy build of docker image with CUDA10.X and cuDNN7.6 to run TensorFlow/PyTorch on the nvidia GPU in docker-container.
cuda cudnn docker docker-gpu docker-image docker-nvidia dockerfile gpu gpu-docker nvidia nvidia-docker pytorch pytorch-gpu tensorflow tensorflow-examples tensorflow-gpu tensorflow-gpu-docker torch
Last synced: 10 Oct 2024
https://github.com/denzp/rust-inline-cuda-tutorial
Let's jump into CUDA development with Rust
Last synced: 28 Oct 2024
https://github.com/rocm/hip-python
HIP Python Low-level Bindings
ai cuda cython gpu hip hpc interoperability ml python radeon-instinct-mi-series
Last synced: 07 Nov 2024
https://github.com/harrism/cuda_event_benchmark
Unit benchmarks of CUDA event APIs.
Last synced: 28 Oct 2024
https://github.com/jundaf2/eigenmha
Forward and backward Attention DNN operators implementationed by LibTorch, cuDNN, and Eigen.
backpropagation cuda cudnn cudnn-v8 dnn inference pytorch
Last synced: 15 Nov 2024
https://github.com/moldyn/clustering
Robust and stable clustering of molecular dynamics simulation trajectories.
biophysics clustering cpp cuda molecular-dynamics
Last synced: 11 Dec 2024
https://github.com/primitiv/primitiv-python
Python binding of primitiv.
cuda cython deep-learning framework gpu neural-network numpy opencl python
Last synced: 28 Oct 2024
https://github.com/anroshka/snake-ai
🐍 A Snake game AI that learns to play through Deep Q-Learning. Built with PyTorch and Pygame, featuring CUDA acceleration and real-time visualization of the learning process.
artificial-intelligence collaborate collaboration cuda deep-learning deep-q-learning dqn game-ai gpu-acceleration machine-learning neural-network pygame python pytorch q-learning reinforcement-learning snake-game
Last synced: 26 Jan 2025
https://github.com/uga-ssrl/SSRLCV
The UGA SSRL's Computer Vision Software Collection
3d-reconstruction computer-vision computer-vision-algorithms computervision cubesat cubesat-payload cubesatellite cuda gis gis-application jetson jetson-nano jetson-tx2 jetson-tx2i satellite-data
Last synced: 27 Oct 2024
https://github.com/pleiszenburg/gravitation
n-body-simulation performance test suite
benchmark cuda gpgpu gpgpu-computing high-performance-computing n-body numerical-computation openmp openmp-parallelization parallel-computing parallelization simd simd-parallelism test-suite
Last synced: 01 Dec 2024
https://github.com/linonetwo/moss-dockerfile
用于在 Docker 里运行复旦的 MOSS 语言模型,使用 GradIO 提供 WebUI。
ai chatglm chatgpt cuda deeplearning docker gpu moss pytorch
Last synced: 23 Dec 2024
https://github.com/OMEGAMAX10/Face-Mask-Detection-Using-YOLOv4
Because of the COVID-19 pandemic of 2020, more and more people are concerned with protecting themselves using masks, thus the need of software capable of monitoring whether the people are wearing masks or not. That is why I created a Python application using OpenCV (with CUDA support) based on the YOLOv4 algorithm, capable of monitoring the safety level of a space with video surveillance.
computer-vision covid-19 cuda cuda-support face-mask-detection gui gui-application masks monitoring opencv pyqt5 python safety-level video-surveillance wearing-masks yolov4 yolov4-algorithm
Last synced: 09 Nov 2024
https://github.com/willprice/flowty
The swiss army knife for extracting optical flow
brox cuda cython dense-inverse-search dis docker farneback lucas-kanade nvidia-docker opencv optic-flow optical-flow pyramidal tv-l1 tvl1 variational-refinement
Last synced: 14 Nov 2024
https://github.com/stellar-group/blaze_cuda
WIP · CUDA compatibility for Blaze · https://bitbucket.org/blaze-lib/blaze
blaze cpp cpp14 cuda gpu hpc linear-algebra metaprogramming
Last synced: 12 Nov 2024
https://github.com/ashvardanian/cpp-cuda-python-starter-kit
Parallel Computing starter project to build GPU & CPU kernels in CUDA & C++ and call them from Python without a single line of CMake using PyBind11
cmake cuda cuda-programming hip hpc matrix-multiplication openmp parallel-computing parallel-programming pybind pybind11 python starter-kit starter-template tutorial
Last synced: 28 Oct 2024
https://github.com/gunrock/mini
mini is mini
cuda gpu graph-primitives gunrock mini-gunrock traversal-operators workload-mapping-strategies
Last synced: 11 Nov 2024
https://github.com/matthewfeickert/nvidia-gpu-ml-library-test
Simple tests for JAX, PyTorch, and TensorFlow to test if the installed NVIDIA drivers are being properly picked up
cuda cudnn gpu jax nvidia pytorch setup tensorflow torch
Last synced: 10 Jan 2025
https://github.com/yujun-shi/cfmatting_cuda_mkl
A cuda & mkl implementation of closed-form matting
Last synced: 14 Nov 2024
https://github.com/albertstarfield/project-zephyrine
Introducing Project Zephyrine: Elevating Your Interaction Plug and Play, and Employing GPU Acceleration within a Modernized Automata Local Graphical User Interface.
chatgpt cuda electron falcon gemma ggml gguf gpt-3 gui llama llama-2 llama-3 llm metal opencl
Last synced: 30 Nov 2024
https://github.com/ahmetfurkandemir/nvidia-gpu-benchmark
NVIDIA GPU benchmark
aws c colab-notebook cpp cuda cuda-programming gpu gpu-computing gpu-programming linux nvidia nvidia-gpu tesla
Last synced: 16 Nov 2024
https://github.com/mnicely/nvml_examples
Examples showing how to utilize the NVML library for GPU monitoring
Last synced: 15 Oct 2024
https://github.com/jtschwar/tomo_tv
C++ library for Regularized 2D and 3D Tomography Reconstructions.
3d-reconstruction cuda inverse-problems regularization tomography
Last synced: 10 Nov 2024
https://github.com/triagemd/tensorflow-builds
Tensorflow binaries and Docker images compiled with GPU support and CPU optimizations.
bazel cuda cudnn docker gpu machine-learning nvidia python tensorflow tensorflow-serving
Last synced: 20 Nov 2024
https://github.com/krassowski/gsea-api
Pandas API for multiple Gene Set Enrichment Analysis implementations in Python (GSEApy, cudaGSEA, GSEA)
bioinformatics cuda enrichment gene-set-enrichment gene-sets gsea pandas pathway-analysis python3 transcriptomics
Last synced: 13 Jan 2025
https://github.com/yashassamaga/convolutionbuildingblocks
GEMM and Winograd based convolutions using CUTLASS
convolution cuda cutlass deep-learning
Last synced: 03 Dec 2024