Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
CUDA
CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.
- GitHub: https://github.com/topics/cuda
- Wikipedia: https://en.wikipedia.org/wiki/CUDA
- Created by: Nvidia
- Released: June 23, 2007
- Related Topics: nvcc,
- Last updated: 2025-01-31 00:06:47 UTC
- JSON Representation
https://github.com/openmlsys/openmlsys-cuda
Tutorials for writing high-performance GPU operators in AI frameworks.
Last synced: 16 Nov 2024
https://github.com/pinto0309/dmhead
Dual model head pose estimation. Fusion of SOTA models. 360° 6D HeadPose detection. All pre-processing and post-processing are fused together, allowing end-to-end processing in a single inference.
6d cuda head-pose-estimation headpose-detection headpose-estimation models onnx tensorrt
Last synced: 09 Nov 2024
https://github.com/dr-noob/peakperf
Achieve peak performance on x86 CPUs and NVIDIA GPUs
assembly avx cpu cpu-frequency cpu-microarchitecture cuda gflop gpu intrinsics microarchitecture microbenchmark nvidia performance
Last synced: 30 Jan 2025
https://paragroup.github.io/WindFlow/
A C++17 Data Stream Processing Parallel Library for Multicores and GPUs
cuda gpu gpu-acceleration gpu-computing gpu-programming multi-core multicore multithreading parallel-computing parallel-patterns parallel-programming parallelism sliding-windows stream stream-api stream-processing streaming streaming-api streaming-data streams
Last synced: 18 Nov 2024
https://github.com/xmartlabs/cuda-calculator
Online CUDA Occupancy Calculator
cuda gpgpu gpu gpu-computing gpu-kernels gpu-programming kernel nvidia occupancy
Last synced: 23 Oct 2024
https://github.com/sh1ng/arboretum
Gradient Boosting powered by GPU(NVIDIA CUDA)
arboretum cuda gpu gradient-boosting gradient-boosting-machine machine-learning python
Last synced: 16 Nov 2024
https://github.com/larc/gproshan
geometry processing and shape analysis framework
computational-geometry cpp cuda dictionary-learning geometry-processing opengl shape-analysis sparse-coding
Last synced: 08 Nov 2024
https://github.com/Dr-Noob/peakperf
Achieve peak performance on x86 CPUs and NVIDIA GPUs
assembly avx cpu cpu-frequency cpu-microarchitecture cuda gflop gpu intrinsics microarchitecture microbenchmark nvidia performance
Last synced: 09 Nov 2024
https://github.com/ztxtech/Time-Evidence-Fusion-Network
Official implementation of "Time Evidence Fusion Network: Multi-source View in Long-Term Time Series Forecasting" (https://arxiv.org/abs/2405.06419)
cuda deep-learning machine-learning macos neural-network neural-networks pytorch time-series time-series-analysis time-series-forecasting time-series-prediction uestc
Last synced: 02 Nov 2024
https://github.com/open-atmos/pysdm
Pythonic particle-based (super-droplet) warm-rain/aqueous-chemistry cloud microphysics package with box, parcel & 1D/2D prescribed-flow examples in Python, Julia and Matlab
atmospheric-modelling atmospheric-physics cuda gpu gpu-computing monte-carlo-simulation numba nvrtc particle-system physics-simulation pint pypi-package python research simulation thrust
Last synced: 29 Jan 2025
https://github.com/elftausend/custos
A minimal OpenCL, CUDA, Vulkan and host CPU array manipulation engine / framework.
array-manipulations autograd automatic-differentiation cpu cuda cuda-support custos framework gpu lazy-evaluation no-std opencl rust vulkan wgsl
Last synced: 10 Jan 2025
https://github.com/jpuigcerver/pytorch-baidu-ctc
PyTorch bindinga for Baidu's Warp-CTC
Last synced: 25 Nov 2024
https://github.com/saddam213/llamastack
ASP.NET Core Web, WebApi & WPF implementations for LLama.cpp & LLamaSharp
alpaca chatgpt cuda huggingface llama llama2 llamacpp llamasharp llm
Last synced: 20 Jan 2025
https://github.com/fynv/thrustrtc
CUDA tool set for non-C++ languages that provides similar functionality like Thrust, with NVRTC at its core.
Last synced: 06 Nov 2024
https://github.com/goldsborough/k-means
Code accompanying my blog post on k-means in Python, C++ and CUDA
cpp cuda k-means machine-learning parallel python
Last synced: 29 Jan 2025
https://github.com/lukeyeager/cmake-cuda-example
Example of how to use CUDA with CMake >= 3.8
Last synced: 29 Oct 2024
https://github.com/dakenf/stable-diffusion-nodejs
GPU-accelerated javascript runtime for StableDiffusion. Uses modified ONNX runtime to support CUDA and DirectML.
cuda directml nodejs stable-diffusion typescript
Last synced: 08 Nov 2024
https://github.com/pkestene/ramsesgpu
Astrophysics MHD simulation code optimized for large cluster of GPU
astrophysics cea cfd conservation-law cuda euler-equations finite-volume gpu gpu-computing hdf5 hpc kelvin-helmholtz-instability magnetohydrodynamics mhd muscl-hancock parallel-computing pnetcdf rayleigh-taylor shearing-box turbulence
Last synced: 23 Jan 2025
https://github.com/NickKarpowicz/LightwaveExplorer
An efficient, user-friendly solver for nonlinear light-matter interaction
c-plus-plus cuda nonlinear-optics oneapi optics-simulation simulation sycl
Last synced: 05 Nov 2024
https://github.com/open-atmos/PySDM
Pythonic particle-based (super-droplet) warm-rain/aqueous-chemistry cloud microphysics package with box, parcel & 1D/2D prescribed-flow examples in Python, Julia and Matlab
atmospheric-modelling atmospheric-physics cuda gpu gpu-computing monte-carlo-simulation numba nvrtc particle-system physics-simulation pint pypi-package python research simulation thrust
Last synced: 05 Nov 2024
https://github.com/brickray/gpu-pathtracer
physically based path tracer on gpu
cuda gpu pathtracing raytracing tracing
Last synced: 14 Nov 2024
https://github.com/rokibulislaam/colab-ffmpeg-cuda
FFmpeg build with CUDA support for Linux (especially for Google Colab)
colab-notebook cuda ffmpeg ffmpeg-installer h264 h265 hevc-encoder nvenc ubuntu1804
Last synced: 08 Nov 2024
https://github.com/tomrunia/pytorchsteerablepyramid
PyTorch implementation of the Complex Steerable Pyramid
batch computer-vision cuda image-processing mkl pyramid pytorch
Last synced: 13 Nov 2024
https://github.com/loeeeee/immich-in-lxc
Install Immich in LXC with optional CUDA support
bare-metal cuda guide immich install-script lxc machine-learning proxmox-ve ubuntu
Last synced: 20 Jan 2025
https://github.com/DefTruth/ffpa-attn-mma
📚[WIP] FFPA: Yet antother Faster Flash Prefill Attention with O(1)⚡️GPU SRAM complexity for headdim > 256, 1.8x~3x↑🎉faster vs SDPA EA.
attention cuda flash-attention mlsys sdpa tensor-cores
Last synced: 27 Jan 2025
https://github.com/denzp/rust-ptx-builder
Convenient `build.rs` helper for NVPTX crates
Last synced: 27 Oct 2024
https://github.com/gunrock/loops
🎃 GPU load-balancing library for regular and irregular computations.
cuda gpu gpu-computing hpc load-balancing parallel
Last synced: 11 Nov 2024
https://github.com/emptysoal/cuda-image-preprocess
Speed up image preprocess with cuda when handle image or tensorrt inference
cnn cuda cuda-demo cuda-kernels cuda-programming deep-learning image-processing tensorrt
Last synced: 06 Dec 2024
https://github.com/jeng1220/openacc_fortran_examples
Simple OpenACC Fortran Examples
Last synced: 28 Oct 2024
https://github.com/rbaygildin/learn-gpgpu
Algorithms implemented in CUDA + resources about GPGPU
cublas cuda curand gpgpu gpu gpu-computing image-processing nvidia opencl parallel-computing pycuda
Last synced: 19 Nov 2024
https://github.com/par4all/par4all
Par4All is an automatic parallelizing and optimizing compiler (workbench) for C and Fortran sequential programs
abstract-interpretation automatic-parallelization c99 cuda fortran interprocedural opencl parallelization polyhedral-model
Last synced: 12 Oct 2024
https://github.com/ctuning/ctuning-programs
Collective Knowledge extension with unified and customizable benchmarks (with extensible JSON meta information) to be easily integrated with customizable and portable Collective Knowledge workflows. You can easily compile and run these benchmarks using different compilers, environments, hardware and OS (Linux, MacOS, Windows, Android). More info:
c collaborative-benchmarking collaborative-optimization collective-knowledge common-benchmarks cpp crowd-benchmarking crowd-tuning cuda customizable-benchmarking fortran json-api json-metadata open-benchmarks opencl reproducible-research reproducible-workflows
Last synced: 13 Nov 2024
https://github.com/Par4All/par4all
Par4All is an automatic parallelizing and optimizing compiler (workbench) for C and Fortran sequential programs
abstract-interpretation automatic-parallelization c99 cuda fortran interprocedural opencl parallelization polyhedral-model
Last synced: 09 Nov 2024
https://github.com/wizyoung/optical-flow-gpu-docker
Compute dense optical flow using TV-L1 algorithm with NVIDIA GPU acceleration.
Last synced: 17 Nov 2024
https://github.com/bruce-lee-ly/cuda_hgemv
Several optimization methods of half-precision general matrix vector multiplication (HGEMV) using CUDA core.
cublas cuda cuda-core gemm gemv gpu hgemm hgemv matrix-multiply nvidia tensor-core
Last synced: 19 Dec 2024
https://github.com/khrylx/dsgpuraytracing
A GPU-based ray tracer using CUDA
Last synced: 21 Nov 2024
https://github.com/Bruce-Lee-LY/cuda_hgemv
Several optimization methods of half-precision general matrix vector multiplication (HGEMV) using CUDA core.
cublas cuda cuda-core gemm gemv gpu hgemm hgemv matrix-multiply nvidia tensor-core
Last synced: 19 Nov 2024
https://github.com/1ytic/warp-rna
Recurrent Neural Aligner
cuda forward-backward rna rnn-transducer
Last synced: 15 Dec 2024
https://github.com/mantasu/glasses-detector
Glasses detection, classification and segmentation
classification computer-vision cuda detection detector eyeglasses eyes frames glasses gpu lenses mps pytorch segmentation sunglasses
Last synced: 01 Nov 2024
https://github.com/ingonyama-zk/fast-danksharding
Danksharding Builder with GPU acceleration
Last synced: 14 Nov 2024
https://github.com/jefflarkin/openacc-interoperability
Interoperability examples for OpenACC.
Last synced: 05 Dec 2024
https://github.com/adityashrm21/book-recommender-system-rbm
A book recommender system created using simple Restricted Boltzmann Machines in TensorFlow
book-recommender books cuda geoffrey-hinton hopfield-network neural-networks python3 rbm recommender-system restricted-boltzmann-machines tensorflow
Last synced: 11 Nov 2024
https://github.com/kevinzakka/learn-cuda
Learning some parallel programming with CUDA
Last synced: 28 Oct 2024
https://github.com/goldbattle/libelas-gpu
Implementation of LIBELAS in cuda.
cpu cuda depth-maps gpu libelas libelas-gpu
Last synced: 06 Nov 2024
https://github.com/stellar-group/octotiger
Astrophysics program simulating the evolution of star systems based on the fast multipole method on adaptive Octrees
astrophysics cuda cuda-kernels hpx kokkos simd stellar-mergers sycl
Last synced: 12 Nov 2024
https://github.com/abraham-ai/eden
Eden converts your python function into a hosted endpoint with minimal changes to your existing code :mage_man:
celery cuda fastapi python redis-client task-queue
Last synced: 09 Oct 2024
https://github.com/STEllAR-GROUP/octotiger
Astrophysics program simulating the evolution of star systems based on the fast multipole method on adaptive Octrees
astrophysics cuda cuda-kernels hpx kokkos simd stellar-mergers sycl
Last synced: 05 Nov 2024
https://github.com/enp1s0/ozimmu
FP64 equivalent GEMM via Int8 Tensor Cores using the Ozaki scheme
cuda gemm mixed-precision tensorcore tensorcores
Last synced: 06 Nov 2024
https://github.com/govertb/GPUGraphLayout
An experimental GPU accelerated implementation of ForceAtlas2
cuda forceatlas2 gephi graph-algorithms graph-layout social-network-analysis visualization
Last synced: 05 Nov 2024
https://github.com/kibae/pg_onnx
pg_onnx: ONNX Runtime integrated with PostgreSQL. Perform ML inference with data in your database.
ai contributions-welcome cuda deep-learning inference machine-learning onnx onnxruntime postgresql postgresql-extension
Last synced: 21 Nov 2024
https://github.com/lucasdelimanogueira/PyNorch
Recreating PyTorch from scratch (C/C++, CUDA and Python, with GPU support and automatic differentiation!)
c cuda deep-learning neural-network python pytorch
Last synced: 08 Jan 2025
https://github.com/enp1s0/ozIMMU
FP64 equivalent GEMM via Int8 Tensor Cores using the Ozaki scheme
cuda gemm mixed-precision tensorcore tensorcores
Last synced: 05 Nov 2024
https://github.com/luisagroup/luisa-compute-rs
Rust frontend to LuisaCompute and more!
computer-graphics cuda differentiable-programming differentiable-rendering directx dsl dx gpu gpu-programming graphics raytracing rendering rust shading-language vulkan
Last synced: 19 Dec 2024
https://github.com/chiehpower/Setup-deeplearning-tools
Set up CI in DL/ cuda/ cudnn/ TensorRT/ onnx2trt/ onnxruntime/ onnxsim/ Pytorch/ Triton-Inference-Server/ Bazel/ Tesseract/ PaddleOCR/ NVIDIA-docker/ minIO/ Supervisord on AGX or PC from scratch.
agx ci cuda cudnn deep-learning docker installation minio nvidia onnx-simplifier onnx2trt onnxruntime paddleocr pytorch supervisord tensorrt tensorrt-inference-server tesseract-ocr triton-inference-server triton-server
Last synced: 28 Oct 2024
https://github.com/lucidrains/autoregressive-linear-attention-cuda
CUDA implementation of autoregressive linear attention, with all the latest research findings
artificial-intelligence attention-mechanisms cuda deep-learning linear-attention
Last synced: 22 Oct 2024
https://github.com/yehengchen/ubuntu-deep-learning-environment-setup
Guide to installing Tensorflow with NVIDIA GPU and Deep learning enviroment - Nvidia Drivers/cuda/cuDNN/tensorflow-gpu/中文文档
cuda cudnn deep-learning nvidia-gpu tensorflow tensorflow-gpu ubuntu
Last synced: 30 Nov 2024
https://github.com/Natsu-Akatsuki/RangeNetTrt8
tensorrt8 && cuda && libtorch implementation of rangenet++
cuda libtorch semantic-segmentation tensorrt
Last synced: 27 Oct 2024
https://github.com/AstroAccelerateOrg/astro-accelerate
AstroAccelerate is a many-core accelerated software package for processing time-domain radio-astronomy data.
Last synced: 02 Nov 2024
https://github.com/js1010/cusim
Superfast CUDA implementation of Word2Vec and Latent Dirichlet Allocation (LDA)
cuda gensim gpu lda topic-modeling w2v word-embedding
Last synced: 02 Nov 2024
https://github.com/autodesk/neon
Multi-GPU Framework for Voxel Grid Computations
cuda gpu gpu-acceleration grid hpc lbm parallel parallel-computing
Last synced: 19 Dec 2024
https://github.com/deftruth/cuhgemm-py
⚡️Write HGEMM from scratch using Tensor Cores with WMMA, MMA and CuTe API, achieve peak⚡️ performance
Last synced: 09 Jan 2025
https://github.com/ProjectPhysX/PTXprofiler
A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis.
cuda gpu gpu-acceleration gpu-computing gpu-programming hpc nvidia nvidia-cuda nvidia-gpu opencl profiler ptx ptx-utils roofline-model sycl
Last synced: 05 Nov 2024
https://github.com/projectphysx/ptxprofiler
A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis.
cuda gpu gpu-acceleration gpu-computing gpu-programming hpc nvidia nvidia-cuda nvidia-gpu opencl profiler ptx ptx-utils roofline-model sycl
Last synced: 08 Nov 2024
https://github.com/abhisheknair10/llama3.cu
Lightweight Llama 3 8B Inference Engine in CUDA C
Last synced: 21 Jan 2025
https://github.com/weft/warp
continuous energy monte carlo neutron transport in general geometries on GPUs
carlo cuda gpu monte monte-carlo neutron transport
Last synced: 05 Nov 2024
https://github.com/cair/pytsetlinmachinecuda
Massively Parallel and Asynchronous Architecture for Logic-based AI
classification convolution cuda gpu learning-automata logic-based-artificial-intelligence regression tsetlin-machine
Last synced: 10 Dec 2024
https://github.com/lwYeo/SoliditySHA3Miner
All-in-one mixed multi-GPU (nVidia, AMD, Intel) & CPU miner solves proof of work to mine supported EIP918 tokens in a single instance (with API).
0xbitcoin amdminer cpuminer cuda ethos gpu-miner gpu-mining gpumining hiveos igpu linux miner nvidia-miner opencl solo-mining windows-10
Last synced: 13 Nov 2024
https://github.com/shredengineer/magneticalc
MagnetiCalc calculates the magnetic field of arbitrary coils.
coil cuda current education engineering field-calculation flux-density gui inductance interactive jit linux magnetic-field magnetostatics metric python simulation-modeling vector-potential visualization wire
Last synced: 02 Dec 2024
https://github.com/r00tman/eventhands
Real-Time Neural 3D Hand Pose Estimation from an Event Stream [ICCV 2021]
computer-vision cuda dataset deep-learning event-camera hand-pose hand-pose-estimation hand-tracking iccv2021 mano opengl pytorch smpl
Last synced: 09 Dec 2024
https://github.com/andi611/apriori-and-eclat-frequent-itemset-mining
Implementation of the Apriori and Eclat algorithms, two of the best-known basic algorithms for mining frequent item sets in a set of transactions, implementation in Python.
apriori apriori-algorithm cuda data-mining data-mining-algorithms eclat eclat-algorithm frequent-itemset-mining frequent-itemsets frequent-pattern-mining gcc gpu gpu-acceleration gpu-programming plot pycuda python transaction transactions
Last synced: 07 Nov 2024
https://github.com/deftruth/ffpa-attn-mma
📚[WIP] FFPA: Yet antother Faster Flash Prefill Attention with O(1)🎉GPU SRAM complexity for headdim > 256, 1.5x~2x🎉faster vs SDPA EA.
attention cuda flash-attention mlsys sdpa tensor-cores
Last synced: 13 Jan 2025
https://github.com/harrism/ranger
Generate simple index ranges in C++ and CUDA C++
Last synced: 28 Oct 2024
https://github.com/pkestene/euler2d_kokkos
Simple 2d finite volume solver for Euler equations using c++ kokkos library
cea cfd cpp cuda euler finite-volume gpu gpu-computing hydrodynamics kokkos miniapp multithreading openmp parallelism parallelization performance-portability
Last synced: 18 Dec 2024
https://github.com/sskorol/vosk-api-gpu
Vosk ASR Docker images with GPU for Jetson boards, PCs, M1 laptops and GPC
asr cuda docker gcp gpu jetson jetson-nano jetson-xavier-nx m1 nvidia nvidia-docker vosk vosk-api
Last synced: 28 Oct 2024
https://github.com/termoshtt/link_cuda_kernel
HowTo: Compile CUDA with nvcc, and link to Rust
Last synced: 10 Nov 2024
https://github.com/mravanelli/pytorch_MLP_for_ASR
This code implements a basic MLP for speech recognition. The MLP is trained with pytorch, while feature extraction, alignments, and decoding are performed with Kaldi. The current implementation supports dropout and batch normalization. An example for phoneme recognition using the standard TIMIT dataset is provided.
asr cuda deep-learning deep-neural-networks feedforward-neural-network kaldi kaldi-asr mlp multilayer-perceptron neural-networks python pytorch speech-recognition timit
Last synced: 27 Nov 2024
https://github.com/mravanelli/pytorch_mlp_for_asr
This code implements a basic MLP for speech recognition. The MLP is trained with pytorch, while feature extraction, alignments, and decoding are performed with Kaldi. The current implementation supports dropout and batch normalization. An example for phoneme recognition using the standard TIMIT dataset is provided.
asr cuda deep-learning deep-neural-networks feedforward-neural-network kaldi kaldi-asr mlp multilayer-perceptron neural-networks python pytorch speech-recognition timit
Last synced: 02 Dec 2024
https://github.com/gabrielscabrera/nbody
GPU-accelerated N-Body particle simulator with visualizer.
cuda cuda-support nbody nbody-gravity nbody-gravity-simulation nbody-sim nbody-simulation nbody-simulations particle-system particles particles-animations simulations sphere
Last synced: 02 Nov 2024
https://github.com/andravin/spio
Efficient CUDA kernels for training convolutional neural networks with PyTorch.
convolutional-neural-networks cuda pytorch
Last synced: 22 Nov 2024
https://github.com/gangliao/VS-Code-Cuda
support cuda grammars in Visual Studio Code
cuda vs vs-code vscode-extension
Last synced: 23 Oct 2024
https://github.com/gangliao/vs-code-cuda
support cuda grammars in Visual Studio Code
cuda vs vs-code vscode-extension
Last synced: 02 Dec 2024
https://github.com/wdmapp/gtensor
GTensor is a multi-dimensional array C++14 header-only library for hybrid GPU development.
cpp cpp14 cuda gpu hacktoberfest rocm sycl
Last synced: 05 Nov 2024
https://github.com/davidalgis/interopunitycuda
Demonstrate interoperability between Unity Engine and CUDA
cpp cuda dx11 gpu gpu-acceleration native-plugin opengl unity unity3d
Last synced: 10 Nov 2024
https://github.com/pkestene/euler_kokkos
Compressible hydro and magneto-hydrodynamics (2nd order Godunov) implemented with MPI+Kokkos
cea cfd cmake cpp cuda finite-volume finite-volume-method fluid-dynamics gpu kokkos magnetohydrodynamics mpi parallel-computing parallelism performance-portability
Last synced: 18 Dec 2024
https://github.com/star-hengxing/cs149-xmake
CS149 xmake version
cuda hpc ispc parrallel-computing xmake
Last synced: 24 Oct 2024
https://github.com/AkashiSN/ffmpeg-docker
ffmpeg build in docker
aribb24 crossbuild cuda docker ffmpeg intel-qsv mingw64 vaapi
Last synced: 25 Nov 2024
https://github.com/pinto0309/facemesh_onnx_tensorrt
Verify that the post-processing merged into FaceMesh works correctly. The object detection model can be anything other than BlazeFace. YOLOv4 and FaceMesh committed to this repository have modified post-processing.
cuda facemesh onnx python tensorrt
Last synced: 22 Oct 2024
https://github.com/pkestene/euler2d_cudafortran
2nd order Godunov solver for 2d Euler equations written in CUDA Fortran and stdpar (standard paralelism)
cea conservation-laws cuda cuda-fortran euler-equations fortran gpu gpu-computing hydrodynamics nvfortran nvhpc stdpar
Last synced: 18 Dec 2024