Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

CUDA

CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs.

https://github.com/cudamat/cudamat

Python module for performing basic dense linear algebra computations on the GPU using CUDA.

cuda linear-algebra python

Last synced: 25 Oct 2024

https://github.com/hpcaitech/fastfold

Optimizing AlphaFold Training and Inference on GPU Clusters

alphafold2 cuda evoformer gpu habana-gaudi parallelism protein-folding protein-structure pytorch

Last synced: 26 Jan 2025

https://github.com/cern/tigre

TIGRE: Tomographic Iterative GPU-based Reconstruction Toolbox

cuda gpus image-reconstruction matlab python tigre tomography toolbox x-ray

Last synced: 17 Jan 2025

https://github.com/hpcaitech/FastFold

Optimizing AlphaFold Training and Inference on GPU Clusters

alphafold2 cuda evoformer gpu habana-gaudi parallelism protein-folding protein-structure pytorch

Last synced: 12 Nov 2024

https://github.com/gprmax/gprmax

gprMax is open source software that simulates electromagnetic wave propagation using the Finite-Difference Time-Domain (FDTD) method for numerical modelling of Ground Penetrating Radar (GPR)

antenna cuda electromagnetic fdtd gpr gpu modelling nvidia python simulation soil

Last synced: 25 Jan 2025

https://github.com/gprMax/gprMax

gprMax is open source software that simulates electromagnetic wave propagation using the Finite-Difference Time-Domain (FDTD) method for numerical modelling of Ground Penetrating Radar (GPR)

antenna cuda electromagnetic fdtd gpr gpu modelling nvidia python simulation soil

Last synced: 17 Nov 2024

https://github.com/laugh12321/TensorRT-YOLO

🚀 你的YOLO部署神器。TensorRT Plugin、CUDA Kernel、CUDA Graphs三管齐下,享受闪电般的推理速度。| Your YOLO Deployment Powerhouse. With the synergy of TensorRT Plugins, CUDA Kernels, and CUDA Graphs, experience lightning-fast inference speeds.

cuda cuda-graph cuda-kernels cuda-programming detection onnx ppyoloe tensorrt yolov10 yolov3 yolov5 yolov6 yolov7 yolov8 yolov9

Last synced: 27 Oct 2024

https://github.com/stochasticai/x-stable-diffusion

Real-time inference for Stable Diffusion - 0.88s latency. Covers AITemplate, nvFuser, TensorRT, FlashAttention. Join our Discord communty: https://discord.com/invite/TgHXuSJEk6

aitemplate automl cuda docker inference notebook nvfuser onnx onnxruntime pytorch stable-diffusion tensorrt

Last synced: 26 Jan 2025

https://github.com/tencent/forward

A library for high performance deep learning inference on NVIDIA GPUs.

cuda deep-learning forward gpu inference inference-engine keras neural-network onnx pytorch tensorflow tensorrt

Last synced: 26 Jan 2025

https://github.com/nvidia/nvbench

CUDA Kernel Benchmarking Library

benchmark cuda cuda-kernels gpu kernel-benchmark nvidia performance

Last synced: 25 Jan 2025

https://github.com/Tencent/Forward

A library for high performance deep learning inference on NVIDIA GPUs.

cuda deep-learning forward gpu inference inference-engine keras neural-network onnx pytorch tensorflow tensorrt

Last synced: 09 Nov 2024

https://github.com/zhihu/cubert

Fast implementation of BERT inference directly on NVIDIA (CUDA, CUBLAS) and Intel MKL

bert cuda deep-learning inference mkl predict tensorflow transformer

Last synced: 27 Jan 2025

https://github.com/kwea123/gaussian_splatting_notes

A detailed formulae explanation on gaussian splatting

cuda gaussian-splatting

Last synced: 26 Jan 2025

https://github.com/nvidia/jitify

A single-header C++ library for simplifying the use of CUDA Runtime Compilation (NVRTC).

cpp cuda jit-compilation nvrtc runtime-compilation single-header

Last synced: 25 Jan 2025

https://github.com/zhihu/cuBERT

Fast implementation of BERT inference directly on NVIDIA (CUDA, CUBLAS) and Intel MKL

bert cuda deep-learning inference mkl predict tensorflow transformer

Last synced: 02 Nov 2024

https://github.com/NVIDIA/nvbench

CUDA Kernel Benchmarking Library

benchmark cuda cuda-kernels gpu kernel-benchmark nvidia performance

Last synced: 19 Nov 2024

https://github.com/gridhead/nvidia-auto-installer-for-fedora-linux

A CLI tool which lets you install proprietary NVIDIA drivers and much more easily on Fedora Linux (32 or above and Rawhide)

cuda fedora hacktoberfest nvidia optimus rpmfusion

Last synced: 25 Jan 2025

https://github.com/openhackathons-org/gpubootcamp

This repository consists for gpu bootcamp material for HPC and AI

ai4hpc cuda data-science deep-learning deepstream gpu hpc machine-learning mpi openacc openmp rapidsai

Last synced: 30 Oct 2024

https://github.com/Kaixhin/dockerfiles

Compilation of Dockerfiles with automated builds enabled on the Docker Registry

cuda deep-learning docker dockerfiles machine-learning vnc

Last synced: 28 Oct 2024

https://github.com/gorgonia/cu

package cu provides an idiomatic interface to the CUDA Driver API.

cuda cuda-driver-api go golang

Last synced: 26 Jan 2025

https://github.com/salesforce/warp-drive

Extremely Fast End-to-End Deep Multi-Agent Reinforcement Learning Framework on a GPU (JMLR 2022)

cuda deep-learning gpu high-throughput multiagent-reinforcement-learning numba pytorch reinforcement-learning

Last synced: 24 Jan 2025

https://github.com/mumax/3

GPU-accelerated micromagnetic simulator

cuda finite-difference-time-domain go micromagnetics scientific-computing

Last synced: 31 Oct 2024

https://github.com/cnstark/pytorch-docker

Pure Pytorch Docker Images.

centos cuda deep-learning docker nvidia pytorch ubuntu

Last synced: 25 Jan 2025

https://github.com/huggingface/large_language_model_training_playbook

An open collection of implementation tips, tricks and resources for training large language models

cuda large-language-models llm nccl nlp performance python pytorch scalability troubleshooting

Last synced: 11 Nov 2024

https://github.com/termoshtt/accel

(Mirror of GitLab) GPGPU Framework for Rust

cuda gpgpu rust-lang

Last synced: 25 Jan 2025

https://github.com/petercunha/Pine

:evergreen_tree: Aimbot powered by real-time object detection with neural networks, GPU accelerated with Nvidia. Optimized for use with CS:GO.

aimbot csgo cuda darknet detection fortnite fps game-hacking hacking neural-network neural-networks nvidia object-detection opencl opencv overwatch pine python yolo yolov3

Last synced: 08 Nov 2024

https://github.com/cryinkfly/solidworks-for-linux

This is a project, where I give you a way to use SOLIDWORKS on Linux!

archlinux cuda fedora international linux linuxmint manjaro nvidia opengl opensuse ubuntu wine

Last synced: 26 Jan 2025

https://github.com/MegviiRobot/MegBA

MegBA: A GPU-Based Distributed Library for Large-Scale Bundle Adjustment

bundleadjustment cuda distributed gpu-acceleration graph-optimization high-performance

Last synced: 14 Nov 2024

https://github.com/petercunha/pine

:evergreen_tree: Aimbot powered by real-time object detection with neural networks, GPU accelerated with Nvidia. Optimized for use with CS:GO.

aimbot csgo cuda darknet detection fortnite fps game-hacking hacking neural-network neural-networks nvidia object-detection opencl opencv overwatch pine python yolo yolov3

Last synced: 06 Nov 2024

https://github.com/patwie/tensorflow-cmake

TensorFlow examples in C, C++, Go and Python without bazel but with cmake and FindTensorFlow.cmake

c cmake cpp cuda deep-learning golang inference opencv tensorflow tensorflow-cc tensorflow-cmake tensorflow-examples tensorflow-gpu

Last synced: 20 Jan 2025

https://github.com/tlkh/ai-lab

All-in-one AI container for rapid prototyping

cuda data-science deep-learning docker jupyter nvidia pytorch tensorflow

Last synced: 26 Jan 2025

https://github.com/ccsb-scripps/autodock-gpu

AutoDock for GPUs and other accelerators

autodock4 cuda gpu-computing molecular-docking multicore-cpu opencl

Last synced: 25 Jan 2025

https://zielon.github.io/insta/

INSTA - Instant Volumetric Head Avatars [CVPR2023]

3dmm avatars cuda flame instant-ngp nerf neural-network volumetric-rendering

Last synced: 30 Oct 2024

https://github.com/rapidsai/rmm

RAPIDS Memory Manager

cuda memory-allocation memory-management rapids

Last synced: 25 Jan 2025

https://github.com/cloudcores/cuassembler

An unofficial cuda assembler, for all generations of SASS, hopefully :)

assembler cuda nvidia sass

Last synced: 20 Jan 2025

https://github.com/arrayfire/arrayfire-python

Python bindings for ArrayFire: A general purpose GPU library.

arrayfire cuda gpgpu gpu hpc opencl python python-bindings

Last synced: 03 Nov 2024

https://github.com/uncomplicate/deep-diamond

A fast Clojure Tensor & Deep Learning library

clojure cuda deep-learning deep-neural-networks dnnl gpu java nvidia

Last synced: 27 Jan 2025

https://github.com/alicevision/popsift

PopSift is an implementation of the SIFT algorithm in CUDA.

computer-vision cuda feature-extraction gpu image-processing sift

Last synced: 27 Jan 2025

https://github.com/toverainc/willow-inference-server

Open source, local, and self-hosted highly optimized language inference server supporting ASR/STT, TTS, and LLM across WebRTC, REST, and WS

cuda deep-learning llama llm privacy speech-recognition speech-to-text text-to-speech vicuna webrtc whisper willow

Last synced: 25 Jan 2025

https://github.com/cloudcores/CuAssembler

An unofficial cuda assembler, for all generations of SASS, hopefully :)

assembler cuda nvidia sass

Last synced: 28 Oct 2024

https://github.com/shi-labs/natten

Neighborhood Attention Extension. Bringing attention to a neighborhood near you!

cuda neighborhood-attention pytorch

Last synced: 24 Jan 2025

https://github.com/DerryHub/BEVFormer_tensorrt

BEVFormer inference on TensorRT, including INT8 Quantization and Custom TensorRT Plugins (float/half/half2/int8).

bevformer cuda int8-inference pytorch quantization tensorrt-plugins

Last synced: 28 Oct 2024

https://github.com/JuliaGPU/CUDAnative.jl

Julia support for native CUDA programming

cuda cuda-toolkit julia julia-library

Last synced: 29 Nov 2024

https://github.com/xmrig/xmrig-cuda

NVIDIA CUDA plugin for XMRig miner

cryptonight cuda randomx xmrig

Last synced: 26 Jan 2025

https://github.com/libocca/occa

Portable and vendor neutral framework for parallel programming on heterogeneous platforms.

c cpp cuda dpcpp fortran gpgpu gpu hip hpc jit metal multithreading oneapi opencl openmp sycl

Last synced: 05 Nov 2024

https://github.com/vectorch-ai/ScaleLLM

A high-performance inference system for large language models, designed for production environments.

cuda efficiency gpu inference llama llama3 llm llm-inference model performance production serving speculative transformer

Last synced: 16 Nov 2024

https://github.com/vectorch-ai/scalellm

A high-performance inference system for large language models, designed for production environments.

cuda efficiency gpu inference llama llama3 llm llm-inference model performance production serving speculative transformer

Last synced: 24 Jan 2025

https://github.com/dfm/extending-jax

Extending JAX with custom C++ and CUDA code

cuda jax xla

Last synced: 26 Jan 2025

https://github.com/osai-ai/tensor-stream

A library for real-time video stream decoding to CUDA memory

c-plus-plus cuda python pytorch video video-processing

Last synced: 14 Nov 2024

https://github.com/luoyetx/mini-caffe

Minimal runtime core of Caffe, Forward only, GPU support and Memory efficiency.

android caffe cuda cudnn forward-only linux mini-caffe openblas windows

Last synced: 26 Oct 2024

https://github.com/nosferalatu/SimpleGPUHashTable

A simple GPU hash table implemented in CUDA using lock free techniques

cuda cuda-programming data-structures gpu gpu-cuda-programs

Last synced: 14 Nov 2024

https://github.com/ibm/aihwkit

IBM Analog Hardware Acceleration Kit

ai analog-devices cuda neural-networks pytorch

Last synced: 20 Jan 2025

https://github.com/ingonyama-zk/icicle

A hardware acceleration library for compute intensive cryptography :ice_cube:

cpu cryptography cuda golang msm ntt rust zero-knowledge

Last synced: 24 Jan 2025

https://github.com/nvidia/cuquantum

Home for cuQuantum Python & NVIDIA cuQuantum SDK C++ samples

cuda cuquantum custatevec cutensornet nvidia quantum-computing

Last synced: 23 Jan 2025

https://github.com/nersc/timemory

Modular C++ Toolkit for Performance Analysis and Logging. Profiling API and Tools for C, C++, CUDA, Fortran, and Python. The C++ template API is essentially a framework to creating tools: it is designed to provide a unifying interface for recording various performance measurements alongside data logging and interfaces to other tools.

analysis c cplusplus cpp cross-language cross-platform cuda cupti gotcha hardware-counters instrumentation-api memory-measurements modular-design mpi papi performance performance-measurement python roofline

Last synced: 23 Jan 2025

https://github.com/alpaka-group/alpaka

Abstraction Library for Parallel Kernel Acceleration :llama:

cpp cpp17 cuda gpu header-only heterogeneous-parallel-programming hip hpc openacc openmp rocm tbb

Last synced: 25 Jan 2025

https://github.com/NERSC/timemory

Modular C++ Toolkit for Performance Analysis and Logging. Profiling API and Tools for C, C++, CUDA, Fortran, and Python. The C++ template API is essentially a framework to creating tools: it is designed to provide a unifying interface for recording various performance measurements alongside data logging and interfaces to other tools.

analysis c cplusplus cpp cross-language cross-platform cuda cupti gotcha hardware-counters instrumentation-api memory-measurements modular-design mpi papi performance performance-measurement python roofline

Last synced: 14 Nov 2024

https://github.com/ekondis/mixbench

A GPU benchmark tool for evaluating GPUs and CPUs on mixed operational intensity kernels (CUDA, OpenCL, HIP, SYCL, OpenMP)

benchmark cuda gpu hip opencl openmp sycl

Last synced: 05 Nov 2024

https://github.com/luisagroup/luisarender

High-Performance Cross-Platform Monte Carlo Renderer Based on LuisaCompute

cpp cuda gpu high-performance ispc metal optix path-tracing ray-tracing renderer rendering siggraph-asia-2022

Last synced: 23 Jan 2025

https://github.com/NVIDIA/cuQuantum

Home for cuQuantum Python & NVIDIA cuQuantum SDK C++ samples

cuda cuquantum custatevec cutensornet nvidia quantum-computing

Last synced: 03 Nov 2024

https://github.com/harrism/hemi

Simple utilities to enable code reuse and portability between CUDA C/C++ and standard C/C++.

c-plus-plus cuda cuda-device cuda-kernels gpu hemi

Last synced: 21 Jan 2025

https://github.com/IBM/aihwkit

IBM Analog Hardware Acceleration Kit

ai analog-devices cuda neural-networks pytorch

Last synced: 17 Nov 2024

https://github.com/lambdalabsml/distributed-training-guide

Best practices & guides on how to write distributed pytorch training code

cluster cuda deepspeed distributed-training fsdp gpu gpu-cluster kuberentes lambdalabs mpi nccl pytorch sharding slurm

Last synced: 27 Jan 2025

https://github.com/bruce-lee-ly/cuda_hgemm

Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.

cublas cuda gemm gpu hgemm matrix-multiply nvidia tensor-core

Last synced: 27 Jan 2025

https://github.com/a2flo/floor

A C++ Compute/Graphics Library and Toolchain enabling same-source CUDA/Host/Metal/OpenCL/Vulkan C++ programming and execution.

c-plus-plus compiler compute cuda graphics ios linux macos metal opencl openxr rendering spir spir-v virtual-reality vulkan windows

Last synced: 27 Jan 2025

https://github.com/UoB-HPC/BabelStream

STREAM, for lots of devices written in many programming models

benchmark cuda gpgpu gpu hpc kokkos memory-bandwidth openacc opencl openmp parallel-processing raja sycl

Last synced: 09 Nov 2024

https://github.com/omlins/ParallelStencil.jl

Package for writing high-level code for parallel high-performance stencil computations that can be deployed on both GPUs and CPUs

cuda gpu julia multi-gpu multi-xpu parallel staggered-grids stencil-codes xpu

Last synced: 30 Oct 2024

https://github.com/knightcrawler25/optix-pathtracer

Simple physically based path tracer based on Nvidia's Optix Ray Tracing Engine

brdf cuda disney gpu optix pathtracing raytracing

Last synced: 24 Jan 2025

https://github.com/charles-r-earp/autograph

A machine learning library for Rust.

cuda machine-learning neural-networks rust

Last synced: 19 Nov 2024

https://github.com/lattice/quda

QUDA is a library for performing calculations in lattice QCD on GPUs.

c c-plus-plus cuda gpu mpi multi-gpu qcd

Last synced: 25 Jan 2025

https://github.com/Bruce-Lee-LY/cuda_hgemm

Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.

cublas cuda gemm gpu hgemm matrix-multiply nvidia tensor-core

Last synced: 19 Nov 2024

https://github.com/QMCPACK/qmcpack

Main repository for QMCPACK, an open-source production level many-body ab initio Quantum Monte Carlo code for computing the electronic structure of atoms, molecules, and solids with full performance portable GPU support

c-plus-plus cuda electronic-structure gpu high-performance-computing hpc mpi quantum-chemistry quantum-monte-carlo

Last synced: 30 Oct 2024

https://github.com/sekwiatkowski/Komputation

Komputation is a neural network framework for the Java Virtual Machine written in Kotlin and CUDA C.

artificial-intelligence convolutional-neural-networks cuda framework gpu jvm kotlin machine-learning neural-networks nlp nvidia recurrent-neural-networks seq2seq

Last synced: 02 Nov 2024

https://github.com/nvidia-genomics-research/genomeworks

SDK for GPU accelerated genome assembly and analysis

alignment cuda genomics gpu mapping nvidia partial-order-alignment poa python-api

Last synced: 26 Jan 2025

https://github.com/clara-parabricks/GenomeWorks

SDK for GPU accelerated genome assembly and analysis

alignment cuda genomics gpu mapping nvidia partial-order-alignment poa python-api

Last synced: 26 Dec 2024

https://github.com/NVIDIA-Genomics-Research/GenomeWorks

SDK for GPU accelerated genome assembly and analysis

alignment cuda genomics gpu mapping nvidia partial-order-alignment poa python-api

Last synced: 15 Nov 2024