An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with gpu-acceleration

A curated list of projects in awesome lists tagged with gpu-acceleration .

https://github.com/tensorflow/tfjs

A WebGL accelerated JavaScript library for training and deploying ML models.

deep-learning deep-neural-network gpu-acceleration javascript machine-learning neural-network typescript wasm web-assembly webgl

Last synced: 09 Sep 2025

https://github.com/nvidia/tensorrt

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.

deep-learning gpu-acceleration inference nvidia tensorrt

Last synced: 09 Sep 2025

https://github.com/NVIDIA/TensorRT

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.

deep-learning gpu-acceleration inference nvidia tensorrt

Last synced: 20 Mar 2025

https://github.com/tensorflow/tfjs-core

WebGL-accelerated ML // linear algebra // automatic differentiation for JavaScript.

deep-learning deep-neural-networks gpu-acceleration javascript machine-learning neural-network typescript webgl

Last synced: 30 Sep 2025

https://github.com/raphamorim/rio

A hardware-accelerated GPU terminal emulator focusing to run in desktops and browsers.

gpu-acceleration rio rio-terminal rust rust-lang terminal terminal-emulator terminal-emulators terminal-ui vte wgpu

Last synced: 15 Dec 2025

https://github.com/cornellius-gp/gpytorch

A highly efficient implementation of Gaussian Processes in PyTorch

gaussian-processes gpu-acceleration pytorch

Last synced: 13 May 2025

https://github.com/nvidia/generativeaiexamples

Generative AI reference workflows optimized for accelerated infrastructure and microservice architecture.

gpu-acceleration large-language-models llm llm-inference microservice nemo rag retrieval-augmented-generation tensorrt triton-inference-server

Last synced: 13 May 2025

https://github.com/NVIDIA/GenerativeAIExamples

Generative AI reference workflows optimized for accelerated infrastructure and microservice architecture.

gpu-acceleration large-language-models llm llm-inference microservice nemo rag retrieval-augmented-generation tensorrt triton-inference-server

Last synced: 28 Mar 2025

https://github.com/Hedgehog-Computing/hedgehog-lab

Run, compile and execute JavaScript for Scientific Computing and Data Visualization TOTALLY TOTALLY TOTALLY in your BROWSER! An open source scientific computing environment for JavaScript TOTALLY in your browser, matrix operations with GPU acceleration, TeX support, data visualization and symbolic computation.

computer-algebra data-visualization gpu-acceleration javascript latex machine-learning matrix-library scientific-computing symbolic-computation tex webgl webgl2

Last synced: 30 Mar 2025

https://github.com/hedgehog-computing/hedgehog-lab

Run, compile and execute JavaScript for Scientific Computing and Data Visualization TOTALLY TOTALLY TOTALLY in your BROWSER! An open source scientific computing environment for JavaScript TOTALLY in your browser, matrix operations with GPU acceleration, TeX support, data visualization and symbolic computation.

computer-algebra data-visualization gpu-acceleration javascript latex machine-learning matrix-library scientific-computing symbolic-computation tex webgl webgl2

Last synced: 15 May 2025

https://github.com/emacs-ng/emacs-ng

A new approach to Emacs - Including TypeScript, Threading, Async I/O, and WebRender.

async deno emacs emacs-ng gpu gpu-acceleration javascript rust wasm webassembly webrender webworkers

Last synced: 14 May 2025

https://github.com/calebwin/emu

The write-once-run-anywhere GPGPU library for Rust

emu gpgpu gpu gpu-acceleration gpu-computing gpu-programming rust

Last synced: 14 May 2025

https://calebwin.github.io/emu/

The write-once-run-anywhere GPGPU library for Rust

emu gpgpu gpu gpu-acceleration gpu-computing gpu-programming rust

Last synced: 30 Apr 2025

https://github.com/beehive-lab/tornadovm

TornadoVM: A practical and efficient heterogeneous programming framework for managed languages

ai cuda gpu-acceleration gpu-computing gpus graalvm java levelzero multi-core opencl parallel-computing parallel-programming spirv

Last synced: 02 Dec 2025

https://github.com/beehive-lab/TornadoVM

TornadoVM: A practical and efficient heterogeneous programming framework for managed languages

ai cuda gpu-acceleration gpu-computing gpus graalvm java levelzero multi-core opencl parallel-computing parallel-programming spirv

Last synced: 04 Apr 2025

https://github.com/liu-xiandong/how_to_optimize_in_gpu

This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, sgemv, sgemm, etc. The performance of these kernels is basically at or near the theoretical limit.

elementwise gpu-acceleration high-performance-computing hpc reduce sgemm sgemv

Last synced: 03 Oct 2025

https://github.com/NVIDIA-Merlin/HugeCTR

HugeCTR is a high efficiency GPU framework designed for Click-Through-Rate (CTR) estimating training

cpp deep-learning gpu-acceleration recommendation-system recommender-system

Last synced: 20 Jul 2025

https://github.com/nvidia-merlin/hugectr

HugeCTR is a high efficiency GPU framework designed for Click-Through-Rate (CTR) estimating training

cpp deep-learning gpu-acceleration recommendation-system recommender-system

Last synced: 14 May 2025

https://github.com/dgasmith/opt_einsum

⚡️Optimizing einsum functions in NumPy, Tensorflow, Dask, and more with contraction order optimization.

contraction einsum gpu-acceleration performance python tensor tensor-contraction

Last synced: 13 May 2025

https://github.com/Liu-xiandong/How_to_optimize_in_GPU

This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, sgemv, sgemm, etc. The performance of these kernels is basically at or near the theoretical limit.

elementwise gpu-acceleration high-performance-computing hpc reduce sgemm sgemv

Last synced: 14 May 2025

https://github.com/nvidia-merlin/merlin

NVIDIA Merlin is an open source library providing end-to-end GPU-accelerated recommender systems, from feature engineering and preprocessing to training deep learning models and running inference in production.

deep-learning end-to-end gpu-acceleration machine-learning recommendation-system recommender-system

Last synced: 13 Apr 2025

https://github.com/NVIDIA-Merlin/Merlin

NVIDIA Merlin is an open source library providing end-to-end GPU-accelerated recommender systems, from feature engineering and preprocessing to training deep learning models and running inference in production.

deep-learning end-to-end gpu-acceleration machine-learning recommendation-system recommender-system

Last synced: 30 Jul 2025

https://github.com/iot-salzburg/gpu-jupyter

GPU-Jupyter: Your GPU-accelerated JupyterLab with a rich data science toolstack, TensorFlow and PyTorch for your reproducible deep learning experiments.

docker environment gpu-acceleration gpu-computing jupyter jupyter-server jupyterlab pytorch reproducible-research tensorflow

Last synced: 15 May 2025

https://github.com/ttddee/Cascade

Node-based image editor with GPU-acceleration.

gpu-acceleration image-editor node-based vulkan

Last synced: 01 Apr 2025

https://github.com/limbo018/DREAMPlace

Deep learning toolkit-enabled VLSI placement

deep-learning gpu-acceleration pytorch vlsi vlsi-physical-design vlsi-placement

Last synced: 08 May 2025

https://github.com/philferriere/dlwin

GPU-accelerated Deep Learning on Windows 10 native

cntk cudnn deep-learning gpu-acceleration gpu-mode keras tensorflow theano

Last synced: 05 Apr 2025

https://github.com/DavidDiazGuerra/gpuRIR

Python library for Room Impulse Response (RIR) simulation with GPU acceleration

acoustics gpu-acceleration image-source-model python-library rir room-impulse-responses

Last synced: 01 Apr 2025

https://github.com/megviirobot/megba

MegBA: A GPU-Based Distributed Library for Large-Scale Bundle Adjustment

bundleadjustment cuda distributed gpu-acceleration graph-optimization high-performance

Last synced: 24 Jun 2025

https://github.com/MegviiRobot/MegBA

MegBA: A GPU-Based Distributed Library for Large-Scale Bundle Adjustment

bundleadjustment cuda distributed gpu-acceleration graph-optimization high-performance

Last synced: 07 May 2025

https://github.com/projectphysx/opencl-wrapper

OpenCL is the most powerful programming language ever created. Yet the OpenCL C++ bindings are cumbersome and the code overhead prevents many people from getting started. I created this lightweight OpenCL-Wrapper to greatly simplify OpenCL software development with C++ while keeping functionality and performance.

gpgpu gpgpu-computing gpu gpu-acceleration gpu-computing gpu-programming opencl vector-processor vectorization

Last synced: 16 May 2025

https://github.com/quiver-team/torch-quiver

PyTorch Library for Low-Latency, High-Throughput Graph Learning on GPUs.

distributed-computing geometric-deep-learning gpu-acceleration graph-learning graph-neural-networks pytorch

Last synced: 04 Apr 2025

https://github.com/marian-nmt/marian-dev

Fast Neural Machine Translation in C++ - development repository

cpp11 cuda fast gpu-acceleration neural-machine-translation

Last synced: 15 May 2025

https://github.com/clesperanto/pyclesperanto_prototype

GPU-accelerated bio-image analysis focusing on 3D+t microscopy image data

bioimage-analysis gpu-acceleration microscopy

Last synced: 21 Oct 2025

https://github.com/audiokit/waveform

GPU accelerated waveform view

audio audio-visualizer gpu-acceleration metal

Last synced: 17 Jun 2025

https://github.com/bh107/bohrium

Automatic parallelization of Python/NumPy, C, and C++ codes on Linux and MacOSX

cuda gpu gpu-acceleration multi-core numpy opencl parallel-computing

Last synced: 21 Oct 2025

https://github.com/clEsperanto/pyclesperanto_prototype

GPU-accelerated bio-image analysis focusing on 3D+t microscopy image data

bioimage-analysis gpu-acceleration microscopy

Last synced: 21 Mar 2025

https://github.com/BasBuller/PySNN

Efficient Spiking Neural Network framework, built on top of PyTorch for GPU acceleration

deep-learning dynamic gpu-acceleration gpu-computing machine-learning neural-networks python3 pytorch spiking-neural-networks stdp

Last synced: 07 May 2025

https://github.com/AudioKit/Waveform

GPU accelerated waveform view

audio audio-visualizer gpu-acceleration metal

Last synced: 16 Jul 2025

https://github.com/yzhao062/pytod

TOD: GPU-accelerated Outlier Detection via Tensor Operations

anomaly-detection gpu-acceleration gpu-systems machine-learning outlier-detection unsupervised-learning

Last synced: 23 Oct 2025

https://github.com/eth-cscs/cosma

Distributed Communication-Optimal Matrix-Matrix Multiplication Algorithm

communication-optimal cuda gpu-acceleration linear-algebra matmul matrix-multiplication mpi pdgemm rocm scalapack

Last synced: 04 Apr 2025

https://github.com/ucl-bug/jwave

A JAX-based research framework for differentiable and parallelizable acoustic simulations, on CPU, GPUs and TPUs

acoustics differentiable-simulations gpu gpu-acceleration jax kwave physics-informed-neural-networks scientific-machine-learning simulation tpu-acceleration ultrasound wave-equation

Last synced: 14 May 2025

https://github.com/juliahealth/komamri.jl

Koma is a Pulseq-compatible framework to efficiently simulate Magnetic Resonance Imaging (MRI) acquisitions. The main focus of this package is to simulate general scenarios that could arise in pulse sequence development.

cardiac diffusion diffusion-mri gpu-acceleration mri simulation

Last synced: 11 Dec 2025

https://github.com/mightycow/Sluggish

Toy CPU and GPU implementations of the Slug rendering algorithm

font gpu-acceleration rendering-algorithm slug

Last synced: 16 Oct 2025

https://github.com/arceryz/raylib-gpu-particles

Raylib 100% GPU particles example in 3D. Uses compute shaders and is fully documented. Millions of particles at 60 fps on a laptop.

c compute-shader example glsl gpu gpu-acceleration gui lorenz-attractor raygui raylib raylib-examples tutorial

Last synced: 11 Apr 2025

https://github.com/IntelPython/dpnp

Data Parallel Extension for NumPy

dpcpp gpu gpu-acceleration intel mkl numpy oneapi pstl python3 sycl

Last synced: 01 May 2025

https://github.com/icl-utk-edu/slate

SLATE is a distributed, GPU-accelerated, dense linear algebra library targetting current and upcoming high-performance computing (HPC) systems. It is developed as part of the U.S. Department of Energy Exascale Computing Project (ECP).

gpu-acceleration hpc linear-algebra

Last synced: 29 Dec 2025

https://github.com/intelpython/dpnp

Data Parallel Extension for NumPy

dpcpp gpu gpu-acceleration intel mkl numpy oneapi pstl python3 sycl

Last synced: 16 May 2025

https://github.com/ucbrise/piranha

Piranha: A GPU Platform for Secure Computation

gpu-acceleration multi-party-computation privacy-preserving-machine-learning

Last synced: 22 Jun 2025

https://github.com/ashvardanian/parallelreductionsbenchmark

Thrust, CUB, TBB, AVX2, AVX-512, CUDA, OpenCL, OpenMP, Metal - all it takes to sum a lot of numbers fast!

apple avx512 cuda glsl gpgpu gpu gpu-acceleration gpu-computing hpc intel metal nvidia opencl openmp parallel simd stl tbb thrust

Last synced: 06 Apr 2025

https://github.com/guillaume-chevalier/glove-as-a-tensorflow-embedding-layer

Taking a pretrained GloVe model, and using it as a TensorFlow embedding weight layer **inside the GPU**. Therefore, you only need to send the index of the words through the GPU data transfer bus, reducing data transfer overhead.

cosine-similarity glove glove-embeddings gpu gpu-acceleration gpu-tensorflow neural-network tensorflow tensorflow-layers word-embeddings word2vec

Last synced: 30 Apr 2025

https://github.com/larsgeb/m1-gpu-cpp

Metal Shading Language on Apple M1's GPU for scientific C++.

clang cpp cpp17 gpu-acceleration gpu-computing m1-mac metal metal-cpp objective-c scientific-computing

Last synced: 17 Mar 2025

https://github.com/aestream/aestream

Efficient streaming of sparse event data supporting files, network I/O, GPU peripherals (via Torch/Jax/Numpy) and neuromorphic protocols

coroutines event-camera gpu-acceleration neuromorphic pytorch

Last synced: 19 Nov 2025

https://github.com/oalieno/asm2vec-pytorch

Unofficial implementation of asm2vec using pytorch ( with GPU acceleration )

asm2vec gpu-acceleration machine-learning neural-language-processing python pytorch unofficial

Last synced: 10 May 2025

https://github.com/kunitoki/yup

YUP is an open-source library dedicated to empowering developers with advanced tools for cross-platform application development.

application-framework audio gpu-acceleration graphics gui juce rive

Last synced: 08 May 2025

https://github.com/selkies-project/selkies-vdi

WebRTC & Xpra desktops on Selkies

gke gpu-acceleration kubernetes selkies vdi webrtc xpra

Last synced: 23 Jul 2025