Projects in Awesome Lists tagged with gpu-acceleration

https://github.com/tensorflow/tfjs

A WebGL accelerated JavaScript library for training and deploying ML models.

deep-learning deep-neural-network gpu-acceleration javascript machine-learning neural-network typescript wasm web-assembly webgl

Last synced: 09 Sep 2025

https://github.com/nvidia/tensorrt

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.

deep-learning gpu-acceleration inference nvidia tensorrt

Last synced: 09 Sep 2025

https://github.com/NVIDIA/TensorRT

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.

deep-learning gpu-acceleration inference nvidia tensorrt

Last synced: 20 Mar 2025

https://github.com/tensorflow/tfjs-core

WebGL-accelerated ML // linear algebra // automatic differentiation for JavaScript.

deep-learning deep-neural-networks gpu-acceleration javascript machine-learning neural-network typescript webgl

Last synced: 30 Sep 2025

https://github.com/raphamorim/rio

A hardware-accelerated GPU terminal emulator focusing to run in desktops and browsers.

gpu-acceleration rio rio-terminal rust rust-lang terminal terminal-emulator terminal-emulators terminal-ui vte wgpu

Last synced: 28 Apr 2026

https://github.com/cornellius-gp/gpytorch

A highly efficient implementation of Gaussian Processes in PyTorch

gaussian-processes gpu-acceleration pytorch

Last synced: 28 Feb 2026

https://github.com/nvidia/generativeaiexamples

Generative AI reference workflows optimized for accelerated infrastructure and microservice architecture.

gpu-acceleration large-language-models llm llm-inference microservice nemo rag retrieval-augmented-generation tensorrt triton-inference-server

Last synced: 13 May 2025

https://github.com/NVIDIA/GenerativeAIExamples

Generative AI reference workflows optimized for accelerated infrastructure and microservice architecture.

gpu-acceleration large-language-models llm llm-inference microservice nemo rag retrieval-augmented-generation tensorrt triton-inference-server

Last synced: 28 Mar 2025

https://github.com/Hedgehog-Computing/hedgehog-lab

Run, compile and execute JavaScript for Scientific Computing and Data Visualization TOTALLY TOTALLY TOTALLY in your BROWSER! An open source scientific computing environment for JavaScript TOTALLY in your browser, matrix operations with GPU acceleration, TeX support, data visualization and symbolic computation.

computer-algebra data-visualization gpu-acceleration javascript latex machine-learning matrix-library scientific-computing symbolic-computation tex webgl webgl2

Last synced: 30 Mar 2025

https://github.com/hedgehog-computing/hedgehog-lab

Run, compile and execute JavaScript for Scientific Computing and Data Visualization TOTALLY TOTALLY TOTALLY in your BROWSER! An open source scientific computing environment for JavaScript TOTALLY in your browser, matrix operations with GPU acceleration, TeX support, data visualization and symbolic computation.

computer-algebra data-visualization gpu-acceleration javascript latex machine-learning matrix-library scientific-computing symbolic-computation tex webgl webgl2

Last synced: 15 May 2025

https://github.com/blazingdb/blazingsql

BlazingSQL is a lightweight, GPU accelerated, SQL engine for Python. Built on RAPIDS cuDF.

arrow artificial-intelligence blazingsql conda-environment cudf data-science gpu gpu-acceleration gpu-dataframes machine-learning machine-learning-workflow python rapids rapidsai sql sql-engine

Last synced: 15 May 2025

https://github.com/BlazingDB/blazingsql

BlazingSQL is a lightweight, GPU accelerated, SQL engine for Python. Built on RAPIDS cuDF.

arrow artificial-intelligence blazingsql conda-environment cudf data-science gpu gpu-acceleration gpu-dataframes machine-learning machine-learning-workflow python rapids rapidsai sql sql-engine

Last synced: 25 Mar 2025

https://github.com/tianzerl/anime4kcpp

A high performance anime upscaler

anime anime4k anime4kcpp avisynth avisynthplus-plugin cnn computer-graphics cpp directshow-filter gpu-acceleration machine-learning upscaling vapoursynth vapoursynth-plugin video-processing

Last synced: 14 May 2025

https://github.com/TianZerL/Anime4KCPP

A high performance anime upscaler

anime anime4k anime4kcpp avisynth avisynthplus-plugin cnn computer-graphics cpp directshow-filter gpu-acceleration machine-learning upscaling vapoursynth vapoursynth-plugin video-processing

Last synced: 06 May 2025

https://github.com/coreylowman/dfdx

Deep learning in Rust, with shape checked tensors and neural networks

autodiff autodifferentiation autograd backpropagation cuda cuda-kernels cuda-support cuda-toolkit cudnn deep-learning deep-neural-networks gpu gpu-acceleration gpu-computing machine-learning neural-network rust rust-lang tensor

Last synced: 14 May 2025

https://github.com/emacs-ng/emacs-ng

A new approach to Emacs - Including TypeScript, Threading, Async I/O, and WebRender.

async deno emacs emacs-ng gpu gpu-acceleration javascript rust wasm webassembly webrender webworkers

Last synced: 14 May 2025

https://github.com/nvidia/cccl

CUDA Core Compute Libraries

accelerated-computing cpp cpp-programming cuda cuda-cpp cuda-kernels cuda-library cuda-programming gpu gpu-acceleration gpu-computing gpu-programming hpc modern-cpp nvidia nvidia-gpu parallel-algorithm parallel-computing parallel-programming

Last synced: 05 Feb 2026

https://github.com/calebwin/emu

The write-once-run-anywhere GPGPU library for Rust

emu gpgpu gpu gpu-acceleration gpu-computing gpu-programming rust

Last synced: 14 May 2025

https://calebwin.github.io/emu/

The write-once-run-anywhere GPGPU library for Rust

emu gpgpu gpu gpu-acceleration gpu-computing gpu-programming rust

Last synced: 30 Apr 2025

https://github.com/emi-group/evox

Distributed GPU-Accelerated Framework for Evolutionary Computation. Comprehensive Library of Evolutionary Algorithms & Benchmark Problems.

black-box-optimization brax derivative-free-optimization evolutionary-algorithms evolutionary-computation evolutionary-optimization evolutionary-reinforcement-learinig evolutionary-strategies gpu-acceleration gradient-free-optimization gym jax metaheuristics multi-objective-optimization neuroevolution population-based-optimization pytorch ray

Last synced: 06 Nov 2025

https://github.com/beehive-lab/tornadovm

TornadoVM: A practical and efficient heterogeneous programming framework for managed languages

ai cuda gpu-acceleration gpu-computing gpus graalvm java levelzero multi-core opencl parallel-computing parallel-programming spirv

Last synced: 02 Apr 2026

https://github.com/NVlabs/sionna

Sionna: An Open-Source Library for Research on Communication Systems

5g 6g communications deep-learning differentiable-simulation gpu-acceleration link-level-simulation machine-learning open-source raytracing reproducible-research system-level-simulation

Last synced: 28 Feb 2026

https://github.com/nvlabs/sionna

Sionna: An Open-Source Library for Research on Communication Systems

5g 6g communications deep-learning differentiable-simulation gpu-acceleration link-level-simulation machine-learning open-source raytracing reproducible-research system-level-simulation

Last synced: 02 Apr 2026

https://github.com/NVIDIA/cccl

CUDA Core Compute Libraries

accelerated-computing cpp cpp-programming cuda cuda-cpp cuda-kernels cuda-library cuda-programming gpu gpu-acceleration gpu-computing gpu-programming hpc modern-cpp nvidia nvidia-gpu parallel-algorithm parallel-computing parallel-programming

Last synced: 14 May 2025

https://github.com/beehive-lab/TornadoVM

TornadoVM: A practical and efficient heterogeneous programming framework for managed languages

ai cuda gpu-acceleration gpu-computing gpus graalvm java levelzero multi-core opencl parallel-computing parallel-programming spirv

Last synced: 04 Apr 2025

https://github.com/stotko/stdgpu

stdgpu: Efficient STL-like Data Structures on the GPU

cpp cpp17 cpp20 cuda data-structures gpgpu gpu gpu-acceleration gpu-computing hip modern-cpp openmp rocm stl stl-containers stl-like

Last synced: 14 May 2025

https://github.com/jaysmito101/terraforge3d

Cross Platform Professional Procedural Terrain Generation & Texturing Tool

3d cpp game-development gamedev gpu-acceleration hacktoberfest imgui nodeeditor open-source opengl opensource precedural-textures procedural-generation terrain-generation

Last synced: 16 May 2025

https://jaysmito101.github.io/TerraForge3D/

Cross Platform Professional Procedural Terrain Generation & Texturing Tool

3d cpp game-development gamedev gpu-acceleration hacktoberfest imgui nodeeditor open-source opengl opensource precedural-textures procedural-generation terrain-generation

Last synced: 07 May 2025

https://github.com/liu-xiandong/how_to_optimize_in_gpu

This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, sgemv, sgemm, etc. The performance of these kernels is basically at or near the theoretical limit.

elementwise gpu-acceleration high-performance-computing hpc reduce sgemm sgemv

Last synced: 03 Oct 2025

https://github.com/NVIDIA-Merlin/HugeCTR

HugeCTR is a high efficiency GPU framework designed for Click-Through-Rate (CTR) estimating training

cpp deep-learning gpu-acceleration recommendation-system recommender-system

Last synced: 20 Jul 2025

https://github.com/nvidia-merlin/hugectr

HugeCTR is a high efficiency GPU framework designed for Click-Through-Rate (CTR) estimating training

cpp deep-learning gpu-acceleration recommendation-system recommender-system

Last synced: 14 May 2025

https://github.com/chelsea0x3b/cudarc

Safe rust wrapper around CUDA toolkit

cublas cuda cuda-kernels cuda-programming cuda-toolkit cudnn curand gpu gpu-acceleration nccl nvrtc rust

Last synced: 09 Feb 2026

https://github.com/hughperkins/VeriGPU

OpenSource GPU, in Verilog, loosely based on RISC-V ISA

asic-design gpu gpu-acceleration hardware-designs machine-learning risc-v risc-v-assembly verification verilog

Last synced: 02 Apr 2025

https://github.com/hughperkins/verigpu

OpenSource GPU, in Verilog, loosely based on RISC-V ISA

asic-design gpu gpu-acceleration hardware-designs machine-learning risc-v risc-v-assembly verification verilog

Last synced: 20 Feb 2026

https://github.com/Jaysmito101/TerraForge3D

Cross Platform Professional Procedural Terrain Generation & Texturing Tool

3d cpp game-development gamedev gpu-acceleration hacktoberfest imgui nodeeditor open-source opengl opensource precedural-textures procedural-generation terrain-generation

Last synced: 01 Apr 2025

https://github.com/dgasmith/opt_einsum

⚡️Optimizing einsum functions in NumPy, Tensorflow, Dask, and more with contraction order optimization.

contraction einsum gpu-acceleration performance python tensor tensor-contraction

Last synced: 13 May 2025

https://github.com/eszdman/PhotonCamera

Android Camera that uses Enhanced image processing

android android-camera camera2-api computational-photography computer-vision gpu-acceleration image-processing photography

Last synced: 27 Mar 2025

https://github.com/Liu-xiandong/How_to_optimize_in_GPU

This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, sgemv, sgemm, etc. The performance of these kernels is basically at or near the theoretical limit.

elementwise gpu-acceleration high-performance-computing hpc reduce sgemm sgemv

Last synced: 14 May 2025

https://github.com/nvidia-merlin/merlin

NVIDIA Merlin is an open source library providing end-to-end GPU-accelerated recommender systems, from feature engineering and preprocessing to training deep learning models and running inference in production.

deep-learning end-to-end gpu-acceleration machine-learning recommendation-system recommender-system

Last synced: 13 Apr 2025

https://github.com/NVIDIA-Merlin/Merlin

NVIDIA Merlin is an open source library providing end-to-end GPU-accelerated recommender systems, from feature engineering and preprocessing to training deep learning models and running inference in production.

deep-learning end-to-end gpu-acceleration machine-learning recommendation-system recommender-system

Last synced: 30 Jul 2025

https://github.com/iot-salzburg/gpu-jupyter

GPU-Jupyter: Your GPU-accelerated JupyterLab with a rich data science toolstack, TensorFlow and PyTorch for your reproducible deep learning experiments.

docker environment gpu-acceleration gpu-computing jupyter jupyter-server jupyterlab pytorch reproducible-research tensorflow

Last synced: 15 May 2025

https://github.com/ttddee/Cascade

Node-based image editor with GPU-acceleration.

gpu-acceleration image-editor node-based vulkan

Last synced: 01 Apr 2025

https://github.com/limbo018/DREAMPlace

Deep learning toolkit-enabled VLSI placement

deep-learning gpu-acceleration pytorch vlsi vlsi-physical-design vlsi-placement

Last synced: 08 May 2025

https://github.com/sergio0694/neuralnetwork.net

A TensorFlow-inspired neural network library built from scratch in C# 7.3 for .NET Standard 2.0, with GPU support through cuDNN

ai backpropagation-algorithm classification-algorithims cnn convolutional-neural-networks csharp cuda gpu-acceleration gradient-descent machine-learning net-framework netstandard neural-network supervised-learning visual-studio

Last synced: 16 May 2025

https://github.com/Sergio0694/NeuralNetwork.NET

A TensorFlow-inspired neural network library built from scratch in C# 7.3 for .NET Standard 2.0, with GPU support through cuDNN

ai backpropagation-algorithm classification-algorithims cnn convolutional-neural-networks csharp cuda gpu-acceleration gradient-descent machine-learning net-framework netstandard neural-network supervised-learning visual-studio

Last synced: 02 Apr 2025

https://github.com/philferriere/dlwin

GPU-accelerated Deep Learning on Windows 10 native

cntk cudnn deep-learning gpu-acceleration gpu-mode keras tensorflow theano

Last synced: 05 Apr 2025

https://github.com/DavidDiazGuerra/gpuRIR

Python library for Room Impulse Response (RIR) simulation with GPU acceleration

acoustics gpu-acceleration image-source-model python-library rir room-impulse-responses

Last synced: 01 Apr 2025

https://github.com/megviirobot/megba

MegBA: A GPU-Based Distributed Library for Large-Scale Bundle Adjustment

bundleadjustment cuda distributed gpu-acceleration graph-optimization high-performance

Last synced: 24 Jun 2025

https://github.com/MegviiRobot/MegBA

MegBA: A GPU-Based Distributed Library for Large-Scale Bundle Adjustment

bundleadjustment cuda distributed gpu-acceleration graph-optimization high-performance

Last synced: 07 May 2025

https://github.com/projectphysx/opencl-wrapper

OpenCL is the most powerful programming language ever created. Yet the OpenCL C++ bindings are cumbersome and the code overhead prevents many people from getting started. I created this lightweight OpenCL-Wrapper to greatly simplify OpenCL software development with C++ while keeping functionality and performance.

gpgpu gpgpu-computing gpu gpu-acceleration gpu-computing gpu-programming opencl vector-processor vectorization

Last synced: 16 May 2025

https://github.com/uncomplicate/bayadera

High-performance Bayesian Data Analysis on the GPU in Clojure

bayesian bayesian-data-analysis bayesian-inference clojure clojure-library cuda gpu gpu-acceleration gpu-computing high-performance-computing machine-learning markov-chain-monte-carlo mcmc opencl statistics

Last synced: 09 Apr 2025

https://github.com/datacanvasio/hypergbm

A full pipeline AutoML tool for tabular data

adversarial-validation automl catboost dask dask-distributed datacleaning distributed-training ensemble-learning fullpipeline gbm gpu-acceleration lightgbm preprocessing pseudo-labeling rapidsai semi-supervised-learning sklearn tabular-data xgboost

Last synced: 15 May 2025

https://github.com/jatinkrmalik/vocalinux

Free, open-source, 100% offline voice dictation for Linux. Speak and type anywhere via whisper.cpp, Whisper & VOSK engines, GPU-accelerated, works on X11 + Wayland!

accessibility dictation gpu-acceleration linux offline-first privacy-first python speech-recognition speech-to-text voice voice-typing vosk wayland whisper whisper-cpp

Last synced: 06 Jun 2026

https://github.com/DataCanvasIO/HyperGBM

A full pipeline AutoML tool for tabular data

adversarial-validation automl catboost dask dask-distributed datacleaning distributed-training ensemble-learning fullpipeline gbm gpu-acceleration lightgbm preprocessing pseudo-labeling rapidsai semi-supervised-learning sklearn tabular-data xgboost

Last synced: 09 May 2025

https://github.com/andrewmilson/ministark

🏃‍♂️💨 GPU accelerated STARK prover built on @arkworks-rs

apple-silicon arkworks arkworks-rs crypto cryptography fft finite-fields gpu gpu-acceleration gpu-computing gpu-programming m1 metal optimization polynomials rust starks virtual-machine zero-knowledge zkstarks

Last synced: 12 Dec 2025

https://github.com/favreau/Sol-R

Open-Source CUDA/OpenCL Speed Of Light Ray-tracer

3d 3d-graphics-engine cuda gpgpu gpu-acceleration gpu-computing graphics-engine interactive opencl path-tracing pathtracing ray-tracing raytracer raytracing raytracing-engine realtime-rendering rendering science virtual-reality vr

Last synced: 30 Apr 2025

https://github.com/quiver-team/torch-quiver

PyTorch Library for Low-Latency, High-Throughput Graph Learning on GPUs.

distributed-computing geometric-deep-learning gpu-acceleration graph-learning graph-neural-networks pytorch

Last synced: 04 Apr 2025

https://github.com/nvidia/cuopt-resources

A collection of NVIDIA cuOpt samples and other resources

cvrp cvrptw gpu gpu-acceleration gpu-optimization intralogistics last-mile-delivery logistics nvidia-gpu operations-research optimization optimization-algorithms optimization-tools pickup-and-delivery route-optimization traveling-salesman-problem tsp-solver vehicle-routing-problem vrp vrp-solver

Last synced: 04 Apr 2025

https://github.com/baggepinnen/montecarlomeasurements.jl

Propagation of distributions by Monte-Carlo sampling: Real number types with uncertainty represented by samples.

error-analysis error-propagation gpu-acceleration gpu-computing monte-carlo monte-carlo-sampling monte-carlo-simulation numeric-types particle-filter physical-quantities probability-distributions robust-optimization uncertainties uncertainty-propagation uncertainty-sampling

Last synced: 25 Jan 2026

https://github.com/baggepinnen/MonteCarloMeasurements.jl

Propagation of distributions by Monte-Carlo sampling: Real number types with uncertainty represented by samples.

error-analysis error-propagation gpu-acceleration gpu-computing monte-carlo monte-carlo-sampling monte-carlo-simulation numeric-types particle-filter physical-quantities probability-distributions robust-optimization uncertainties uncertainty-propagation uncertainty-sampling

Last synced: 27 Mar 2025

https://github.com/marian-nmt/marian-dev

Fast Neural Machine Translation in C++ - development repository

cpp11 cuda fast gpu-acceleration neural-machine-translation

Last synced: 15 May 2025

https://github.com/AdrianAntico/RemixAutoML

R package for automation of machine learning, forecasting, model evaluation, and model interpretation

automated-machine-learning automl catboost classification gpu-acceleration h2o lightgbm multiclass-classification panel-data r regression supervised-learning timeseries unsupervised-learning xgboost

Last synced: 14 Mar 2025

https://github.com/AdrianAntico/AutoQuant

R package for automation of machine learning, forecasting, model evaluation, and model interpretation

automated-machine-learning automl catboost classification gpu-acceleration h2o lightgbm multiclass-classification panel-data r regression supervised-learning timeseries unsupervised-learning xgboost

Last synced: 26 Apr 2025

https://github.com/clesperanto/pyclesperanto_prototype

GPU-accelerated bio-image analysis focusing on 3D+t microscopy image data

bioimage-analysis gpu-acceleration microscopy

Last synced: 21 Oct 2025

https://github.com/denosaurs/netsaur

Powerful Powerful Machine Learning library with GPU, CPU and WASM backends

ai artificial-intelligence deep-learning deep-neural-networks deno edge-computing gpu-acceleration gpu-computing hacktoberfest machine-learning ml neural-network rust safetensors serverless typescript wasm webassembly webgpu

Last synced: 04 Apr 2025

https://github.com/audiokit/waveform

GPU accelerated waveform view

audio audio-visualizer gpu-acceleration metal

Last synced: 17 Jun 2025

https://github.com/bh107/bohrium

Automatic parallelization of Python/NumPy, C, and C++ codes on Linux and MacOSX

cuda gpu gpu-acceleration multi-core numpy opencl parallel-computing

Last synced: 21 Oct 2025

https://github.com/clEsperanto/pyclesperanto_prototype

GPU-accelerated bio-image analysis focusing on 3D+t microscopy image data

bioimage-analysis gpu-acceleration microscopy

Last synced: 21 Mar 2025

https://github.com/BasBuller/PySNN

Efficient Spiking Neural Network framework, built on top of PyTorch for GPU acceleration

deep-learning dynamic gpu-acceleration gpu-computing machine-learning neural-networks python3 pytorch spiking-neural-networks stdp

Last synced: 07 May 2025

https://github.com/AudioKit/Waveform

GPU accelerated waveform view

audio audio-visualizer gpu-acceleration metal

Last synced: 16 Jul 2025

https://github.com/emi-group/evomo

EvoMO is a GPU-accelerated library for evolutionary multiobjective optimization (EMO)

evolutionary-algorithms evolutionary-computation gpu-acceleration gpu-computing multiobjective-optimization pytorch

Last synced: 26 Jan 2026

https://github.com/ROCm/Tensile

Stretching GPU performance for GEMMs and tensor contractions.

amd assembly auto-tuning blas dnn gemm gpu gpu-acceleration gpu-computing hip machine-learning matrix-multiplication neural-networks opencl python radeon tensor-contraction tensors

Last synced: 23 Jul 2025

https://github.com/rocm/tensile

Stretching GPU performance for GEMMs and tensor contractions.

amd assembly auto-tuning blas dnn gemm gpu gpu-acceleration gpu-computing hip machine-learning matrix-multiplication neural-networks opencl python radeon tensor-contraction tensors

Last synced: 12 Jun 2026

https://github.com/juliahealth/komamri.jl

Koma is a Pulseq-compatible framework to efficiently simulate Magnetic Resonance Imaging (MRI) acquisitions. The main focus of this package is to simulate general scenarios that could arise in pulse sequence development.

cardiac diffusion diffusion-mri gpu-acceleration mri simulation

Last synced: 29 Apr 2026

https://github.com/ucl-bug/jwave

A JAX-based research framework for differentiable and parallelizable acoustic simulations, on CPU, GPUs and TPUs

acoustics differentiable-simulations gpu gpu-acceleration jax kwave physics-informed-neural-networks scientific-machine-learning simulation tpu-acceleration ultrasound wave-equation

Last synced: 04 Apr 2026

https://github.com/uncomplicate/clojurecuda

Clojure library for CUDA development

clojure clojure-library cuda cuda-development gpu-acceleration gpu-computing high-performance java

Last synced: 16 May 2025

https://github.com/peculiarventures/gammacv

GammaCV is a WebGL accelerated Computer Vision library for browser

computer-vision feature-extraction gpu gpu-acceleration image-analysis image-processing machine-learning machine-vision object-detection opencv webgl

Last synced: 08 Apr 2025

https://github.com/ertis-research/kafka-ml

Kafka-ML: connecting the data stream with ML/AI frameworks (now TensorFlow and PyTorch!)

data-stream deep-learning docker gpu-acceleration iot kafka keras keras-tensorflow kubernetes machine-learning pytorch tensorflow

Last synced: 28 Feb 2026

https://github.com/yzhao062/pytod

TOD: GPU-accelerated Outlier Detection via Tensor Operations

anomaly-detection gpu-acceleration gpu-systems machine-learning outlier-detection unsupervised-learning

Last synced: 23 Oct 2025

https://github.com/eth-cscs/cosma

Distributed Communication-Optimal Matrix-Matrix Multiplication Algorithm

communication-optimal cuda gpu-acceleration linear-algebra matmul matrix-multiplication mpi pdgemm rocm scalapack

Last synced: 02 Mar 2026

https://github.com/merzlab/QUICK

QUICK: A GPU-enabled ab intio quantum chemistry software package

chemistry computational-chemistry cuda density-functional-theory electronic-structure-calculations gpu gpu-acceleration hartree-fock parallel-computing quantum-chemistry

Last synced: 09 Jul 2025

https://github.com/nexusgpu/tensor-fusion

Tensor Fusion is a state-of-the-art GPU virtualization and pooling solution designed to optimize GPU cluster utilization to its fullest potential.

ai amd-gpu autoscaling dynamic-resource-allocation gpu gpu-acceleration gpu-pooling gpu-scheduling gpu-usage gpu-virtualization inference karpenter kubernetes llm-serving nvidia pytorch rcuda remote-gpu vgpu

Last synced: 24 May 2026

https://github.com/ysh329/opencl-101

Learn OpenCL step by step.

gpu-acceleration gpu-programming guides opencl scratch tutorial-code tutorials

Last synced: 13 Apr 2025

https://github.com/matteospanio/torchfx

A GPU accelerated and torch based audio DSP library

audio dsp filters gpu gpu-acceleration torch

Last synced: 26 May 2026

https://github.com/mightycow/Sluggish

Toy CPU and GPU implementations of the Slug rendering algorithm

font gpu-acceleration rendering-algorithm slug

Last synced: 16 Oct 2025

https://github.com/arceryz/raylib-gpu-particles

Raylib 100% GPU particles example in 3D. Uses compute shaders and is fully documented. Millions of particles at 60 fps on a laptop.

c compute-shader example glsl gpu gpu-acceleration gui lorenz-attractor raygui raylib raylib-examples tutorial

Last synced: 11 Apr 2025

https://github.com/tianzerl/pyanime4k

An easy way to use anime4k in python

anime anime4k anime4kcpp gpu-acceleration machine-learning python python3 upscale upscaling video-processing

Last synced: 12 Apr 2025

https://github.com/IntelPython/dpnp

Data Parallel Extension for NumPy

dpcpp gpu gpu-acceleration intel mkl numpy oneapi pstl python3 sycl

Last synced: 01 May 2025

https://github.com/mitulgarg/env-doctor

Debug your GPU, CUDA, and AI stacks across local, Docker, and CI/CD (CLI and MCP server)

compatibility-tool cuda cuda-library cuda-support cuda-toolkit cudnn gpu-acceleration mcp-server nvidia-driver nvidia-gpu nvidia-smi pytorch wsl2

Last synced: 02 Apr 2026

https://github.com/icl-utk-edu/slate

SLATE is a distributed, GPU-accelerated, dense linear algebra library targetting current and upcoming high-performance computing (HPC) systems. It is developed as part of the U.S. Department of Energy Exascale Computing Project (ECP).

gpu-acceleration hpc linear-algebra

Last synced: 29 Dec 2025

https://github.com/mitmath/juliacomputation

Repository for Common Ground C25

automatic-differentiation climate-science gpu-acceleration image-processing julia julia-language machine-learning optimization pluto-notebooks

Last synced: 30 Oct 2025

https://github.com/Heteroflow/Heteroflow

Concurrent CPU-GPU Programming using Task Models

cpu-gpu-scheduling cuda gpu gpu-acceleration gpu-computing gpu-programming heterogeneous-computing heterogeneous-parallel-programming heterogeneous-systems multithreaded multithreading task-parallelism

Last synced: 01 Apr 2025

https://github.com/intelpython/dpnp

Data Parallel Extension for NumPy

dpcpp gpu gpu-acceleration intel mkl numpy oneapi pstl python3 sycl

Last synced: 16 May 2025

https://github.com/kklmn/xrt

Package xrt (XRayTracer) is a python software library for ray tracing and wave propagation in x-ray regime. It is primarily meant for modeling synchrotron sources, beamlines and beamline elements.

beamline crystal-optics gpu-acceleration optics partial-coherence propagation pyopencl python qt-gui ray-tracing synchrotron undulator visualization wave wiggler x-ray

Last synced: 06 Feb 2026

https://github.com/ucbrise/piranha

Piranha: A GPU Platform for Secure Computation

gpu-acceleration multi-party-computation privacy-preserving-machine-learning

Last synced: 22 Jun 2025

https://github.com/ashvardanian/parallelreductionsbenchmark

Thrust, CUB, TBB, AVX2, AVX-512, CUDA, OpenCL, OpenMP, Metal - all it takes to sum a lot of numbers fast!

apple avx512 cuda glsl gpgpu gpu gpu-acceleration gpu-computing hpc intel metal nvidia opencl openmp parallel simd stl tbb thrust

Last synced: 06 Apr 2025

https://github.com/guillaume-chevalier/glove-as-a-tensorflow-embedding-layer

Taking a pretrained GloVe model, and using it as a TensorFlow embedding weight layer **inside the GPU**. Therefore, you only need to send the index of the words through the GPU data transfer bus, reducing data transfer overhead.