Projects in Awesome Lists tagged with rocm
A curated list of projects in awesome lists tagged with rocm .
https://github.com/vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
amd cuda deepseek gpt hpu inference inferentia llama llm llm-serving llmops mlops model-serving pytorch qwen rocm tpu trainium transformer xpu
Last synced: 29 Jan 2026
https://github.com/apache/tvm
Open Machine Learning Compiler Framework
compiler deep-learning gpu javascript machine-learning metal opencl performance rocm spirv tensor tvm vulkan
Last synced: 17 Mar 2026
https://github.com/tracel-ai/burn
Burn is a next generation Deep Learning Framework that doesn't compromise on flexibility, efficiency and portability.
autodiff cross-platform cuda deep-learning kernel-fusion machine-learning metal ndarray neural-network onnx pytorch rocm rust scientific-computing tensor vulkan wasm webgpu
Last synced: 07 May 2026
https://github.com/gpustack/gpustack
A GPU cluster manager that configures and orchestrates inference engines like vLLM and SGLang for high-performance AI model deployment.
ascend cuda deepseek distributed-inference genai high-performance-inference inference llama llm llm-inference llm-serving maas mindie openai qwen rocm sglang vllm
Last synced: 20 Apr 2026
https://github.com/lemonade-sdk/lemonade
Lemonade helps users discover and run local AI apps by serving optimized LLMs right from their own GPUs and NPUs. Join our discord: https://discord.gg/5xXzkMu8Zk
ai amd genai gpu llama llm llm-inference local-server mcp mcp-server mistral npu onnxruntime openai-api qwen radeon rocm ryzen vulkan
Last synced: 02 Apr 2026
https://github.com/dmlc/nnvm
computation-graph cuda deep-learning deployment metal nnvm opencl optimization rocm tvm
Last synced: 04 May 2025
https://github.com/deepmodeling/deepmd-kit
A deep learning package for many-body potential energy representation and molecular dynamics
ase c computational-chemistry cpp cuda deep-learning deepmd ipi jax lammps materials-science molecular-dynamics nodejs paddle potential-energy python pytorch rocm tensorflow
Last synced: 13 May 2025
https://github.com/aphrodite-engine/aphrodite-engine
Large-scale LLM inference engine
api-rest cuda inference-engine inferentia intel lora machine-learning rocm speculative-decoding tpu
Last synced: 14 May 2025
https://github.com/stotko/stdgpu
stdgpu: Efficient STL-like Data Structures on the GPU
cpp cpp17 cpp20 cuda data-structures gpgpu gpu gpu-acceleration gpu-computing hip modern-cpp openmp rocm stl stl-containers stl-like
Last synced: 14 May 2025
https://github.com/ROCm/ROCm-docker
Dockerfiles for the various software layers defined in the ROCm software platform
Last synced: 03 Apr 2025
https://github.com/rocm/rocm-docker
Dockerfiles for the various software layers defined in the ROCm software platform
Last synced: 04 Apr 2025
https://github.com/rocm/rocblas
[DEPRECATED] Moved to ROCm/rocm-libraries repo
Last synced: 02 Apr 2026
https://github.com/alpaka-group/alpaka
Abstraction Library for Parallel Kernel Acceleration :llama:
cpp cpp17 cuda gpu header-only heterogeneous-parallel-programming hip hpc openacc openmp rocm tbb
Last synced: 15 May 2025
https://github.com/agenium-scale/nsimd
Agenium Scale vectorization library for CPUs and GPUs
aarch64 avx avx2 avx512 cpp20 cpp20-library cuda hpc neon neon128 rocm simd simd-instructions simd-library simd-programming sse2 sse42 sve vectorization-library
Last synced: 09 Apr 2025
https://github.com/MFlowCode/MFC
Exascale multiphase flow solver — 2025 Gordon Bell Prize Finalist | 200T grid points on 43K+ GPUs
amd-gpu cfd computational-fluid-dynamics cuda exascale fluid-dynamics fortran gpu gpu-computing hpc mpi multiphase nvidia-gpu openacc openmp parallel-computing physics-simulation rocm scientific-computing simulation
Last synced: 01 Mar 2026
https://github.com/rocm/k8s-device-plugin
Kubernetes (k8s) device plugin to enable registration of AMD GPU to a container cluster
k8s kubernetes kubernetes-device-plugins rocm
Last synced: 18 May 2026
https://github.com/QMCPACK/qmcpack
Main repository for QMCPACK, an open-source production level many-body ab initio Quantum Monte Carlo code for computing the electronic structure of atoms, molecules, and solids with full performance portable GPU support
c-plus-plus cuda electronic-structure gpu high-performance-computing hpc mpi oneapi quantum-chemistry quantum-monte-carlo rocm
Last synced: 26 Mar 2025
https://github.com/juliagpu/amdgpu.jl
AMD GPU (ROCm) programming in Julia
amdgpu gpu gpu-programming julia rocm
Last synced: 12 Jan 2026
https://github.com/rocm/aomp
AOMP is an open source Clang/LLVM based compiler with added support for the OpenMP® API on Radeon™ GPUs. Use this repository for releases, issues, documentation, packaging, and examples.
amd clang fortran-compiler llvm openmp rocm
Last synced: 16 May 2025
https://github.com/llnl/hiop
HPC solver for nonlinear optimization problems
acopf bfgs constrained-optimization cuda gpu-support hpc interior-point-method interior-point-optimizer math-physics mpi nonlinear-optimization nonlinear-programming nonlinear-programming-algorithms nonsmooth-optimization optimization parallel-programming quasi-newton radiuss rocm solver
Last synced: 16 May 2025
https://github.com/supranational/sppark
Zero-knowledge template library
bls12-377 bls12-381 cuda ntt pasta-curves rocm zero-knowledge zero-knowledge-proofs zk-snarks zk-starks
Last synced: 12 Apr 2025
https://github.com/rocm/mivisionx
MIVisionX toolkit is a set of comprehensive computer vision and machine intelligence libraries, utilities, and applications bundled into a single toolkit. AMD MIVisionX also delivers a highly optimized open-source implementation of the Khronos OpenVX™ and OpenVX™ Extensions.
amd-opencl amd-opencv amd-openvx computer-vision inference inference-engine khronos-openvx machine-learning neural-network nnef onnx opencl openvx openvx-extensions openvx-neural-network rocm ryzen virtual-reality windows-machine-learning winml
Last synced: 16 May 2025
https://github.com/ROCm/MIVisionX
MIVisionX toolkit is a set of comprehensive computer vision and machine intelligence libraries, utilities, and applications bundled into a single toolkit. AMD MIVisionX also delivers a highly optimized open-source implementation of the Khronos OpenVX™ and OpenVX™ Extensions.
amd-opencl amd-opencv amd-openvx computer-vision inference inference-engine khronos-openvx machine-learning neural-network nnef onnx opencl openvx openvx-extensions openvx-neural-network rocm ryzen virtual-reality windows-machine-learning winml
Last synced: 18 Jul 2025
https://github.com/eth-cscs/cosma
Distributed Communication-Optimal Matrix-Matrix Multiplication Algorithm
communication-optimal cuda gpu-acceleration linear-algebra matmul matrix-multiplication mpi pdgemm rocm scalapack
Last synced: 02 Mar 2026
https://github.com/rocm/gpufort
GPUFORT: S2S translation tool for CUDA Fortran and Fortran+X in the spirit of hipify
cuda cuda-fortran fortran gpgpu gpu hip interoperability openacc openmp rocm
Last synced: 21 Jun 2025
https://github.com/ROCm/gpufort
GPUFORT: S2S translation tool for CUDA Fortran and Fortran+X in the spirit of hipify
cuda cuda-fortran fortran gpgpu gpu hip interoperability openacc openmp rocm
Last synced: 11 Mar 2025
https://github.com/rocm/hipblas
[DEPRECATED] Moved to ROCm/rocm-libraries repo
Last synced: 02 Apr 2026
https://github.com/electronic-structure/SIRIUS
Domain specific library for electronic structure calculations
cuda density-functional-theory electronic-structure-calculations full-potential gpu lapw mpi planewave pseudopotential rocm
Last synced: 09 Jul 2025
https://github.com/rocm/rocsolver
[DEPRECATED] Moved to ROCm/rocm-libraries repo
Last synced: 02 Apr 2026
https://github.com/rocm/hipblaslt
[DEPRECATED] Moved to ROCm/rocm-libraries repo
amd assembly blas gemm gpu-computing hip machine-learning matrix-multiplication rocm
Last synced: 05 May 2026
https://github.com/juliagpu/acceleratedkernels.jl
Cross-architecture parallel algorithms for Julia's CPU and GPU backends. Targets multithreaded CPUs, and GPUs via Intel oneAPI, AMD ROCm, Apple Metal, Nvidia CUDA.
amd apple cuda gpu intel julia metal nvidia oneapi parallel rocm standard-library
Last synced: 04 Apr 2025
https://github.com/pennylaneai/pennylane-lightning
The Lightning plugin ecosystem provides fast quantum state-vector and tensor network simulators written in C++ for use with PennyLane.
cuda distributed-computing gpu hpc mpi openmp parallel quantum-computing quantum-machine-learning rocm
Last synced: 15 May 2025
https://github.com/PennyLaneAI/pennylane-lightning
The Lightning plugin ecosystem provides fast quantum state-vector and tensor network simulators written in C++ for use with PennyLane.
cuda distributed-computing gpu hpc mpi openmp parallel quantum-computing quantum-machine-learning rocm
Last synced: 11 May 2025
https://github.com/JuliaGPU/AcceleratedKernels.jl
Cross-architecture parallel algorithms for Julia's CPU and GPU backends. Targets multithreaded CPUs, and GPUs via Intel oneAPI, AMD ROCm, Apple Metal, Nvidia CUDA.
amd apple cuda gpu intel julia metal nvidia oneapi parallel rocm standard-library
Last synced: 17 Mar 2025
https://github.com/Grench6/RX580-rocM-tensorflow-ubuntu20.4-guide
Install guide of ROCm and Tensorflow on Ubuntu for the RX580
Last synced: 19 Jul 2025
https://github.com/sukhmeetbawa/opencl-amd-fedora
AMD OpenCL userspace drivers for Fedora. Currently not working for fedora 37
amd fedora-workstation linux opencl rocm
Last synced: 04 Oct 2025
https://github.com/gpuopen-tools/radeon_compute_profiler
The Radeon Compute Profiler (RCP) is a performance analysis tool that gathers data from the API run-time and GPU for OpenCL™ and ROCm/HSA applications. This information can be used by developers to discover bottlenecks in the application and to find ways to optimize the application's performance.
Last synced: 30 Apr 2025
https://github.com/GPUOpen-Tools/radeon_compute_profiler
The Radeon Compute Profiler (RCP) is a performance analysis tool that gathers data from the API run-time and GPU for OpenCL™ and ROCm/HSA applications. This information can be used by developers to discover bottlenecks in the application and to find ways to optimize the application's performance.
Last synced: 08 May 2025
https://github.com/pika-org/pika
pika is a C++ tasking library built on std::execution with fibers, CUDA, HIP, and MPI support.
concurrency cplusplus cpp cuda gpu hip mpi p2300 parallelism rocm stdexec
Last synced: 30 Jan 2026
https://github.com/srohit0/trafficvision
MIVisionX toolkit is a comprehensive computer vision and machine intelligence libraries, utilities and applications bundled into a single toolkit.
amd-gpu amd-modules amd-opencl amd-openvx artificial-intelligence artificial-neural-networks convolutional-neural-networks machine-intelligence machine-learning mivision mivisionx object-detection opencl openvx openvx-nn-extension rocm tiny-yolo tiny-yolo-network yolov2
Last synced: 10 Mar 2026
https://github.com/rocm/rpp
AMD ROCm Performance Primitives (RPP) library is a comprehensive high-performance computer vision library for AMD processors with HIP/OpenCL/CPU back-ends.
agumentation amd bitwise channel-extract computer-vision contrast cpu gpu hip histogram hpc mivisionx opencl openvx radeon-performance-primitives rocm rpp warp-affine
Last synced: 11 Apr 2025
https://github.com/ROCm/rpp
AMD ROCm Performance Primitives (RPP) library is a comprehensive high-performance computer vision library for AMD processors with HIP/OpenCL/CPU back-ends.
agumentation amd bitwise channel-extract computer-vision contrast cpu gpu hip histogram hpc mivisionx opencl openvx radeon-performance-primitives rocm rpp warp-affine
Last synced: 14 Mar 2025
https://github.com/eth-cscs/spfft
Sparse 3D FFT library with MPI, OpenMP, CUDA and ROCm support
cuda fft fft-library gpu-acceleration hpc mpi rocm
Last synced: 17 Jun 2025
https://quokka-astro.github.io/quokka/
Two-moment AMR radiation hydrodynamics (with self-gravity, particles, and chemistry) on CPUs/GPUs for astrophysics
adaptive-mesh-refinement astrochemistry astrophysics cuda gpu hip hydrodynamics particles rocm self-gravity
Last synced: 09 Mar 2025
https://github.com/evshiron/rocm_lab
gfx1100 rocm tensorflow torch torchaudio torchvision
Last synced: 17 Feb 2026
https://github.com/beinsezii/comfyui-amd-go-fast
Simple monkeypatch to boost AMD Navi 3 GPUs
Last synced: 20 Jun 2025
https://github.com/stampby/halo-ai-core
Bare-metal AI platform for AMD Strix Halo. One script. Everything works. Lego blocks — snap in what you need.
agent-framework ai amd arch-linux bare-metal caddy gaia gpu inference lemonade llama-cpp local-ai privacy rocm ryzen-ai self-hosted strix-halo systemd
Last synced: 18 Apr 2026
https://github.com/wdmapp/gtensor
GTensor is a multi-dimensional array C++14 header-only library for hybrid GPU development.
cpp cpp14 cuda gpu hacktoberfest rocm sycl
Last synced: 04 Apr 2025
https://github.com/geramy/odinlink-five
A high-performance RCCL / NCCL (ROCm Communication Collectives Library) plugin for Thunderbolt 5 that enables GPU-to-GPU communication across Thunderbolt connections with RDMA support.
ai amd apple driver high-performance-computing hip linux-kernel nvidia rocm thunderbolt
Last synced: 23 May 2026
https://github.com/okuvshynov/cubestat
Horizon chart for CPU/GPU/Neural Engine utilization monitoring. Supports Apple M1-M4, Nvidia GPUs, AMD GPUs
apple-silicon command-line-tool gpu horizon monitoring neural-engine nvidia-gpu rocm
Last synced: 11 Mar 2026
https://github.com/eth-cscs/spla
Specialized Parallel Linear Algebra, providing distributed GEMM functionality for specific matrix distributions with optional GPU acceleration.
cuda gemm linear-algebra mpi rocm
Last synced: 14 Apr 2025
https://github.com/bluescarni/rakau
C++17 N-body Barnes-Hut on heterogeneous hardware architectures
astronomy astrophyics astrophysical-simulation avx avx2 avx512 cpp17 cuda n-body n-body-simulator nbody nbody-gravity-simulation nbody-problem nbody-sim nbody-simulation rocm simd vectorization
Last synced: 04 Jul 2025
https://github.com/ptsolvers/chmy.jl
Finite differences and staggered grids on CPUs and GPUs
cuda gpu julialang metal mpi parallel rocm staggeredgrid stencil
Last synced: 23 Apr 2025
https://github.com/vokegpu/bicudo
Separation Axis Theorem (SAT) physics engine library accelerated via GPGPU API (ROCm/OpenCL/CUDA) / or CPU-side
opengl opengl4 physics physics-2d physics-simulation rocm rocm-kernel sat sdl separation-axis-theorem
Last synced: 10 Apr 2025
https://github.com/shivaraj-bh/ollama-flake
Run ollama natively - powered by Nix
cuda flakes nix ollama open-webui rocm services
Last synced: 01 May 2025
https://github.com/pccr10001/comfyui-gfx1151-fa
ComfyUI with Flash Attention for AI+ MAX 395 (gfx1151)
amd comfyui flash-attention gfx1151 pytorch rocm
Last synced: 11 Mar 2026
https://github.com/yalue/cudabrot
A CUDA renderer for the Buddhabrot fractal
amd buddhabrot buddhabrot-fractal cuda gpu hip mandelbrot mandelbrot-fractal rocm
Last synced: 07 May 2025
https://github.com/landslidesim/materialpointsolver.jl
🧮 High-performance Material Point Method (MPM) Solver in Julia.
backend-agnostic cluster cuda hpc material-point-method metal mpm oneapi parallel-computing rocm
Last synced: 12 Apr 2025
https://github.com/ai-dock/python
Python docker images for use in GPU cloud and local environments. Includes AI-Dock base for authentication and improved user experience.
ai cuda docker machine-learning python rocm runpod vast
Last synced: 28 Aug 2025
https://github.com/ulyssesrr/docker-rocm-xtra
ROCm docker images with fixes/support for extra architectures, such as gfx803/gfx1010.
docker gfx1010 gfx803 pytorch rocm stable-diffusion-webui
Last synced: 26 Feb 2025
https://github.com/pennylaneai/lightning-on-hpc
"Hybrid quantum programming with PennyLane Lightning on HPC platforms" accompanying data and workloads
cpp20 cuda gpu hpc mpi openmp python quantum quantum-computing rocm supercomputing
Last synced: 10 Jun 2025
https://github.com/hec-ovi/vllm-qwen
vLLM + Qwen3.6-27B (BF16) OpenAI-compatible inference server on AMD Strix Halo (Ryzen AI Max+ 395, gfx1151). Vision input, 256K context, /v1/responses with separated reasoning, via TheRock ROCm.
amd docker gfx1151 inference-server llm-serving local-llm multimodal-llm openai-compatible qwen qwen3 rocm ryzen-ai self-hosted strix-halo vllm
Last synced: 01 May 2026
https://github.com/neoblizz/hip_template
🖤 Template for starting HIP/C++ project using CMake with Github Action for CI.
cpp cuda cuda-programming gpgpu gpu hip rocm template-project template-repository
Last synced: 26 Mar 2025
https://github.com/alpaka-group/bactria
Broadly Applicable C++ Tracing and Instrumentation API :camel:
cuda hardware-counters instrumentation-api metrics rocm tracing-events
Last synced: 21 Apr 2025
https://github.com/mrowan137/stable-diffusion-v1-5-radeon-pro-vii
Notes for Stable Diffusion v1.5 setup on a Radeon Pro VII (AMD GPU).
amdgpu pytorch-rocm radeon-pro-vii rocm stable-diffusion ubuntu
Last synced: 17 Feb 2026
https://github.com/microsoft/hat
TOML-annotated C header file format for packaging binary files, from Microsoft Research
benchmarking cpp cprogramming cuda metadata platform-independent python-library rocm toml
Last synced: 10 Apr 2025
https://github.com/nikelborm/amd-amdgpu-rocm-ollama-gfx90c-ati-radeon-vega-ryzen7-5800h-arch-linux
Run Ollama on AMD Ryzen 7 5800H CPU with integrated GPU AMD ATI Radeon Vega (gfx90c) with optimizations
amd amd-gpu amdgpu archlinux avx2 bash bash-scripting cuda linux llama llama3 llm ollama oneapi radeon rocm ssse3 vega
Last synced: 30 Apr 2025
https://github.com/ROCm/hipMM
HIP Memory Manager (ROCm-DS)
amd cuda gpu hip memory-management radeon-instinct-mi-series rocm
Last synced: 12 Apr 2025
https://github.com/rocm/hipmm
HIP Memory Manager (ROCm-DS)
amd cuda gpu hip memory-management radeon-instinct-mi-series rocm
Last synced: 12 Apr 2025
https://github.com/amd-agi/gpt-fast
The GPT-Fast for Multimodal Models on AMD GPUs
amd gptfast inference llama llava multimodal multimodal-large-language-models qwen rocm
Last synced: 10 Sep 2025
https://github.com/eliranwong/multiamdgpu_aidev_ubuntu
Multi AMD GPU Setup for AI Development on Ubuntu with ROCM
ai amd amd-gpu amdgpu freegenius gpu rocm ubuntu
Last synced: 08 Apr 2025
https://github.com/arminms/p2rng
A modern header-only C++ library for parallel algorithmic (pseudo) random number generation supporting OpenMP, CUDA, ROCm and oneAPI
cpp cuda cxx gpu header-only library linux macos multiplatorm oneapi openmp parallel pcg-random prng pseudorandom-number-generator random-number-distributions random-number-generation rocm stl-algorithms windows
Last synced: 04 Apr 2025
https://github.com/dereklstinson/hip
go bindings for hip
amdgpu bindings go gpu-acceleration gpu-computing gpu-programming hip rocm
Last synced: 18 Mar 2025
https://github.com/guilt/rocm-programming-masterclass
Udemy's CUDA programming Masterclass with Examples in ROCM/HIP.
cuda easy hip learning-by-doing masterclass rocm
Last synced: 04 Aug 2025
https://github.com/kiritigowda/mivisionx-setup
This project has scripts to set up, build and test installation of AMD ROCm MIVisionX
amd amd-gpu amdovx-modules gpu loom machine-learning mivision-setup mivisionx mivisionx-profile mivisionx-setup nnef opencl opencv openvx openvx-neural-net openvx-nn-extension radeon radeon-mivisionx rocm
Last synced: 11 Apr 2025
https://github.com/rocm/rocmds-cmake
This is a collection of CMake modules that are useful for all ROCm-DS projects. By sharing the code in a single place it makes rolling out CMake fixes easier.
amd cmake cuda hip radeon-instinct-mi-series rocm
Last synced: 10 Apr 2025
https://github.com/stargate01/aidungeon2-docker-rocm
Runs an AIDungeon2 fork in Docker on AMD ROCm hardware.
aidungeon2 amd docker gpt-2 pytorch rocm
Last synced: 09 May 2026
https://github.com/benjaminhottell/nix-rocm-pytorch
Compiles pytorch with rocm on nixos
Last synced: 08 Sep 2025
https://github.com/video-analysis-opensource/docker_hub_sync
同步AI开发常用的docker镜像到阿里云镜像仓库,便于在国内快速拉取镜像。如:pytorch
centos docker-image openjdk pytorch rocm tensorflow torch ubuntu
Last synced: 05 Jan 2026
https://github.com/kiritigowda/mivisionx-inference-analyzer
MIVisionX Python Inference Analyzer uses pre-trained ONNX/NNEF/Caffe models to analyze inference results and summarize individual image results
amd amdgpu caffe docker-images inceptionv4 inference inference-engine inference-optimization mivisionx mivisionx-inference-analyzer nnef nnir onnx opencl openvx resnet resnet-50 rocm squeezenet vgg
Last synced: 11 Apr 2025