An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with rocm

A curated list of projects in awesome lists tagged with rocm .

https://github.com/vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

amd cuda deepseek gpt hpu inference inferentia llama llm llm-serving llmops mlops model-serving pytorch qwen rocm tpu trainium transformer xpu

Last synced: 29 Jan 2026

https://github.com/tracel-ai/burn

Burn is a next generation Deep Learning Framework that doesn't compromise on flexibility, efficiency and portability.

autodiff cross-platform cuda deep-learning kernel-fusion machine-learning metal ndarray neural-network onnx pytorch rocm rust scientific-computing tensor vulkan wasm webgpu

Last synced: 07 May 2026

https://github.com/lmcache/lmcache

Supercharge Your LLM with the Fastest KV Cache Layer

amd cuda fast inference kv-cache llm pytorch rocm speed vllm

Last synced: 30 May 2026

https://github.com/gpustack/gpustack

A GPU cluster manager that configures and orchestrates inference engines like vLLM and SGLang for high-performance AI model deployment.

ascend cuda deepseek distributed-inference genai high-performance-inference inference llama llm llm-inference llm-serving maas mindie openai qwen rocm sglang vllm

Last synced: 20 Apr 2026

https://github.com/lemonade-sdk/lemonade

Lemonade helps users discover and run local AI apps by serving optimized LLMs right from their own GPUs and NPUs. Join our discord: https://discord.gg/5xXzkMu8Zk

ai amd genai gpu llama llm llm-inference local-server mcp mcp-server mistral npu onnxruntime openai-api qwen radeon rocm ryzen vulkan

Last synced: 02 Apr 2026

https://github.com/deepmodeling/deepmd-kit

A deep learning package for many-body potential energy representation and molecular dynamics

ase c computational-chemistry cpp cuda deep-learning deepmd ipi jax lammps materials-science molecular-dynamics nodejs paddle potential-energy python pytorch rocm tensorflow

Last synced: 13 May 2025

https://github.com/ROCm/ROCm-docker

Dockerfiles for the various software layers defined in the ROCm software platform

docker rocm

Last synced: 03 Apr 2025

https://github.com/rocm/rocm-docker

Dockerfiles for the various software layers defined in the ROCm software platform

docker rocm

Last synced: 04 Apr 2025

https://github.com/rocm/rocblas

[DEPRECATED] Moved to ROCm/rocm-libraries repo

blas hip rocm

Last synced: 02 Apr 2026

https://github.com/alpaka-group/alpaka

Abstraction Library for Parallel Kernel Acceleration :llama:

cpp cpp17 cuda gpu header-only heterogeneous-parallel-programming hip hpc openacc openmp rocm tbb

Last synced: 15 May 2025

https://github.com/rocm/k8s-device-plugin

Kubernetes (k8s) device plugin to enable registration of AMD GPU to a container cluster

k8s kubernetes kubernetes-device-plugins rocm

Last synced: 18 May 2026

https://github.com/QMCPACK/qmcpack

Main repository for QMCPACK, an open-source production level many-body ab initio Quantum Monte Carlo code for computing the electronic structure of atoms, molecules, and solids with full performance portable GPU support

c-plus-plus cuda electronic-structure gpu high-performance-computing hpc mpi oneapi quantum-chemistry quantum-monte-carlo rocm

Last synced: 26 Mar 2025

https://github.com/juliagpu/amdgpu.jl

AMD GPU (ROCm) programming in Julia

amdgpu gpu gpu-programming julia rocm

Last synced: 12 Jan 2026

https://github.com/JuliaGPU/AMDGPU.jl

AMD GPU (ROCm) programming in Julia

amdgpu julia rocm

Last synced: 14 May 2025

https://github.com/rocm/aomp

AOMP is an open source Clang/LLVM based compiler with added support for the OpenMP® API on Radeon™ GPUs. Use this repository for releases, issues, documentation, packaging, and examples.

amd clang fortran-compiler llvm openmp rocm

Last synced: 16 May 2025

https://github.com/rocm/rocfft

[DEPRECATED] Moved to ROCm/rocm-libraries repo

amd fast fft fourier gpu hip rocm transform

Last synced: 02 Apr 2026

https://github.com/rocm/mivisionx

MIVisionX toolkit is a set of comprehensive computer vision and machine intelligence libraries, utilities, and applications bundled into a single toolkit. AMD MIVisionX also delivers a highly optimized open-source implementation of the Khronos OpenVX™ and OpenVX™ Extensions.

amd-opencl amd-opencv amd-openvx computer-vision inference inference-engine khronos-openvx machine-learning neural-network nnef onnx opencl openvx openvx-extensions openvx-neural-network rocm ryzen virtual-reality windows-machine-learning winml

Last synced: 16 May 2025

https://github.com/ROCm/MIVisionX

MIVisionX toolkit is a set of comprehensive computer vision and machine intelligence libraries, utilities, and applications bundled into a single toolkit. AMD MIVisionX also delivers a highly optimized open-source implementation of the Khronos OpenVX™ and OpenVX™ Extensions.

amd-opencl amd-opencv amd-openvx computer-vision inference inference-engine khronos-openvx machine-learning neural-network nnef onnx opencl openvx openvx-extensions openvx-neural-network rocm ryzen virtual-reality windows-machine-learning winml

Last synced: 18 Jul 2025

https://github.com/eth-cscs/cosma

Distributed Communication-Optimal Matrix-Matrix Multiplication Algorithm

communication-optimal cuda gpu-acceleration linear-algebra matmul matrix-multiplication mpi pdgemm rocm scalapack

Last synced: 02 Mar 2026

https://github.com/rocm/rocprim

ROCm Parallel Primitives

amd cuda gpu hip parallel primitive rocm

Last synced: 02 Apr 2026

https://github.com/rocm/gpufort

GPUFORT: S2S translation tool for CUDA Fortran and Fortran+X in the spirit of hipify

cuda cuda-fortran fortran gpgpu gpu hip interoperability openacc openmp rocm

Last synced: 21 Jun 2025

https://github.com/ROCm/gpufort

GPUFORT: S2S translation tool for CUDA Fortran and Fortran+X in the spirit of hipify

cuda cuda-fortran fortran gpgpu gpu hip interoperability openacc openmp rocm

Last synced: 11 Mar 2025

https://github.com/rocm/hipblas

[DEPRECATED] Moved to ROCm/rocm-libraries repo

blas cuda hip rocm

Last synced: 02 Apr 2026

https://github.com/ROCm/hipBLAS

ROCm BLAS marshalling library

blas cuda hip rocm

Last synced: 23 Jul 2025

https://github.com/rocm/rocrand

RAND library for HIP programming language

cuda gpu hip random rng rocm

Last synced: 02 Apr 2026

https://github.com/rocm/rocsolver

[DEPRECATED] Moved to ROCm/rocm-libraries repo

lapack linear-algebra rocm

Last synced: 02 Apr 2026

https://github.com/rocm/hipblaslt

[DEPRECATED] Moved to ROCm/rocm-libraries repo

amd assembly blas gemm gpu-computing hip machine-learning matrix-multiplication rocm

Last synced: 05 May 2026

https://github.com/juliagpu/acceleratedkernels.jl

Cross-architecture parallel algorithms for Julia's CPU and GPU backends. Targets multithreaded CPUs, and GPUs via Intel oneAPI, AMD ROCm, Apple Metal, Nvidia CUDA.

amd apple cuda gpu intel julia metal nvidia oneapi parallel rocm standard-library

Last synced: 04 Apr 2025

https://github.com/pennylaneai/pennylane-lightning

The Lightning plugin ecosystem provides fast quantum state-vector and tensor network simulators written in C++ for use with PennyLane.

cuda distributed-computing gpu hpc mpi openmp parallel quantum-computing quantum-machine-learning rocm

Last synced: 15 May 2025

https://github.com/PennyLaneAI/pennylane-lightning

The Lightning plugin ecosystem provides fast quantum state-vector and tensor network simulators written in C++ for use with PennyLane.

cuda distributed-computing gpu hpc mpi openmp parallel quantum-computing quantum-machine-learning rocm

Last synced: 11 May 2025

https://github.com/ROCm/rocRAND

RAND library for HIP programming language

cuda gpu hip random rng rocm

Last synced: 18 Aug 2025

https://github.com/JuliaGPU/AcceleratedKernels.jl

Cross-architecture parallel algorithms for Julia's CPU and GPU backends. Targets multithreaded CPUs, and GPUs via Intel oneAPI, AMD ROCm, Apple Metal, Nvidia CUDA.

amd apple cuda gpu intel julia metal nvidia oneapi parallel rocm standard-library

Last synced: 17 Mar 2025

https://github.com/Grench6/RX580-rocM-tensorflow-ubuntu20.4-guide

Install guide of ROCm and Tensorflow on Ubuntu for the RX580

rocm tensorflow-rocm

Last synced: 19 Jul 2025

https://github.com/sukhmeetbawa/opencl-amd-fedora

AMD OpenCL userspace drivers for Fedora. Currently not working for fedora 37

amd fedora-workstation linux opencl rocm

Last synced: 04 Oct 2025

https://github.com/gpuopen-tools/radeon_compute_profiler

The Radeon Compute Profiler (RCP) is a performance analysis tool that gathers data from the API run-time and GPU for OpenCL™ and ROCm/HSA applications. This information can be used by developers to discover bottlenecks in the application and to find ways to optimize the application's performance.

opencl profiler rocm

Last synced: 30 Apr 2025

https://github.com/GPUOpen-Tools/radeon_compute_profiler

The Radeon Compute Profiler (RCP) is a performance analysis tool that gathers data from the API run-time and GPU for OpenCL™ and ROCm/HSA applications. This information can be used by developers to discover bottlenecks in the application and to find ways to optimize the application's performance.

opencl profiler rocm

Last synced: 08 May 2025

https://github.com/pika-org/pika

pika is a C++ tasking library built on std::execution with fibers, CUDA, HIP, and MPI support.

concurrency cplusplus cpp cuda gpu hip mpi p2300 parallelism rocm stdexec

Last synced: 30 Jan 2026

https://github.com/rocm/hipfort

Fortran interfaces for ROCm libraries

blas cuda fft fortran gpgpu gpu hip interoperability random rocm solver sparse

Last synced: 05 Apr 2025

https://github.com/rocm/rpp

AMD ROCm Performance Primitives (RPP) library is a comprehensive high-performance computer vision library for AMD processors with HIP/OpenCL/CPU back-ends.

agumentation amd bitwise channel-extract computer-vision contrast cpu gpu hip histogram hpc mivisionx opencl openvx radeon-performance-primitives rocm rpp warp-affine

Last synced: 11 Apr 2025

https://github.com/ROCm/rpp

AMD ROCm Performance Primitives (RPP) library is a comprehensive high-performance computer vision library for AMD processors with HIP/OpenCL/CPU back-ends.

agumentation amd bitwise channel-extract computer-vision contrast cpu gpu hip histogram hpc mivisionx opencl openvx radeon-performance-primitives rocm rpp warp-affine

Last synced: 14 Mar 2025

https://github.com/eth-cscs/spfft

Sparse 3D FFT library with MPI, OpenMP, CUDA and ROCm support

cuda fft fft-library gpu-acceleration hpc mpi rocm

Last synced: 17 Jun 2025

https://quokka-astro.github.io/quokka/

Two-moment AMR radiation hydrodynamics (with self-gravity, particles, and chemistry) on CPUs/GPUs for astrophysics

adaptive-mesh-refinement astrochemistry astrophysics cuda gpu hip hydrodynamics particles rocm self-gravity

Last synced: 09 Mar 2025

https://github.com/beinsezii/comfyui-amd-go-fast

Simple monkeypatch to boost AMD Navi 3 GPUs

amd comfyui rocm

Last synced: 20 Jun 2025

https://github.com/stampby/halo-ai-core

Bare-metal AI platform for AMD Strix Halo. One script. Everything works. Lego blocks — snap in what you need.

agent-framework ai amd arch-linux bare-metal caddy gaia gpu inference lemonade llama-cpp local-ai privacy rocm ryzen-ai self-hosted strix-halo systemd

Last synced: 18 Apr 2026

https://github.com/wdmapp/gtensor

GTensor is a multi-dimensional array C++14 header-only library for hybrid GPU development.

cpp cpp14 cuda gpu hacktoberfest rocm sycl

Last synced: 04 Apr 2025

https://github.com/eth-cscs/tiled-mm

Matrix multiplication on GPUs for matrices stored on a CPU. Similar to cublasXt, but ported to both NVIDIA and AMD GPUs.

amd cublas cublasxt cuda gpu matmul matrix-multiplication nvidia rocblas rocblasxt rocm

Last synced: 19 Jul 2025

https://github.com/geramy/odinlink-five

A high-performance RCCL / NCCL (ROCm Communication Collectives Library) plugin for Thunderbolt 5 that enables GPU-to-GPU communication across Thunderbolt connections with RDMA support.

ai amd apple driver high-performance-computing hip linux-kernel nvidia rocm thunderbolt

Last synced: 23 May 2026

https://github.com/okuvshynov/cubestat

Horizon chart for CPU/GPU/Neural Engine utilization monitoring. Supports Apple M1-M4, Nvidia GPUs, AMD GPUs

apple-silicon command-line-tool gpu horizon monitoring neural-engine nvidia-gpu rocm

Last synced: 11 Mar 2026

https://github.com/eth-cscs/spla

Specialized Parallel Linear Algebra, providing distributed GEMM functionality for specific matrix distributions with optional GPU acceleration.

cuda gemm linear-algebra mpi rocm

Last synced: 14 Apr 2025

https://github.com/ptsolvers/chmy.jl

Finite differences and staggered grids on CPUs and GPUs

cuda gpu julialang metal mpi parallel rocm staggeredgrid stencil

Last synced: 23 Apr 2025

https://github.com/tracel-ai/cubek

CubeK: high-performance multi-platform kernels in CubeCL

cuda gpu hpc rocm vulkan

Last synced: 13 Jan 2026

https://github.com/vokegpu/bicudo

Separation Axis Theorem (SAT) physics engine library accelerated via GPGPU API (ROCm/OpenCL/CUDA) / or CPU-side

opengl opengl4 physics physics-2d physics-simulation rocm rocm-kernel sat sdl separation-axis-theorem

Last synced: 10 Apr 2025

https://github.com/tracel-ai/cubecl-hip-sys

Rust system bindings for AMD ROCm HIP runtime used by CubeCL

bindings hip rocm runtime rust

Last synced: 17 Feb 2026

https://github.com/shivaraj-bh/ollama-flake

Run ollama natively - powered by Nix

cuda flakes nix ollama open-webui rocm services

Last synced: 01 May 2025

https://github.com/pccr10001/comfyui-gfx1151-fa

ComfyUI with Flash Attention for AI+ MAX 395 (gfx1151)

amd comfyui flash-attention gfx1151 pytorch rocm

Last synced: 11 Mar 2026

https://github.com/ai-dock/pytorch

PyTorch docker images for use in GPU cloud and local environments. Includes AI-Dock base for authentication and improved user experience.

ai cuda docker jupyter machine-learning python pytorch rocm runpod syncthing vast

Last synced: 09 May 2025

https://github.com/yalue/cudabrot

A CUDA renderer for the Buddhabrot fractal

amd buddhabrot buddhabrot-fractal cuda gpu hip mandelbrot mandelbrot-fractal rocm

Last synced: 07 May 2025

https://github.com/psygreg/rocm-ubuntu

Automated installation for ROCm and OpenCL for RDNA 2/3 cards on Ubuntu.

amdgpu mesa3d opencl rocm ubuntu

Last synced: 23 Jul 2025

https://github.com/landslidesim/materialpointsolver.jl

🧮 High-performance Material Point Method (MPM) Solver in Julia.

backend-agnostic cluster cuda hpc material-point-method metal mpm oneapi parallel-computing rocm

Last synced: 12 Apr 2025

https://github.com/abuccts/rocm-container-runtime

ROCm container runtime

container docker rocm runtime

Last synced: 11 Jul 2025

https://github.com/jatinx/pyhip

Python Interface to HIP and hiprtc Library

bindings cuda gpu hip hiprtc python rocm

Last synced: 21 Sep 2025

https://github.com/ai-dock/python

Python docker images for use in GPU cloud and local environments. Includes AI-Dock base for authentication and improved user experience.

ai cuda docker machine-learning python rocm runpod vast

Last synced: 28 Aug 2025

https://github.com/ulyssesrr/docker-rocm-xtra

ROCm docker images with fixes/support for extra architectures, such as gfx803/gfx1010.

docker gfx1010 gfx803 pytorch rocm stable-diffusion-webui

Last synced: 26 Feb 2025

https://github.com/pennylaneai/lightning-on-hpc

"Hybrid quantum programming with PennyLane Lightning on HPC platforms" accompanying data and workloads

cpp20 cuda gpu hpc mpi openmp python quantum quantum-computing rocm supercomputing

Last synced: 10 Jun 2025

https://github.com/hec-ovi/vllm-qwen

vLLM + Qwen3.6-27B (BF16) OpenAI-compatible inference server on AMD Strix Halo (Ryzen AI Max+ 395, gfx1151). Vision input, 256K context, /v1/responses with separated reasoning, via TheRock ROCm.

amd docker gfx1151 inference-server llm-serving local-llm multimodal-llm openai-compatible qwen qwen3 rocm ryzen-ai self-hosted strix-halo vllm

Last synced: 01 May 2026

https://github.com/neoblizz/hip_template

🖤 Template for starting HIP/C++ project using CMake with Github Action for CI.

cpp cuda cuda-programming gpgpu gpu hip rocm template-project template-repository

Last synced: 26 Mar 2025

https://github.com/alpaka-group/bactria

Broadly Applicable C++ Tracing and Instrumentation API :camel:

cuda hardware-counters instrumentation-api metrics rocm tracing-events

Last synced: 21 Apr 2025

https://github.com/mrowan137/stable-diffusion-v1-5-radeon-pro-vii

Notes for Stable Diffusion v1.5 setup on a Radeon Pro VII (AMD GPU).

amdgpu pytorch-rocm radeon-pro-vii rocm stable-diffusion ubuntu

Last synced: 17 Feb 2026

https://github.com/acai66/pytorch_rocm_whl

Pytorch compiled with ROCm.

pytorch rocm

Last synced: 03 Mar 2025

https://github.com/microsoft/hat

TOML-annotated C header file format for packaging binary files, from Microsoft Research

benchmarking cpp cprogramming cuda metadata platform-independent python-library rocm toml

Last synced: 10 Apr 2025

https://github.com/nikelborm/amd-amdgpu-rocm-ollama-gfx90c-ati-radeon-vega-ryzen7-5800h-arch-linux

Run Ollama on AMD Ryzen 7 5800H CPU with integrated GPU AMD ATI Radeon Vega (gfx90c) with optimizations

amd amd-gpu amdgpu archlinux avx2 bash bash-scripting cuda linux llama llama3 llm ollama oneapi radeon rocm ssse3 vega

Last synced: 30 Apr 2025

https://github.com/ROCm/hipMM

HIP Memory Manager (ROCm-DS)

amd cuda gpu hip memory-management radeon-instinct-mi-series rocm

Last synced: 12 Apr 2025

https://github.com/rocm/hipmm

HIP Memory Manager (ROCm-DS)

amd cuda gpu hip memory-management radeon-instinct-mi-series rocm

Last synced: 12 Apr 2025

https://github.com/rocm/numba-hip

HIP backend patch for Numba, the NumPy aware dynamic Python compiler using LLVM.

ai compiler cuda gpu hip hpc jit ml numba python radeon-instinct-mi-series rocm

Last synced: 31 Aug 2025

https://github.com/amd-agi/gpt-fast

The GPT-Fast for Multimodal Models on AMD GPUs

amd gptfast inference llama llava multimodal multimodal-large-language-models qwen rocm

Last synced: 10 Sep 2025

https://github.com/eliranwong/multiamdgpu_aidev_ubuntu

Multi AMD GPU Setup for AI Development on Ubuntu with ROCM

ai amd amd-gpu amdgpu freegenius gpu rocm ubuntu

Last synced: 08 Apr 2025

https://github.com/arminms/p2rng

A modern header-only C++ library for parallel algorithmic (pseudo) random number generation supporting OpenMP, CUDA, ROCm and oneAPI

cpp cuda cxx gpu header-only library linux macos multiplatorm oneapi openmp parallel pcg-random prng pseudorandom-number-generator random-number-distributions random-number-generation rocm stl-algorithms windows

Last synced: 04 Apr 2025

https://github.com/guilt/rocm-programming-masterclass

Udemy's CUDA programming Masterclass with Examples in ROCM/HIP.

cuda easy hip learning-by-doing masterclass rocm

Last synced: 04 Aug 2025

https://github.com/wasd-tech/guide-amd-hip-rocm

Guides about AMD HIP/ROCm

ai amd-gpu comfyui guide hip llamacpp rocm torch

Last synced: 06 Mar 2026

https://github.com/rocm/rocmds-cmake

This is a collection of CMake modules that are useful for all ROCm-DS projects. By sharing the code in a single place it makes rolling out CMake fixes easier.

amd cmake cuda hip radeon-instinct-mi-series rocm

Last synced: 10 Apr 2025

https://github.com/stargate01/aidungeon2-docker-rocm

Runs an AIDungeon2 fork in Docker on AMD ROCm hardware.

aidungeon2 amd docker gpt-2 pytorch rocm

Last synced: 09 May 2026

https://github.com/benjaminhottell/nix-rocm-pytorch

Compiles pytorch with rocm on nixos

nix nixos pytorch rocm

Last synced: 08 Sep 2025

https://github.com/video-analysis-opensource/docker_hub_sync

同步AI开发常用的docker镜像到阿里云镜像仓库,便于在国内快速拉取镜像。如:pytorch

centos docker-image openjdk pytorch rocm tensorflow torch ubuntu

Last synced: 05 Jan 2026

https://github.com/kiritigowda/mivisionx-inference-analyzer

MIVisionX Python Inference Analyzer uses pre-trained ONNX/NNEF/Caffe models to analyze inference results and summarize individual image results

amd amdgpu caffe docker-images inceptionv4 inference inference-engine inference-optimization mivisionx mivisionx-inference-analyzer nnef nnir onnx opencl openvx resnet resnet-50 rocm squeezenet vgg

Last synced: 11 Apr 2025

https://github.com/han-minhee/sgemm_hip

SGEMM implementations in HIP for NVIDIA / AMD GPUs

cuda gpgpu gpu hip rocm

Last synced: 27 Apr 2026