https://github.com/skobkin/amdgputop-web

AMD GPU status monitor web panel written in Go using LLMs
https://github.com/skobkin/amdgputop-web

amd amdgpu dashboard gpu gpu-monitoring linux metrics metrics-collection metrics-collector metrics-exporter metrics-gathering metrics-visualization monitoring resource-usage top utilization

Last synced: about 2 months ago
JSON representation

AMD GPU status monitor web panel written in Go using LLMs

Host: GitHub
URL: https://github.com/skobkin/amdgputop-web
Owner: skobkin
License: mit
Created: 2025-10-12T16:01:54.000Z (8 months ago)
Default Branch: master
Last Pushed: 2026-03-21T17:43:32.000Z (2 months ago)
Last Synced: 2026-03-22T07:41:20.770Z (2 months ago)
Topics: amd, amdgpu, dashboard, gpu, gpu-monitoring, linux, metrics, metrics-collection, metrics-collector, metrics-exporter, metrics-gathering, metrics-visualization, monitoring, resource-usage, top, utilization
Language: Go
Homepage:
Size: 628 KB
Stars: 4
Watchers: 0
Forks: 0
Open Issues: 6
Metadata Files:
- Readme: README.md
- License: LICENSE
- Agents: AGENTS.md

Awesome Lists containing this project

README

# amdgpu_top-web

[![CI](https://github.com/skobkin/amdgputop-web/actions/workflows/ci.yml/badge.svg)](https://github.com/skobkin/amdgputop-web/actions/workflows/ci.yml)

Read-only web UI for live AMD GPU telemetry inspired by the `amdgpu_top` CLI.
The backend is pure Go (stdlib HTTP + WebSockets) and the frontend is a compact
Preact single-page app.

![AMD GPU telemetry UI](docs/screenshot.webp "Current UI snapshot")

## Features

- 🖥️ Enumerates DRM GPUs and streams utilization, clocks, temps, VRAM/GTT usage.
- 🧾 Optional “process top” view sourced from `/proc/*/fdinfo` with engine-time
deltas when exposed by the kernel.
- 📈 Historical charts (uPlot) for the selected GPU with hover tooltips.
- 🌐 REST endpoints for `/api/gpus`, `/api/gpus//metrics`, and `/api/gpus//procs`
alongside a WebSocket feed (`/ws`).
- 📊 Optional Prometheus `/metrics` export with per-GPU telemetry (no per-process data).
- ⚙️ Configuration via environment variables (`APP_*`), including sampler cadence,
process scanner limits, and allowed origins.

## Quick Start (host build)

```bash
cd web && npm ci && npm run build
go build ./cmd/amdgputop-web
./amdgputop-web # listens on :8080 by default

# Alternatively, run the default build pipeline:
# make
```

The frontend build output is generated into `internal/httpserver/assets/` and is
embedded at compile time; those files are not committed to the repository.

On AMD hardware you can sanity-check the sampler without the web UI:

```bash
go run ./cmd/sampler-test -sample
```

## Docker

The official image built by Github Actions is available here: [`ghcr.io/skobkin/amdgputop-web`](https://github.com/skobkin/amdgputop-web/pkgs/container/amdgputop-web).

### Docker compose

Example Docker stack: https://git.skobk.in/skobkin/docker-stacks/src/branch/master/amdgputop-web

### Running manually

An Alpine-based multi-stage image is defined in `Dockerfile`.

```bash
docker build -t amdgputop-web:dev .

VID_GID=$(getent group video | cut -d: -f3)
RENDER_GID=$(getent group render | cut -d: -f3)

docker run --rm -p 8080:8080 \
--device=/dev/dri \
--device=/dev/kfd \
--group-add "${VID_GID}" \
--group-add "${RENDER_GID}" \
--pid=host \
--cap-add SYS_PTRACE \
--user root \
amdgputop-web:dev
```

### Important notes

> **GPU names**: the image bundles Alpine's `/usr/share/hwdata/pci.ids`, so GPU
> model names resolve without any extra volume mounts. If you want to override
> the bundled database with the host's copy, bind-mount it explicitly.

> **Why root + `SYS_PTRACE`?** Reading `/proc//fdinfo` for host workloads
> requires elevated privileges and the `CAP_SYS_PTRACE` capability. Running the
> container as `root` with `--cap-add SYS_PTRACE` is the simplest way to let the
> process scanner observe GPU clients outside the container. If you only need
> device-level metrics, you can omit `--pid=host`, `--user root`, and the extra
> capability and run with the default non-root user.

Refer to `docs/DOCKER.md` for more detail, including why `--pid=host` is needed
to observe host processes.

#### Troubleshooting & permissions

- The [permissions matrix](docs/DOCKER.md#permissions-matrix) explains which
flags, groups, and capabilities are required for device-only metrics versus
host process telemetry.
- If the UI shows empty process tables or partial metrics, consult the
[troubleshooting section](docs/DOCKER.md#troubleshooting) for the most common
container permission fixes.

## Configuration

| Variable
|---------------------
| `APP_LISTEN_ADDR`
| `APP_LOG_LEVEL`
| `APP_ALLOWED_ORIGINS`
| `APP_DEFAULT_GPU`
| `APP_ENABLE_PROMETHEUS`
| `APP_ENABLE_PPROF`
| `APP_CHARTS_ENABLE`
| `APP_CHARTS_MAX_POINTS`
| `APP_SAMPLE_INTERVAL`
| `APP_PROC_ENABLE`
| `APP_PROC_SCAN_INTERVAL`
| `APP_PROC_MAX_PIDS`
| `APP_PROC_MAX_FDS_PER_PID`
| `APP_WS_MAX_CLIENTS`
| `APP_WS_WRITE_TIMEOUT`
| `APP_WS_READ_TIMEOUT`
| `APP_SYSFS_ROOT`
| `APP_DEBUGFS_ROOT`
| `APP_PROC_ROOT` | Default | Description | -------|---------------------|----------------------------------------------------------------| | `:8080` | HTTP listen address. | | `INFO` | Log verbosity (`DEBUG`, `INFO`, `WARN`, `ERROR`). | | `*` | Comma-separated origins allowed for WebSocket/HTTP. | | `auto` | GPU pre-selected on connect (`auto` = first detected). | | `false` | Enable `/metrics` endpoint with per-GPU telemetry when `true`. | | `false` | Expose Go pprof handlers on `/debug/pprof/*`. | | `true` | Toggle historical charts feature. | | `7200` | Maximum data points retained per chart. | | `2s` | Metrics sampling cadence. | | `true` | Toggle process scanner feature. | | `2s` | Interval between process snapshot scans. | | `5000` | Upper bound on tracked process count per scan. | | `64` | Max file descriptors per PID to inspect. | | `1024` | Maximum concurrent WebSocket clients. | | `3s` | WebSocket write timeout. | | `30s` | WebSocket read timeout. | | `/sys` | Override sysfs root (test-only). | | `/sys/kernel/debug` | Override debugfs root (test-only). | | `/proc` | Override procfs root (test-only). |

See `internal/config/config.go` for the full list, including test-only roots
(`APP_SYSFS_ROOT`, `APP_DEBUGFS_ROOT`, `APP_PROC_ROOT`).

## Prometheus

Set `APP_ENABLE_PROMETHEUS=true` to expose `GET /metrics`. The exporter
publishes WebSocket counters along with the latest per-GPU telemetry pulled from
the sampler. Each gauge is labeled with `gpu_id` and includes:

- Busy percentages for graphics and memory engines.
- Current SCLK/MCLK frequencies, temperature, fan RPM, and power draw.
- VRAM/GTT usage and capacity.
- Timestamps and age for the most recent sample.

Per-process statistics stay out of the Prometheus surface area.

## Development

```bash
# Backend
go test ./...

# Frontend
cd web && npm ci && npm run build
```

CI (see `.github/workflows/ci.yml`) enforces `gofmt`, `go vet`, Go tests,
frontend build, and publishes tagged releases with Linux binaries and Docker
images.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/skobkin/amdgputop-web

Awesome Lists containing this project

README