https://github.com/crazyguitar/rdmatop
htop-like TUI for real-time RDMA network monitoring.
https://github.com/crazyguitar/rdmatop
efa hpc infiniband linux monitoring networking rdma rust tui
Last synced: about 2 months ago
JSON representation
htop-like TUI for real-time RDMA network monitoring.
- Host: GitHub
- URL: https://github.com/crazyguitar/rdmatop
- Owner: crazyguitar
- License: apache-2.0
- Created: 2026-03-14T05:28:49.000Z (3 months ago)
- Default Branch: main
- Last Pushed: 2026-04-21T07:46:53.000Z (about 2 months ago)
- Last Synced: 2026-04-21T09:39:05.695Z (about 2 months ago)
- Topics: efa, hpc, infiniband, linux, monitoring, networking, rdma, rust, tui
- Language: Rust
- Homepage:
- Size: 4.41 MB
- Stars: 10
- Watchers: 0
- Forks: 1
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# rdmatop
[](https://crates.io/crates/rdmatop)
[](LICENSE)
`htop`, but for RDMA traffic — a real-time TUI monitor for RDMA network interfaces.
Monitors per-device throughput (Gbps, packets/s, drops), RDMA read/write counters,
retransmits, health events, and shows which processes are using each RDMA device —
all via RDMA netlink, the same interface used by [rdma statistic](https://github.com/iproute2/iproute2/blob/main/rdma/stat.c).
## Requirements
- **Linux** (netlink-based — macOS/Windows are not supported)
- RDMA-capable NICs (e.g., Mellanox/NVIDIA ConnectX, AWS EFA)
## Installation
### Ubuntu (PPA)
On Ubuntu 22.04 (jammy) or 24.04 (noble):
```bash
sudo add-apt-repository ppa:crazyguitar/rdmatop
sudo apt update
sudo apt install rdmatop
```
### Cargo
```bash
cargo install rdmatop
```
### From source
```bash
make # cargo build
make install # cargo install
```
## Usage
```bash
rdmatop
```
## Examples
Use `rdmatop` to monitor RDMA traffic while running GPU
communication benchmarks:
- [NCCL](examples/nccl/) — collective communication
- [NIXL](examples/nixl/) — point-to-point KV cache transfer
- [NVSHMEM](examples/nvshmem/) — one-sided GPU communication
- [PPLX Kernels](examples/pplx/) — MoE all-to-all dispatch/combine
- [RDMA Statistics](examples/rdma/) — shell-based RDMA stats
- [Kubernetes](examples/kubernetes/) — DaemonSet deployment for Kubernetes
## How It Works
1. **Device enumeration** — `RDMA_NLDEV_CMD_GET` via netlink to discover all RDMA devices
2. **HW counters** — `RDMA_NLDEV_CMD_STAT_GET` per device/port, same as `rdma statistic show`
3. **Process detection** — `RDMA_NLDEV_CMD_RES_QP_GET` to map QPs → PIDs, enriched with `/proc` data
4. **Throughput** — Two snapshots per interval, delta / elapsed for rates
## License
Apache-2.0