Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/jelmd/nvmex
Metrics exporter for Nvidia GPUs (Prometheus exposition format)
https://github.com/jelmd/nvmex
c gpu gpu-monitoring grafana metrics metrics-exporter monitoring nvidia nvidia-gpu nvidia-smi prometheus prometheus-exporter
Last synced: about 2 months ago
JSON representation
Metrics exporter for Nvidia GPUs (Prometheus exposition format)
- Host: GitHub
- URL: https://github.com/jelmd/nvmex
- Owner: jelmd
- Created: 2021-03-28T02:20:58.000Z (almost 4 years ago)
- Default Branch: main
- Last Pushed: 2023-10-02T14:24:39.000Z (about 1 year ago)
- Last Synced: 2024-08-02T15:25:58.625Z (5 months ago)
- Topics: c, gpu, gpu-monitoring, grafana, metrics, metrics-exporter, monitoring, nvidia, nvidia-gpu, nvidia-smi, prometheus, prometheus-exporter
- Language: C
- Homepage:
- Size: 505 KB
- Stars: 11
- Watchers: 2
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# nvmex
nvmex is a *m*etrics *ex*porter for *nv*idia graphic processing units (GPUs).
To monitor the health and resource consumption of GPUs it utilizes the nvidia
management library (libnividia-ml.so.1) usually distributed with the nvidia
driver or a similar package (e.g. Solaris: driver/graphics/nvidia,
Ubuntu: libnvidia-compute-\*). Collected data can be exposed via HTTP
in [Prometheuse exposition format](https://prometheus.io/docs/instrumenting/exposition_formats/)
using the endpoint URL http://_hostname:9400_/metrics (port and IP are
customizable of course) and thus visualized e.g. using [Grafana](https://grafana.com/), [Netdata](https://www.netdata.cloud/), or [Zabbix](https://www.zabbix.com/).In contrast to Nvidia's dcgm-exporter *nvmex* is written in plain C and thus
it is compared to dcgm-exporter extremely lightweight (virtual memory size:
120 MiB vs. 5.7 GiB, resident set size: 6.5 MiB vs. 23.5..246 MiB),
does not trash your disks with error logs, or hogs any cpu.
You may run it on bare metal, or in any zone, container, or pod.## Requirements
Beside Nvidia's libnividia-ml.so.1 (usually provided by the libnvidia-compute-XYZ package) *nvmex* requires [libprom](https://github.com/jelmd/libprom) and [libmicrohttpd](https://github.com/Karlson2k/libmicrohttpd).
The **nvml.h** (usually provided by the cuda-nvml-dev-U-V package) used to compile this utility must match the **libnividia-ml.so.1** library used on your machines and this in turn the version of the nvdia kernel module in use. The management library is usually backward compatible, so compiling against an older version and using it on machines with more recent versions of the NVML should work (but you may miss some metrics).
## Build
Adjust the **Makefile** as needed, optionally set related environment variables
(e.g. `export CUDA_VERS=10.1 CC=gcc`) and run **make**.## Repo
The official repository for *nvmex* is https://github.com/jelmd/nvmex .
If you need some new features (or bug fixes), please feel free to create an
issue there using https://github.com/jelmd/nvmex/issues .## Versioning
*nvmex* follows the basic idea of semantic versioning, but having the real world
in mind. Therefore official releases have always THREE numbers (A.B.C), not
more and not less! For nightly, alpha, beta, RC builds, etc. a *.0* and
possibly more dot separated digits will be append, so that one is always able
to overwrite this one by using a 4th digit > 0.## License
[CDDL 1.1](https://spdx.org/licenses/CDDL-1.1.html)