Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/imcf/nvsmi-prometheus-textfile
A zero-dependencies metrics collector for Prometheus based on "nvidia-smi" written in Python.
https://github.com/imcf/nvsmi-prometheus-textfile
grafana metrics monitoring nvidia nvidia-smi prometheus prometheus-exporter vgpu
Last synced: about 1 month ago
JSON representation
A zero-dependencies metrics collector for Prometheus based on "nvidia-smi" written in Python.
- Host: GitHub
- URL: https://github.com/imcf/nvsmi-prometheus-textfile
- Owner: imcf
- License: gpl-3.0
- Created: 2021-08-06T11:30:15.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2023-03-17T11:48:10.000Z (almost 2 years ago)
- Last Synced: 2024-11-11T03:12:50.979Z (3 months ago)
- Topics: grafana, metrics, monitoring, nvidia, nvidia-smi, prometheus, prometheus-exporter, vgpu
- Language: Python
- Homepage:
- Size: 186 KB
- Stars: 1
- Watchers: 5
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Prometheus textfile collector for `nvidia-smi`
![Python: 2.7](https://img.shields.io/badge/python-2.7-yellow) [![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black) [![License: GPL](https://img.shields.io/badge/license-GPL-blue)](https://github.com/imcf/nvsmi-prometheus-textfile/blob/main/LICENSE)
This is a zero-dependencies (see below for details) standalone tool collecting metrics
using the [`nvidia-smi`][1] (NVIDIA System Management Interface) command and formatting
them in a [Prometheus][2] compatible style that can be used through the
[node_exporter][3]'s `textfile` collector.Below is shown an example for a visualization generated by [Grafana][9] using the
metrics of two GPUs, showing* temperature as *solid lines*
* power draft as *dotted lines*
* the (intended) fan speed as *dashed lines*![Example using Grafana to visualize GPU metrics](/resources/nvsmi-grafana.png)
## Zero Dependencies
Or: **"Why not using the official Prometheus Python Client?"**
The tool is intended to work on minimalistic installations, e.g. we are using it on our
[Xen][4] / [Citrix Hypervisor][5] instances. Those setups come with very basic installs
(currently based on [CentOS][6]) and the installation of additional tools like `pip`
(which would be required for the Python Client) is not always possible / desirable.Therefore the only *actual* dependencies of this collector are already always fulfilled
on the relevant systems:* Python 2.7 - comes with the base OS installation
* `nvidia-smi` - available as soon as the NVIDIA driver package is installed## Permissions
No *root permissions* are required to collect the metrics through `nvidia-smi`, instead
having a user that is having write permissions to the textfile collector directory (or
actually just a single file therein, to be precise) of `node_exporter` is sufficient.One simple solution is to run the script under the same account that is also used for
the `node_exporter`. A possible setup could look like this:```bash
adduser \
--home-dir /var/lib/node_exporter \
--comment "Prometheus Node Exporter daemon" \
--system \
node_exportermkdir -pv /var/lib/node_exporter/textfile_collector
chown -R node_exporter:node_exporter /var/lib/node_exporter
```## Installation
Assuming you have followed the strategy for the user account outlined above, you can
simply clone this repo to `/opt/nvsmi-prometheus-textfile/` and use the *service* file
provided in the `resources` directory to run metrics collection via *systemd*:```bash
cd /opt/
git clone https://github.com/imcf/nvsmi-prometheus-textfile
cd nvsmi-prometheus-textfile/resources
cp -v nvsmi-prometheus-textfile.service /etc/systemd/system/
systemctl daemon-reload
systemctl enable --now nvsmi-prometheus-textfile.service
```## Seriously, Python 2.7? In 2021??
Well, that's what is available on the Citrix Hypervisor default installation that we're
running. Let's re-evaluate the situation with the next version.## Metric and Label Naming
See the official Prometheus instructions on [writing exporters][7] and [metric and
label naming][8] for more information.[1]: https://developer.nvidia.com/nvidia-system-management-interface
[2]: https://prometheus.io/
[3]: https://github.com/prometheus/node_exporter
[4]: https://xenproject.org/
[5]: https://docs.citrix.com/en-us/citrix-hypervisor.html
[6]: https://centos.org/
[7]: https://prometheus.io/docs/instrumenting/writing_exporters/
[8]: https://prometheus.io/docs/practices/naming/
[9]: https://grafana.com/