{"id":19426488,"url":"https://github.com/beuth-erdelt/prometheus_nvlink_exporter","last_synced_at":"2025-04-14T16:34:13.144Z","repository":{"id":105072106,"uuid":"188201097","full_name":"Beuth-Erdelt/prometheus_nvlink_exporter","owner":"Beuth-Erdelt","description":"This script collects some informations about NVLink and PCI bus traffic of NVidia GPUs. Results are published as prometheus metrics via a websocket.","archived":false,"fork":false,"pushed_at":"2019-07-29T09:31:28.000Z","size":11,"stargazers_count":6,"open_issues_count":4,"forks_count":0,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-03-28T05:12:21.381Z","etag":null,"topics":["gpu","nvidia-cuda","nvidia-docker","nvidia-gpu","nvlink","prometheus","prometheus-exporter","python"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Beuth-Erdelt.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-05-23T09:15:57.000Z","updated_at":"2024-06-24T02:37:53.000Z","dependencies_parsed_at":null,"dependency_job_id":"f6cccb2f-2243-4b51-9c6a-c1512bc4999a","html_url":"https://github.com/Beuth-Erdelt/prometheus_nvlink_exporter","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Beuth-Erdelt%2Fprometheus_nvlink_exporter","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Beuth-Erdelt%2Fprometheus_nvlink_exporter/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Beuth-Erdelt%2Fprometheus_nvlink_exporter/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Beuth-Erdelt%2Fprometheus_nvlink_exporter/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Beuth-Erdelt","download_url":"https://codeload.github.com/Beuth-Erdelt/prometheus_nvlink_exporter/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248916897,"owners_count":21182885,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["gpu","nvidia-cuda","nvidia-docker","nvidia-gpu","nvlink","prometheus","prometheus-exporter","python"],"created_at":"2024-11-10T14:07:46.917Z","updated_at":"2025-04-14T16:34:13.137Z","avatar_url":"https://github.com/Beuth-Erdelt.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# prometheus_nvlink_exporter\n\nThis script collects some informations about NVLink and PCI bus traffic of NVidia GPUs.\nResults are published as prometheus metrics via a websocket.\n\n## Usage\n\nWe also provide a Docker file.\nThis is based on NVidia's CUDA container, adds a python installation and runs the exporter script.\n\nThe basic usage is `docker run -d ...`\n\nThe metrics can be scraped from port 8001.\n\nThe docker image is compatible to kubernetes environments.\n\n## Prerequisites\n\nThe docker image requires docker and NVidia GPUs capable of NVLink and the basic drivers being installed.\n\nThe script expects the GPUs to be set via\n```\nnvidia-smi nvlink -sc 0bz\nnvidia-smi nvlink -sc 1pz\n```\nThe script uses `nvidia-smi` and some python libraries, in particular https://github.com/prometheus/client_python\n\n## Working examples\n\nBasically the script runs `nvidia-smi` commands and transforms output to some format that can be scraped by prometheus.\n\n### Collecting NVLink Informations\n\nThis automatically runs  `nvidia-smi nvlink -g 0`:\n```\nGPU 0: Tesla V100-SXM2-16GB (UUID: GPU-8dfc570f-9ee4-bdf1-abcd-192837465abc)\n         Link 0: Rx0: 0 KBytes, Tx0: 0 KBytes\n         Link 1: Rx0: 100 KBytes, Tx0: 0 KBytes\n         Link 2: Rx0: 0 KBytes, Tx0: 0 KBytes\n         Link 3: Rx0: 0 KBytes, Tx0: 0 KBytes\nGPU 1: Tesla V100-SXM2-16GB (UUID: GPU-29123255-8aab-d30e-abcd-192837465abc)\n         Link 0: Rx0: 0 KBytes, Tx0: 0 KBytes\n         Link 1: Rx0: 0 KBytes, Tx0: 0 KBytes\n         Link 2: Rx0: 50 KBytes, Tx0: 0 KBytes\n         Link 3: Rx0: 0 KBytes, Tx0: 0 KBytes\nGPU 2: Tesla V100-SXM2-16GB (UUID: GPU-7db3a1e6-6150-9c24-abcd-192837465abc)\n         Link 0: Rx0: 0 KBytes, Tx0: 0 KBytes\n         Link 1: Rx0: 0 KBytes, Tx0: 0 KBytes\n         Link 2: Rx0: 0 KBytes, Tx0: 0 KBytes\n         Link 3: Rx0: 0 KBytes, Tx0: 0 KBytes\n         Link 4: Rx0: 0 KBytes, Tx0: 0 KBytes\nGPU 3: Tesla V100-SXM2-16GB (UUID: GPU-22ea33c7-5a76-9747-abcd-192837465abc)\n         Link 0: Rx0: 0 KBytes, Tx0: 0 KBytes\n         Link 1: Rx0: 0 KBytes, Tx0: 0 KBytes\n         Link 2: Rx0: 0 KBytes, Tx0: 0 KBytes\n         Link 3: Rx0: 0 KBytes, Tx0: 0 KBytes\n         Link 4: Rx0: 0 KBytes, Tx0: 0 KBytes\n```\n\n### Collecting PCI Informations\n\nThis automatically runs `nvidia-smi dmon -s t -c 1`\n```\n# gpu rxpci txpci\n# Idx  MB/s  MB/s\n    1     0     0\n    2     0     0\n```\n\n### Publishing Metrics\n\nOutput is similar to\n```\n# HELP gpu_nvlink_tx_kbytes Transmitted KBytes via NVLink\n# TYPE gpu_nvlink_tx_kbytes gauge\ngpu_nvlink_tx_kbytes{GPUID=\"0\",LinkID=\"2\"} 27598895329.0\ngpu_nvlink_tx_kbytes{GPUID=\"0\",LinkID=\"1\"} 31602715771.0\ngpu_nvlink_tx_kbytes{GPUID=\"4\",LinkID=\"2\"} 0.0\ngpu_nvlink_tx_kbytes{GPUID=\"7\",LinkID=\"0\"} 0.0\ngpu_nvlink_tx_kbytes{GPUID=\"4\",LinkID=\"3\"} 0.0\ngpu_nvlink_tx_kbytes{GPUID=\"5\",LinkID=\"1\"} 0.0\ngpu_nvlink_tx_kbytes{GPUID=\"0\",LinkID=\"3\"} 31602715771.0\ngpu_nvlink_tx_kbytes{GPUID=\"5\",LinkID=\"0\"} 0.0\ngpu_nvlink_tx_kbytes{GPUID=\"7\",LinkID=\"2\"} 0.0\ngpu_nvlink_tx_kbytes{GPUID=\"2\",LinkID=\"3\"} 1019788145.0\ngpu_nvlink_tx_kbytes{GPUID=\"7\",LinkID=\"1\"} 0.0\ngpu_nvlink_tx_kbytes{GPUID=\"3\",LinkID=\"2\"} 1017047660.0\ngpu_nvlink_tx_kbytes{GPUID=\"2\",LinkID=\"0\"} 1014424036.0\ngpu_nvlink_tx_kbytes{GPUID=\"2\",LinkID=\"1\"} 1017028693.0\ngpu_nvlink_tx_kbytes{GPUID=\"1\",LinkID=\"2\"} 1017047660.0\ngpu_nvlink_tx_kbytes{GPUID=\"6\",LinkID=\"2\"} 49.0\ngpu_nvlink_tx_kbytes{GPUID=\"5\",LinkID=\"3\"} 2986639.0\ngpu_nvlink_tx_kbytes{GPUID=\"0\",LinkID=\"0\"} 0.0\ngpu_nvlink_tx_kbytes{GPUID=\"3\",LinkID=\"3\"} 1017028657.0\ngpu_nvlink_tx_kbytes{GPUID=\"6\",LinkID=\"1\"} 0.0\ngpu_nvlink_tx_kbytes{GPUID=\"5\",LinkID=\"2\"} 0.0\ngpu_nvlink_tx_kbytes{GPUID=\"6\",LinkID=\"0\"} 2555441.0\ngpu_nvlink_tx_kbytes{GPUID=\"3\",LinkID=\"0\"} 1014357462.0\ngpu_nvlink_tx_kbytes{GPUID=\"6\",LinkID=\"3\"} 0.0\ngpu_nvlink_tx_kbytes{GPUID=\"1\",LinkID=\"3\"} 0.0\ngpu_nvlink_tx_kbytes{GPUID=\"3\",LinkID=\"1\"} 0.0\ngpu_nvlink_tx_kbytes{GPUID=\"1\",LinkID=\"0\"} 1014341346.0\ngpu_nvlink_tx_kbytes{GPUID=\"1\",LinkID=\"1\"} 5022027981.0\ngpu_nvlink_tx_kbytes{GPUID=\"4\",LinkID=\"0\"} 0.0\ngpu_nvlink_tx_kbytes{GPUID=\"4\",LinkID=\"1\"} 0.0\ngpu_nvlink_tx_kbytes{GPUID=\"2\",LinkID=\"2\"} 4007720847.0\ngpu_nvlink_tx_kbytes{GPUID=\"7\",LinkID=\"3\"} 0.0\n# HELP gpu_pci_rx_mb_per_s Received MBytes per second via PCI\n# TYPE gpu_pci_rx_mb_per_s gauge\ngpu_pci_rx_mb_per_s{GPUID=\"2\"} 0.0\ngpu_pci_rx_mb_per_s{GPUID=\"5\"} 0.0\ngpu_pci_rx_mb_per_s{GPUID=\"7\"} 0.0\ngpu_pci_rx_mb_per_s{GPUID=\"3\"} 0.0\ngpu_pci_rx_mb_per_s{GPUID=\"6\"} 0.0\ngpu_pci_rx_mb_per_s{GPUID=\"4\"} 0.0\ngpu_pci_rx_mb_per_s{GPUID=\"0\"} 0.0\ngpu_pci_rx_mb_per_s{GPUID=\"1\"} 0.0\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbeuth-erdelt%2Fprometheus_nvlink_exporter","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbeuth-erdelt%2Fprometheus_nvlink_exporter","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbeuth-erdelt%2Fprometheus_nvlink_exporter/lists"}