{"id":13716401,"url":"https://github.com/treydock/infiniband_exporter","last_synced_at":"2025-04-09T23:19:23.023Z","repository":{"id":43005598,"uuid":"361212660","full_name":"treydock/infiniband_exporter","owner":"treydock","description":null,"archived":false,"fork":false,"pushed_at":"2025-01-12T17:44:45.000Z","size":205,"stargazers_count":61,"open_issues_count":0,"forks_count":8,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-04-02T21:11:15.430Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/treydock.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-04-24T16:34:08.000Z","updated_at":"2025-03-06T02:41:50.000Z","dependencies_parsed_at":"2024-01-19T22:05:02.954Z","dependency_job_id":"b6c7025e-1005-4150-b2c4-e9d23392af11","html_url":"https://github.com/treydock/infiniband_exporter","commit_stats":null,"previous_names":[],"tags_count":18,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/treydock%2Finfiniband_exporter","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/treydock%2Finfiniband_exporter/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/treydock%2Finfiniband_exporter/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/treydock%2Finfiniband_exporter/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/treydock","download_url":"https://codeload.github.com/treydock/infiniband_exporter/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248125781,"owners_count":21051802,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-03T00:01:10.111Z","updated_at":"2025-04-09T23:19:22.997Z","avatar_url":"https://github.com/treydock.png","language":"Go","funding_links":[],"categories":["Monitoring"],"sub_categories":["Prometheus Based"],"readme":"[![Build Status](https://circleci.com/gh/treydock/infiniband_exporter/tree/master.svg?style=shield)](https://circleci.com/gh/treydock/infiniband_exporter)\n[![GitHub release](https://img.shields.io/github/v/release/treydock/infiniband_exporter?include_prereleases\u0026sort=semver)](https://github.com/treydock/infiniband_exporter/releases/latest)\n![GitHub All Releases](https://img.shields.io/github/downloads/treydock/infiniband_exporter/total)\n[![Go Report Card](https://goreportcard.com/badge/github.com/treydock/infiniband_exporter)](https://goreportcard.com/report/github.com/treydock/infiniband_exporter)\n[![codecov](https://codecov.io/gh/treydock/infiniband_exporter/branch/master/graph/badge.svg)](https://codecov.io/gh/treydock/infiniband_exporter)\n\n# InfiniBand Prometheus exporter\n\nThe InfiniBand exporter collects counters from InfiniBand switches and HCAs.\nThe exporter supports the `/metrics` endpoint to gather InfiniBand metrics and metrics about the exporter.\n\nThis exporter must be run on a host that has an active interface on the InfiniBand fabric you wish to monitor.\nBy default this exporter will collect counters from all switch ports on the fabric connected to the host running this exporter.\n\nThe InfiniBand diagnostic tools of `ibnetdiscover` and `perfquery` must also be present on the host running this exporter.\nThese are commonly installed via the `infiniband-diags` package.\n\n## Usage\n\nCollectors are enabled or disabled via `--collector.\u003cname\u003e` and `--no-collector.\u003cname\u003e` flags.\n\nName | Description | Default\n-----|-------------|--------\nswitch | Collect switch port counters | Enabled\nibswinfo | Collect data on unmanaged switches via ibswinfo (BETA) | Disabled\nhca | Collect HCA port counters | Disabled\n\nIf you have a node name map file typically used with Subnet Managers, you can provide that file to the  `--ibnetdiscover.node-name-map` flag.  This will use friendly names for switches.\n\n\nIf you wish to run the exporter as a user other than root and do not want to use sudo, you must make the UMAD device read/write to all users with something like the following:\n\n```\n$ cat /etc/udev/rules.d/99-ib.rules \nKERNEL==\"umad*\", NAME=\"infiniband/%k\" MODE=\"0666\"\n```\n\nIf you wish to use sudo you will need to run with the `--sudo` flag.  Below is an example of the sudo rules necessary if the exporter rules as `infiniband_exporter` user: (adjust paths to `perfquery` and `ibnetdiscover` as needed)\n\n```\nDefaults:infiniband_exporter !syslog\nDefaults:infiniband_exporter !requiretty\ninfiniband_exporter ALL=(ALL) NOPASSWD: /usr/sbin/ibnetdiscover\ninfiniband_exporter ALL=(ALL) NOPASSWD: /usr/sbin/perfquery\n```\n\nIf `ibnetdiscover` and `perfquery` are not in PATH then their paths need to be provided via the `--ibnetdiscover.path` and `--perfquery.path` flags.\n\n### Collect switch information using ibswinfo (BETA)\n\nThe tool [ibswinfo](https://github.com/stanford-rc/ibswinfo) can be used to collect information from unmanaged InfiniBand switches such as power supply and fan health.  To enable this collection pass the `--collector.ibswinfo` flag and ensure either `ibswinfo` is in $PATH or define the path to that executable via the `--ibswinfo.path` flag.\n\nThis feature is considered BETA as it relies on parsing non-machine readable data.\nIn the future this exporter may collect the unmanaged switch information directly in a similar way to what ibswinfo is doing.\n\nThe collection of `ibswinfo` takes about 2-3 seconds per switch so consider increasing Prometheus scrape timeout or running using `--exporter.runonce` per [Large fabric considerations](#large-fabric-considerations).  Also consider increasing the `--ibswinfo.max-concurrent` to a value greater than the default of 1, but be aware that a value too high will cause timeouts executing concurrent `ibswinfo` commands.\n\n### Large fabric considerations\n\nIf you have a large fabric where collection times are too long for Prometheus scrapes, the exporter can instead write metrics to a file that can be collected by node_exporter textfile collection.\n\nThis exporter has been tested on a fabric with 109 switches each having around 36 ports and collecting only switches takes ~10 seconds.\n\nTo collect the metrics from a file pass the `--collector.textfile.directory` flag to node_exporter like so: `--collector.textfile.directory=/var/lib/node_exporter/textfile_collector`.  Add this exporter to be executed via cron using flags like the following:\n\n* `--exporter.runonce`\n* `--exporter.output=/var/lib/node_exporter/textfile_collector/infiniband_exporter.prom`\n\nThe collection time of `--collector.switch.rcv-err-details` can take much longer than base metrics due to having to execute `perfquery` once per port.\nOne way to collect these metrics is collect base metrics with Prometheus scrapes and collect `--collector.switch.rcv-err-details` with runonce using the following flags (example on 8 core system, adjust `--perfquery.max-concurrent` as needed):\n\n* `--exporter.runonce`\n* `--exporter.output=/var/lib/node_exporter/textfile_collector/infiniband_exporter.prom`\n* `--no-collector.switch.base-metrics`\n* `--collector.switch.rcv-err-details`\n* `--perfquery.max-concurrent=8`\n\n## Docker\n\nExample of running the Docker container\n\n```\ndocker run -d -p 9315:9315 \\\n--name infiniband_exporter \\\n--cap-add=IPC_LOCK \\\n--device=/dev/infiniband/umad0 \\\ntreydock/infiniband_exporter\n```\n\n## Install\n\nDownload the [latest release](https://github.com/treydock/infiniband_exporter/releases)\n\nAdd the user that will run `infiniband_exporter`\n\n```\ngroupadd -r infiniband_exporter\nuseradd -r -d /var/lib/infiniband_exporter -s /sbin/nologin -M -g infiniband_exporter -M infiniband_exporter\n```\n\nInstall compiled binaries after extracting tar.gz from release page.\n\n```\ncp /tmp/infiniband_exporter /usr/sbin/infiniband_exporter\n```\n\nAdd systemd unit file and start service. Modify the `ExecStart` or `OPTIONS` in `/etc/sysconfig/infiniband_exporter` with desired flags.\nThe unit file uses [systemd's service templating feature](https://www.freedesktop.org/software/systemd/man/latest/systemd.service.html#Service%20Templates) and [specifiers](https://www.freedesktop.org/software/systemd/man/latest/systemd.unit.html#Specifiers).\n```\ncp systemd/infiniband_exporter@.service /etc/systemd/system/infiniband_exporter@.service\nsystemctl daemon-reload\nsystemctl start infiniband_exporter@infiniband_exporter.service\n```\n\n## Build from source\n\nTo produce the `infiniband_exporter` binary:\n\n```\nmake build\n```\n\nOr\n\n```\ngo get github.com/treydock/infiniband_exporter\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftreydock%2Finfiniband_exporter","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftreydock%2Finfiniband_exporter","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftreydock%2Finfiniband_exporter/lists"}