{"id":13581644,"url":"https://github.com/amd/amd_smi_exporter","last_synced_at":"2025-04-06T10:32:40.860Z","repository":{"id":64308281,"uuid":"468282214","full_name":"amd/amd_smi_exporter","owner":"amd","description":"The AMD SMI Exporter exports AMD EPYC CPU  \u0026 Datacenter GPU metrics to the Prometheus server. ","archived":false,"fork":false,"pushed_at":"2024-10-25T09:59:15.000Z","size":55,"stargazers_count":49,"open_issues_count":9,"forks_count":9,"subscribers_count":8,"default_branch":"master","last_synced_at":"2025-03-03T16:48:34.283Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/amd.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"License.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-03-10T09:49:30.000Z","updated_at":"2025-02-18T13:38:57.000Z","dependencies_parsed_at":"2024-01-16T21:29:59.098Z","dependency_job_id":"40094cd1-5fd7-4487-bfa3-a20f66ceb732","html_url":"https://github.com/amd/amd_smi_exporter","commit_stats":null,"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/amd%2Famd_smi_exporter","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/amd%2Famd_smi_exporter/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/amd%2Famd_smi_exporter/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/amd%2Famd_smi_exporter/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/amd","download_url":"https://codeload.github.com/amd/amd_smi_exporter/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247470384,"owners_count":20944146,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-01T15:02:09.081Z","updated_at":"2025-04-06T10:32:40.064Z","avatar_url":"https://github.com/amd.png","language":"Go","funding_links":[],"categories":["Go"],"sub_categories":[],"readme":"AMD SMI Prometheus Exporter\n----------------------------\n\nThe AMD SMI Exporter is a standalone app that can be run as a daemon, written in GO Language,\nthat exports AMD CPU \u0026 GPU metrics to the Prometheus server. The AMD SMI Prometheus Exporter\nemploys\n* [AMDSMI Library](https://github.com/ROCm/amdsmi.git) for its data acquisition and \n* [GO binding](https://github.com/ROCm/amdsmi/blob/amd-staging/goamdsmi.go) that provides an interface between the amdsmi and the GO exporter code.\n\n### Important note about Versioning and Backward Compatibility\n\nThe AMD SMI Exporter follows the AMDSMI library in its releases, as it is dependent on the underlying libraries for its data. The Exporter is currently under development, and therefore subject to change in the features it offers and at the interface with the GO binding.\n\nWhile every effort will be made to ensure stable backward compatibility in software releases\nwith a major version greater than 0, any code/interface may be subject to revision/change while the major version remains 0.\n\n## Building the exporter\n\nThe standalone GO Exporter may be built from the src directory as follows:\n\n### Downloading the source\n\nThe source code for the GO Exporter is [AMD SMI Exporter](https://github.com/amd/amd_smi_exporter.git).\n\n### Directory stucture of the source\n\nOnce the exporter source has been cloned to a local Linux machine, the directory structure of\nsource is as below:\n* `$ src/` Contains exporter source for package main\n* `$ src/collect/` Contains the implementation of the Scan function of the collector.\n* `$ grafana/` Contains the JSON files for Grafana dashboard.\n\n* Change the directory to amd_smi_exporter/src\n\n\t```$ cd amd_smi_exporter/src```\n\n* Execute \"make clean\" to clean pre-existing binaries and GO module files\n\n\t```amd_smi_exporter/src$ make clean```\n\n* Execute \"make\" to perform a \"go get\" of dependent modules such as\n\t* github.com/prometheus/client_golang\n\t* github.com/prometheus/client_golang/prometheus\n\t* github.com/prometheus/client_golang/prometheus/promhttp\n\t* github.com/ROCm/amdsmi\n\n\t```amd_smi_exporter/src$ make```\n\nThe aforementioned steps will create the \"amd_smi_exporter\" GO binary file.\nTo install the binary in /usr/local/bin, and install the service file in\n/etc/systemd/system directory, one may execute:\n\n\t```$ sudo make install```\n\n## Building the container for the GO Exporter\n\nOnce the GO Exporter is built, one may proceed to create a containerized micro service of the go executable by executing the following commands:\n\nPrerequisite: docker version 20.10.12 or later must be installed on the build server for the\ncontainer build to succeed.\n\n* Execute \"make container_clean\" to clean pre-existing images and configuration of the container image.\n\n\t```amd_smi_exporter/src$ make container_clean```\n\n* Build the fresh container image with the following command:\n\n\t```amd_smi_exporter/src$ make container```\n\n  This command will build the container image and will be listed when the user issues the\n```sudo docker images``` command.\nA tarball of the container image file \"k8/amd_smi_exporter_container.tar\" is also saved in\nthe \"k8\" directory, and this may be used to deploy the container manually on respective\nnodes of the kubernetes cluster using the \"k8/daemonset.yaml\" file.\n\n## Grafana Dashboard:\n\nJSON files for Grafana dashboard are available under grafana/ of this repo\n* AMDSmiExporter_CPU_GrafanaDashboard.json\n* AMDSmiExporter_GPU_GrafanaDashboard.json\n\n## Dependencies\n\nPlease ensure the following are in place\n1. amdsmi library with  goamdsmi_shim bindings installed under \"/opt/rocm\"\n2. GO v1.20\n3. Docker (tested on v20.10.12 or later)\n\n### GO Installation:\n\nTo run on AMD rocm dockers, GO installation through apt install on Linux is only supported till 1.18.\nManual installation can be done from here: \u003chttps://go.dev/dl/\u003e\nBelow is an example of installing 1.20.12 of go.\n\n\t$ wget -L \"https://golang.org/dl/go1.20.12.linux-amd64.tar.gz\"\n\t$ tar -xf \"go1.20.12.linux-amd64.tar.gz\"\n\t$ cd go/\n\t$ ls -l\n\t$ cd ..\n\t$ sudo chown -R root:root ./go\n\t$ sudo mv -v go /usr/local\n\t$ export GOPATH=$HOME/go\n\t$ export PATH=$PATH:/usr/local/go/bin:$GOPATH/bin\n\t\n\nAdd amdsmi library path to LD_LIBRARY_PATH environment variable and export.\n\n\t$ export LD_LIBRARY_PATH=\u003cpath_to_amdsmi_library\u003e\n\n## Running the GO Exporter\n\nNOTE: Only one instance of the GO Exporter may be run on the server, either as a \nstandalone service, or as a containerized micro service (started with \"docker run\"\nor as a daemonSet of a kubernetes deployment).\n\nPrerequisite: To ensure that AMD custom parameters defined in the \namd-smi-custom-rules.yml file are found in the promql queries, add \nthe following rule_files and scrape_configs to the \n/etc/prometheus/prometheus.yml file:\n\nrule_files:\n  - \"amd-smi-custom-rules.yml\"\n\nscrape_configs:\n  - job_name: \"prometheus\"\n  - job_name: \"amd-smi-exporter\"\n    static_configs:\n      - targets: [\"localhost:2021\"]\n\n### Custom rules\n\nThe prometheus query language allows the user to customize his queries based on user requirements.\nThe customizations may be added to the /usr/local/bin/prometheus/amd-smi-custom-rules.yml file\".\nHere are a few sample queries that may be built over the aforementioned objects:\n\n* \u003e ### amd_core_energy{thread=\"101\"}/1000000\n\tDisplays the core energy of core 101 shifted by six decimal points.\n\n* \u003e ### amd_socket_power/100 \u003e 650.00\n\tRule to check if socket power consumption has gone over 650.00\n\n* \u003e ### amd_prochot_status != 0\n\tAlert to check if PROC_HOT status has been triggered\n\n## Executing the Go Exporter\n\nThe GO exporter may be run manually in the following ways\n\n### 1. Executing the \"amd_smi_exporter\" GO binary:\n\n\t```amd_smi_exporter/src$ ./amd_smi_exporter```\n\n### 2. As a systemd daemon:\n\n\t```$ sudo systemctl daemon-reload```\n\n\t```$ sudo service prometheus restart```\n\n\t```$ sudo service amd-smi-exporter start```\n\n### 3. As a containerized micro service that may be started manually or as a kubernetes daemonSet:\n\nAssuming user has a running docker daemon and a kubernetes cluster.\n\n#### On a server node that is not a part of a kubernetes cluster, one may execute the following command:\n\n\t```$ sudo docker run -d --name amd-exporter --device=/dev/cpu --device=/dev/kfd\n           --device=/dev/dri --privileged -p 2021:2021 amd_smi_exporter_container:0.1```\n\n   Alternatively, the docker image tarball of the container may be copied to individual\n   kubernetes cluster node and loaded on the worker node. The daemonSet may then be applied\n   from the master node as follows:\n\n#### On the worker node, copy the amd_smi_exporter_container.tar image file and execute:\n\n\t```$ sudo docker load -i amd_smi_exporter_container.tar```\n\n\tOn the master node, copy the daemonset.yaml file and execute:\n\n\t```$ kubectl apply -f daemonset.yaml```\n\n\tThis will deploy a single running instance of the AMD SMI Exporter container micro\n\tservice on the worker nodes of the kubernetes cluster. The daemonset.yaml file may\n\tbe edited to apply taints for nodes where the exporter is not expected to run in\n\tthe cluster.\n\n## Supported hardware\n\nAMD EPYC TM line of server CPU Families:\n\n* AMD CPU Family `19h` Models `0h-Fh` (Milan), `10h-1Fh` (Genoa), `A0h-AFh`. \n* AMD CPU Family `1Ah` Models `0h-Fh` (Turin), `10h-1Fh`.\n* AMD APU Family `19h` Models `90h-9fh` and \n* AMD GPUs MI200 and MI300.\n\n## Examples\n\n### CPU core metrics\n\n#### 1. amd_core_energy\n\t### Description: Displays the per-core energy consumption of the processor so far.\nThis object may be queried at the core level or the thread level. The values reported by\nthe threads in a hyperthreaded core will be the same. This object query will report the\nenergy counter values for all threads. To query a single thread (lets say the thread number\nis 101), the user may use the following query:\n\n\tamd_core_energy{thread=\"101\"}\n\n\t### Type: Counter\n\t### Property: Read-only\n\n#### 2. amd_boost_limit\n\t### Description: Displays the per-core boost limit that the core is operating at.\n\t### Type: Gauge\n\t### Property: Read-only\n\n### CPU Socket metrics\n\n#### 3. amd_socket_energy\n\t### Description: Displays the per-socket cumulative energy consumed by all cores\nso far. This value excludes the energy consumed by the AID (Active Interposer Die).To query\na single socket (lets say socket 2), the user may use the following query:\n\n\tamd_socket_energy{socket=\"2\"}\n\n\t### Type: Counter\n\t### Property: Read-only\n\n#### 4. amd_socket_power\n\t### Description: Displays the per-socket power consumed. This is a real time gauge\nvalue that is queried at a time interval set by the scrape interval.\n\t### Type: Gauge\n\t### Property: Read-only\n\n#### 5. amd_power_limit\n\t### Description: Displays the power limit at which the processor is operating at.\n\t### Type: Gauge\n\t### Property: Read-only\n\n#### 6. amd_prochot_status\n\t### Description: Displays a binary value of \"0\" or \"1\", where \"1\" implies that the\nPROC_HOT status of the processor has been triggered.\n\t### Type: Gauge\n\t### Property: Read-only\n\n### System\n\n#### 7. amd_num_sockets\n\t### Description: Displays the number of sockets which the processor is seated in.\n\t### Type: Gauge\n\t### Property: Read-only\n\n#### 8. amd_num_threads\n\t### Description: Displays the total number of threads (logical CPUs) in all.\n\t### Type: Gauge\n\t### Property: Read-only\n\n### 9. amd_num_threads_per_core\n\t### Description: Displays the number of threads (logical CPUs) per core.\n\t### Type: Gauge\n\t### Property: Read-only\n\n### GPU Metrics\n\n#### 10. amd_num_gpus\n\t### Description: Displays the number of gpus\n\t### Type: Gauge\n\t### Property: Read-only\n\n#### 11. amd_gpu_dev_id\n\t### Description: Displays the dev id of the gpu\n\t### Type: Gauge\n\t### Property: Read-only\n\n#### 12. amd_gpu_power_cap\n\t### Description: Displays the gpu power cap\n\t### Type: Gauge\n\t### Property: Read-only\n\n#### 13. amd_gpu_power_avg\n\t### Description: Displays the gpu average power consumed\n\t### Type: Counter\n\t### Property: Read-only\n\n#### 14. amd_gpu_current_temperature\n\t### Description: Displays the current temperature of the gpu\n\t### Type: Gauge\n\t### Property: Read-only\n\n#### 15. amd_gpu_SCLK\n\t### Description: Displays the GPU SCLK frequency\n\t### Type: Gauge\n\t### Property: Read-only\n\n#### 16. amd_gpu_MCLK\n\t### Description: Displays the GPU MCLK frequency\n\t### Type: Gauge\n\t### Property: Read-only\n\n#### 17. amd_gpu_Usage\n        ### Description: Displays the GPU Use percent\n        ### Type: Gauge\n        ### Property: Read-only\n\n#### 18. amd_gpu_memory_busy percent\n        ### Description: Displays the GPU Memory busy percent\n        ### Type: Gauge\n        ### Property: Read-only\n\n## FAQs:\n\n* If the prometheus service fails to start properly,\n   run the command ```journalctl -u prometheus -f --no-pager``` \n\n* If an issue is related to \"Web lister busy\" or \"Port is already in use\",\n  Please change Port from 9090 to 9091 in the following files\n\n\t* /etc/systemd/system/prometheus.service file\n\t\t- under line \"--web.listen-address=0.0.0.0:9090\"\n\t* /etc/prometheus/prometheus.yml file\n\t\t- under line \"targets: [\"localhost:9090\"]\n\n\tand restart the systemd service using command \"service prometheus restart\".\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Famd%2Famd_smi_exporter","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Famd%2Famd_smi_exporter","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Famd%2Famd_smi_exporter/lists"}