{"id":21915605,"url":"https://github.com/clustercockpit/go-rocm-smi","last_synced_at":"2025-10-09T14:30:22.418Z","repository":{"id":38826119,"uuid":"493854812","full_name":"ClusterCockpit/go-rocm-smi","owner":"ClusterCockpit","description":"Golang interface for the AMD ROCm SMI library","archived":false,"fork":false,"pushed_at":"2023-06-12T14:15:17.000Z","size":134,"stargazers_count":3,"open_issues_count":0,"forks_count":1,"subscribers_count":3,"default_branch":"main","last_synced_at":"2024-12-29T05:08:38.530Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ClusterCockpit.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-05-18T23:24:10.000Z","updated_at":"2024-07-08T16:17:11.000Z","dependencies_parsed_at":"2024-06-20T07:12:25.021Z","dependency_job_id":"792a387b-2dfe-4d39-9fb7-c24ab5fb5686","html_url":"https://github.com/ClusterCockpit/go-rocm-smi","commit_stats":null,"previous_names":[],"tags_count":6,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ClusterCockpit%2Fgo-rocm-smi","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ClusterCockpit%2Fgo-rocm-smi/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ClusterCockpit%2Fgo-rocm-smi/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ClusterCockpit%2Fgo-rocm-smi/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ClusterCockpit","download_url":"https://codeload.github.com/ClusterCockpit/go-rocm-smi/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":235827195,"owners_count":19051205,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-28T19:13:04.087Z","updated_at":"2025-10-09T14:30:17.010Z","avatar_url":"https://github.com/ClusterCockpit.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Introduction\n\nThis is an unofficial interface to the AMD ROCM SMI library for Golang applications. It is heavily\ninspired by [`go-nvml`](https://github.com/NVIDIA/go-nvml) by also using [`cgo`](https://golang.org/cmd/cgo/), [`c-for-go`](https://c.for-go.com/) and its [`dlopen` wrapper](https://github.com/NVIDIA/go-nvml/tree/main/pkg/dl).\n\nThis Golang interface is planned to be used in [cc-metric-collector](https://github.com/ClusterCockpit/cc-metric-collector).\n\n**Disclaimer**: These bindings are created without any collaboration with AMD. Use them as you like but we, the developers of these bindings, are not responsible for any damage or anything that was caused by them. If you want official Golang bindings for the ROCm SMI library, use [this](https://github.com/amd/go_amd_smi) package.\n\n# Usage\n\n```go\npackage main\n\nimport (\n\t\"fmt\"\n\t\"log\"\n\n\t\"github.com/ClusterCockpit/go-rocm-smi/pkg/rocm_smi\"\n)\n\nfunc main() {\n\tret := rocm_smi.Init()\n\tif ret != rocm_smi.STATUS_SUCCESS {\n\t\tlog.Fatalf(\"Unable to initialize ROCM SMI: %v\", rocm_smi.StatusStringNoError(ret))\n\t}\n\tdefer func() {\n\t\tret := rocm_smi.Shutdown()\n\t\tif ret != rocm_smi.STATUS_SUCCESS {\n\t\t\tlog.Fatalf(\"Unable to shutdown ROCM SMI: %v\", rocm_smi.StatusStringNoError(ret))\n\t\t}\n\t}()\n\n\tcount, ret := rocm_smi.NumMonitorDevices()\n\tif ret != rocm_smi.STATUS_SUCCESS {\n\t\tlog.Fatalf(\"Unable to get device count: %v\", rocm_smi.StatusStringNoError(ret))\n\t}\n\n\tfor i := 0; i \u003c count; i++ {\n\t\tdevice, ret := rocm_smi.DeviceGetHandleByIndex(i)\n\t\tif ret != rocm_smi.STATUS_SUCCESS {\n\t\t\tlog.Fatalf(\"Unable to get device at index %d: %v\", i, rocm_smi.StatusStringNoError(ret))\n\t\t}\n\n\t\tuuid, ret := device.GetUniqueId()\n\t\tif ret != rocm_smi.STATUS_SUCCESS {\n\t\t\tlog.Fatalf(\"Unable to get uuid of device at index %d: %v\", i, rocm_smi.StatusStringNoError(ret))\n\t\t}\n\n\t\tfmt.Printf(\"%v\\n\", uuid)\n\t}\n}\n```\n\nThe `librocm_smi64.so` is dynamically loaded by the `rocm_smi` package. Make sure that the directory containing this library is in your `LD_LIBRARY_PATH`.\n\n# Documentation\nSee [pkg.go.dev](https://pkg.go.dev/github.com/ClusterCockpit/go-rocm-smi).\n\n# Generating the bindings\n\n## ROCm SMI Headers\n\nThere are three ROCm SMI Headers, all located at `rocm_smi/rocm_smi`\n- `rocm_smi.h`\n- `rocm_smi64Config.h`\n- `kfd_ioctl.h`\n\nThe files are copied from ROCm 5.1.0. For the generation, the `rocm_smi.h` header is changed to support [`c-for- go`](https://c.for-go.com/)'s parser.\n- All occurences of `uint64_t` are changed to `unsigned long long`, otherwise [`c-for-go`](https://c.for-go.com/) wouldn't use Golang's `uint64` type.\n- All occurences of `int64_t` are changed to `long long`, otherwise [`c-for-go`](https://c.for-go.com/) wouldn't use Golang's `int64` type.\n- The `union id` is renamed to `union id_rename` to avoid problems with clang. The type is never addressed with the name `id` but a `typedef` name.\n\n## Generation\n\nCalling [`c-for-go`](https://c.for-go.com/) with the `rocm_smi.yml` as input\n\n## Post processing\n\nAfter the generation, the `types.go` file still contains the C types but it is more suitable to have\nGolang types for them. Luckly [`cgo`](https://golang.org/cmd/cgo/) has a bootstrapping option `-godefs` to\ngenerate the Go types.\n\nBefore:\n```go\ntype RSMI_pcie_bandwidth C.rsmi_pcie_bandwidth_t\n```\nAfter:\n```go\ntype RSMI_pcie_bandwidth struct {\n\tRate\tRSMI_frequencies\n\tLanes\t[32]uint32\n}\n```\n\n## Manual labor\n\nIn the end, the generated functions are wrapped to have more Golang style. This is similar to the\nwrappers created in [`go-nvml`](https://github.com/NVIDIA/go-nvml). Most of them are straight-forward\nwith a little bit of casting.\n\n```go\n// rocm_smi.DeviceGetSerial()\nfunc DeviceGetSerial(Device DeviceHandle) (string, RSMI_status) {\n\tvar Serial []byte = make([]byte, 100)\n\tsptr := \u0026Serial[0]\n\tret := rsmi_dev_serial_number_get(Device.index, sptr, 100)\n\treturn bytes2String(Serial), ret\n}\n\nfunc (Device DeviceHandle) DeviceGetSerial() (string, RSMI_status) {\n\treturn DeviceGetSerial(Device)\n}\n```\n\n\n\n# The device index and the \"device handle\"\n\nFor most libraries which handle multiple devices ([`go-nvml`](https://github.com/NVIDIA/go-nvml) is an example), the user at first requests a handle for each device, mostly through the logical index in the list of available devices. The official `rocm_smi` library uses the logical index instead but in order to get everything right, you have to do quite some work to know what is supported. The `rocm_smi` provides a feature (`APISupport` in `rocm_smi.h`) to determine which functions are supported for a device and if a function accepts arguments, which ones are valid for this device. An example would be the function to get the firmware version and the list of GPU parts that provide such a version. The `go-rocm-smi` bindings introduce a virtual type `DeviceHandle`, retrivable through the logical index (so similar to [`go-nvml`](https://github.com/NVIDIA/go-nvml)), which encapsulates the `APISupport` lookup: `DeviceGetHandleByIndex()`. The `DeviceHandle` is used for all device related calls in `go-rocm-smi`. You can get the logical index by `deviceHandle.Index()`, the **not** unique ID of a GPU by `deviceHandle.ID()` and the list of supported functions through `deviceHandle.Supported()`\n\n\n# Problems\n- One big problem is currently, that [`c-for-go`](https://c.for-go.com/) does not generate `uint64` types for the C type `uint64_t`. It is one of the main data type used in the ROCm SMI headers. While I was able to generate underlying code for `uint64_t`, the Golang function still uses `uint32`:\n  ```C\n  rsmi_status_t rsmi_dev_unique_id_get(uint32_t dv_ind, uint64_t *id);\n  ```\n  Output:\n  ```go\n  func rsmi_dev_unique_id_get(Dv_ind uint32, Id *uint32) RSMI_status {\n\tcDv_ind, cDv_indAllocMap := (C.uint32_t)(Dv_ind), cgoAllocsUnknown\n\tcId, cIdAllocMap := (*C.uint64_t)(unsafe.Pointer(Id)), cgoAllocsUnknown\n\t__ret := C.rsmi_dev_unique_id_get(cDv_ind, cId)\n\truntime.KeepAlive(cIdAllocMap)\n\truntime.KeepAlive(cDv_indAllocMap)\n\t__v := (RSMI_status)(__ret)\n\treturn __v\n  }\n  ```\n  One can see, that the `cId` is casted to `*C.uint64_t`, but the `Id` variable used by the function is `*uint32`. I was not able to persuade [`c-for-go`](https://c.for-go.com/) to use `uint64`. See also https://github.com/xlab/c-for-go/issues/120. As a workaround, `uint64_t` gets replaced by `unsigned long long` and `int64_t` gets replaced by `long long`, see `Makefile`. Interestingly, the translation of the C types to Golang types with [`cgo`](https://golang.org/cmd/cgo/) generates `uint64` without the type exchange in the header. If we wouldn't use `unsigned long long`, the `uint32` generated by [`c-for-go`](https://c.for-go.com/) would clash with the `uint64` generated by [`cgo`](https://golang.org/cmd/cgo/).\n\n- The symbol `rsmi_dev_sku_get` is defined by the `rocm_smi.h` header but on the test system with ROCm 5.1.0, the symbol lookup fails. There is now an `updateFunctionPointers()` function that is called at `Init()`. This is quite similar the function `updateVersionedSymbols()` in [`go-nvml`](https://github.com/NVIDIA/go-nvml). The `APISupport` feature of the `rocm_smi` library shows, `rsmi_dev_sku_get` is supported by the device.\n\n- The function `rsmi_status_string` cannot use the wrapper generated by [`c-for-go`](https://c.for-go.com/) because it requires a pointer to a `char` array while [`c-for-go`](https://c.for-go.com/) wants to use the `char` array directly. There is a manually created version to get the status string `StatusString()`. One issue is when using it in prints (see example) because `rsmi_status_string` accepts a status and returns a new status and the string. To drop the new status, use `StatusStringNoError()`.\n\n- I havn't found a way to access the `Build` field in `RSMI_version`. It is a `char*` in `rocm_smi` but [`c-for-go`](https://c.for-go.com/) generates an `*int8` entry for it.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fclustercockpit%2Fgo-rocm-smi","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fclustercockpit%2Fgo-rocm-smi","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fclustercockpit%2Fgo-rocm-smi/lists"}