https://github.com/ohsawa0515/gcp-gpu-stackdriver-reporting
This repository provides a tool that sends metrics on GPU utilization on Google Compute Engine (GCE) to Stackdriver.
https://github.com/ohsawa0515/gcp-gpu-stackdriver-reporting
gce gcp go golang nvidia nvml stackdriver
Last synced: 3 months ago
JSON representation
This repository provides a tool that sends metrics on GPU utilization on Google Compute Engine (GCE) to Stackdriver.
- Host: GitHub
- URL: https://github.com/ohsawa0515/gcp-gpu-stackdriver-reporting
- Owner: ohsawa0515
- License: mit
- Created: 2019-11-15T04:00:07.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2019-12-16T07:14:46.000Z (over 6 years ago)
- Last Synced: 2024-06-20T11:52:05.177Z (almost 2 years ago)
- Topics: gce, gcp, go, golang, nvidia, nvml, stackdriver
- Language: Go
- Homepage:
- Size: 21.5 KB
- Stars: 2
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# gcp-gpu-stackdriver-reporting
This repository provides a tool that sends metrics on GPU utilization on Google Compute Engine (GCE) to Stackdriver.
This tools is able to supports Linux only.
- Ubuntu 16.04/18.04
## Installation
### Download binary
Download it from [releases page](https://github.com/ohsawa0515/gcp-gpu-stackdriver-reporting/releases) and extract it to `/usr/local/bin`.
```console
$ curl -L -O https://github.com/ohsawa0515/gcp-gpu-stackdriver-reporting/releases/download//gcp-gpu-stackdriver-reporting_linux_amd64.tar.gz
$ tar zxf gcp-gpu-stackdriver-reporting_linux_amd64.tar.gz
$ mv ./gcp-gpu-stackdriver-reporting /usr/local/bin/
$ chmod +x /usr/local/bin/gcp-gpu-stackdriver-reporting
```
### go get
```console
$ go get github.com/ohsawa0515/gcp-gpu-stackdriver-reporting
$ mv $GOPATH/gcp-gpu-stackdriver-reporting /usr/local/bin/
$ chmod +x /usr/local/bin/gcp-gpu-stackdriver-reporting
```
## Run as systemd
```console
$ cat <<-EOH > /lib/systemd/system/gcp-gpu-stackdriver-reporting.service
[Unit]
Description=GPU Utilization Metric Reporting
[Service]
Type=simple
PIDFile=/run/gcp-gpu-stackdriver-reporting.pid
ExecStart=/usr/local/bin/gcp-gpu-stackdriver-reporting
User=root
Group=root
WorkingDirectory=/
Restart=always
[Install]
WantedBy=multi-user.target
EOH
$ systemctl daemon-reload
$ systemctl enable gcp-gpu-stackdriver-reporting.service
$ systemctl start gcp-gpu-stackdriver-reporting.service
```
## Run as docker container
NVIDIA driver is required. Please install from [here](https://github.com/NVIDIA/nvidia-docker#quickstart).
```console
$ docker pull ohsawa0515/gcp-gpu-stackdriver-reporting:latest
$ docker run -d --runtime=nvidia --rm ohsawa0515/gcp-gpu-stackdriver-reporting:latest
```
## Run as Google Kubernetes Engine(GKE)
### Create GKE cluster and node pools with GPU
See [document](https://cloud.google.com/kubernetes-engine/docs/how-to/gpus?hl=en#gpu_pool).
### Installing NVIDIA GPU device drivers.
```console
$ kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/container-engine-accelerators/master/nvidia-driver-installer/cos/daemonset-preloaded.yaml
```
### gcp-gpu-stackdriver-reporting apply into GKE as daemonset
```console
$ kubectl apply -f daemonset-sample.yaml
```