https://github.com/djenriquez/cudamon
NVIDIA Cuda Monitor
https://github.com/djenriquez/cudamon
Last synced: about 1 month ago
JSON representation
NVIDIA Cuda Monitor
- Host: GitHub
- URL: https://github.com/djenriquez/cudamon
- Owner: djenriquez
- License: mit
- Created: 2018-01-25T05:12:05.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2018-01-26T04:19:11.000Z (over 7 years ago)
- Last Synced: 2025-02-01T18:42:48.991Z (3 months ago)
- Language: Python
- Size: 19.5 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# cudamon
NVIDIA Cuda Monitor# Summary
CUDAMon is an NVIDIA GPU monitor with the ability to notify on problematic events. CUDAMon is configured to monitor a metric against a threshold, then notify you incase that threshold is surpassed. Supported metrics:
- Temperature
- Utilization## Dependencies:
- Docker-CE (17.02+ recommeded)
- [NVIDIA-Docker](https://github.com/NVIDIA/nvidia-docker)
- AWS Account (for access to SNS)## Environment Variables:
- `TIMEOUT_MINS`: The length of time in minutes between notification alerts. Default 10.
- `POLL_SECS`: The length of time in seconds between each nvidia-smi poll. Default 60.
- `GPU_UTIL_`: The utilization threshold that the specific GPU model must stay above. Default 80.
- `GPU_TEMP_`: The temperature threshold that the specific GPU model must stay below. Default 75.
- `SNS_TOPIC_ARN`: The AWS SNS topic ARN to publish to. Note: SNS messages are published in `raw` message structure.
- `AWS_ACCESS_KEY_ID`: The AWS Access Key ID /w publish access to the SNS topic `${SNS_TOPIC_ARN}`.
- `AWS_SECRET_ACCESS_KEY`: The AWS Secret Access Key associated with `${AWS_ACCESS_KEY_ID}`.# Run
```
docker run -d --name cudamon \
--runtime=nvidia \
-e TIMEOUT_MINS=10 \
-e POLL_SEC=15 \
-e GPU_UTIL_1070=90 \
-e GPU_UTIL_1080=90 \
-e GPU_TEMP_1080_TI=85 \
-e AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID} \
-e AWS_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY} \
-e SNS_TOPIC_ARN=${SNS_TOPIC_ARN} \
djenriquez/cudamon
```## Donations
- BTC - 33DyXVuy3R5jfLZRRpEQcXXAJ1Xz5rkGxE
- LTC - MUaov1JidbnpfeuQiSR3mtJhN3CN8Wj5g9
- ETH - 0xCBBC579Ac1Bc4868823fbBb2D8dDaFF93D619ceD
- DASH - Xy4cgJVAiHsrbeBB53NeQWk2iXKoWjBvJp
- ZEC - t1gYs8Zn2ZCFZWKZsTmZWd5bgXa9eD8M87K
- BCH - LOL