https://github.com/abinthomasonline/gpu-stat
A simple tool to monitor NVIDIA GPU usage on remote machines
https://github.com/abinthomasonline/gpu-stat
click gpu mlops nvidia-smi paramiko python ssh streamlit
Last synced: about 2 months ago
JSON representation
A simple tool to monitor NVIDIA GPU usage on remote machines
- Host: GitHub
- URL: https://github.com/abinthomasonline/gpu-stat
- Owner: abinthomasonline
- License: mit
- Created: 2025-03-22T12:23:28.000Z (7 months ago)
- Default Branch: main
- Last Pushed: 2025-03-25T18:09:28.000Z (7 months ago)
- Last Synced: 2025-07-20T16:42:13.591Z (3 months ago)
- Topics: click, gpu, mlops, nvidia-smi, paramiko, python, ssh, streamlit
- Language: Python
- Homepage:
- Size: 141 KB
- Stars: 2
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# GPU Stat Monitor
A simple tool to monitor NVIDIA GPU usage on remote machines with minimal setup.

## Overview
GPU Stat Monitor is a Python-based tool that:
1. Connects to remote machines via SSH
2. Collects GPU stats by running `nvidia-smi` commands remotely
3. Stores the data locally in CSV format
4. Provides a web-based dashboard for visualization## Features
- Single command to start monitoring: `gpu-stat run`
- Multiple remote machine support
- Simple SSH configuration
- Zero setup on remote machines (only requires NVIDIA drivers)
- Metrics tracked:
- GPU utilization
- Memory usage
- Temperature
- Power consumption
- Running processes## Installation
```bash
# install from git url
pip install git+https://github.com/abinthomasonline/gpu-stat.git
``````bash
# or clone and install
git clone https://github.com/abinthomasonline/gpu-stat.git
cd gpu-stat
pip install .
```## Usage
### Basic Usage
```bash
# Start monitoring with interactive setup
gpu-stat run --config config.yaml
```### Configuration File
Create a `config.yaml` file:
```yaml
servers:
- name: ML Server 1
host: ml1.example.com
user: researcher
key_path: ~/.ssh/ml_key
interval: 5 # seconds
port: 12889- name: ML Server 2
host: ml2.example.com
user: researcher
key_path: ~/.ssh/ml_key
interval: 10
port: 22settings:
data_dir: ./data
default_interval: 5
```## Project Structure
```
gpu-stat/
├── gpu_stat/
│ ├── __init__.py
│ ├── cli.py # Command-line interface
│ ├── ssh_client.py # SSH connection handler
│ ├── data_collector.py # Remote data collection
│ ├── data_store.py # Local data storage client
│ ├── dashboard.py # Streamlit dashboard
└── setup.py
```## Contributing
Contributions are welcome! Please open an issue or submit a pull request.
## License
MIT License - see [LICENSE](LICENSE) for details