{"id":30624524,"url":"https://github.com/inureyes/all-smi","last_synced_at":"2025-08-30T17:17:41.003Z","repository":{"id":303241844,"uuid":"840533937","full_name":"inureyes/all-smi","owner":"inureyes","description":"Command-line utility for monitoring GPU hardware.","archived":false,"fork":false,"pushed_at":"2025-08-29T06:53:09.000Z","size":4011,"stargazers_count":81,"open_issues_count":2,"forks_count":7,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-08-29T08:32:35.313Z","etag":null,"topics":["backend-ai","cluster-computing","gpu","monitoring-tool"],"latest_commit_sha":null,"homepage":"https://github.com/inureyes/all-smi","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/inureyes.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2024-08-10T00:02:11.000Z","updated_at":"2025-08-29T06:53:12.000Z","dependencies_parsed_at":"2025-07-06T15:27:08.391Z","dependency_job_id":"331431dd-3827-472b-867b-fb568a542640","html_url":"https://github.com/inureyes/all-smi","commit_stats":null,"previous_names":["inureyes/all-smi"],"tags_count":24,"template":false,"template_full_name":null,"purl":"pkg:github/inureyes/all-smi","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/inureyes%2Fall-smi","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/inureyes%2Fall-smi/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/inureyes%2Fall-smi/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/inureyes%2Fall-smi/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/inureyes","download_url":"https://codeload.github.com/inureyes/all-smi/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/inureyes%2Fall-smi/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":272878337,"owners_count":25008340,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-30T02:00:09.474Z","response_time":77,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["backend-ai","cluster-computing","gpu","monitoring-tool"],"created_at":"2025-08-30T17:17:40.496Z","updated_at":"2025-08-30T17:17:40.986Z","avatar_url":"https://github.com/inureyes.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"# all-smi\n\n[![Crates.io version](https://img.shields.io/crates/v/all-smi.svg?style=flat-square)](https://crates.io/crates/all-smi)\n[![Crates.io downloads](https://img.shields.io/crates/d/all-smi.svg?style=flat-square)](https://crates.io/crates/all-smi)\n![CI](https://github.com/inureyes/all-smi/workflows/CI/badge.svg)\n[![dependency status](https://deps.rs/repo/github/inureyes/all-smi/status.svg)](https://deps.rs/repo/github/inureyes/all-smi)\n\n\n`all-smi` is a command-line utility for monitoring GPU and NPU hardware across multiple systems. It provides a real-time view of accelerator utilization, memory usage, temperature, power consumption, and other metrics. The tool is designed to be a cross-platform alternative to `nvidia-smi`, with support for NVIDIA GPUs, NVIDIA Jetson platforms, Apple Silicon GPUs, Tenstorrent NPUs, Rebellions NPUs, and Furiosa NPUs.\n\nThe application presents a terminal-based user interface with cluster overview, interactive sorting, and both local and remote monitoring capabilities. It also provides an API mode for Prometheus metrics integration.\n\n![screenshot](screenshots/all-smi-all-tab.png)\n\n\u003cp align=\"center\"\u003eAll-node view (remote mode)\u003c/p\u003e\n\n![screenshot](screenshots/all-smi-node-tab.png)\n\n\u003cp align=\"center\"\u003eNode view (remote mode)\u003c/p\u003e\n\n## Installation\n\n### Option 1: Install via Homebrew (macOS/Linux)\n\nThe easiest way to install all-smi on macOS and Linux is through Homebrew:\n\n```bash\nbrew tap lablup/tap\nbrew install all-smi\n```\n\n### Option 2: Install via Ubuntu PPA\n\nFor Ubuntu users, all-smi is available through the official PPA:\n\n```bash\n# Add the PPA repository\nsudo add-apt-repository ppa:lablup/backend-ai\nsudo apt update\n\n# Install all-smi\nsudo apt install all-smi\n```\n\nThe PPA provides automatic updates and is maintained for Ubuntu 22.04 (Jammy) and 24.04 (Noble).\n\n### Option 3: Install via Debian Package\n\nFor Debian and other Debian-based distributions, download the `.deb` package from the [releases page](https://github.com/inureyes/all-smi/releases):\n\n```bash\n# Download the latest .deb package (replace VERSION with the actual version)\nwget https://github.com/inureyes/all-smi/releases/download/vVERSION/all-smi_VERSION_OS_ARCH.deb\n# Example: all-smi_0.7.0_ubuntu24.04.noble_amd64.deb\n\n# Install the package\nsudo dpkg -i all-smi_VERSION_OS_ARCH.deb\n\n# If there are dependency issues, fix them with:\nsudo apt-get install -f\n```\n\n### Option 4: Download Pre-built Binary\n\nDownload the latest release from the [GitHub releases page](https://github.com/inureyes/all-smi/releases):\n\n1. Go to https://github.com/inureyes/all-smi/releases\n2. Download the appropriate binary for your platform\n3. Extract the archive and place the binary in your `$PATH`\n\n### Option 5: Install from Cargo\n\nInstall all-smi through Cargo:\n\n```bash\ncargo install all-smi\n```\n\nAfter installation, the binary will be available in your `$PATH` as `all-smi`.\n\n### Option 6: Build from Source\n\nSee [Building from Source](DEVELOPERS.md#building-from-source) in the developer documentation.\n\n## Usage\n\n### Command Overview\n\n```bash\n# Show help\nall-smi --help\n\n# Local monitoring (requires sudo on macOS) - default when no command specified\nall-smi\nsudo all-smi local\n\n# Remote monitoring (requires API endpoints)\nall-smi view --hosts http://node1:9090 http://node2:9090\nall-smi view --hostfile hosts.csv\n\n# API mode (expose metrics server)\nall-smi api --port 9090\n```\n\n### Local Mode (Monitor Local Hardware)\n\nThe `local` mode monitors your local GPUs/NPUs with a terminal-based interface. This is the default when no command is specified.\n\n```bash\n# Monitor local GPUs (requires sudo on macOS)\nall-smi              # Default to local mode\nsudo all-smi local   # Explicit local mode\n\n# With custom refresh interval\nsudo all-smi local --interval 5\n```\n\n### Remote View Mode (Monitor Remote Nodes)\n\nThe `view` mode monitors multiple remote systems that are running in API mode. This mode requires specifying remote endpoints.\n\n```bash\n# Direct host specification (required)\nall-smi view --hosts http://gpu-node1:9090 http://gpu-node2:9090\n\n# Using host file (required)\nall-smi view --hostfile hosts.csv --interval 2\n```\n\n**Note:** The `view` command requires either `--hosts` or `--hostfile`. For local monitoring, use `all-smi local` instead.\n\nHost file format (CSV):\n```\nhttp://gpu-node1:9090\nhttp://gpu-node2:9090\nhttp://gpu-node3:9090\n```\n\n## Features\n\n### GPU Monitoring\n- **Real-time Metrics:** Displays comprehensive GPU information including:\n  - GPU Name and Driver Version\n  - Utilization Percentage with color-coded status\n  - Memory Usage (Used/Total in GB)\n  - Temperature in Celsius (or Thermal Pressure for Apple Silicon)\n  - Clock Frequency in MHz\n  - Power Consumption in Watts (2 decimal precision for Apple Silicon)\n- **Multi-GPU Support:** Handles multiple GPUs per system with individual monitoring\n- **Interactive Sorting:** Sort GPUs by utilization, memory usage, or default (hostname+index) order\n- **Platform-Specific Features:**\n  - NVIDIA: PCIe info, performance states, power limits\n  - NVIDIA Jetson: DLA utilization monitoring\n  - Apple Silicon: ANE power monitoring, thermal pressure levels\n  - Tenstorrent NPUs: Real-time telemetry via luwen library, board-specific TDP calculations\n  - Rebellions NPUs: Performance state monitoring, KMD version tracking, device status\n  - Furiosa NPUs: Per-core PE utilization, power governor modes, firmware version tracking\n  \n### CPU Monitoring\n- **Comprehensive CPU Metrics:**\n  - Real-time CPU utilization with per-socket breakdown\n  - Core and thread counts\n  - Frequency monitoring (P+E format for Apple Silicon)\n  - Temperature and power consumption\n- **Apple Silicon Enhanced:**\n  - P-core and E-core utilization tracking\n  - P-cluster and E-cluster frequency monitoring\n  - Integrated GPU core count\n\n### Memory Monitoring\n- **System Memory Tracking:**\n  - Total, used, available, and free memory\n  - Memory utilization percentage\n  - Swap space monitoring\n  - Linux: Buffer and cache memory tracking\n- **Visual Indicators:** Color-coded memory usage bars\n\n### Process Monitoring\n- **Enhanced GPU Process View:**\n  - Process ID (PID) and Parent PID\n  - Process Name and Command Line\n  - GPU Memory Usage with per-column coloring\n  - CPU usage percentage\n  - User and State Information\n- **Advanced Features:**\n  - Mouse click sorting on column headers\n  - Multi-criteria sorting (PID, memory, GPU memory, CPU usage)\n  - Per-column color coding for better visibility\n  - Full process tree integration\n\n### Cluster Management\n- **Cluster Overview Dashboard:** Real-time statistics showing:\n  - Total nodes and GPUs across the cluster\n  - Average utilization and memory usage\n  - Temperature statistics with standard deviation\n  - Total and average power consumption\n- **Live Statistics History:** Visual graphs showing utilization, memory, and temperature trends\n- **Tabbed Interface:** Switch between \"All\" view and individual host tabs\n- **Adaptive Update Intervals:**\n  - Local monitoring: 1 second (Apple Silicon) or 2 seconds (others)\n  - 1-10 remote nodes: 3 seconds\n  - 11-50 nodes: 4 seconds\n  - 51-100 nodes: 5 seconds\n  - 101+ nodes: 6 seconds\n\n### Cross-Platform Support\n- **Linux:** \n  - NVIDIA GPUs via NVML and nvidia-smi (fallback)\n  - CPU monitoring via /proc filesystem\n  - Memory monitoring with detailed statistics\n  - Tenstorrent NPUs (Grayskull, Wormhole, Blackhole) via luwen library\n  - Rebellions NPUs (ATOM, ATOM+, ATOM Max) via rbln-stat\n  - Furiosa NPUs (RNGD) via furiosa-smi\n- **macOS:** \n  - Apple Silicon (M1/M2/M3/M4) GPUs via powermetrics and Metal framework\n  - ANE (Apple Neural Engine) power tracking\n  - Thermal pressure monitoring\n  - P/E core architecture support\n- **NVIDIA Jetson:** \n  - Special support for Tegra-based systems\n  - DLA (Deep Learning Accelerator) monitoring\n\n### Remote Monitoring\n- **Multi-Host Support:** Monitor up to 256+ remote systems simultaneously\n- **Connection Management:** Optimized networking with:\n  - Connection pooling (200 idle connections per host)\n  - Concurrent connection limiting (64 max)\n  - Automatic retry with exponential backoff\n  - TCP keepalive for persistent connections\n  - Connection staggering to prevent overload\n- **Storage Monitoring:** Disk usage information for all hosts\n- **High Availability:** Resilient to connection failures with automatic recovery\n\n### Interactive UI\n- **Enhanced Controls:**\n  - Keyboard: Arrow keys, Page Up/Down, Tab switching\n  - Mouse: Click column headers to sort (process view)\n  - Sorting: 'd' (default), 'u' (utilization), 'g' (GPU memory), 'p' (PID), 'm' (memory), 'c' (CPU)\n  - Interface: '1'/'h' (help), 'q' (quit), ESC (close help)\n- **Visual Design:**\n  - Color-coded status: Green (≤60%), Yellow (60-80%), Red (\u003e80%)\n  - Per-column coloring in process view\n  - Responsive layout adapting to terminal size\n  - Double-buffered rendering for flicker-free display\n- **Help System:** Context-sensitive help with all keyboard shortcuts\n\n### Development \u0026 Testing\n- **Mock Server:** Built-in mock server for testing and development\n  - Simulates realistic GPU clusters with 8 GPUs per node\n  - Configurable port ranges for multiple instances\n  - Failure simulation for resilience testing\n  - Platform-specific metric generation (NVIDIA, Apple Silicon, Jetson, Tenstorrent, Rebellions, Furiosa)\n  - Background metric updates with realistic variations\n- **Performance Optimized:**\n  - Template-based response generation\n  - Efficient memory management\n  - Minimal CPU overhead\n\n### API Mode (Prometheus Metrics)\n\nExpose hardware metrics in Prometheus format for integration with monitoring systems:\n\n```bash\n# Start API server\nall-smi api --port 9090\n\n# Custom update interval (default: 3 seconds)\nall-smi api --port 9090 --interval 5\n\n# Include process information\nall-smi api --port 9090 --processes\n```\n\nMetrics are available at `http://localhost:9090/metrics` and include comprehensive hardware monitoring for:\n- **GPUs:** Utilization, memory, temperature, power, frequency (NVIDIA, Apple Silicon, Tenstorrent)\n- **CPUs:** Utilization, frequency, temperature, power (with P/E core metrics for Apple Silicon)\n- **Memory:** System and swap memory statistics\n- **Storage:** Disk usage information\n- **Processes:** GPU process metrics (with --processes flag)\n\nFor a complete list of all available metrics, see [API.md](API.md).\n\n### Quick Start with Make Commands\n\nFor development and testing, you can use the provided Makefile:\n\n```bash\n# Run local monitoring\nmake local\n\n# Run remote view mode with hosts file\nmake remote\n\n# Start mock server for testing\nmake mock\n\n# Build release version\nmake release\n\n# Run tests\nmake test\n```\n\n## Development\n\nFor development documentation including building from source, testing with mock servers, architecture details, and technology stack information, see [DEVELOPERS.md](DEVELOPERS.md).\n\n## Testing\n\nFor comprehensive testing documentation including unit tests, integration tests, and shell script tests, see [TESTING.md](TESTING.md).\n\n### Quick Test Commands\n```bash\n# Run all unit tests (no sudo required)\ncargo test\n\n# Run tests including those requiring sudo (macOS only)\nsudo cargo test -- --include-ignored\n\n# Run shell script tests for containers and real-world scenarios\ncd tests \u0026\u0026 make all\n```\n\n## Contributing\n\nContributions are welcome! Areas for contribution include:\n\n- **Platform Support:** Additional GPU vendors or operating systems\n- **Features:** New metrics, visualization improvements, or monitoring capabilities\n- **Performance:** Optimization for larger clusters or resource usage\n- **Documentation:** Examples, tutorials, or API documentation\n\nPlease submit pull requests or open issues for bugs, feature requests, or questions.\n\n## Acknowledgments\n\nThis project is being developed with tremendous help from [Claude Code](https://claude.ai/code) and [Gemini CLI](https://github.com/google-gemini/gemini-cli). These AI-powered development tools have been instrumental in accelerating the development process, improving code quality, and implementing complex features across multiple hardware platforms.\n\nThe journey of building all-smi with AI assistance has been a fascinating exploration of how domain expertise guides AI capabilities. From the initial three-day Rust learning sprint with Google AI Studio and ChatGPT to the recent development with Gemini CLI and Claude Code, this project demonstrates that the boundary of AI coding capability is tightly bound by the expertise of the person guiding it. [Read the full development story here](docs/AI_DEVELOPMENT_STORY.md).\n\n## License\n\nThis project is licensed under the Apache License 2.0.  \nSee the [LICENSE](./LICENSE) file for details.\n\n## Changelog\n\n### Recent Updates\n- **v0.9.0 (2025/08/29):** Separate local/remote monitoring commands, Backend.AI cluster auto-discovery, modular refactoring for better maintainability, and Prometheus metric fixes\n- **v0.8.0 (2025/08/08):** Container-aware resource monitoring, enhanced ARM CPU frequency detection, UI improvements for process list, license change to Apache 2.0, and PPA build enhancements\n- **v0.7.2 (2025/08/06):** Reorganize man page location in release archives, add GPU core count for Apple Silicon, animated loading progress bar, and fix display issues\n- **v0.7.1 (2025/08/03):** Add manpage for Debian/Ubuntu package, updated installation guide with PPA support, and fixed debian_build workflow\n- **v0.7.0 (2025/08/02):** Add Furiosa RNGD NPU support, Debian/Ubuntu PPA packaging, scrolling device names, and improved CI/CD workflows\n- **v0.6.3 (2025/07/28):** Add Rebellions ATOM NPU support with secure container monitoring\n- **v0.6.2 (2025/07/25):** Added multi-segment bar visualization with stacked memory display, CPU temperature for Linux, CPU cache detection, per-core CPU metrics, and fixed-width CPU display formatting\n- **v0.6.1 (2025/07/19):** Fixed multi-node view hanging, improved hostname handling, optimized network fetch, and updated Ubuntu release workflows\n- **v0.6.0 (2025/07/18):** Added Tenstorrent NPU support, improved UI alignment and terminal resize handling, modularized API metrics, and enhanced disk filtering\n- **v0.5.0 (2025/07/12):** Enhanced Apple Silicon support with ANE power in watts, P+E frequency display, thermal pressure text, interactive process sorting, and configurable PowerMetrics intervals\n- **v0.4.3 (2025/07/11):** Fix P-CPU/E-CPU gauges for all Apple Silicon variants (M1/M2/M3/M4) including M1 Pro hybrid format\n- **v0.4.2 (2025/07/10):** Eliminate PowerMetrics temp file growth with in-memory buffer, Homebrew installation support\n- **v0.4.1 (2025/07/10):** Mock server improvements, efficient Apple Silicon and NVidia GPU support\n- **v0.4.0 (2025/07/08):** Architectural refactoring, Smart sudo detection and comprehensive unit testing\n- **v0.3.3 (2025/07/07):** CPU, Memory, and ANE support, and UI fixes\n- **v0.3.2 (2025/07/06):** Cargo.toml for publishing and release process\n- **v0.3.1 (2025/07/06):** GitHub actions and Dockerfile, and UI fixes\n- **v0.3.0 (2025/07/06):** Multi-architecture support, optimized space allocation, enhanced UI\n- **v0.2.2 (2025/07/06):** GPU sorting functionality with hotkeys\n- **v0.2.1 (2025/07/05):** Help system improvements and code refactoring\n- **v0.2.0 (2025/07/05):** Remote monitoring and cluster management features\n- **v0.1.1 (2025/07/04):** ANE (Apple Neural Engine) support, page navigation keys, and scrolling fixes\n- **v0.1.0 (2024/08/11):** Initial release with local GPU monitoring","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Finureyes%2Fall-smi","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Finureyes%2Fall-smi","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Finureyes%2Fall-smi/lists"}