An open API service indexing awesome lists of open source software.

https://github.com/princetonuniversity/jobstats

Jobstats is a job monitoring platform for CPU and GPU clusters
https://github.com/princetonuniversity/jobstats

Last synced: 6 months ago
JSON representation

Jobstats is a job monitoring platform for CPU and GPU clusters

Awesome Lists containing this project

README

          

[![License: GPL v2](https://img.shields.io/badge/License-GPL_v2-blue.svg)](https://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html)
[![Tests](https://github.com/PrincetonUniversity/jobstats/actions/workflows/tests.yml/badge.svg)](https://github.com/PrincetonUniversity/jobstats/actions?workflow=Tests)
[![DOI](https://img.shields.io/badge/doi-10.1145/3569951.3604396-blue.svg?style=flat&labelColor=whitesmoke&logo=data%3Aimage%2Fpng%3Bbase64%2CiVBORw0KGgoAAAANSUhEUgAAAB8AAAAfCAYAAAAfrhY5AAAJsklEQVR42qWXd1DTaRrHf%2BiB2Hdt5zhrAUKz4IKEYu9IGiGFFJJQ0gkJCAKiWFDWBRdFhCQUF3UVdeVcRQEBxUI3yY9iEnQHb3bdW1fPubnyz%2F11M7lvEHfOQee2ZOYzPyDv%2B3yf9%2Fk95YX4fx%2BltfUt08GcFEuPR4U9hDDZ%2FVngIlhb%2FSiI6InkTgLzgDcgfvtnovhH4BzoVlrbwr55QnhCtBW4QHXnFrZbPBaQoBh4%2FSYH2EnpBEtqcDMVzB93wA%2F8AFwa23XFGcc8CkT3mxz%2BfXWtq9T9IQlLIXYEuHojudb%2BCM7Hgdq8ydi%2FAHiBXyY%2BLjwFlAEnS6Jnar%2FvnQVhvdzasad0eKvWZKe8hvDB2ofLZ%2FZEcWsh%2BhyIuyO5Bxs2iZIE4nRv7NWAb0EO8AC%2FWPxjYAWuOEX2MSXZVgPxzmRL3xKz3ScGpx6p6QnOx4mDIFqO0w6Q4fEhO5IzwxlSwyD2FYHzwAW%2BAZ4fEsf74gCumykwNHskLM7taQxLYjjIyy8MUtraGhTWdkfhkFJqtvuVl%2F9l2ZquDfEyrH8B0W06nnpH3JtIyRGpH1iJ6SfxDIHjRXHJmdQjLpfHeN54gnfFx4W9QRnovx%2FN20aXZeTD2J84hn3%2BqoF2Tqr14VqTPUCIcP%2B5%2Fly4qC%2BUL3sYxSvNj1NwsVYPsWdMUfomsdkYm3Tj0nbV0N1wRKwFe1MgKACDIBdMAhPE%2FwicwNWxll8Ag40w%2BFfhibJkGHmutjYeQ8gVlaN%2BjO51nDysa9TwNUFMqaGbKdRJZFfOJSp6mkRKsv0rRIpEVWjAvyFkxNOEpwvcAVPfEe%2Bl8ojeNTx3nXLBcWRrYGxSRjDEk0VlpxYrbe1ZmaQ5xuT0u3r%2B2qe5j0J5uytiZPGsRL2Jm32AldpxPUNJ3jmmsN4x62z1cXrbedXBQf2yvIFCeZrtyicZZG2U2nrrBJzYorI2EXLrvTfCSB43s41PKEvbZDEfQby6L4JTj%2FfIwam%2B4%2BwucBu%2BDgNK05Nle1rSt9HvR%2FKPC4U6LTfvUIaip1mjIa8fPzykii23h2eanT57zQ7fsyYH5QjywwlooAUcAdOh5QumgTHx6aAO7%2FL52eaQNEShrxfhL6albEDmfhGflrsT4tps8gTHNOJbeDeBlt0WJWDHSgxs6cW6lQqyg1FpD5ZVDfhn1HYFF1y4Eiaqa18pQf3zzYMBhcanlBjYfgWNayAf%2FASOgklu8bmgD7hADrk4cRlOL7NSOewEcbqSmaivT33QuFdHXj5sdvjlN5yMDrAECmdgDWG2L8P%2BAKLs9ZLZ7dJda%2BB4Xl84t7QvnKfvpXJv9obz2KgK8dXyqISyV0sXGZ0U47hOA%2FAiigbEMECJxC9aoKp86re5O5prxOlHkcksutSQJzxZRlPZmrOKhsQBF5zEZKybUC0vVjG8PqOnhOq46qyDTDnj5gZBriWCk4DvXrudQnXQmnXblebhAC2cCB6zIbM4PYgGl0elPSgIf3iFEA21aLdHYLHUQuVkpgi02SxFdrG862Y8ymYGMvXDzUmiX8DS5vKZyZlGmsSgQqfLub5RyLNS4zfDiZc9Edzh%2FtCE%2BX8j9k%2FqWB071rcZyMImne1SLkL4GRw4UPHMV3jjwEYpPG5uW5fAEot0aTSJnsGAwHJi2nvF1Y5OIqWziVCQd5NT7t6Q8guOSpgS%2Fa1dSRn8JGGaCD3BPXDyQRG4Bqhu8XrgAp0yy8DMSvvyVXDgJcJTcr1wQ2BvFKf65jqhvmxXUuDpGBlRvV36XvGjQzLi8KAKT2lYOnmxQPGorURSV0NhyTIuIyqOmKTMhQ%2BieEsgOgpc4KBbfDM4B3SIgFljvfHF6cef7qpyLBXAiQcXvg5l3Iunp%2FWv4dH6qFziO%2BL9PbrimQ9RY6MQphEfGUpOmma7KkGzuS8sPUFnCtIYcKCaI9EXo4HlQLgGrBjbiK5EqMj2AKWt9QWcIFMtnVvQVDQV9lXJJqdPVtUQpbh6gCI2Ov1nvZts7yYdsnvRgxiWFOtNJcOMVLn1vgptVi6qrNiFOfEjHCDB3J%2BHDLqUB77YgQGwX%2Fb1eYna3hGKdlqJKIyiE4nSbV8VFgxmxR4b5mVkkeUhMgs5YTi4ja2XZ009xJRHdkfwMi%2BfocaancuO7h%2FMlcLOa0V%2FSw6Dq47CumRQAKhgbOP8t%2BMTjuxjJGhXCY6XpmDDFqWlVYbQ1aDJ5Cptdw4oLbf3Ck%2BdWkVP0LpH7s9XLPXI%2FQX8ws%2Bj2In63IcRvOOo%2BTTjiN%2BlssfRsanW%2B3REVKoavBOAPTXABW4AL7e4NygHdpAKBscmlDh9Jysp4wxbnUNna3L3xBvyE1jyrGIkUHaqQMuxhHElV6oj1picvgL1QEuS5PyZTEaivqh5vUCKJqOuIgPFGESns8kyFk7%2FDxyima3cYxi%2FYOQCj%2F%2B9Ms2Ll%2Bhn4FmKnl7JkGXQGDKDAz9rUGL1TIlBpuJr9Be2JjK6qPzyDg495UxXYF7JY1qKimw9jWjF0iV6DRIqE%2B%2FeWG0J2ofmZTk0mLYVd4GLiFCOoKR0Cg727tWq981InYynvCuKW43aXgEjofVbxIqrm0VL76zlH3gQzWP3R3Bv9oXxclrlO7VVtgBRpSP4hMFWJ8BrUSBCJXC07l40X4jWuvtc42ofNCxtlX2JH6bdeojXgTh5TxOBKEyY5wvBE%2BACh8BtOPNPkApjoxi5h%2B%2FFMQQNpWvZaMH7MKFu5Ax8HoCQdmGkJrtnOiLHwD3uS5y8%2F2xTSDrE%2F4PT1yqtt6vGe8ldMBVMEPd6KwqiYECHDlfbvzphcWP%2BJiZuL5swoWQYlS%2Br7Yu5mNUiGD2retxBi9fl6RDGn4Ti9B1oyYy%2BMP5G87D%2FCpRlvdnuy0PY6RC8BzTA40NXqckQ9TaOUDywkYsudxJzPgyDoAWn%2BB6nEFbaVxxC6UXjJiuDkW9TWq7uRBOJocky9iMfUhGpv%2FdQuVVIuGjYqACbXf8aa%2BPeYNIHZsM7l4s5gAQuUAzRUoT51hnH3EWofXf2vkD5HJJ33vwE%2FaEWp36GHr6GpMaH4AAPuqM5eabH%2FhfG9zcCz4nN6cPinuAw6IHwtvyB%2FdO1toZciBaPh25U0ducR2PI3Zl7mokyLWKkSnEDOg1x5fCsJE9EKhH7HwFNhWMGMS7%2BqxyYsbHHRUDUH4I%2FAheQY7wujJNnFUH4KdCju83riuQeHU9WEqNzjsJFuF%2FdTDAZ%2FK7%2F1WaAU%2BAWymT59pVMT4g2AxcwNa0XEBDdBDpAPvgDIH73R25teeuAF5ime2Ul0OUIiG4GpSAEJeYW9wDTf43wfwHgHLKJoPznkwAAAABJRU5ErkJggg%3D%3D)](https://doi.org/10.1145/3569951.3604396)

# Jobstats

Jobstats is a free and open-source job monitoring platform designed for CPU and GPU clusters that use the Slurm workload manager.

The `jobstats` command provides users with a job efficiency report:

```
$ jobstats 39798795

================================================================================
Slurm Job Statistics
================================================================================
Job ID: 39798795
User/Account: aturing/math
Job Name: sys_logic_ordinals
State: COMPLETED
Nodes: 2
CPU Cores: 48
CPU Memory: 256GB (5.3GB per CPU-core)
GPUs: 4
QOS/Partition: della-gpu/gpu
Cluster: della
Start Time: Fri Mar 4, 2022 at 1:56 AM
Run Time: 18:41:56
Time Limit: 4-00:00:00

Overall Utilization
================================================================================
CPU utilization [||||| 10%]
CPU memory usage [||| 6%]
GPU utilization [|||||||||||||||||||||||||||||||||| 68%]
GPU memory usage [||||||||||||||||||||||||||||||||| 66%]

Detailed Utilization
================================================================================
CPU utilization per node (CPU time used/run time)
della-i14g2: 1-21:41:20/18-16:46:24 (efficiency=10.2%)
della-i14g3: 1-18:48:55/18-16:46:24 (efficiency=9.5%)
Total used/runtime: 3-16:30:16/37-09:32:48, efficiency=9.9%

CPU memory usage per node - used/allocated
della-i14g2: 7.9GB/128.0GB (335.5MB/5.3GB per core of 24)
della-i14g3: 7.8GB/128.0GB (334.6MB/5.3GB per core of 24)
Total used/allocated: 15.7GB/256.0GB (335.1MB/5.3GB per core of 48)

GPU utilization per node
della-i14g2 (GPU 0): 65.7%
della-i14g2 (GPU 1): 64.5%
della-i14g3 (GPU 0): 72.9%
della-i14g3 (GPU 1): 67.5%

GPU memory usage per node - maximum used/total
della-i14g2 (GPU 0): 26.5GB/40.0GB (66.2%)
della-i14g2 (GPU 1): 26.5GB/40.0GB (66.2%)
della-i14g3 (GPU 0): 26.5GB/40.0GB (66.2%)
della-i14g3 (GPU 1): 26.5GB/40.0GB (66.2%)

Notes
================================================================================
* This job only used 6% of the 256GB of total allocated CPU memory. For
future jobs, please allocate less memory by using a Slurm directive such
as --mem-per-cpu=1G or --mem=10G. This will reduce your queue times and
make the resources available to other jobs. For more info:
https://researchcomputing.princeton.edu/support/knowledge-base/memory

* This job only needed 19% of the requested time which was 4-00:00:00. For
future jobs, please request less time by modifying the --time Slurm
directive. This will lower your queue times and allow the Slurm job
scheduler to work more effectively for all users. For more info:
https://researchcomputing.princeton.edu/support/knowledge-base/slurm

* For additional job metrics including metrics plotted against time:
https://mydella.princeton.edu/pun/sys/jobstats (VPN required off-campus)
```

### Getting Started

Begin with [What is Jobstats?](https://princetonuniversity.github.io/jobstats/) in the documentation.

### Users of the Jobstats Platform

- Brown University - Center for Computation and Visualization
- Free University of Berlin - High-Performance Computing
- George Mason University - Office of Research Computing
- Monash University - e-Research
- Princeton University - Computer Science Department
- Princeton University - Research Computing
- University of Queensland - Research Computing Centre
- University of Virginia - Research Computing
- Yale University - Center for Research Computing
- and more