Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/nbuchwitz/check_pve
Icinga check command for Proxmox VE via API
https://github.com/nbuchwitz/check_pve
hacktoberfest icinga2 monitoring proxmox proxmox-ve pve
Last synced: 2 days ago
JSON representation
Icinga check command for Proxmox VE via API
- Host: GitHub
- URL: https://github.com/nbuchwitz/check_pve
- Owner: nbuchwitz
- License: gpl-2.0
- Created: 2018-01-09T16:45:57.000Z (almost 7 years ago)
- Default Branch: main
- Last Pushed: 2024-06-30T10:23:13.000Z (6 months ago)
- Last Synced: 2024-12-24T19:08:45.694Z (9 days ago)
- Topics: hacktoberfest, icinga2, monitoring, proxmox, proxmox-ve, pve
- Language: Python
- Homepage:
- Size: 165 KB
- Stars: 113
- Watchers: 10
- Forks: 36
- Open Issues: 9
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
README
# check_pve
Icinga check command for Proxmox VE via API![Linter](https://github.com/nbuchwitz/check_pve/actions/workflows/lint.yml/badge.svg)
## Setup
### Requirements
This check command depends on **Python 3** and the following modules:
* requests
* argparse
* packaging**Installation on Debian / Ubuntu**
```
apt install python3 python3-requests python3-packaging
```**Installation on Rocky / Alma Linux 9**
```
yum install python3 python3-requests python3-packaging
```**Installation on FreeBSD**
```
pkg install python3 py39-requests py39-packaging
```**Installation from requirements file**
```
pip3 install -r requirements.txt
```**Installation as Docker container**
```
docker build -t check_pve .
```
After this, you can start the container like so:
```
docker run -d --name check_pve --rm check_pve
```
The container will keep running without having the need for any of the requirements listed above (for environments that do not support this).
Running a check is as simple as:
```
docker exec check_pve python check_pve.py ....rest of the default arguments listed below....
```### Create a API user in Proxmox VE
Create a role named ``Monitoring`` and assign necessary privileges:
```
pveum roleadd Monitoring
pveum rolemod Monitoring --privs VM.Monitor,Sys.Audit,Sys.Modify,Datastore.Audit,VM.Audit
```Create a user named ``monitoring`` and set password:
```
pveum useradd monitoring@pve --comment "The ICINGA 2 monitoring user"
```#### Use token based authorization (recommended)
Create an API token named `monitoring` for the user `monitoring`:
```
pveum user token add monitoring@pve monitoring
```Please save the token secret as there isn't any way to fetch it at a later point.
Assign role `monitoring` to token `monitoring` and the user `monitoring@pve`:
```
pveum acl modify / --roles Monitoring --user 'monitoring@pve'
pveum acl modify / --roles Monitoring --tokens 'monitoring@pve!monitoring'
```#### Use password based authorization
Set password for the user `monitoring`:
```
pveum passwd monitoring@pve
```Assign ``monitoring`` role to user ``monitoring``
```
pveum acl modify / --users monitoring@pve --roles Monitoring
```For further information about the Proxmox VE privilege system have a look into the [documentation](https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_strong_pveum_strong_proxmox_ve_user_manager).
## Usage
The ``icinga2`` folder contains the command definition and service examples for use with Icinga2.
```
usage: check_pve.py [-h] -e API_ENDPOINT [--api-port API_PORT] -u API_USER (-p API_PASSWORD | -t API_TOKEN) [-k] -m
{cluster,version,cpu,memory,swap,storage,io_wait,io-wait,updates,services,subscription,vm,vm_status,vm-status,replication,disk-health,ceph-health,zfs-health,zfs-fragmentation,backup}
[-n NODE] [--name NAME] [--vmid VMID] [--expected-vm-status {running,stopped,paused}] [--ignore-vm-status] [--ignore-service NAME] [--ignore-disk NAME] [--ignore-pools NAME]
[-w THRESHOLD_WARNING] [-c THRESHOLD_CRITICAL] [-M] [-V MIN_VERSION] [--unit {GB,MB,KB,GiB,MiB,KiB,B}]Check command for PVE hosts via API
options:
-h, --help show this help message and exitAPI Options:
-e API_ENDPOINT, -H API_ENDPOINT, --api-endpoint API_ENDPOINT
PVE api endpoint hostname or ip address (no additional data like paths)
--api-port API_PORT PVE api endpoint port
-u API_USER, --username API_USER
PVE api user (e.g. icinga2@pve or icinga2@pam, depending on which backend you have chosen in proxmox)
-p API_PASSWORD, --password API_PASSWORD
PVE API user password
-t API_TOKEN, --api-token API_TOKEN
PVE API token (format: TOKEN_ID=TOKEN_SECRET
-k, --insecure Don't verify HTTPS certificateCheck Options:
-m {cluster,version,cpu,memory,swap,storage,io_wait,io-wait,updates,services,subscription,vm,vm_status,vm-status,replication,disk-health,ceph-health,zfs-health,zfs-fragmentation,backup}, --mode {cluster,version,cpu,memory,swap,storage,io_wait,io-wait,updates,services,subscription,vm,vm_status,vm-status,replication,disk-health,ceph-health,zfs-health,zfs-fragmentation,backup}
Mode to use.
-n NODE, --node NODE Node to check (necessary for all modes except cluster, version and backup)
--name NAME Name of storage, vm, or container
--vmid VMID ID of virtual machine or container
--expected-vm-status {running,stopped,paused}
Expected VM status
--ignore-vm-status Ignore VM status in checks
--ignore-service NAME
Ignore service NAME in checks
--ignore-disk NAME Ignore disk NAME in health check
--ignore-pools NAME Ignore vms and containers in pool(s) NAME in checks
-w THRESHOLD_WARNING, --warning THRESHOLD_WARNING
Warning threshold for check value. Mutiple thresholds with name:value,name:value
-c THRESHOLD_CRITICAL, --critical THRESHOLD_CRITICAL
Critical threshold for check value. Mutiple thresholds with name:value,name:value
-M Values are shown in the unit which is set with --unit (if available). Thresholds are also treated in this unit
-V MIN_VERSION, --min-version MIN_VERSION
The minimal pve version to check for. Any version lower than this will return CRITICAL.
--unit {GB,MB,KB,GiB,MiB,KiB,B}
Unit which is used for performance data and other values
```## Check examples
**Check cluster health**
```
./check_pve.py -u -t -e -m cluster
OK - Cluster 'proxmox1' is healthy'
```**Check PVE version**
```
./check_pve.py -u -p -e -m version -V 5.0.0
OK - Your pve instance version '5.2' (0fcd7879) is up to date
```**Check CPU load**
```
./check_pve.py -u -p -e -m cpu -n node1
OK - CPU usage is 2.4%|usage=2.4%;;
```**Check memory usage**
```
./check_pve.py -u -p -e -m memory -n node1
OK - Memory usage is 37.44%|usage=37.44%;; used=96544.72MB;;;257867.91
```**Check disk-health**
```
./check_pve.py -u -p -e -m disk-health -n node1
OK - All disks are healthy|wearout_sdb=96%;; wearout_sdc=96%;; wearout_sdd=96%;; wearout_sde=96%;;
```**Check storage usage**
```
./check_pve.py -u -p -e -m storage -n node1 --name local
OK - Storage usage is 54.23%|usage=54.23%;; used=128513.11MB;;;236980.36./check_pve.py -u -p -e -m storage -n node1 --name vms-disx
CRITICAL - Storage 'vms-disx' doesn't exist on node 'node01'
```**Check subscription status**
```
./check_pve.py -u -p -e -m subscription -n node1 -w 50 -c 10
OK - Subscription of level 'Community' is valid until 2019-01-09
```**Check VM status**
Without specifying a node name:
```
./check_pve.py -u -p -e -m vm --name test-vm
OK - VM 'test-vm' is running on 'node1'|cpu=1.85%;; memory=8.33%;;
```You can also pass a container name for the VM check:
```
./check_pve.py -u -p -e -m vm --name test-lxc
OK - LXC 'test-lxc' on node 'node1' is running|cpu=0.11%;; memory=13.99%;;
```With memory thresholds:
```
./check_pve.py -u -p -e -m vm --name test-vm -w 50 -c 80
OK - VM 'test-vm' is running on 'node1'|cpu=1.85%;; memory=40.33%;50.0;80.0
```With a specified node name, the check plugin verifies on which node the VM runs.
```
./check_pve.py -u -p -e -m vm -n node1 --name test-vm
OK - VM 'test-vm' is running on node 'node1'|cpu=1.85%;; memory=8.33%;;./check_pve.py -u -p -e -m vm -n node1 --name test-vm
WARNING - VM 'test-vm' is running on node 'node2' instead of 'node1'|cpu=1.85%;; memory=8.33%;;
```If you only want to gather metrics and don't care about the vm status add the ``--ignore-vm-status`` flag:
```
./check_pve.py -u -p -e -m vm --name test-vm --ignore-vm-status
OK - VM 'test-vm' is not running
```Specify the expected VM status:
```
./check_pve.py -u -p -e -m vm --name test-vm --expected-vm-status stopped
OK - VM 'test-vm' is not running```
For hostalive checks without gathering performance data use ``vm_status`` instead of ``vm``. The parameters are the same as with ``vm``.
**Check swap usage**
```
./check_pve.py -u -p -e -m swap -n pve
OK - Swap usage is 0.0 %|usage=0.0%;; used=0.0MB;;;8192.0
```**Check storage replication status**
```
./check_pve.py -u -p -e -m replication -n node1
OK - No failed replication jobs on node1
```**Check ceph cluster health**
```
./check_pve.py -u -p -e -m ceph-health
WARNING - Ceph Cluster is in warning state
```**Check ZFS pool health**
```
./check_pve.py -u -p -e -m zfs-health -n pve
OK - All ZFS pools are healthy
```Check for specific pool:
```
./check_pve.py -u -p -e -m zfs-health -n pve --name rpool
OK - ZFS pool 'rpool' is healthy
```**Check ZFS pool fragmentation**
```
./check_pve.py -u -p -e -m zfs-fragmentation -n pve -w 40 -c 60
CRITICAL - 2 of 2 ZFS pools are above fragmentation thresholds:- rpool (71 %) is CRITICAL
- diskpool (50 %) is WARNING
|fragmentation_diskpool=50%;40.0;60.0 fragmentation_rpool=71%;40.0;60.0```
Check for specific pool:
```
./check_pve.py -u -p -e -m zfs-fragmentation -n pve --name diskpool -w 40 -c 60
WARNING - Fragmentation of ZFS pool 'diskpool' is above thresholds: 50 %|fragmentation=50%;40.0;60.0
```**Check VZDump Backups**
Check task history on all nodes:
```
./check_pve.py -u -p -e -m backup
CRITICAL - 8 backup tasks successful, 3 backup tasks failed
```Check for specific node and time frame:
```
./check_pve.py -u -p -e -m backup -n pve -c 86400
OK - 2 backup tasks successful, 0 backup tasks failed within the last 86400.0s
```## FAQ
### Individual thresholds per metric
You can either specify a threshold for warning or critical which is applied to all metrics or define individual thresholds like this (`name:value,name:value,...`):
```
./check_pve.py -u -p -e -m vm --name test-vm -w memory:50 -c cpu:50,memory:80
OK - VM 'test-vm' is running on 'node1'|cpu=1.85%;50.0; memory=40.33%;50.0;80.0
```### Could not connect to PVE API: Failed to resolve hostname
Verify that your DNS server is working and can resolve your hostname. If everything is fine check for proxyserver environment variables (HTTP_PROXY,HTTPS_PROXY), which maybe not allow communication to port 8006.
## Contributors
Thank you to everyone, who is contributing to `check_pve`: https://github.com/nbuchwitz/check_pve/graphs/contributors.